First of all, I hope the title isn’t too catchy or misleading. There are no magic secrets here, actually, it’s all pretty common stuff for anyone that has some experience with any DBMS. But that is, in itself, part of point of this article: that sometimes the basic stuff can be overlooked.
(Editorial note: For privacy reasons I changed the names of several things in order to keep the client identity and industry as less-evident as possible)
I won a contract to build a platform for a client. This was to be used internally but also as a way for the client’s clients to access and analyze data. They had a pretty old platform in place that was aging quite fast and lacked useful features. So they hired a company to get them a more modern platform with a flashy UI. Unfortunately, this company didn’t build anything new, rather they used something they already had and made lots of tweaks here and there to try and fit a round peg in a square hole. The end product was interesting at first, but in a short period of time it began to show its weaknesses. So again the need came to actually get something built from the ground up that met the needs of the present and was ready for the future. This is when I come in.
The main table of the database holds millions and millions of rows. Although the users are not that many, the amount of data being stored every hour is huge. This means the database grows a couple of Gb every single month. But the main problem of the database was not the storage space it needed – I was told – but that it was so big that needed to be as performant as possible.
The main queries that are done, which retrieve data to be analyzed or compared over spans of time, could take anywhere from 30 to 180seconds, depending on the time span of the query. You read that right, seconds. It was painful to use it.
Because they were still maintaining the old platform plus the current one, they had a mixture of MySQL and Postgres running. When I proposed to get a fresh start and merge everything together in just one MySQL database I felt some doubts were raised as if I really knew what I was doing. After all, when performance is an issue postgres is the answer. Right? Not necessarily!
Step 1 – Actually utilize the resources you have
Early on in the development process I got access to the current server so I could look at how things were running. The server had a nice 24Gb of RAM, a recent upgrade was done (I was told) from 16Gb, in order to try and improve things. There was so much of stuff being ran at the same time for this platform to work! I never got to understand properly the whole architecture of it but there was a lot of java everywhere. In spite of the RAM increase, there was no noticeable difference in performance.
Upon some looking here and there, I noticed that about 90% of the RAM was free. Does that mean that the recent upgrade was totally mislead and unnecessary? Yes, it does!
Worse than that, I ended up finding out that the company that installed the server and platform didn’t even bother to do a proper setup of the MySql server. That means that it was using default values! For instance, innodb buffer pool size was 128Mb! No one wonder MySql wasn’t touching all that available RAM. It was basically a very costly and nonsense server upgrade. Mainly because the server doesn’t even get that many requests.
Step 2 – Can you believe it?
I got a partial dump of the database so that I could run some tests on my dev machine. I had the queries that were run most of the time, in order to make sure I was targeting a good performance increase.
On my machine I could get slightly faster query results, but still in the order of about 1min, on average.
My second shock of horror was when I discover that this table that holds all the important data had the usual auto-incremented primary key, but no indexes whatsoever. No wonder the queries were taking too long!
The queries were most of the time something similar to
SELECT [cols...] FROM [datatable] WHERE user_id = ? AND location_id = ? AND datestamp BETWEEN ? AND ?
This was the way to fetch all relevant data from a specific location, from a user, during a span of time. But sometimes, for another kind of analysis, it would drop the location and filter only by user and datetime.
The priority for using an index was then User_id 🠖 datestamp 🠖 location_id
Of course you can create an index for all of these columns, but in a table this big it is also important to keep in mind that indexes can take up a lot of space so you don’t want too many of them.
I ended up creating a composite index, taking into consideration the priority above and that MySql uses the keys from left to right.
That means if the index is [user_id, location_id, datestamp] but location_id is not part of the query, then MySql won’t be able to use this index even if datestamp is part of the query.
The composite index ended up as [user_id, datestamp, location_id] which is somewhat complex but is able to match exactly the kind of queries the platform needs to do. In the event of wanting to retrieve all data from a user in a certain span of time (ignoring location_id), MySql can ignore the last key because it is on the right side of the index. And because data is never fetched without specifying a clear time period, the first two keys are always used.
After changing the table accordingly I ran a couple of tests with some of the most demanding queries. Data was now fetched always under 1 second! It was like night and day.
I decided to give this a try in a very weak server. I created a new VPS in Digital Ocean and opted from the lowest of the lowest: 1vCPU, 1Gb Ram. Actually faced some troubles to get npm to work due to memory constraints (sigh), but after that was done, everything was still blazing fast.
This was only used for demos and eventually the client created their own new VPS to host the new platform once it was done, at a 90% reduced yearly-cost, compared to the current one.
So this is my lesson learnt, coming straight from the trenches: never overlook the simple stuff, and never think that others have done the simple stuff too!
So, to conclude
– Don’t assume the basics have been taking into consideration when you are handling work done by someone else
– Check all the default values and adjust accordingly to your server and needs
– Plan your indexes according to the queries you are supposed to execute