10 Jun 2016

How to Speed Up the Database Behind Your API

I have changed my tune and jumped on the bandwagon for having APIs, BaaS, mBaaS, etc..
I like the idea, the flexibility and the potential it has.
As such, I have done some research and will now provide some practical advice on how to speed up the database that powers your API.


1) ORMs vs YeSQL - Speed


I have recently taking a liking to Clojure. Clojure uses something very simple and elegant to connect to databases and that is YeSQL. YeSQL is basically an SQL template engine which largely separates  SQL queries from the code. In an SQL file, the SQL queries are kept with annotations and those annotations can be used as the names of the functions in the code.
This keeps the code clean and the SQL queries separate.

There have been discussions about why using SQL is more performant than an ORM.
I will add that in my professional opinion as a DBA, ORMs will:
  1. Make things slower
  2. Give you issues with the relational/object mismatch
  3. Make it difficult to find where the performance issue is
  4. Make it harder to fix the ORM query after you/I find it in the database (is it easy for you to change an ORM generated query to include an index hint? could you change a join to a specific sub-query?)
The truth is that SQL is an extremely powerful DSL for reading and manipulating data and is very mature and thoroughly tested.


2) ORMs vs YeSQL - Teamwork and Monitoring


With YeSQL, you can write your SQL query with a few parameters like: WHERE date = :startdate
and your application can fill those parameters at runtime.
If you should need to make changes to your SQL queries,  as long as you don't change the parameters themselves, you do not need to update your application code.

Stringing together that idea of easily updating your SQL queries and separating your SQL code to an sql file, you now have 2 very substantial features:
  1. You can ask me, a database performance engineer to make the queries run faster by providing only your sql file. I can give you back the sql file with faster performing queries and as long as they have the same parameters and produce the same result sets, you don't need to make any further changes in your application code. This means you can easily add different members of a team and different experts to work on your system.
  2. Depending on where you host your API, you may NOT HAVE access to performance information from the database. So knowing which SQL queries to tune, will help you find the worst performing ones.
Some of the hosting providers will only give you performance information about how long your API takes to answer calls. Some might give you more access to the database or maybe the MySQL performance schema. It varies on the hosting provider, and it would be helpful to keep your queries in a nice and clean place, just in case you need to find the worst performing ones.


3) More Monitoring


I recommend you set thresholds for your database queries and log those somewhere. For example, log all queries + their parameters to a log file or a database table if they take longer than 0.5 seconds. If you have N+1 queries, then maybe log the sum of those set of queries.
This will help you a lot later on and will help you be more proactive in finding issues and depending on your vendor, you may not be able to get this information any other way. You may also be able to connect that information to a alert and monitoring software. 


4) Caching - Memcached


Depending on where you host your API, you may have the ability to have a caching layer sit in front of the database. I would recommend this and it would prevent your application hitting (maybe even literally) the database too many times.


5) Caching - Varnish, Nginx


Depending on where you host your API, you may have the ability to cache the calls of the API. In most cases that I have researched, this is usually a default option. If not, then I would recommend that you add it, but I am guessing that you probably knew that.


6) Caching - Summary tables, Count tables


One of the strategies which you can use as an engineer are creating new tables that have summarised information. You can then query those tables to get faster response times. I have a video about this in the context of big data, but it very much applies here as well as a form of caching.
Consider generating and maintaining tables that have a summarised version of your existing tables and that are substantially smaller. You can also "enrich" those summarised tables with data that will prevent you from doing more JOINs. Then fill those tables in a batch job when needed.

This strategy is particularly useful for improving response time for reporting queries on projects that you inherited with highly normalised database tables.


7) Data Modelling - Do it Properly


Take the data modelling part of the application very seriously. I am a bit biased about this but, consider bringing an expert at the beginning or early in the project to help plan your database tables to be optimal.
Take your time to think about how you would run queries against those table in the long term.
What would be the best way to store and query the data for the requirements now and in the possible future?
Don't just stick everything in a JSON/text and hope for the best.
Don't just normalise it to the 3rd, 4th, 5th, 6th normal form.
Plan it carefully.


Some sites you may want to look at