MariaDB's Columnar Store

I have been keeping an eye on MariaDB's Columnar store progress for a bit longer then half a year.

MariaDB chose to take the infinidb code after Calpoint closed shop about two years ago and implemented it into their product. I was a bit wary about infinidb as well as it was a columnar store without compression that had mixed reviews on news hacker.

However, it seems like MariaDB have pulled it off. They have added the infinidb engine to MariaDB with all its quirks and extra commands and they have added snappy compression as well. This is truely a huge win for them and their users, specifically in the area of reporting and analytics.

Here are two real life examples for getting data ready for reporting currently happening in the wild:
1) MySQL -> Sqoop -> Hadoop - where you would need a) 5-6 additional servers, b) someone to set those servers up in a hadoop cluster and then c) monitor the daily data transfer.
2) MySQL -> CDC -> Kafka -> Hadoop - a) more technologies to master, b) a few more servers and some c) more monitoring. But this time, its streaming.

To set all of this up could take from a couple of months to a year.

Now with MariaDB, you have:
1) MariaDB + some SQL scripts -  such as INSERT INTO datawarehouse.facttable SELECT ... FROM site.table1 WHERE date >= curdate() - interval 1 day;
2) MariaDB -> Pentaho/Talend -> MariaDB - Could be a bit slower, but with a GUI and really a lot of monitoring out of the box.

As you can see, there are a lot fewer technologies, a lot fewer complexities and it is a lot more straight forward to develop.

It is also very important to add that no one other than MariaDB is doing this. The closest you have is Tokudb which is great and can also last you a while, but a Columnar store is definitely more suited for this type of task.

So once again, congratulations to MariaDB for offering such a huge benefit to its users.

Using a Generated Column to help with date lookups

I have a table that has two columns: year and month.
While its ok to search on exact dates or by year, it is harder to search between two dates.

Lets see if we can solve this issue by using a generated column.
(Table taken from https://github.com/jonathanvx/uk_land_registry_paid_dataset)


JSON and MySQL Stored Procedures

You probably heard that MySQL 5.7 supports JSON.
But did you know that you can also use JSON with MySQL Stored Procedures - making them very flexible?

Less talk, more code:
(Data was used from the UK Land Registry that I worked on)