16 Oct 2017

Top 3 Reasons Why SQL is Faster than Java

I had a discussion with a colleague the other day. He was trying to write some SQL to use for a less-than-optimal data structure and was getting frustrated that it was looking "cumbersome". He wanted some advice, but was keen to simply write it with a mix of a few light SQL statements and some Java.

I would like to explain why this option would be slower than using "ugly looking" and "cumbersome" SQL:

1) Disk I/O

If you were to use Java, you would need to probably get a larger dataset from the database, process it in some way and output the results. This would mean that the database would need to fetch that larger dataset for you which would mean more (sometimes much more IO)

If you were to use SQL, you are leaving the fetching operation to the database's optimiser and with the help of indexes - may not fetch as much as with Java.

In short, you are allowing the database to reach the right data and filter what not to fetch - for you.

2) Network

For the reason above, the large dataset normally has to travel over a network. This is unless the java app server is located on the same machine as the database. This is not very common nowadays.

That network overheard can become more pronounced in a virtual or containerised environment where network issues can be a headache. (*note: I am not an expert, just observing from a distance)

In addition, needing to pass data through a network can be an additional overhead in parallel systems where the data needs to travel to each machine before starting an operation.

3) Java's Garbage Collection

It may not be known to most people, but Java adds quite the memory overhead for objects and some data structures. You can sometimes get a x100 difference. This does not mean that that you need x100 available memory, but it would mean that the GC would work extra hard with more CPU cycles to clean up the extra memory churn.

Bonus: SQL takes far less code than it would do in Java

While SQL can be an ugly string in your code sometimes, doing it in Java can take between x30-x100 more lines of code including tests. You may also need to test that your code does what SQL already does well such as JOINs and aggregate functions.

Caveat: When it is a good idea to use Java over SQL?

For processing a lot of data, Databases have the following concept:
Row vs Chunk vs Too Big

'Row' would be the slowest way of getting and processing data - unless you need to guarantee some level of data quality which requires it.
'Chunk' or a set of rows, is just right. Usually this would mean querying a large table by using 2 or 3 keys and get a result set that the database can handle well.
'Too Big' is a case where the database cannot handle well the number of rows and you would need to split your SQL into 'Chunks' using Java and process it that way.

Please also check out this book that I found useful in this matter:

No comments:

Post a Comment