3 Nov 2007

From Developing to Data-basing and back

So first of, I'd like to say thank you to David Edwards who wrote Log Buffer #69: a Carnival of the Vanities for DBAs - for writing about my blog!

It really amazes me how vital blogs are to my day to day life and work. Most of the blogs I read are by people who work in some area of computers and they give you a rare look at their life and how they solved problems in their field.
Its really first hand- first class information that benefits me immensely.

And now for my blog...


Most database users are not interested in your tables

Its true, they really don’t care. In fact, most developers have to "map" your tables to their application, meaning tell the program which table to look, which field to look for and if need be, convert it or manipulate it in some way.

For developers, the database is a sort of big headache. The SQL queries look scary and fragile. The tables seem to be connected to 50 other tables and they need to understand why. Add to that, making sure the database connection/session is turned on or off to save resources and improve speed.

All this adds many many more lines of code, which can lead to bugs and hair loss (from pulling out hair).

There are some tools out there, even very good ones that use Object-Relational mapping (ORM) which do help greatly with that, but there still is an issue of understanding the tables and how they are structured.

And as I said before, they are not interested in your tables. Database users or developers in my case, want some sort of result from the database and that’s it. Which to me means: that if there was an abstraction layer between the database and whoever wants to use the database, it might benefit everyone.

So how do we do that? Well, we can use views, stored procedures and functions. In a way, we should ask the users - how they want the data to be shown. After we know what it is they want, we can join tables, make small adjustments with a function or two and then store this method of showing the result in the database. Then all the user has to do is call on this method and start working.

Experience from my company

In my company, we have a hotel reservation system that searches for available rooms for the user and then allows them to book the rooms. We are using PHP to get data from the database, process it and then display the results. All in all, totally the standard way of doing things on websites, these days (and even Nati Shalom wont argue with that).

The problem is the way the original developers of the website wrote the code, they did it, largely how they understood it in their heads. For example:

1) Take the list of all the hotels in the database - by the location selected by the user
2) For each hotel, get the list of rooms they have.
3) For each room, check if the room can hold the people the user asked for (for example 2 adults and 1 child)
4) For each room, check if the room is available on the dates the user selected
5) Get the best price for the room (cheapest rate) to give user an idea of how much it will cost.
rate can be for example: with breakfast included, all meals included or no meals at all.

So you have location -> hotel -> rooms -> rates and then, get the price.

The problem with the way the PHP was written is that it asks FAR too many queries from the database. Each "for each" is a query in itself. This can get slow... very slow in fact.

Now even here, the original developers could have used some slightly complicated joins in 1 SQL to get results better. In examples 1 and 2, you could do a simple join right there. But the developers were thinking in a human-logical way of how they would process this in real-life.

Anyway, the way I got this system, it took 40 seconds to run this kind of search on all the hotels we had. Now, after serious tweaking to the database and some changes to the PHP code (I could only change certain things), it runs on 8 seconds with twice as many hotels as the 40-second example. But, I wanted to go further.


Go further, baby! Yeah!

I wanted to see what it would be like to run this program in the database, but only using 1 loooong SQL line. So I tried it out, just to see how it would be like and learn from it…

I got the same results from the database that the PHP got, but in 0.93 second!!!!oneone111!!oneone.

I was really amazed myself. I showed it to my bosses, but even before that when I suggested this kind of change, they told me they don’t want any major changes for the time being till all the other errors are solved. In any case, the database "did its thing" much much faster then the PHP could do.


Hey, wait a second here!

Now maybe I can hear some of you saying to yourselves, that this doesn’t make sense, PHP isn’t slow and that I am biased. Well look.... bad code and bad database design (which is very much the case here) will ALWAYS be slow. And if the original developers used better SQLs, it would have been much faster. This is also true.

In my particular example, I moved all the if's and business logic into the 1 (very long) SQL line and the database produced a very fast result. Now even if I break this down a bit and make 3-4 SQL lines, let the PHP hold the business logic and do it that way, then it is still perfectly ok. I know that many people absolutely hate the fact that business logic would be anywhere else then the application and heaven forbid it to be in the database.

You can do it anyway you are comfortable with, but the point is that I want to make is NOT to let the developers play with the tables themselves. In this case, they completely didn’t factor in that all they needed was just some results and not to load all the tables one by one to the memory, then process them.


My 2 cents

I actually like MySQL for writing some business logic inside of it. Why? because for me, its quite easy to do it. But the best part of all is that I can see how long it took the database to produce the results. For those who don’t know, in the MySQL query browser (which you get free from MySQL), there is a feature that tells you how many seconds it took to produce the results. I like this feature, because it lets me see straight away where there can be bottlenecks and where I can tweak things as I write them. Its really great feedback!


Conclusion

1) So as I was saying, database users do NOT want to see your tables. They just want the results.
2) You REALLY need to consider the database when you code, if you want to keep things fast.
3) (and I will even go as far as to say, you need to consider the code when you plan your database)
4) Get feedback how fast your SQL queries are and you can solve bottlenecks early on.


Thank you so much for taking the time to read my blog.

No comments:

Post a Comment