14 Dec 2008

Reporting SpeedUp Trick

I've been asked to make some reports recently for some people in my company.
Almost all of them are quite heavy and hit the database pretty hard.
The thing is that these reports are usually generated once or twice a month, but they process a year's worth of data. Every time they are generated, they really upsets my poor production MySQL database.

I was thinking if there maybe could be a better way to just save the processed data in a table and update it every now and then.

So.. this is what I did:
  1. I created a table that keeps the aggregated results from the report.
  2. I added to the start of the SQL statement "INSERT INTO reports.thistable SELECT (the worlds most intensive SQL query) ;"
  3. At this point I realised that I need a bit extra help determining which parts of the data should be updated and which is already in the table. So I added an "last_updated" column in both tables with a current_timestamp.
  4. I changed the INSERT INTO statement to an INSERT.. ON DUPLICATE KEY so it can add to the previous amount in the aggregated table.
  5. I also made sure that I added more primary keys to the reports table so the statement will know which fields are in fact duplicate keys.
  6. Lastly, I added: WHERE filter the data table's timestamp column by -> (SELECT the last timestamp on the reports table). So basically, you are (in theory) only getting fresh data to update into your reports table.

This is fine for tables that only have data inserted into them. If existing rows get updated, you might need to use a trigger.

What I plan to do is to give an option to clear the whole report table if the user doesn't trust the data. However, in the future, I will limit that query from running during certain work hours to avoid upsetting the production database.


No comments:

Post a Comment