13 Nov 2007

Exploring Event-Driven Publishing

So I have been very interested with event-driven publishing and apparently (as I have often discovered) oracle has already beaten me to it by having a whole system in place before I had even thought about it.


What is Event-Driven Publishing?

Well, for short it means that an event triggered the data or content to be published and not like request-driven publishing which is when a user asks to see this data and it is generated per request. The event could be the database being updated with new data, which triggers all the relevant parts of the websites to be updated accordingly. Therefore technically, you save the time that you would have processed this information before hand and it, depends on the situation, can significantly increase speed.

Now I did some more research (another useful article) and I found out there are real benefits to event-driven publishing, however its not always a silver bullet. In cases where the data changes often or when the results vary greatly then it just adds more overhead and slows things down
Its applicability heavily depends on the ratio between the frequency of data modification events and the Web client's requests. For example, if the events occur more frequently than the client's requests, this approach creates even more overhead on the server than request-driven publishing. Imagine a rarely visited Webpage that displays current server time with millisecond precision. Use the event-driven approach to publish such a page and your server will die long before the first curious client can enjoy the fruits of your labor.


How can I use Event-Driven Publishing using MySQL?

Well, this would go back to my previous article about chunking things down to blocks. If I take dynamic parts of my web page and separate them into blocks, I can then construct the blocks and save them in the database, ready for being called on.

How do I do that?
Well this would require the approach that the database should have some of the application logic inside of it. This is not an approach everyone likes and there has been many articles saying that its "evil" but I feel that databases should produce results and that no one is interested just looking at tables. I also feel that databases handle data really well and I would prefer to use databases instead of coding cache, arrays and logic that manipulates data. In fact it would be easier and faster (in my opinion) to let the database handle "its own data" internally and just give me the results I'm after.


Step 1 - Integrating

So the first step would be to realize that I need to use html code and glue it to the relevant data. I don't think there is any way around it. What I would like to do for this, is use some sort of configuration table that saves this in a long SQL line in the database. Like this:

select customer_id , concat('',orders.idOrders,'',customers.name, '(', customers.email, ')' ,sum(orders.total_cost),'') AS result from (orders join customers on((orders.customerId = customers.id))) group by orders.customerId

(yes, I also think its very attractive)
So what we have here is an SQL line with HTML code that will display results that can be used in the html code. I would prefer it if the php developer or designer show me how they would like the data to displayed and then we can just work out how to do this long SQL line.
Now this is the only part that looks ugly. Later stages get much easier.


Step 2 - Saving and Caching

At this stage, things become surprisingly flexible. You can now either save 1 line per row of ready-made HTML or you can save a number of HTML lines together as a block.
If you join the lines together (with group_concat() or stored procedure with a cursor) then you can create a complete block that should be seen on the page as is.
You can now just focus on these blocks: if they are up to date, when to use and reuse them.
So again, you can save:
  1. Ready-made 1 liners.
  2. Blocks or a few 1 liners grouped together.
  3. Whole pages - assuming they don't change often.
Now, what you would need to do is make up some memory table. Remember, this is your cache. Then you save the lines,blocks or pages in this memory table and wait for the user to call on them.


Step 3 - Triggers and Events

So now you have your cache ready, but what happens when an "event happens" and someone updates the database?
Well, basically, what you need to do is delete the cache thats not relevant anymore and recreate it. For this you can use triggers in the database.

Triggers, from what I understand, run in the same thread that you are connecting to the database, so in effect they are just add ons to your SQL line. Taking that into consideration, the user that entered the data will have to wait till all the triggers and the original SQL line has finished to continue. This can slow things up and basically moves the waiting period from when the user request the data to when the other user that updated the data.
This can be good, depending on the situation. If in my case, a user comes online and that user belongs to a group that needs to know is that user is online, then the user will have to wait for each member of the group to delete and generate their cache. So it might take a long time to login.

In order for the user to not wait so long, the database needs to run these things in different threads. The best way that I found how to do this is by using events (which has only been applied since 5.1.6).
You can use the triggers to delete the cache and then call on an event to recreate the cache, so the user doesn't have to wait. Then you can really call this "event" driven.


Step 4 - Final Checks

I think all of these do answer the problem, but I think that there still should be some check to see if the cache is empty when the user requests it and if it is, then create it while they wait.
So the way I see it, the application communicates with a stored procedure. The stored procedure, first checks the cache to see if the data is there. If so, it will give it back to the application. If not, it will run the function to generate it and then give it back to the application. This function should be the same function that the event uses. This is similar to the way that Oracle's WebCache does it (but I'd like to think that I came up with it first).


Conclusion

1) To speed your website up considerably, you will need to probably use a combination of event and request driven publishing. Many people (correct me if I'm wrong) do not use event driven publishing and the fact that you know it, will give you a considerable advantage.
2) Using event driven publishing can be more complicated then regular request driven publishing.
3) Both methods do get the job done and when combining both of them, what they do is to manage the time or intervals the server runs its processes so as to not keep the user waiting.


Thank you for taking the time to read my blog.