13 Dec 2007

My Database is a Monster!! Run Away!!

Firstly, I have to apologize. I have been busy with some personal things and also been looking for a new job, so I haven't had a lot of time to update my blog.

I have read something on Coding Horror (which is a really good blog, by the way) about whats called "analysis paralysis".
I myself, often get stuck in this stage. Especially in a paradigm-rich environment like java, which gives you many philosophies about how to do things more effectively, when I myself have to admit that it would take less time to just start to do some code and make corrections later. For the people, that have been following my blog, I do talk a lot about theory and not a lot about action. This is something I would very much like to change. So I plan to take my idea for "simplified business rules engine " and actually start working on it - as my pet project.
So I hereby commit myself to this idea and also to promise that I will try to cut down on my theory and take more action.


The Leap From a Regular Database to an AI Monster

When I came up with some of my previous ideas and did some research on them, I eventually ended up with systems that used rules to help them make sense of the data. I intended the rules to be added manually at the start of the development process by the user according to their business needs. But I later found out that there are other possibilities to add rules after you inputted your data and found patterns that you didn't see before.
This is the realm of rule-based systems, knowledge system, expert systems and leads to artificial intelligence.
All of these are very big and scary words, but in essence, they use rules taken from patterns in the data and try to make assumptions or give answers.

For example (and this is something from my college days), there are 3 levels of data, which I will explain:
  1. Data - red light, 01/12/2007 09:15am
  2. Information - I drove my car through a red light on the 01/12/2007 at 09:15am.
  3. Knowledge - after concluding that if you cross at a red light there is a 77% chance you will be in an accident - Do not cross when the traffic light is red to avoid being in an accident.

So as you can see here, we found "knowledge" from a data pattern. If we tell this to a car with some artificial intelligence, they would know that when the traffic light is red, they should stop the car.

Now, I am not suggesting that machines will discover pattern and create rules themselves. Deciding on the rules is still done manually (I think anyway), since you may have many different decisions to consider before adding a new rule.


Rules, Rules and More Rules

Going back to business rules, our applications has logic, in the form of rules.
A typical rule for a mortgage application might look something like this:
IF (number-of-30-day-delinquencies > 4)
AND (number-of-30-day-delinquencies <> ....

So you have all sorts of considerations for your business. You need this to be done when something else happened. Basically, you need If-then rules to help you make sense of it all.
You already use this if-then rules in your application anyway, so the knowledge that you discovered through out your business life is already there.


Basic Rules

Lets start with rules to help you see whats going on in the database. Some data is there in the database to give you indications of what actions to take.
For example:
We have a dataset of the dates which hotel rooms are available to be reserved. There is a column for "soldout", for "room_status" (available or not) and you have "rooms_available".
The rule would be:
if (soldout = false) and (room_status = true) and (rooms_available > 0) then room is availble

Everyday, your application runs thousands of scans on this dataset to find available rooms and each time, it needs to decide if a room is available or not.
You can place that decision in the database by moving the IF statement from your application to your database. Then you just do a search by "select available rooms" where the rule if the room is available or not is placed in the database.

So, not exactly rocket science, but moving the IF to the database has its advantages, which I have mentioned before but I will go over them quickly:
  1. You don't need to pass all the extra data to the application so that it can make its decision in the application layer. So you save network traffic
  2. Databases usually have pre-built functions that you can use "off the shelf" to manipulate data and saves you writing things from scratch in the application layer. (although, I am sure there are pre-built functions in your application layer too)
  3. You can manage the rules in the database in a more structured way instead of looking for it between many lines of code in the application layer.


Guesses

You can make guesses using rules. For example, if an animal has striped fur then it is a zebra.
But wait a minute! a tiger also has striped fur, so lets improve this rule.
If an animal has striped fur, hooves and jagged teeth then it is a zebra.
If an animal has striped fur, paws and pointy teeth then it is a tiger.

So you can make guesses, you just need to make sure they are right ones and also consider to recheck them every once in a while to see if they are still right.


Estimates

You can make estimates according to previous data. This is especially good with databases, since you have data to do that with.
For example: Compared to last year's sales data multiplied the yearly sales increase percentage - this store is under performing and we need to see to why that is.
The rule is: check last year's sales for this store on this month and multiply it by the yearly sales increase percentage. If this number is higher then the store's monthly sales, then report it as a problem.

So estimates can give you good early warnings or insight as to whats going on, but they can also be wrong because of some unreported data or new pattern. So you need to check and recheck the data.


Keeping It Simple

Try not to get scared by the big names and use rules to benefit your goals. If in the future, after having collected more data, you would like to make some more rules, try to do so but be very careful and try to establish that the data you got for your assumptions is correct.
For example:
Every day at 6pm I get a small headache for the last week. Why is this happening?
Well, I walked home and maybe the walk gave me a headache?
Before I left the office, I went to a shop and bought a can of coke and a dough nut.
Is it the can of coke that gave me a headache?
Is it the dough nut that gave me a headache?
Is it the combination of coke and dough nut that gave me a headache?
Is it the combination of coke, dough nut and walking a lot that gave me a headache?

So to get to a rule about how to stop getting headaches at 6pm each day can be many things and you need to do some tests before you can see what the conclusion is.
And even after you did some test, the headache could go away from something entirely different.......


Conclusion
1) Rules are the logic behind things
2) Knowledge is based on patterns of data
3) Rules help you understand the data and use it for your goal.


For my next blog, I will start to actually do some code. Not C code, because I don't know that, but I will do some MySQL functions. So I plan to eat my own dog food.

4 comments:

  1. or, perhaps, your own donuts? :)

    I look forward to seeing your progress.

    ReplyDelete
  2. Thank you Sheeri for your kind words.

    Also, I wish you best of luck with your new job at the pythian group.

    ReplyDelete
  3. Kinda blatant plug, but meant as earnest help: perhaps my GRAPH engine (http://openquery.com.au/products/graph-engine) could be of use for this kind of application?

    ReplyDelete
  4. @arjen
    Feel free to "plug" my blog anytime you like :)

    Regarding your graph engine, I saw it.. and I have to say it looks interesting.
    I need to take a look at it a bit more, but definitely well done on it.

    ReplyDelete