3 Sep 2008

Decisions... Decisions

In my last blog post, I discussed storing conditions in the database. For a quick reminder, I suggested that you store all the under-lying conditions (the if-then-else statements) from your code in your database. From my example, I used "did the sales person sell more then average for that day" ( sold_more_than_average BOOLEAN) and the answer would be True or False.

In this post, I would like to discuss connecting those same conditions with a decision table.
Before I get started, I would like to take a moment and reminisce about some of the previous posts I talked about decision tables to help explain them better.


Decision Tables and Your Database - November 25, 2007
Where I talk about decision trees and decision tables.

Using Business Rules in MySQL - February 20, 2008
I have a very graphical example of how to use decision trees and tables in that post.

Decisions in Your Tables - June 30, 2008
My most up to date and well thought out explanation for understanding and implementing decision tables.

Storing Conditions - August 25, 2008
This was the prelude to this post. It explains the reasoning behind simply storing your conditions, while this chapter explains how to make further use of them along side decision tables. 


So What is the Trick?
The trick is basically, you take the conditions you have stored and do a search in the decision table to get only one result back.
Confusing? read on. It will become clearer.


Example

Now I would like to show how to integrate conditions stored in the database with decision tables. Once again, I will do so with less words and more examples.


Story - Maintenance and Cleanliness in a Hotel

An owner of a small guesthouse would like to manage better the cleaning woman and how often she should clean the rooms of the guesthouse. After discussing the issue with her, they decide that she will clean each room, once every 2 days under the condition that there are guests using the room (don't look at me, I was in a guesthouse that the cleaning lady cleaned once every 3 days or whenever she felt like it).

The owner would also like the cleaning lady to clean rooms before new guests come in and straight after they leave so that people can view a clean room when no one is using it. Also, if there is something wrong with the room (like a broken sink) then there is no point to clean the room until it is fixed.

The owner would also like to keep track of booked and available rooms in this system.
Using MySQL WorkBench I designed the database model and it looks like this


Bare with me for the explanation. Lets take a look at the rather multi-columned Rooms table:


As we can see, at the top is the standard data that you would need for the table - room id and the day. I also needed some more input from the user, so I added those 2 extra fields.
The conditions are what I talked about taking the if-then-else statements and putting the results in your database.
Is_room_booked - is the room booked or will have people using the room on that day.
Is_first_day and Is_last_day - are conditions to know if the room needs a special kind of cleaning. First day needs to be prepared for visit and cleaned. On the last day, the room needs to be cleaned for viewing only (which could mean, no towels and toilet paper are left in the room).
has_been_cleaned_yesterday - has to do with the arrangement for the cleaner that she wont have to clean the room everyday. If she did clean yesterday, she doesn't have to today. If she didn't clean yesterday, then she does today.

The decisions are what you want to do with the data you have. Once you have the conditions, you need to query the decision tables. The decision table holds the combinations of the conditions and decisions. The query will then give you a result of action you should take.
When you have all the combinations pre-defined, you have no side effects. That is to say, if you have if-then statement (that you wrote it quickly), you might have missed a side effect when you didn't use an ELSEIF statement for a certain condition.



(Please pretend that 0 is False and 1 is True)
Here we have the decision table with all (or most) of the conditions that might happen and their results. Notice the primary keys at the top of the table to ensure that there wont be 2 conditions twice in the table (another benefit to use relational database with decision tables). Please also notice that I could have just joined the conditions to this table, but I decided to keep a copy of the decision in the main table so that if any changes are made, the old results will still remain in the main table.


So What was that Trick Again?

As I mentioned before, you take the conditions from the table, you use those conditions to do a search in the decision table. So you can do: 
SELECT * FROM decision_maintenance WHERE is_booked = True AND....
You can then insert the results into the rooms table, since you should only get one result. You can even set up a Trigger in the database to do it for you.

Small Note - If you forgot to input a line in the decision table, you can make a function (in any language) that if you do not get a result from the decision table that it should add a new line and send an error to the user to choose a decision for those conditions. 


And Now, The Code

If you don't need to see this part, skip ahead.
(Code from MySQL WorkBench - p.s. I had an error on a constraint with this program. I had to remove it manually)
-- -----------------------------------------------------
-- Table `mydb`.`customers`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `mydb`.`customers` ;

CREATE TABLE IF NOT EXISTS `mydb`.`customers` (
`idcustomers` INT NOT NULL AUTO_INCREMENT ,
`customer_name` VARCHAR(45) NOT NULL ,
PRIMARY KEY (`idcustomers`) )
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `mydb`.`rooms_info`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `mydb`.`rooms_info` ;

CREATE TABLE IF NOT EXISTS `mydb`.`rooms_info` (
`idrooms` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT ,
`room_name` VARCHAR(45) NULL ,
PRIMARY KEY (`idrooms`) )
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `mydb`.`rooms`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `mydb`.`rooms` ;

CREATE TABLE IF NOT EXISTS `mydb`.`rooms` (
`idrooms` TINYINT UNSIGNED NOT NULL ,
`date` DATE NOT NULL ,
`idcustomers` INT UNSIGNED,
`has_been_cleaned` BOOLEAN NOT NULL ,
`has_some_problem` BOOLEAN NOT NULL ,
`is_booked` BOOLEAN NOT NULL ,
`is_first_day` BOOLEAN NOT NULL ,
`is_last_day` BOOLEAN NOT NULL ,
`has_been_cleaned_yesterday` BOOLEAN NOT NULL ,
`need_cleaning` BOOLEAN NULL ,
`prepare_for_visit` BOOLEAN NULL ,
`available` BOOLEAN NULL ,
PRIMARY KEY (`idrooms`, `date`) ,
INDEX fk_customers (`idcustomers` ASC) ,
INDEX fk_rooms (`idrooms` ASC) ,
CONSTRAINT `fk_rooms`
FOREIGN KEY (`idrooms` )
REFERENCES `mydb`.`rooms_info` (`idrooms` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `mydb`.`decision_available`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `mydb`.`decision_available` ;

CREATE TABLE IF NOT EXISTS `mydb`.`decision_available` (
`is_booked` BOOLEAN NOT NULL ,
`has_some_problem` BOOLEAN NOT NULL ,
`available` BOOLEAN NULL ,
PRIMARY KEY (`is_booked`, `has_some_problem`) )
ENGINE = MyISAM;


-- -----------------------------------------------------
-- Table `mydb`.`decision_maintenance`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `mydb`.`decision_maintenance` ;

CREATE TABLE IF NOT EXISTS `mydb`.`decision_maintenance` (
`has_been_cleaned_yesterday` BOOLEAN NOT NULL ,
`has_some_problem` BOOLEAN NOT NULL ,
`is_booked` BOOLEAN NOT NULL ,
`is_first_day` BOOLEAN NOT NULL ,
`is_last_day` BOOLEAN NOT NULL ,
`needs_cleaning` BOOLEAN NULL ,
`prepare_for_visit` BOOLEAN NULL ,
`prepare_for_view` BOOLEAN NULL ,
PRIMARY KEY (`has_been_cleaned_yesterday`, `is_booked`, `is_first_day`, `is_last_day`, `has_some_problem`) )
ENGINE = MyISAM;


INSERT INTO `mydb`.`decision_maintenance` VALUES (False,False,False,False,False,False,False,False);
INSERT INTO `mydb`.`decision_maintenance` VALUES (False,False,False,False,True,False,False,False);
INSERT INTO `mydb`.`decision_maintenance` VALUES (False,False,False,True,False,True,False,False);
INSERT INTO `mydb`.`decision_maintenance` VALUES (True,False,False,False,False,False,False,False);
INSERT INTO `mydb`.`decision_maintenance` VALUES (False,True,False,True,False,True,True,False);
INSERT INTO `mydb`.`decision_maintenance` VALUES (True,True,False,True,False,True,True,False);
INSERT INTO `mydb`.`decision_maintenance` VALUES (False,False,True,True,False,True,False,True);
INSERT INTO `mydb`.`decision_maintenance` VALUES (True,False,True,True,False,True,False,True);
INSERT INTO `mydb`.`decision_maintenance` VALUES (True,False,False,False,True,False,False,False);
INSERT INTO `mydb`.`decision_maintenance` VALUES (True,True,False,True,True,False,False,False);
INSERT INTO `mydb`.`decision_maintenance` VALUES (True,False,True,True,True,False,False,False);
INSERT INTO `mydb`.`decision_maintenance` VALUES (False,False,False,True,True,False,False,False);
INSERT INTO `mydb`.`decision_maintenance` VALUES (False,True,False,True,True,False,False,False);

INSERT INTO `mydb`.`decision_available` VALUES (True,False,False);
INSERT INTO `mydb`.`decision_available` VALUES (False,True,False);
INSERT INTO `mydb`.`decision_available` VALUES (False,False,True);

Conclusion - Some Pro's and Con's
1) Saves CPU usage by using more hard disk space
Pro: Saves hard disk usage and CPU usage from repeated processing. 
Con: Uses more space on the hard disk and uses the CPU (probably when you insert data) to processes it one time 
2)  Helps with data analysis
Pro: Simplifies OLAP and BI operations
3) Leaves clues about how the data was processed. 
Pro: Helps a lot with debugging and possibly documentation.
Con: Keeps extra data that might not be needed for later. 
4) Helps with the design of the database as well as with the code. 
Pro: While planning the database design for the example, I ran into many situations where I re-did some of the design during the actual design process. This was because I had to think ahead for which conditions I might need for the decisions. I found this far more helpful then designing the database first, then finding out during the coding phase, that I need to make changes to the database.
5) Helps manage decisions
Pro: Allows you to "configure" your decisions by making simple changes to the table. As apposed to looking through your code to make the changes.
6) Leverages regular database search functionality to use your decisions
Pro: You can use regular SELECT statements to find out which decision to take.
Con: Even though you can quickly find which decision to take, you cannot take the action needed. The database will just give you the data and you might have to set up CASE or IF statements in your code to know what to do with the decisions.

Some more points which are arguable..
7) Is database and programing language agnostic.
You are not locked in to database vendors or "slow" programming languages, because you can relatively easily change either database or programing language. The database just keeps the data, the programming language is restricted to very defined data it needs to provide to it and the actual decisions/rules are mostly kept in the database but as data.
8) Scales
Using single table is faster then using JOINs. If you have a table with everything you need inside of it, then in theory, you could scale better.  As well as the fact that you might not need to process data again and again. I understood that facebook uses very very few joins in their code and that everything is in single tables (they might duplicate the data to keep it that way).
9) Keeps the data cleaner
Only if you setup conditions for that to happen, but could still help analyze the data for integrity.
10) Develop software faster
In connection to point 4, but a shameless statement that would need much more proof to back it up.


Bibliography

Decision tables have been studied in academia since the 80s. Here are some papers on the subject touching on different fields, from data mining to artificial intelligence:




If you have any feedback about my idea, I would be glad to hear it.