19 Apr 2012

MariaDB's Virtual Columns

I wanted to evaluate MariaDB's virtual column and see if it could store business rules next to table data. With virtual columns this could be done and if you specify as 'virtual', would not even take up space on disk.

The imaginary use case is that there is a sales team in a company and you would like to evaluate if a particular salesperson is eligible for a bonus. In order to get a bonus, you need to sell above the average for the day and be in the top 5 amongst the salespeople.

So here is what I did:

MariaDB [test]> create table salespeople (id int unsigned not null auto_increment primary key, salesperson_id int unsigned not null , `date` datetime not null default 0, sold decimal(15,2) not null default 0, day_avg decimal(15,2) not null default 0,  above_avg char(1) as (if(sold>day_avg,'Y','N')) virtual);
Query OK, 0 rows affected (0.02 sec)
So far so good.
 MariaDB [test]> insert into salespeople(salesperson_id, `date`, sold, day_avg) values (1,now(),300,150);
Query OK, 1 row affected (0.01 sec)
MariaDB [test]> select * from salespeople;
+----+----------------+---------------------+--------+---------+-----------+
| id | salesperson_id | date                | sold   | day_avg | above_avg |
+----+----------------+---------------------+--------+---------+-----------+
|  1 |              1 | 2012-04-19 13:16:32 | 300.00 |  150.00 | Y         |
+----+----------------+---------------------+--------+---------+-----------+
1 row in set (0.00 sec)
Cool, just like mashing a view into a table.

MariaDB [test]> update salespeople set sold = 149 where id = 1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0
MariaDB [test]> select * from salespeople;
+----+----------------+---------------------+--------+---------+-----------+
| id | salesperson_id | date                | sold   | day_avg | above_avg |
+----+----------------+---------------------+--------+---------+-----------+
|  1 |              1 | 2012-04-19 13:59:57 | 149.00 |  150.00 | N         |
+----+----------------+---------------------+--------+---------+-----------+
1 row in set (0.00 sec)

Works again. No triggers for UPDATE needed.
Now lets try to add another business rule:

MariaDB [test]> drop table salespeople;
Query OK, 0 rows affected (0.03 sec)

MariaDB [test]> create table salespeople (
    -> id int unsigned not null auto_increment primary key,
    -> salesperson_id int unsigned not null,
    -> `date` datetime not null default 0,
    -> sold decimal(15,2) not null default 0,
    -> day_avg decimal(15,2) not null default 0,
    -> above_avg char(1) as (if(sold>day_avg,'Y','N')) virtual,
    -> day_position int not null default 0,
    -> give_bonus char(1) as (if(above_avg='Y' and day_position between 1 and 5,'Y','N') )virtual
    -> );
ERROR 1900 (HY000): A computed column cannot be based on a computed column
MariaDB [test]>


DOH! >.<

ADDITION:

I have found a way to store more business rules with the limitation of not being able to use another virtual column to "build" a more complex rule set:

MariaDB [test]> create table salespeople (
    -> id int unsigned not null auto_increment primary key,
    -> salesperson_id int unsigned not null,
    -> `date` datetime not null default 0,
    -> sold decimal(15,2) not null default 0,
    -> day_avg decimal(15,2) not null default 0,
    -> day_position int not null default 0,
    -> bonus_amount decimal(15,2) as (
    -> case when sold>day_avg and day_position between 1 and 5
    ->  then 0.2 * (sold - day_avg) #20% bonus
    -> when sold>day_avg and day_position >5
    ->  then 0.05 * (sold - day_avg) #5% bonus
    -> else 0
    -> end )virtual
    -> );
Query OK, 0 rows affected (0.75 sec) 
MariaDB [test]> insert into salespeople(salesperson_id, `date`, sold, day_avg, day_position) values (1,curdate(),300,150,3);
Query OK, 1 row affected (0.01 sec) 
MariaDB [test]> select * from salespeople;
+----+----------------+---------------------+--------+---------+--------------+--------------+
| id | salesperson_id | date                | sold   | day_avg | day_position | bonus_amount |
+----+----------------+---------------------+--------+---------+--------------+--------------+
|  1 |              1 | 2012-04-20 00:00:00 | 300.00 |  150.00 |            3 |        30.00 |
+----+----------------+---------------------+--------+---------+--------------+--------------+
1 row in set (0.00 sec)
In the above example, the actual bonus amount was calculated.



6 comments:

  1. But does it actually make any sense if you can't create an index on it anyway? I think this is a nice feature, but not really useful unless function-based indexes are implemented.

    ReplyDelete
  2. Try using "PERSISTENT" instead of "VIRTUAL."

    PERSISTENT columns take up storage, and can be indexed, with some limitations, whereas VIRTUAL columns are calculated on-the-fly, and thus can't be used by other fields or indices.

    ReplyDelete
  3. @Maciek
    I guess what I was trying to do is see if I wanted to construct a virtual column based on more then one business-rule. Those rules would be captured as virtual columns.
    With this limitation, all the rules have to fit into one virtual column.

    @bytesmiths
    Yes, I tried that, but it didn't work. It is probably an intentional limitation.

    ReplyDelete
  4. Cool. You are the first blogger after myself to dive into virtual columns. Clearly we have similar interests. Do you also like football?

    Anyway:

    Having a column like day_avg is not normalized! The proper way to do this would be to call AVG() in the column function, which will not work. (Can only reference your own column.) If it was allowed, it would still be ridiculously slow.

    As it is now, I don't see you actually saving any space: storing the average number within each row takes up more space than what you save with the two Y/N columns :-)

    You can make the second column work though (as long as you keep the day_avg column there). You just need to include the logic from above_avg directly into the function:

    above_avg char(1) as (if(sold>day_avg,'Y','N')) virtual,
    -> day_position int not null default 0,
    -> give_bonus char(1) as (if(if(sold>day_avg,true,false) and day_position between 1 and 5,'Y','N') )virtual

    ReplyDelete
  5. Eh, I mean you can only reference your own row, not column!

    ReplyDelete
  6. @hingo
    Thanks :) I was interested in virtual columns when MariaDB was in 5.1, but didn't really find the time to research how I could use it.

    The salespeople table was an aggregate with a row per day. The columns reflect those aggregated data.

    I could have implemented the rules in one column (in fact, I have to), I was just experimenting if its possible to "build" a complex one based on many smaller rules.

    I have however, added something to the original post which you may find interesting: I found that I can use case statements with virtual columns.

    ReplyDelete