15 Mar 2010

Denormalization - Examples Needed

Dear MySQL Community

I would like to ask for your help as I am writing a report for use cases for denormalization.
Could you please add a comment if you have used denormalization in the past to help solve a certain problem. If you could add what was the problem, how did you apply denormalization and how effective (or not) was it?

Thank you in advance.

8 comments:

  1. "...use cases for demoralisation."

    :)

    ReplyDelete
  2. LOL. You wrote "demoralisation." Is this an indication of how you feel about the matter?
    BTW, I am usually against denormalization, although I have twice used summary tables to solve pagination problems. In both cases, the trick was needed because the original design was faulty. So, in my book, normalization is still the first and best performance tuning advice, but I am always open to alternative views.

    Giuseppe

    ReplyDelete
  3. I was once accused of sawing demoralization in the company.

    But with regard to denormalization: quite a few cases.

    Simple example: t being large table, with foreign key into small lookup table l.

    Queries of the form:
    SELECT * FROM t JOIN l ON (t.l_id = l.l_id)
    WHERE t.customer_id = 3 AND l.customer_type = 7

    Can only gain from an index on a single table (since the second table will most hopefully be joined using the index on the foreign key).

    If neither filtering provides with reasonable row reduction, a case for denomalization stands.

    Copy customer_type into t. Have an index on both (customer_id, customer_type).

    I apologize for writing in such short form; I'm holding a crying baby in my other arm. I don't want to demoralise her.

    ReplyDelete
  4. @demoralisation
    damn spellcheck
    still, was quite funny..

    ReplyDelete
  5. I often feel 'demoralized' when dealing with (de)normalization done wrong.

    Most of the time I favor denormalization for performance reasons.

    My $.02
    G

    ReplyDelete
  6. Ok, now that the typo is fixed, I'd like to make the case that the question is very broad. I may elicit a lot of repsonse that way, which can be good, but at the same time, there is a risk that there will be too little focus.

    First, the definition of "denormalization" - what does that mean exactly? The word implies that we have something that is normalized, and that that state is then undone. But what is normalized? In my book, something is normalized whenever it is in at least 1NF.

    Should denormalize in this case mean storing data in sub 1NF? Or do you merely mean transforming data from higher to lower normal forms? Maybe you mean both?

    While we're at it, should denormalization include the process of joining tables? I mean, whenever you're joining, you are temporarily creating a result set that is typically in a lower normal form (typically 1NF) than the tables that are being joined (typically 3NF).

    Or does this "denormalization" due to joining not count as "real" denormalization because the redundancy that is introduced this way is controlled and cannot lead to data maintenance problems? In other words, should the term "denormalization" be confined to the act of persisting and maintaining such sub 3NF datasets?

    Another thing that might be worth bringing into the discussion is that maybe denormalization (as in, moving from higher to lower normal forms) is just one example of adding redundancy. Giuseppe already mentioned "aggregate tables", a phenomenon that certainly has to do with adding redundancy (because we can compute the aggregate from the lower-grained data). But is aggregation really a form of denormalization? If it is, it does not seem to fit so well in the idea of moving from higher to lower normal forms.

    Personally, my opinion is that "denormalization" or "redundancy" are fine as long as you can control it, and as long as it serves a purpose. The purpose maybe increased performance, although I am not so sure denormalzation always gives you that. I use denormalization, but for me it is mainly a way to simplify the data model for business intelligence applications like report writers.

    ReplyDelete
  7. Hi Jonathan,

    The best case to introduce some 'controlled redundancy' is when you need to join, but you can't filter very much on the first table. With MySQL only having nested-loop-joins, this creates a situation you'd rather not get into.

    i.e. Tables=Actors, Movies

    Find all actors that have a birth date between 1980 and 1990 and stared in a movie between 1985 and 1990.

    If we movie 'year_active_start' and 'year_active_end' to the actors table, I'd say this is a form of denormalization.

    ReplyDelete
  8. can somebody answer this question for me please?
    Under what situations would denormalization of a database schema be used? Give examples of denormalization.

    ReplyDelete