Slave Replication Errors

replication errors_edit1Image by MethoxyRoxy via Flickr

I had a problem with slave replication, every now and then we would get an error which manually had to be resolved

We used Set Global sql_slave_skip_counter = 1; start slave;

You might also get some of those when you set up a new slave using a mysqldump. If from the time you started the mysqldump and by the time it finished, some lines were added to the bin logs or you started it at the wrong position, then you would get some duplication errors.

While it looks concerning, don’t panic. You can fix it by skipping the lines that are duplicates, although it’s not an ideal situation.


Now if you want to completely avoid errors, or specifically duplicates and you are sure that the odd ones that come along are not important, then why not try log-slave-errors=1062 . Even better, you can set it to “all” and be completely oblivious to any problem that comes along.

Every now and then you can check the log errors and if there are too many problems, get a new mysqldump to re-sync the slave.

4 comments:

  1. "...try log-slave-errors=1062 . Even better, you can set it to “all”..."

    I hope you're kidding that this is better ;-)

    ReplyDelete
  2. mysqldump --lock-tables (or --lock-all-tables) solves the problem of data added to the binlog while the dump runs.

    ReplyDelete
  3. "Set Global sql_slave_skip_counter = 1; start slave;"

    Except the rare cases where I knew exactly why the problem occurred in the first place (e.g. a database existed on the slave which did not exist on the master, and a CREATE DATABASE was invoked),

    I have yet to encounter the case where it would be safe to simply skip a query. I -always- find that it leads to serious data corruption on the slave.

    That is not to say I don;t try it out: I do use it, but just as a way to understand "what went wrong and how". At any case I later on re-sync the slave with the master.

    If you're using InnoDB only tables, use mysqldump with --single-transaction --master-data and the rest of the usual suspects, It hasn't failed me yet.

    ReplyDelete
  4. Shlomi, I have encountered many time cases where query manually executed on the master are interrupted and somehow (don't ask me why) they are indeed replicated but the replication thread breaks. So you manually checks that the data affected by the query is actually up to date in the slave(s) and then you can simply skip that query and go on. You're not loosing consistency at all.

    ReplyDelete