November 2, 2015

Forcing a Slave Server to Recreate Temporary Tables After an Unsafe Shutdown

Losing temporary tables on a slave when binlog_format is not set to ROW is a well-known problem, and there is even a way to avoid it, as described by the safe slave shutdown procedure in the MySQL documentation. However, the documentation doesn't describe how to fix your slave if you accidentally shut it down while it has temporary tables open. In this blog post, I'll describe how to do that.

The Problem

Let's say that you run these statements on a master server:

-- statement 1 DROP TEMPORARY TABLE IF EXISTS tmp_table; -- statement 2 CREATE TEMPORARY TABLE tmp_table ( id int primary key, str varchar(10) ); -- statement 3 INSERT INTO real_table VALUES (1, 'str1', false, NULL); -- statement 4 INSERT INTO real_table VALUES (2, 'str2', false, NULL); -- statement 5 INSERT INTO tmp_table SELECT id, str FROM real_table; -- statement 6 UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table);

These statements will end up in the slave's binary log, and it will execute them one at a time. If the slave is shut down after it executes statement 2, but before it executes statements 5 and 6, then tmp_table will not be available to the SQL thread when the server is brought back up. This will result in errors that look like this in SHOW SLAVE STATUS:

Last_SQL_Errno: 1146 Last_SQL_Error: Error 'Table 'db1.tmp_table' doesn't exist' on query. Default database: 'db1'. Query: 'INSERT INTO tmp_table SELECT id, str FROM real_table'

The Solution

Normally, to fix a missing table on a slave, you can just manually create the table on the slave. However, temporary tables are unique to a given session, so if you create the temporary table manually, the SQL thread still won't see it. To fix the problem, you have to actually force the SQL thread to re-execute the CREATE TEMPORARY TABLE statement for the thread that originally executed the statement (as given by the thread_id identifier in the binary log). At the same time, you don't want the slave to re-execute statements that will result in duplicate key errors or cause the slave to become inconsistent.

To figure out how to do this in the example above, let's look at the binary logs for these events:

# at 603 #151015 16:46:35 server id 1 end_log_pos 705 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; DROP TEMPORARY TABLE IF EXISTS tmp_table /*!*/; # at 705 #151015 16:46:35 server id 1 end_log_pos 839 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; CREATE TEMPORARY TABLE tmp_table ( id int primary key, str varchar(10) ) /*!*/; # at 839 #151015 16:46:35 server id 1 end_log_pos 955 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; INSERT INTO real_table VALUES (1, 'str1', false, NULL) /*!*/; # at 955 #151015 16:46:35 server id 1 end_log_pos 1071 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; INSERT INTO real_table VALUES (2, 'str2', false, NULL) /*!*/; # at 1071 #151015 16:46:35 server id 1 end_log_pos 1185 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; INSERT INTO tmp_table SELECT id, str FROM real_table /*!*/; # at 1185 #151015 16:46:43 server id 1 end_log_pos 1352 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444942003/*!*/; SET @@session.time_zone='SYSTEM'/*!*/; UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table) /*!*/; # at 1352 #151015 16:46:51 server id 1 end_log_pos 1476 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444942011/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=8,@@session.collation_server=8/*!*/; DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `tmp_table` /*!*/;

Let's outline where our statements are located in the binary log:

  • Statement 1
    Statement: DROP TEMPORARY TABLE IF EXISTS tmp_table;
    Start pos: 603
    End pos: 705
  • statement 2
    statement: CREATE TEMPORARY TABLE tmp_table ( id int primary key, str varchar(10) );
    Start pos: 705
    End pos: 839
  • Statement 3
    Statement: INSERT INTO real_table VALUES (1, 'str1', false, NULL);
    Start pos: 839
    End pos: 955
  • Statement 4
    Statement: INSERT INTO real_table VALUES (2, 'str2', false, NULL);
    Start pos: 955
    End pos: 1071
  • Statement 5
    Statement: INSERT INTO tmp_table SELECT id, str FROM real_table;
    Start pos: 1071
    End pos: 1185
  • Statement 6
    Statement: UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table);
    Start pos: 1185
    End pos: 1352

If the slave is shutdown unsafely, it will likely fail at statement 5: i.e. when the slave tries to insert into the temporary table. So what can we tell the slave to do in order to fix this? In English, the steps would be:

  • Go back and execute statement 2.
  • Ignore statements 3-4.
  • Start executing normally at statement 5.

We can convert these steps into a series of commands using the binary log positions above:

STOP SLAVE; -- Statement 2 starts at 705 CHANGE MASTER TO Master_log_file='mysqld-bin.000001', Master_log_pos=705; -- Statement 3 starts at position 839, so let's end the slave before then START SLAVE UNTIL Master_log_file='mysqld-bin.000001', Master_log_pos=838; -- Wait a small amount of time for our SQL thread to execute the statement SELECT SLEEP(60); STOP SLAVE; -- Statement 5 starts at position 1071 CHANGE MASTER TO Master_log_file='mysqld-bin.000001', Master_log_pos=1071; START SLAVE;

Conclusion

It's not that difficult to recreate temporary tables on a slave after an unsafe shutdown, as long as you still have the binary logs that contained the CREATE TEMPORARY TABLE statements. However, it can be quite tedious if there were a lot of temporary tables open at the time of the shutdown. If your application uses a lot of temporary tables, you may have a lot of errors to fix. In that case, you should consider setting binlog_format=ROW, which would avoid this problem entirely.

Has anyone else figured out interesting ways around this problem?

About Geoff Montee

Geoff Montee is a Support Engineer with MariaDB. He has previous experience as a Database Administrator/Software Engineer with the U.S. Government, and as a System Administrator and Software Developer at Florida State University.

Read all posts by Geoff Montee