Forcing a Slave Server to Recreate Temporary Tables After an Unsafe Shutdown
Losing temporary tables on a slave when binlog_format
is not set to ROW
is a well-known problem, and there is even a way to avoid it, as described by the safe slave shutdown procedure in the MySQL documentation. However, the documentation doesn’t describe how to fix your slave if you accidentally shut it down while it has temporary tables open. In this blog post, I’ll describe how to do that.
The Problem
Let’s say that you run these statements on a master server:
-- statement 1 DROP TEMPORARY TABLE IF EXISTS tmp_table; -- statement 2 CREATE TEMPORARY TABLE tmp_table ( id int primary key, str varchar(10) ); -- statement 3 INSERT INTO real_table VALUES (1, 'str1', false, NULL); -- statement 4 INSERT INTO real_table VALUES (2, 'str2', false, NULL); -- statement 5 INSERT INTO tmp_table SELECT id, str FROM real_table; -- statement 6 UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table);
These statements will end up in the slave’s binary log, and it will execute them one at a time. If the slave is shut down after it executes statement 2, but before it executes statements 5 and 6, then tmp_table
will not be available to the SQL thread when the server is brought back up. This will result in errors that look like this in SHOW SLAVE STATUS
:
Last_SQL_Errno: 1146 Last_SQL_Error: Error 'Table 'db1.tmp_table' doesn't exist' on query. Default database: 'db1'. Query: 'INSERT INTO tmp_table SELECT id, str FROM real_table'
The Solution
Normally, to fix a missing table on a slave, you can just manually create the table on the slave. However, temporary tables are unique to a given session, so if you create the temporary table manually, the SQL thread still won’t see it. To fix the problem, you have to actually force the SQL thread to re-execute the CREATE TEMPORARY TABLE statement for the thread that originally executed the statement (as given by the thread_id
identifier in the binary log). At the same time, you don’t want the slave to re-execute statements that will result in duplicate key errors or cause the slave to become inconsistent.
To figure out how to do this in the example above, let’s look at the binary logs for these events:
# at 603 #151015 16:46:35 server id 1 end_log_pos 705 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; DROP TEMPORARY TABLE IF EXISTS tmp_table /*!*/; # at 705 #151015 16:46:35 server id 1 end_log_pos 839 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; CREATE TEMPORARY TABLE tmp_table ( id int primary key, str varchar(10) ) /*!*/; # at 839 #151015 16:46:35 server id 1 end_log_pos 955 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; INSERT INTO real_table VALUES (1, 'str1', false, NULL) /*!*/; # at 955 #151015 16:46:35 server id 1 end_log_pos 1071 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; INSERT INTO real_table VALUES (2, 'str2', false, NULL) /*!*/; # at 1071 #151015 16:46:35 server id 1 end_log_pos 1185 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444941995/*!*/; INSERT INTO tmp_table SELECT id, str FROM real_table /*!*/; # at 1185 #151015 16:46:43 server id 1 end_log_pos 1352 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444942003/*!*/; SET @@session.time_zone='SYSTEM'/*!*/; UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table) /*!*/; # at 1352 #151015 16:46:51 server id 1 end_log_pos 1476 Query thread_id=1 exec_time=0 error_code=0 SET TIMESTAMP=1444942011/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=8,@@session.collation_server=8/*!*/; DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `tmp_table` /*!*/;
Let’s outline where our statements are located in the binary log:
- Statement 1
Statement:DROP TEMPORARY TABLE IF EXISTS tmp_table;
Start pos: 603
End pos: 705 - Statement 2
Statement:
CREATE TEMPORARY TABLE tmp_table (
id int primary key,
str varchar(10)
);
Start pos: 705
End pos: 839
- Statement 3
Statement:INSERT INTO real_table VALUES (1, 'str1', false, NULL);
Start pos: 839
End pos: 955 - Statement 4
Statement:INSERT INTO real_table VALUES (2, 'str2', false, NULL);
Start pos: 955
End pos: 1071 - Statement 5
Statement:INSERT INTO tmp_table SELECT id, str FROM real_table;
Start pos: 1071
End pos: 1185 - Statement 6
Statement:UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table);
Start pos: 1185
End pos: 1352
If the slave is shutdown unsafely, it will likely fail at statement 5: i.e. when the slave tries to insert into the temporary table. So what can we tell the slave to do in order to fix this? In English, the steps would be:
- Go back and execute statement 2.
- Ignore statements 3-4.
- Start executing normally at statement 5.
We can convert these steps into a series of commands using the binary log positions above:
STOP SLAVE; -- Command 2 starts at 705 CHANGE MASTER TO Master_log_file='mysqld-bin.000001', Master_log_pos=705; -- Command 3 starts at position 839, so let's end the slave before then START SLAVE UNTIL Master_log_file='mysqld-bin.000001', Master_log_pos=838; -- Wait a small amount of time for our SQL thread to execute the statement SELECT SLEEP(60); STOP SLAVE; -- Command 5 starts at position 1071 CHANGE MASTER TO Master_log_file='mysqld-bin.000001', Master_log_pos=1071; START SLAVE;
Conclusion
It’s not that difficult to recreate temporary tables on a slave after an unsafe shutdown, as long as you still have the binary logs that contained the CREATE TEMPORARY TABLE
statements. However, it can be quite tedious if there were a lot of temporary tables open at the time of the shutdown. If your application uses a lot of temporary tables, you may have a lot of errors to fix. In that case, you should consider setting binlog_format=ROW
, which would avoid this problem entirely.
Has anyone else figured out interesting ways around this problem?