Forcing a Slave Server to Recreate Temporary Tables After an Unsafe Shutdown

Losing temporary tables on a slave when binlog_format is not set to ROW is a well-known problem, and there is even a way to avoid it, as described by the safe slave shutdown procedure in the MySQL documentation. However, the documentation doesn’t describe how to fix your slave if you accidentally shut it down while it has temporary tables open. In this blog post, I’ll describe how to do that.

The Problem

Let’s say that you run these statements on a master server:

-- statement 1
DROP TEMPORARY TABLE IF EXISTS tmp_table;
-- statement 2
CREATE TEMPORARY TABLE tmp_table (
	id int primary key,
	str varchar(10)
);
-- statement 3
INSERT INTO real_table VALUES (1, 'str1', false, NULL);
-- statement 4
INSERT INTO real_table VALUES (2, 'str2', false, NULL);
-- statement 5
INSERT INTO tmp_table SELECT id, str FROM real_table;
-- statement 6
UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table);

These statements will end up in the slave’s binary log, and it will execute them one at a time. If the slave is shut down after it executes statement 2, but before it executes statements 5 and 6, then tmp_table will not be available to the SQL thread when the server is brought back up. This will result in errors that look like this in SHOW SLAVE STATUS:

Last_SQL_Errno: 1146
Last_SQL_Error: Error 'Table 'db1.tmp_table' doesn't exist' on query. Default database: 'db1'. Query: 'INSERT INTO tmp_table SELECT id, str FROM real_table'

The Solution

Normally, to fix a missing table on a slave, you can just manually create the table on the slave. However, temporary tables are unique to a given session, so if you create the temporary table manually, the SQL thread still won’t see it. To fix the problem, you have to actually force the SQL thread to re-execute the CREATE TEMPORARY TABLE statement for the thread that originally executed the statement (as given by the thread_id identifier in the binary log). At the same time, you don’t want the slave to re-execute statements that will result in duplicate key errors or cause the slave to become inconsistent.

To figure out how to do this in the example above, let’s look at the binary logs for these events:

# at 603
#151015 16:46:35 server id 1  end_log_pos 705   Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1444941995/*!*/;
DROP TEMPORARY TABLE IF EXISTS tmp_table
/*!*/;
# at 705
#151015 16:46:35 server id 1  end_log_pos 839   Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1444941995/*!*/;
CREATE TEMPORARY TABLE tmp_table (
id int primary key,
str varchar(10)
)
/*!*/;
# at 839
#151015 16:46:35 server id 1  end_log_pos 955   Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1444941995/*!*/;
INSERT INTO real_table VALUES (1, 'str1', false, NULL)
/*!*/;
# at 955
#151015 16:46:35 server id 1  end_log_pos 1071  Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1444941995/*!*/;
INSERT INTO real_table VALUES (2, 'str2', false, NULL)
/*!*/;
# at 1071
#151015 16:46:35 server id 1  end_log_pos 1185  Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1444941995/*!*/;
INSERT INTO tmp_table SELECT id, str FROM real_table
/*!*/;
# at 1185
#151015 16:46:43 server id 1  end_log_pos 1352  Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1444942003/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table)
/*!*/;
# at 1352
#151015 16:46:51 server id 1  end_log_pos 1476  Query   thread_id=1     exec_time=0     error_code=0
SET TIMESTAMP=1444942011/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `tmp_table`
/*!*/;

Let’s outline where our statements are located in the binary log:

  • Statement 1
    Statement: DROP TEMPORARY TABLE IF EXISTS tmp_table;
    Start pos: 603
    End pos: 705
  • Statement 2
    Statement:

CREATE TEMPORARY TABLE tmp_table (
id int primary key,
str varchar(10)
);

Start pos: 705
End pos: 839

  • Statement 3
    Statement: INSERT INTO real_table VALUES (1, 'str1', false, NULL);
    Start pos: 839
    End pos: 955
  • Statement 4
    Statement: INSERT INTO real_table VALUES (2, 'str2', false, NULL);
    Start pos: 955
    End pos: 1071
  • Statement 5
    Statement: INSERT INTO tmp_table SELECT id, str FROM real_table;
    Start pos: 1071
    End pos: 1185
  • Statement 6
    Statement: UPDATE real_table SET processed=true, processed_time=NOW() WHERE id IN (SELECT id FROM tmp_table);
    Start pos: 1185
    End pos: 1352

If the slave is shutdown unsafely, it will likely fail at statement 5: i.e. when the slave tries to insert into the temporary table. So what can we tell the slave to do in order to fix this? In English, the steps would be:

  • Go back and execute statement 2.
  • Ignore statements 3-4.
  • Start executing normally at statement 5.

We can convert these steps into a series of commands using the binary log positions above:

STOP SLAVE;
-- Command 2 starts at 705
CHANGE MASTER TO Master_log_file='mysqld-bin.000001', Master_log_pos=705;
-- Command 3 starts at position 839, so let's end the slave before then
START SLAVE UNTIL Master_log_file='mysqld-bin.000001', Master_log_pos=838;
-- Wait a small amount of time for our SQL thread to execute the statement
SELECT SLEEP(60);
STOP SLAVE;
-- Command 5 starts at position 1071
CHANGE MASTER TO Master_log_file='mysqld-bin.000001', Master_log_pos=1071;
START SLAVE;

Conclusion

It’s not that difficult to recreate temporary tables on a slave after an unsafe shutdown, as long as you still have the binary logs that contained the CREATE TEMPORARY TABLE statements. However, it can be quite tedious if there were a lot of temporary tables open at the time of the shutdown. If your application uses a lot of temporary tables, you may have a lot of errors to fix. In that case, you should consider setting binlog_format=ROW, which would avoid this problem entirely.

Has anyone else figured out interesting ways around this problem?