The Sign: row-based binary logging and integer signedness in MySQL and MariaDB

“I saw the sign and it opened up my eyes, I saw the sign Life is demanding without understanding” –I Saw The Sign, Ace of Base

An interesting question appeared in #mysql on Freenode recently. A user was trying to build a client that would act as a replication slave so that it could consume row-based replication events coming via the replication stream, in order to transform them into JSON that could be sent to ElasticSearch. Neat idea! But he ran into a problem: it’s not possible to tell whether integer values in the replication stream are signed or unsigned. Let’s take a look at how that works. The row-based binary log format in MySQL and MariaDB utilizes a couple kinds of events to convey various information to the client (normally a slave). The first event is TABLE_MAP_EVENT, which gives information about the structure of the tables that will be changed in a subsequent ROWS_EVENT. There’s pretty good documentation about TABLE_MAP_EVENT at the TABLE_MAP_EVENT page in the MySQL Internals Manual where we can see that this structure has fields that account for schema name, table name, and column types. The column_type_def field consists of one byte per column in the table, with the value of each byte being one of the types taken from the list on the COM_QUERY Response page. There are different types for each size of integer, including MYSQL_TYPE_INT24 and MYSQL_TYPE_LONG andMYSQL_TYPE_LONGLONG, but none of that carries any information about signedness. The column_meta_def field contains length information for data types that can vary in size, but integer types are represented by a fixed number of bytes, thus no meta information is included for integer types. There’s nowhere else to look in the TABLE_MAP_EVENT for additional information, and we’re left with the harsh reality: it is not possible to tell whether an integer value is meant to be signed or unsigned by examining the replication stream alone. And we can easily observe this by experimentation. Let’s create a table with two integer columns, one unsigned and one signed. We’ll insert the maximum value into the unsigned column and we’ll insert the equivalent sequence of bytes into the signed column, which is represented by the value “-1”. Here are some silly tricks to show the equivalency:

mysql> select conv(-1,10,2);
+------------------------------------------------------------------+
| conv(-1,10,2)                                                    |
+------------------------------------------------------------------+
| 1111111111111111111111111111111111111111111111111111111111111111 |
+------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select conv(18446744073709551615,10,2);
+------------------------------------------------------------------+
| conv(18446744073709551615,10,2)                                  |
+------------------------------------------------------------------+
| 1111111111111111111111111111111111111111111111111111111111111111 |
+------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select conv(repeat(1,64),2,-10);
+--------------------------+
| conv(repeat(1,64),2,-10) |
+--------------------------+
| -1                       |
+--------------------------+
1 row in set (0.00 sec)

mysql> select conv(repeat(1,64),2,10);
+-------------------------+
| conv(repeat(1,64),2,10) |
+-------------------------+
| 18446744073709551615    |
+-------------------------+
1 row in set (0.00 sec)

And here’s the table:

mysql> create table t1 (id int unsigned not null auto_increment primary key, col1 bigint unsigned, col2 bigint signed) engine=innodb;
Query OK, 0 rows affected (0.02 sec)

mysql> flush logs; show master status;
Query OK, 0 rows affected (0.09 sec)
+----------------------+----------+--------------+------------------+-------------------+
| File                 | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+----------------------+----------+--------------+------------------+-------------------+
| prosimmon-bin.000019 |      120 |              |                  |                   |
+----------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> insert into t1 (col1, col2) values (18446744073709551615, -1);
Query OK, 1 row affected (0.00 sec)

mysql> flush logs;
Query OK, 0 rows affected (0.10 sec)

mysql> select * from t1;
+----+----------------------+------+
| id | col1                 | col2 |
+----+----------------------+------+
|  1 | 18446744073709551615 |   -1 |
+----+----------------------+------+
1 row in set (0.00 sec)

Now we use mysqlbinlog with -vv so that it will “Reconstruct pseudo-SQL statements out of row events” and add “comments on column data types”.

$ mysqlbinlog -vv ./data/prosimmon-bin.000019
...
### INSERT INTO `test`.`t1`
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2=-1 (18446744073709551615) /* LONGINT meta=0 nullable=1 is_null=0 */
###   @3=-1 (18446744073709551615) /* LONGINT meta=0 nullable=1 is_null=0 */

We definitely inserted different values into the two columns, and our SELECT statement returns different values: 18446744073709551615 and -1. But mysqlbinlog can’t tell that they’re meant to be represented differently. It just sees the string of bytes in the binary log and then outputs both signed and unsigned interpretations, with the unsigned interpretation in parentheses. Going back a bit, you’ll recall that theTABLE_MAP_EVENT includes information about schema name, table name, and column types. Note that there is no provision in this structure to include column names! That’s why the best we can get in mysqlbinlog output is the numeric index of the column in the table, which of course means that a slave can never have a different column layout than a master when using row-based binary logging (except for additional columns at the end of the table), but also that a theoretical non-slave replication client cannot really understand what it is seeing unless it also knows the table structure. A workaround for any clever clients like the one proposed in #mysql might be for the client to scrape tables in information_schema to get information about table structure for all tables in the database, but it would then have to take on the burden of watching for new tables and changes to existing tables. I think it would be interesting for the replication facility to evolve into something that was more general purpose, in order to facilitate a variety of interesting applications, but that would require some changes to the protocol that might jeopardize backward compatibility and interoperability.