A New, Consistent Way to Monitor Replication Lag in MariaDB Community Server 11.8 LTS Release

spacer

This blog post describes three new fields to monitor replication lag added in MariaDB Community Server 11.8 LTS and MariaDB Enterprise Server 10.6.20 and 11.4.3: Master_last_event_time, Slave_last_event_time, and Master_Slave_time_diff. The existing method of monitoring replication lag using Seconds_Behind_Master is both complex and confusing. That is, the formula used to calculate the value changes depending on the underlying replication parallelization configuration (i.e. is the replica serial or parallel) and state (i.e. is the replica currently idle or not).  These new fields aim to improve on Seconds_Behind_Master by providing consistent behavior independent of replica configuration, with unambiguous definitions.

It is also worth-while to mention that replica status can now be queried via the information schema table INFORMATION_SCHEMA.slave_status in these same MariaDB versions (MDEV-33526). This allows users to query specific fields output by SHOW ALL REPLICAS STATUS, and also provides introspection abilities into a server’s replication settings using stored routines.

Background

The current method of monitoring replication lag is to use the field Seconds_Behind_Master from the command SHOW REPLICA STATUS. This field, however, is both complex and confusing. The MariaDB documentation currently defines this field as follows:

Difference between the timestamp logged on the master for the event that the replica is currently processing, and the current timestamp on the replica. Zero if the replica is not currently processing an event. With serial replication, seconds_behind_master is updated when the SQL thread begins executing a transaction. With parallel replication, seconds_behind_master is updated only after transactions commit. Starting in MariaDB 10.5.19, 10.6.12, 10.8.7, 10.9.5, 10.10.3, and 10.11.2, an exception is drawn on the parallel replica to additionally update seconds_behind_master when the first transaction received after idling is queued to a worker for execution, to provide a reliable initial value for the duration until a transaction commits. Additional behaviors to be aware of are as follows: 1) Seconds_Behind_Master will update for ignored events, e.g. those skipped due to sql_slave_skip_counter. 2) On the serial replica, transactions with prior timestamps can update Seconds_Behind_Master such that it can go backwards, though this is not true for the parallel replica. 3) When configured with MASTER_DELAY, as a replicated transaction begins executing (i.e. on a serial or post-idle parallel replica), Seconds_Behind_Master will update before delaying, and while delaying occurs will grow to encompass the configured value. 4) There is a known issue, tracked by MDEV-17516, such that Seconds_Behind_Master will initially present as 0 on replica restart until a replicated transaction begins executing, even if the last replica session was lagging behind when stopped.

Figure 1 provides some context to this definition, both to 1) help decipher how it works, and 2) use as contrast to understand how the three new fields work. Note the workflow depicts the process to update Seconds_Behind_Master for a serial replica; however, the idea is generalizable for each parallel worker thread, and therefore, we also show how the parallel replica updates value on the same figure.

The workflow starts with a user executing two transactions, Trx1 and Trx2 on the MariaDB Primary. Each transaction is written into the primary server’s binary log using the same pattern: a GTID event begins the transaction, Rows events provide the data modified by the transaction, and the XID event ends the transaction (the replica will COMMIT the transaction when encountering this event). These events are sent to the replica’s IO Thread by the primary’s dump thread. The IO Thread simply buffers these events into the replica’s relay log for crash safety. The replica’s SQL Thread reads each event one by one, and processes it. That is, it will delay the event according to the configured MASTER_DELAY (if any), execute the event, and when the final event of the transaction is encountered (i.e. the XID event in the figure), will additionally update the state of the replica (e.g., the gtid_slave_pos is updated at this point). Note that on the parallel replica, the processing of the event and updating of the replica state is offloaded to a parallel worker thread. Once one transaction is finished, the SQL Thread will repeat the process for the next transaction, until all events in the relay log have been processed.

To understand how Seconds_Behind_Master relates to this workflow, it is calculated by taking the difference between two values: 1) slave.now(), the current system time as reported by the replica server, and 2) last_master_timestamp, the timestamp of the last binary log event seen by the replica. Where the current system time (slave.now()) on the replica does add some complexity to this calculation, the main source of confusion comes from last_master_timestamp, as it is updated at different points depending on the configuration of the replica. For a serial replica, the SQL Thread will update last_master_timestamp for every event it sees, as soon as it reads it from the relay log. This is shown by the yellow line in figure 1. The parallel replica will update last_master_timestamp at two times: 1) by a parallel worker thread when the slave state is updated, which happens at commit-time (which is after an event is delayed by the configured MASTER_DELAY, and 2) by the SQL Thread for the first event that comes in after the replica has been idle (which is before the event is delayed). The updates to last_master_timestamp on the parallel replica are shown in figure 1 by the red lines.

Due to the behavior of Seconds_Behind_Master changing depending on the replica’s configuration, with many things happening in-between the different points at which the value can be updated, it has led to a lot of confusion in trying to understand its reported values. In addition, because of this complexity, it has proven to be prone to bugs. Ultimately, because Seconds_Behind_Master is not well-defined, it is not a consistent and reliable source to determine replication lag. It provides a vague notion at best.

Solution

To provide a consistent solution to monitor replication lag, three new fields have been added to the output of SHOW ALL REPLICAS STATUS and INFORMATION_SCHEMA.slave_status (taking the definitions from our documentation):

  1. Master_last_event_time: Timestamp of the last event read from the primary by the IO thread. NULL until the replica has started and the replica has read one query event from the primary that changes data.
  2. Slave_last_event_time: Primary timestamp of the last event committed on the replica. NULL until the replica has started and the replica has read one query event from the primary that changes data.
  3. Master_Slave_time_diff: The difference of the above two timestamps. NULL until the replica has started and the replica has read one query event from the primary that changes data.

Figure 2 shows how a replica updates these new fields. Master_last_event_time is updated by the IO Thread when it reads in a new event (events which commit only), and the value used is the timestamp from the event. Effectively, this field provides the timestamp of the latest transaction from the primary which has been transmitted to the replica. Slave_last_event_time is updated by the SQL Thread (or parallel worker threads, if using a parallel configuration) when a transaction is committed, and the value used is the timestamp of the event which commits the transaction. Effectively, this field provides the timestamp when the primary had that same data state. Master_Slave_time_diff is calculated when running SHOW ALL REPLICAS STATUS as the difference between Master_last_event_time and Slave_last_event_time. This field provides insight into replication lag, as a function of time passed on the primary between the current state of the replica, and the last known transaction applied on the primary.

The benefits of using these new fields to monitor replication lag, as opposed to Seconds_Behind_Master, include: 

  1. Each field is well-defined and serves one specific purpose. Seconds_Behind_Master, on the other hand, combines many ideas into one reported value.
  2. Each new field has a single point at which it is updated, allowing for consistent behavior across different replication configurations. Seconds_Behind_Master uses last_master_timestamp, which is updated at different points depending on the replica configuration.
  3. The current system time of the replica is irrelevant in the new fields, leading to a simpler understanding of their reported values.
  4. These timestamps can be directly cross-referenced with the binary log, allowing for more straightforward analysis. Seconds_Behind_Master cannot be referenced in the binary log.

Please let us know if you’ve found more benefits than these listed!

Examples

To showcase how these new fields report on replication lag, we set up a simple asynchronous primary-replica topology with two transactions that reflect the same binary log as those in figures 1 and 2 (that is, two transactions with the same timestamps). On the replica, the first transaction will complete, and we will artificially hold the second transaction on a row lock so it can’t finish. This simulates a replica which is actively executing events, giving us an opportunity to view the replication lag. This following snippet outlines the binary log events (header data only) that will be replicated, and is taken from mariadb-binlog:

#250319 13:49:05 server id 1  end_log_pos 371 CRC32 0x73119bd0 	GTID 0-1-1 trans
#250319 13:49:05 server id 1 end_log_pos 0 CRC32 0xbd5a6bb1 Annotate_rows:
#250319 13:49:05 server id 1 end_log_pos 0 CRC32 0x1d696443 Table_map: `test`.`t1` mapped to number 33
#250319 13:49:05 server id 1 end_log_pos 0 CRC32 0x81922514 Write_rows: table id 33 flags: STMT_END_F
#250319 13:49:05 server id 1 end_log_pos 533 CRC32 0xff151939 Xid = 115
#250319 13:49:07 server id 1 end_log_pos 575 CRC32 0x24a20b61 GTID 0-1-2 trans
#250319 13:49:07 server id 1 end_log_pos 0 CRC32 0x00083d64 Annotate_rows:
#250319 13:49:07 server id 1 end_log_pos 0 CRC32 0xa9db9cae Table_map: `test`.`t2` mapped to number 34
#250319 13:49:07 server id 1 end_log_pos 0 CRC32 0x06902dc9 Write_rows: table id 34 flags: STMT_END_F
#250319 13:49:07 server id 1 end_log_pos 738 CRC32 0xe8b3d05b Xid = 117

Query via SHOW ALL REPLICAS STATUS

One can monitor these new fields the traditional way using SHOW ALL REPLICAS STATUS, for example:

MariaDB [(none)]> SHOW ALL REPLICAS STATUS\G
*************************** 1. row ***************************
...
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-1-2
...
Slave_SQL_Running_State: Waiting for table metadata lock
...
Gtid_Slave_Pos: 0-1-1
Master_last_event_time: 2025-03-19 13:49:07
Slave_last_event_time: 2025-03-19 13:49:05
Master_Slave_time_diff: 2
1 row in set (0.003 sec)

First, we prove the state of the replica matches the aforementioned experimental setup description. Gtid_IO_Pos reflects the state of the IO Thread as the latest transaction it has written into its relay logs. It shows as 0-1-2, meaning that the slave has received both transactions and written them into the relay log. Gtid_Slave_Pos reflects the last transaction committed by the SQL Thread on the replica. It shows as 0-1-1, meaning that the slave has committed the first transaction, but not the second. Slave_SQL_Running_State describes what the SQL Thread is currently doing. It shows that it is waiting for table metadata lock, meaning that it is working on transaction 0-1-2, but has not yet committed it.

The last three fields of the output are the new fields which help report replication lag of this execution state. Master_last_event_time reflects the timestamp of the latest transaction transmitted to the replica (i.e. GTID 0-1-2 with timestamp 2025-03-19 13:49:07). Slave_last_event_time, reflects the timestamp of the last committed transaction (i.e. GTID 0-1-1 with timestamp 2025-03-19 13:49:05). Master_Slave_time_diff calculates the difference between these timestamps to show how much time has elapsed on the primary between the replica’s current state, and the last transaction applied on the primary (i.e., 2 seconds).

Query via INFORMATION_SCHEMA.slave_status

Alternatively, one can run SELECT queries on the new table INFORMATION_SCHEMA.slave_status to monitor the values of these new fields:

MariaDB [(none)]> SELECT Master_last_event_time, Slave_last_event_time, Master_slave_time_diff FROM INFORMATION_SCHEMA.slave_status;
+------------------------+-----------------------+------------------------+
| Master_last_event_time | Slave_last_event_time | Master_slave_time_diff |
+------------------------+-----------------------+------------------------+
| 2025-03-19 13:49:07 | 2025-03-19 13:49:05 | 2 |
+------------------------+-----------------------+------------------------+
1 row in set (0.002 sec)

Note that all fields output by SHOW ALL REPLICAS STATUS may be queried via INFORMATION_SCHEMA.slave_status; though for this example, we limit the output columns to the new fields only.

Conclusion

This blog post provided an overview of three new fields to monitor replication lag added in MariaDB Community Server 11.8 LTS: Master_last_event_time, Slave_last_event_time, and Master_Slave_time_diff, while also touching on the new information schema table, INFORMATION_SCHEMA.slave_status, which holds all fields traditionally viewed by SHOW ALL REPLICAS STATUS. The key idea was to show that the new fields add consistency to monitoring replication lag, in contrast to the old method of using Seconds_Behind_Master which is not well-defined.