MariaDB Point In Time Recovery

spacer

Database backup and restore are arguably the most important tasks of the database administrator (DBA). Once in a while, the DBA must restore a database to a point in time in the past. In this article, we will learn to do just that in MariaDB Server. We will restore from a backup and then apply the binary logs up to a point in the past.

GETTING STARTED

Complete the exercises in this article on a MariaDB instance that is for testing purposes. It is assumed you are accessing the server as root.

We will create a database schema with a table that is constantly changing in a way that we can measure changes in time.

On the linux command line, use git to download the project files:
git clone https://github.com/mariadb-edwardstoever/MY_DB.git
cd MY_DB

Review the files that are included in the download.

Copy the file zzz_options.cnf to the appropriate directory for your linux distro.

ON DEBIAN distros:
cp zzz_options.cnf /etc/mysql/mariadb.conf.d/

ON REDHAT distros:

cp zzz_options.cnf /etc/my.cnf.d/

On REDHAT distros, create the directory /var/log/mysql:
mkdir -p /var/log/mysql
chown mysql:mysql /var/log/mysql

Restart the MariaDB instance:
systemctl restart mariadb

Check that that binary logging is ON and saving binlogs to /var/log/mysql
mariadb -Ae "show global variables where Variable_name in ('log_bin','log_bin_basename');"

Check that the max_binlog_size is 524288 bytes:
mariadb -Ae "show global variables like 'max_binlog_size';"

A small value for max_binlog_size will cause the database to generate new binary logs frequently, which is what we need for this exercise.

Run the CREATE_MY_DB_SCHEMA.sql to create a schema called MY_DB:
mariadb < CREATE_MY_DB_SCHEMA.sql

Add the following entry to crontab to insert and update rows continuously:
* * * * * mariadb < /root/MY_DB/ROWS_ON_MY_DB_TABLE.sql

Check that rows are being added to the table MY_DB_TABLE:
mariadb -Ae "select count(*) from MY_DB.MY_DB_TABLE;"

Once we have created the schema MY_DB and the SQL script is being run from crontab, we can leave it alone to accumulate some rows. I suggest allowing it to run for 15 minutes before going to the next step.

INITIAL BACKUP

After 15 minutes, the table MY_DB_TABLE will have about 900 rows. To take the first backup, we run the following script:
./backup_mariadb.sh
The backup is a consistent picture of the database up to the point in time that it was taken.

We want to demonstrate recovering the database to any point in time, not just to the point that the backup was taken. To do that we need to allow more rows to be inserted, and more binary logs to be generated. Allow the changes made by the script in crontab to run for about 40 minutes more before going to the next step.

REVIEWING DATABASE ACTIVITY

Now that the database has accumulated many changes, let’s see how many rows exist on our table:
mariadb -Ae "select count(*) from MY_DB.MY_DB_TABLE;"

If we run that query multiple times, we can see the table is growing and changing continuously.

STOPPING DATABASE ACTIVITY

Comment out the crontab instruction and save
# * * * * * mariadb < /root/MY_DB/ROWS_ON_MY_DB_TABLE.sql

Wait a minute for the last cron task to complete. Then check that the number of rows does not grow by selecting count a few times.
mariadb -Ae "select count(*) from MY_DB.MY_DB_TABLE;"

We don’t want to generate new binary logs while we are applying changes that exist in binary logs.

Edit the file zzz_options.cnf that is saved in a subdirectory of /etc, uncomment or add the line skip-log-bin and restart the database.

Now, check that log_bin is OFF:
mariadb -Ae "show global variables like 'log_bin'"

Let’s take a note of a few details about our table to see exactly where it is after the final transaction has been completed.
mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED) from MY_DB.MY_DB_TABLE);"
mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_UPDATED=(select max(ROW_UPDATED) from MY_DB.MY_DB_TABLE);"
mariadb -Ae "checksum table MY_DB.MY_DB_TABLE;"

Copy the results of these queries and paste them into a text editor and set them aside. You will refer to them later.

RESTORE FROM BACKUP

Our first task is to restore the database. In the directory created by the git command, you will find a script restore_from_backup.sh. It will look like this:

#!/bin/bash
# Script by Edward Stoever for Mariadb Support

# EDIT THE VARIABLES ACCORDINGLY
TARGETDIR=/BACKUPS/mariabackup_2024-07-06-16-40
DATADIR=/var/lib/mysql

if [ ! -d ${TARGETDIR} ]; then echo "Edit this script accordingly."; exit; fi
systemctl stop mariadb
rm -fr ${DATADIR}/*
mariabackup --prepare --target-dir=${TARGETDIR}
mariabackup --copy-back --target-dir=${TARGETDIR}
chown -R mysql:mysql ${DATADIR}
systemctl start mariadb

Edit the script and change the values for the variables according to what is on your server. Run the script restore_from_backup.sh

We want to see how many rows are now on our table MY_DB_TABLE.
mariadb -Ae "select count(*) from MY_DB.MY_DB_TABLE;"

You should find fewer rows than existed prior to the restore. That is because all of the rows that were created and updated after the backup do not exist yet.

IDENTIFY THE STARTING POINT

In order to restore the database to its most recent transaction, we need to find a text file called xtrabackup_binlog_info.
find /BACKUPS/ -name xtrabackup_binlog_info

Review the contents of that file. In my case, it looks like this:
mariadb-bin.000002 208286 0-1-423

We see that in order to start applying binary logs, we must begin with log mariadb-bin.000002 at position 208286.

WHAT IS A BINLOG POSITION?

A position is an integer that increases with each new transaction. You should understand the following:

  • Positions are not sequential. The first one in a given logfile is always 256, but from there, they go up in uneven and arbitrary intervals.
  • If you indicate a position but you do not indicate a file name, you have provided incomplete information. The position could be from any binlog file.
  • In a binary log that is converted to text by mariadb-binlog, each position is indicated by end_log_pos and this indication is written prior to the event it corresponds to.
APPLY ALL BINARY LOGS

In the directory created by the git command, you will find a script apply_all_binary_logs.sh. It will look like this:

#!/bin/bash
# Script by Edward Stoever for Mariadb Support

# EDIT THE VARIABLES ACCORDINGLY
MARIADB_CLIENT="mariadb -u root --socket=/run/mysqld/mysqld.sock"
BINLOGDIR=/var/log/mysql
BINLOG_BASENAME=mariadb-bin
FIRSTBINLOG=${BINLOGDIR}/${BINLOG_BASENAME}.000002
STARTPOSITION=208286

LOG_BIN=$($MARIADB_CLIENT  -ABNe "select VARIABLE_VALUE from information_schema.GLOBAL_VARIABLES where VARIABLE_NAME='LOG_BIN';")
if [ ! "${LOG_BIN}" == "OFF" ]; then echo "BINARY LOGGING SHOULD BE OFF."; exit; fi
$MARIADB_CLIENT -Ae "select 1" 1>/dev/null 2>&1 || echo "Mariadb client is not connecting."

# SANITIZE BINLOG NAMES
FIRSTBINLOG=$(echo ${FIRSTBINLOG} | sed "s/\/\//\//g"); LASTBINLOG=$(echo ${LASTBINLOG} | sed "s/\/\//\//g")
if [ ! -f ${FIRSTBINLOG} ]; then echo "Edit this script accordingly."; exit; fi

# A SIMPLE ARRAY, ALL BINLOGS IN ORDER:
unset STARTED BINLOGNAMES ALLBINLOGNAMES
ALLBINLOGNAMES=($(find ${BINLOGDIR} -name "${BINLOG_BASENAME}.*[0-9]" | sort))
# POPULATE A SUBARRAY STARTING AT $FIRSTBINLOG
for (( k=0; k<${#ALLBINLOGNAMES[@]}; k++ )); do
  if [ "${ALLBINLOGNAMES[$k]}" == "${FIRSTBINLOG}" ] || [ $STARTED ]; then
   STARTED=true;
   BINLOGNAMES+=(${ALLBINLOGNAMES[$k]});
  fi
done
unset ALLBINLOGNAMES

for (( j=0; j<${#BINLOGNAMES[@]}; j++ ));
do
if [ "$j" == "0" ]; then
  # THIS BINLOG IS THE FIRST
  echo "Applying binlog ${BINLOGNAMES[$j]} (first from position ${STARTPOSITION})"
  mariadb-binlog --no-defaults --start-position=${STARTPOSITION} ${BINLOGNAMES[$j]} | $MARIADB_CLIENT

else
  # THIS BINLOG IS NOT THE FIRST
  echo "Applying binlog ${BINLOGNAMES[$j]}"
  mariadb-binlog --no-defaults ${BINLOGNAMES[$j]} | $MARIADB_CLIENT

fi
done

$MARIADB_CLIENT -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED) from MY_DB.MY_DB_TABLE);"
$MARIADB_CLIENT -Ae "select * from MY_DB.MY_DB_TABLE where ROW_UPDATED=(select max(ROW_UPDATED) from MY_DB.MY_DB_TABLE);"
$MARIADB_CLIENT -Ae "checksum table MY_DB.MY_DB_TABLE;"

Edit the script and change the values for the variables according to what is on your server. Run the script apply_all_binary_logs.sh

When the script completes, you will see the results of the queries. You should find that the values and checksum of the table match the values from our queries before the database was restored from backup. This is a clear example of restoring to the very last transaction available.

IDENTIFY A POINT BEFORE THE MOST RECENT TRANSACTION

Sometimes it is necessary to ignore the final transaction, or perhaps all of the transactions seconds, minutes or hours before the final transaction. Let’s identify the insert that took place 5 minutes before the final INSERT on MY_DB_TABLE. Run this command:
mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED)-interval 5 minute from MY_DB.MY_DB_TABLE);"

When I run the query, I get this result:

+------+-------------------------------------------------------------------------+---------------------+-------------+
| ID   | NOTE                                                                    | ROW_CREATED         | ROW_UPDATED |
+------+-------------------------------------------------------------------------+---------------------+-------------+
| 6240 | I d wd prqwdrawvbbpcmawwmaewwpreqq brqamaqevcew bjpbvfpwacbpbb cvp afv. | 2024-06-07 18:21:00 | NULL        |
+------+-------------------------------------------------------------------------+---------------------+-------------+

We need to identify the binary log that has a modified time after the timestamp for ROW_CREATED and that contains the insert with NOTE “I d wd prqwdrawvbbpcmawwmaewwpreqq brqamaqevcew bjpbvfpwacbpbb cvp afv.”.

I can see that the binary log I am looking for is /var/log/mysql/mariadb-bin.000013. This command identifies it:
mariadb-binlog /var/log/mysql/mariadb-bin.000013| grep "brqamaqevcew bjpbvfpwacbpbb"

Now, showing 16 lines beyond that point in the binlog, we can see the end log position of 51707 of the COMMIT for that transaction:

root@db:~/MY_DB$ mariadb-binlog /var/log/mysql/mariadb-bin.000013 | grep -A16 "brqamaqevcew bjpbvfpwacbpbb"
#Q> insert into `MY_DB_TABLE` (NOTE) VALUES ( NAME_CONST('my_note',_latin1'I d wd prqwdrawvbbpcmawwmaewwpreqq brqamaqevcew bjpbvfpwacbpbb cvp afv.' COLLATE 'latin1_general_cs'))
#240607 18:21:00 server id 1  end_log_pos 51557 CRC32 0xd996999a        Table_map: `MY_DB`.`MY_DB_TABLE` mapped to number 141
# at 51557
#240607 18:21:00 server id 1  end_log_pos 51676 CRC32 0x4101dcbe        Write_rows: table id 141 flags: STMT_END_F

BINLOG '
zIdjZhMBAAAAPgAAAGXJAAAAAI0AAAAAAAEABU1ZX0RCAAtNWV9EQl9UQUJMRQAECA8SEgRkAAAA
DpqZltk=
zIdjZhcBAAAAdwAAANzJAAAAAI0AAAAAAAEABA/4YBgAAAAAAABHSSBkIHdkIHBycXdkcmF3dmJi
cGNtYXd3bWFld3dwcmVxcSBicnFhbWFxZXZjZXcgYmpwYnZmcHdhY2JwYmIgY3ZwIGFmdi6Zs48l
QL7cAUE=
'/*!*/;
# Number of rows: 1
# at 51676
#240607 18:21:00 server id 1  end_log_pos 51707 CRC32 0xc6f661d2        Xid = 57458
COMMIT/*!*/;
# at 51707

What we want to do is apply binlogs up to that commit and finish there.

RESTORE FROM BACKUP (AGAIN)

Run the script restore_from_backup.sh again.

RECOVER TO A POINT IN TIME

In the directory created by the git command, you will find a script recover_to_a_point_in_time.sh. It will look like this:

#!/bin/bash
# Script by Edward Stoever for Mariadb Support

# EDIT THE VARIABLES ACCORDINGLY
MARIADB_CLIENT="mariadb -u root --socket=/run/mysqld/mysqld.sock"
BINLOGDIR=/var/log/mysql
BINLOG_BASENAME=mariadb-bin
FIRSTBINLOG=${BINLOGDIR}/${BINLOG_BASENAME}.000002
STARTPOSITION=208286
LASTBINLOG=${BINLOGDIR}/${BINLOG_BASENAME}.000013
STOPPOSITION=51707

LOG_BIN=$($MARIADB_CLIENT  -ABNe "select VARIABLE_VALUE from information_schema.GLOBAL_VARIABLES where VARIABLE_NAME='LOG_BIN';")
if [ ! "${LOG_BIN}" == "OFF" ]; then echo "BINARY LOGGING SHOULD BE OFF."; exit; fi
$MARIADB_CLIENT -Ae "select 1" 1>/dev/null 2>&1 || echo "Mariadb client is not connecting."

# SANITIZE BINLOG NAMES
FIRSTBINLOG=$(echo ${FIRSTBINLOG} | sed "s/\/\//\//g"); LASTBINLOG=$(echo ${LASTBINLOG} | sed "s/\/\//\//g")
if [ ! -f ${FIRSTBINLOG} ]; then echo "Edit this script accordingly."; exit; fi

# A SIMPLE ARRAY, ALL BINLOGS IN ORDER:
unset STARTED ENDED BINLOGNAMES ALLBINLOGNAMES
ALLBINLOGNAMES=($(find ${BINLOGDIR} -name "${BINLOG_BASENAME}.*[0-9]" | sort))
# POPULATE A SUBARRAY STARTING AT $FIRSTBINLOG ENDING AT $LASTBINLOG
for (( k=0; k<${#ALLBINLOGNAMES[@]}; k++ )); do
if [ ! $ENDED ]; then
  if [ "${ALLBINLOGNAMES[$k]}" == "${FIRSTBINLOG}" ] || [ $STARTED ]; then
   STARTED=true;
   BINLOGNAMES+=(${ALLBINLOGNAMES[$k]});
  fi
fi
if [ "${ALLBINLOGNAMES[$k]}" == "${LASTBINLOG}" ]; then ENDED=true; fi
done
unset ALLBINLOGNAMES


for (( j=0; j<${#BINLOGNAMES[@]}; j++ ));
do
if [ "$j" == "0" ]&&[ "$((${#BINLOGNAMES[@]} - 1))" == "0" ]; then
  # THIS BINLOG IS THE FIRST AND THE LAST
  echo "Applying binlog ${BINLOGNAMES[$j]} (first and last, ${STARTPOSITION} to ${STOPPOSITION})"
  mariadb-binlog --no-defaults --start-position=${STARTPOSITION} --stop-position=${STOPPOSITION} ${BINLOGNAMES[$j]} | $MARIADB_CLIENT

elif [ "$j" == "0" ]; then
  # THIS BINLOG IS THE FIRST
  echo "Applying binlog ${BINLOGNAMES[$j]} (first from position ${STARTPOSITION})"
  mariadb-binlog --no-defaults --start-position=${STARTPOSITION} ${BINLOGNAMES[$j]} | $MARIADB_CLIENT

elif  [ "$j" == "$((${#BINLOGNAMES[@]} - 1))" ]; then
  # THIS BINLOG IS THE LAST
  echo "Applying binlog ${BINLOGNAMES[$j]} (last to position ${STOPPOSITION})"
  mariadb-binlog --no-defaults --stop-position=${STOPPOSITION} ${BINLOGNAMES[$j]} | $MARIADB_CLIENT

else
  # THIS BINLOG IS NEITHER THE FIRST NOR THE LAST
  echo "Applying binlog ${BINLOGNAMES[$j]}"
  mariadb-binlog --no-defaults ${BINLOGNAMES[$j]} | $MARIADB_CLIENT

fi
done

$MARIADB_CLIENT -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED) from MY_DB.MY_DB_TABLE);"

Edit the script and change the values for the variables according to what is on your server. The FIRSTBINLOG and STARTPOSITION are identified in the file xtrabackup_binlog_info. The LASTBINLOG and STOPPOSITION are identified by the COMMIT for the transaction where we want to stop applying transactions. Run the script recover_to_a_point_in_time.sh

POINT IN TIME RECOVERY WITHOUT BASH SCRIPTS

For some MariaDB administrators, such as those using Windows, using a bash script may be difficult or impossible. If you can read through the bash commands in the script, each step can be done one at a time leading to the same result.

With mariadb-binlog, there is a way to minimize the steps by using the database’s catalog of binary logs. To demonstrate this, edit the file zzz_options.cnf that you saved in a subdirectory of /etc, and comment out or remove the line “skip-log-bin”. Then restart the MariaDB instance. Run the following command to see a list of binary logs:
mariadb -Ae "show binary logs;"

RESTORE DATABASE AND APPLY ALL BINARY LOGS WITHOUT A SCRIPT

Mariadb-binlog has client capability, meaning it can connect to the database and gather data from it. Thus, it has access to the list of binary logs that are shown by the show binary logs command. Keep in mind:

  • To use mariadb-binlog as a client include the flag --read-from-remote-server (even if server is localhost)
  • Include the basename of the binlog that it will start reading from, but do not include the path leading to the file
  • Do not pipe the output back into the database as it is created. Doing so will create a never ending loop of binlogs to be applied

Here is an example of applying all of the binary logs. These steps are to be run on the command line one-by-one:

Stop database, remove all files:
systemctl stop mariadb
rm -fr /var/lib/mysql/*

Restore the database, start the database:
mariabackup --prepare --target-dir=/BACKUPS/mariabackup_2024-07-06-16-40
mariabackup --copy-back --target-dir=/BACKUPS/mariabackup_2024-07-06-16-40
chown -R mysql:mysql /var/lib/mysql
systemctl start mariadb

Generate the logs to apply, starting at mariadb-bin.000002, position 208286 and continuing to end:

# In bash, the backslash at the end of each line allows the very long command to continue to the next line.
mariadb-binlog --read-from-remote-server \
               --user=root --protocol=socket --host=localhost \
               --to-last-log --start-position=208286 \
               mariadb-bin.000002 \
               > /tmp/all_logs.sql

Edit the configuration file, and include skip-log-bin:
vi /etc/mysql/mariadb.conf.d/zzz_options.cnf # (debian)
vi /etc/my.cnf.d/zzz_options.cnf # (redhat)

Restart database:
systemctl restart mariadb

Check that binary logging is stopped:
mariadb -Ae "show global variables like 'log_bin';"

Apply the binary logs:
mariadb < /tmp/all_logs.sql

Check that our table is up to the final transaction:
mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED) from MY_DB.MY_DB_TABLE);"
mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_UPDATED=(select max(ROW_UPDATED) from MY_DB.MY_DB_TABLE);"
mariadb -Ae "checksum table MY_DB.MY_DB_TABLE;"

The queries show that we have restored the database to the final transaction.

RESTORE DATABASE AND RECOVER TO A POINT IN TIME WITHOUT A SCRIPT

One method to recover to a point in time before the final transaction is exactly the same as the previous example of applying all binary logs with the additional step of editing the SQL output file before it is run. Simply put, you edit the file /tmp/all_logs.sql and add the command exit on a new line, at the point in which you want it to stop. If the file is massive, then this might not be possible.

Another method is to indicate a timestamp at which mariadb-binlog must stop generating. The syntax is --stop-datetime="2023-11-22 19:21:01" which will stop generating output at the first transaction with a datetime equal or posterior to the datetime indicated. So, to include a transaction, it must have a timestamp at least one second before the indicated timestamp.

All of the steps from the previous example apply here, except for the mariadb-binlog command which will now stop generating SQL commands 5 minutes of activity prior to the final transaction.

When we restored to a point in time using a bash script, the final insert was at 18:21:00. To include that transaction and the UPDATE that occurred in the same second, we use this command to generate the SQL script:

# In bash, the backslash at the end of each line allows the very long command to continue to the next line.
mariadb-binlog --read-from-remote-server \
               --user=root --protocol=socket --host=localhost \
               --to-last-log --stop-datetime="2024-06-07 18:21:01" \
               --start-position=208286 \
               mariadb-bin.000002 \
               > /tmp/point_in_time.sql

After restoring and applying the script point_in_time.sql, the final insert matches with our previous point in time example:

root@db:~/MY_DB$ mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED) from MY_DB.MY_DB_TABLE);"
+------+-------------------------------------------------------------------------+---------------------+-------------+
| ID   | NOTE                                                                    | ROW_CREATED         | ROW_UPDATED |
+------+-------------------------------------------------------------------------+---------------------+-------------+
| 6240 | I d wd prqwdrawvbbpcmawwmaewwpreqq brqamaqevcew bjpbvfpwacbpbb cvp afv. | 2024-06-07 18:21:00 | NULL        |
+------+-------------------------------------------------------------------------+---------------------+-------------+

 

CONCLUSION

Today we learned how to perform point in time recovery in MariaDB Server. The key is identifying where to start and stop applying transactions from binary logs. We can identify the starting point from the file xtrabackup_binlog_info in our backup target directory. The ending point is determined by what has happened to your database. You should now be able to perform point in time recovery using scripts or manually.