MariaDB Point In Time Recovery
Database backup and restore are arguably the most important tasks of the database administrator (DBA). Once in a while, the DBA must restore a database to a point in time in the past. In this article, we will learn to do just that in MariaDB Server. We will restore from a backup and then apply the binary logs up to a point in the past.
GETTING STARTED
Complete the exercises in this article on a MariaDB instance that is for testing purposes. It is assumed you are accessing the server as root.
We will create a database schema with a table that is constantly changing in a way that we can measure changes in time.
On the linux command line, use git to download the project files:
git clone https://github.com/mariadb-edwardstoever/MY_DB.git
cd MY_DB
Review the files that are included in the download.
Copy the file zzz_options.cnf to the appropriate directory for your linux distro.
ON DEBIAN distros:
cp zzz_options.cnf /etc/mysql/mariadb.conf.d/
ON REDHAT distros:
cp zzz_options.cnf /etc/my.cnf.d/
On REDHAT distros, create the directory /var/log/mysql:
mkdir -p /var/log/mysql
chown mysql:mysql /var/log/mysql
Restart the MariaDB instance:
systemctl restart mariadb
Check that that binary logging is ON and saving binlogs to /var/log/mysql
mariadb -Ae "show global variables where Variable_name in ('log_bin','log_bin_basename');"
Check that the max_binlog_size is 524288 bytes:
mariadb -Ae "show global variables like 'max_binlog_size';"
A small value for max_binlog_size will cause the database to generate new binary logs frequently, which is what we need for this exercise.
Run the CREATE_MY_DB_SCHEMA.sql to create a schema called MY_DB:
mariadb < CREATE_MY_DB_SCHEMA.sql
Add the following entry to crontab to insert and update rows continuously:
* * * * * mariadb < /root/MY_DB/ROWS_ON_MY_DB_TABLE.sql
Check that rows are being added to the table MY_DB_TABLE:
mariadb -Ae "select count(*) from MY_DB.MY_DB_TABLE;"
Once we have created the schema MY_DB and the SQL script is being run from crontab, we can leave it alone to accumulate some rows. I suggest allowing it to run for 15 minutes before going to the next step.
INITIAL BACKUP
After 15 minutes, the table MY_DB_TABLE will have about 900 rows. To take the first backup, we run the following script:
./backup_mariadb.sh
The backup is a consistent picture of the database up to the point in time that it was taken.
We want to demonstrate recovering the database to any point in time, not just to the point that the backup was taken. To do that we need to allow more rows to be inserted, and more binary logs to be generated. Allow the changes made by the script in crontab to run for about 40 minutes more before going to the next step.
REVIEWING DATABASE ACTIVITY
Now that the database has accumulated many changes, let’s see how many rows exist on our table:
mariadb -Ae "select count(*) from MY_DB.MY_DB_TABLE;"
If we run that query multiple times, we can see the table is growing and changing continuously.
STOPPING DATABASE ACTIVITY
Comment out the crontab instruction and save
# * * * * * mariadb < /root/MY_DB/ROWS_ON_MY_DB_TABLE.sql
Wait a minute for the last cron task to complete. Then check that the number of rows does not grow by selecting count a few times.
mariadb -Ae "select count(*) from MY_DB.MY_DB_TABLE;"
We don’t want to generate new binary logs while we are applying changes that exist in binary logs.
Edit the file zzz_options.cnf that is saved in a subdirectory of /etc, uncomment or add the line skip-log-bin
and restart the database.
Now, check that log_bin is OFF:
mariadb -Ae "show global variables like 'log_bin'"
Let’s take a note of a few details about our table to see exactly where it is after the final transaction has been completed.
mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED) from MY_DB.MY_DB_TABLE);"
mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_UPDATED=(select max(ROW_UPDATED) from MY_DB.MY_DB_TABLE);"
mariadb -Ae "checksum table MY_DB.MY_DB_TABLE;"
Copy the results of these queries and paste them into a text editor and set them aside. You will refer to them later.
RESTORE FROM BACKUP
Our first task is to restore the database. In the directory created by the git command, you will find a script restore_from_backup.sh. It will look like this:
#!/bin/bash # Script by Edward Stoever for Mariadb Support # EDIT THE VARIABLES ACCORDINGLY TARGETDIR=/BACKUPS/mariabackup_2024-07-06-16-40 DATADIR=/var/lib/mysql if [ ! -d ${TARGETDIR} ]; then echo "Edit this script accordingly."; exit; fi systemctl stop mariadb rm -fr ${DATADIR}/* mariabackup --prepare --target-dir=${TARGETDIR} mariabackup --copy-back --target-dir=${TARGETDIR} chown -R mysql:mysql ${DATADIR} systemctl start mariadb
Edit the script and change the values for the variables according to what is on your server. Run the script restore_from_backup.sh
We want to see how many rows are now on our table MY_DB_TABLE.
mariadb -Ae "select count(*) from MY_DB.MY_DB_TABLE;"
You should find fewer rows than existed prior to the restore. That is because all of the rows that were created and updated after the backup do not exist yet.
IDENTIFY THE STARTING POINT
In order to restore the database to its most recent transaction, we need to find a text file called xtrabackup_binlog_info.
find /BACKUPS/ -name xtrabackup_binlog_info
Review the contents of that file. In my case, it looks like this:
mariadb-bin.000002 208286 0-1-423
We see that in order to start applying binary logs, we must begin with log mariadb-bin.000002 at position 208286.
WHAT IS A BINLOG POSITION?
A position is an integer that increases with each new transaction. You should understand the following:
- Positions are not sequential. The first one in a given logfile is always 256, but from there, they go up in uneven and arbitrary intervals.
- If you indicate a position but you do not indicate a file name, you have provided incomplete information. The position could be from any binlog file.
- In a binary log that is converted to text by mariadb-binlog, each position is indicated by end_log_pos and this indication is written prior to the event it corresponds to.
APPLY ALL BINARY LOGS
In the directory created by the git command, you will find a script apply_all_binary_logs.sh. It will look like this:
#!/bin/bash # Script by Edward Stoever for Mariadb Support # EDIT THE VARIABLES ACCORDINGLY MARIADB_CLIENT="mariadb -u root --socket=/run/mysqld/mysqld.sock" BINLOGDIR=/var/log/mysql BINLOG_BASENAME=mariadb-bin FIRSTBINLOG=${BINLOGDIR}/${BINLOG_BASENAME}.000002 STARTPOSITION=208286 LOG_BIN=$($MARIADB_CLIENT -ABNe "select VARIABLE_VALUE from information_schema.GLOBAL_VARIABLES where VARIABLE_NAME='LOG_BIN';") if [ ! "${LOG_BIN}" == "OFF" ]; then echo "BINARY LOGGING SHOULD BE OFF."; exit; fi $MARIADB_CLIENT -Ae "select 1" 1>/dev/null 2>&1 || echo "Mariadb client is not connecting." # SANITIZE BINLOG NAMES FIRSTBINLOG=$(echo ${FIRSTBINLOG} | sed "s/\/\//\//g"); LASTBINLOG=$(echo ${LASTBINLOG} | sed "s/\/\//\//g") if [ ! -f ${FIRSTBINLOG} ]; then echo "Edit this script accordingly."; exit; fi # A SIMPLE ARRAY, ALL BINLOGS IN ORDER: unset STARTED BINLOGNAMES ALLBINLOGNAMES ALLBINLOGNAMES=($(find ${BINLOGDIR} -name "${BINLOG_BASENAME}.*[0-9]" | sort)) # POPULATE A SUBARRAY STARTING AT $FIRSTBINLOG for (( k=0; k<${#ALLBINLOGNAMES[@]}; k++ )); do if [ "${ALLBINLOGNAMES[$k]}" == "${FIRSTBINLOG}" ] || [ $STARTED ]; then STARTED=true; BINLOGNAMES+=(${ALLBINLOGNAMES[$k]}); fi done unset ALLBINLOGNAMES for (( j=0; j<${#BINLOGNAMES[@]}; j++ )); do if [ "$j" == "0" ]; then # THIS BINLOG IS THE FIRST echo "Applying binlog ${BINLOGNAMES[$j]} (first from position ${STARTPOSITION})" mariadb-binlog --no-defaults --start-position=${STARTPOSITION} ${BINLOGNAMES[$j]} | $MARIADB_CLIENT else # THIS BINLOG IS NOT THE FIRST echo "Applying binlog ${BINLOGNAMES[$j]}" mariadb-binlog --no-defaults ${BINLOGNAMES[$j]} | $MARIADB_CLIENT fi done $MARIADB_CLIENT -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED) from MY_DB.MY_DB_TABLE);" $MARIADB_CLIENT -Ae "select * from MY_DB.MY_DB_TABLE where ROW_UPDATED=(select max(ROW_UPDATED) from MY_DB.MY_DB_TABLE);" $MARIADB_CLIENT -Ae "checksum table MY_DB.MY_DB_TABLE;"
Edit the script and change the values for the variables according to what is on your server. Run the script apply_all_binary_logs.sh
When the script completes, you will see the results of the queries. You should find that the values and checksum of the table match the values from our queries before the database was restored from backup. This is a clear example of restoring to the very last transaction available.
IDENTIFY A POINT BEFORE THE MOST RECENT TRANSACTION
Sometimes it is necessary to ignore the final transaction, or perhaps all of the transactions seconds, minutes or hours before the final transaction. Let’s identify the insert that took place 5 minutes before the final INSERT on MY_DB_TABLE. Run this command:
mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED)-interval 5 minute from MY_DB.MY_DB_TABLE);"
When I run the query, I get this result:
+------+-------------------------------------------------------------------------+---------------------+-------------+ | ID | NOTE | ROW_CREATED | ROW_UPDATED | +------+-------------------------------------------------------------------------+---------------------+-------------+ | 6240 | I d wd prqwdrawvbbpcmawwmaewwpreqq brqamaqevcew bjpbvfpwacbpbb cvp afv. | 2024-06-07 18:21:00 | NULL | +------+-------------------------------------------------------------------------+---------------------+-------------+
We need to identify the binary log that has a modified time after the timestamp for ROW_CREATED and that contains the insert with NOTE “I d wd prqwdrawvbbpcmawwmaewwpreqq brqamaqevcew bjpbvfpwacbpbb cvp afv.”.
I can see that the binary log I am looking for is /var/log/mysql/mariadb-bin.000013. This command identifies it:
mariadb-binlog /var/log/mysql/mariadb-bin.000013| grep "brqamaqevcew bjpbvfpwacbpbb"
Now, showing 16 lines beyond that point in the binlog, we can see the end log position of 51707 of the COMMIT for that transaction:
root@db:~/MY_DB$ mariadb-binlog /var/log/mysql/mariadb-bin.000013 | grep -A16 "brqamaqevcew bjpbvfpwacbpbb" #Q> insert into `MY_DB_TABLE` (NOTE) VALUES ( NAME_CONST('my_note',_latin1'I d wd prqwdrawvbbpcmawwmaewwpreqq brqamaqevcew bjpbvfpwacbpbb cvp afv.' COLLATE 'latin1_general_cs')) #240607 18:21:00 server id 1 end_log_pos 51557 CRC32 0xd996999a Table_map: `MY_DB`.`MY_DB_TABLE` mapped to number 141 # at 51557 #240607 18:21:00 server id 1 end_log_pos 51676 CRC32 0x4101dcbe Write_rows: table id 141 flags: STMT_END_F BINLOG ' zIdjZhMBAAAAPgAAAGXJAAAAAI0AAAAAAAEABU1ZX0RCAAtNWV9EQl9UQUJMRQAECA8SEgRkAAAA DpqZltk= zIdjZhcBAAAAdwAAANzJAAAAAI0AAAAAAAEABA/4YBgAAAAAAABHSSBkIHdkIHBycXdkcmF3dmJi cGNtYXd3bWFld3dwcmVxcSBicnFhbWFxZXZjZXcgYmpwYnZmcHdhY2JwYmIgY3ZwIGFmdi6Zs48l QL7cAUE= '/*!*/; # Number of rows: 1 # at 51676 #240607 18:21:00 server id 1 end_log_pos 51707 CRC32 0xc6f661d2 Xid = 57458 COMMIT/*!*/; # at 51707
What we want to do is apply binlogs up to that commit and finish there.
RESTORE FROM BACKUP (AGAIN)
Run the script restore_from_backup.sh again.
RECOVER TO A POINT IN TIME
In the directory created by the git command, you will find a script recover_to_a_point_in_time.sh. It will look like this:
#!/bin/bash # Script by Edward Stoever for Mariadb Support # EDIT THE VARIABLES ACCORDINGLY MARIADB_CLIENT="mariadb -u root --socket=/run/mysqld/mysqld.sock" BINLOGDIR=/var/log/mysql BINLOG_BASENAME=mariadb-bin FIRSTBINLOG=${BINLOGDIR}/${BINLOG_BASENAME}.000002 STARTPOSITION=208286 LASTBINLOG=${BINLOGDIR}/${BINLOG_BASENAME}.000013 STOPPOSITION=51707 LOG_BIN=$($MARIADB_CLIENT -ABNe "select VARIABLE_VALUE from information_schema.GLOBAL_VARIABLES where VARIABLE_NAME='LOG_BIN';") if [ ! "${LOG_BIN}" == "OFF" ]; then echo "BINARY LOGGING SHOULD BE OFF."; exit; fi $MARIADB_CLIENT -Ae "select 1" 1>/dev/null 2>&1 || echo "Mariadb client is not connecting." # SANITIZE BINLOG NAMES FIRSTBINLOG=$(echo ${FIRSTBINLOG} | sed "s/\/\//\//g"); LASTBINLOG=$(echo ${LASTBINLOG} | sed "s/\/\//\//g") if [ ! -f ${FIRSTBINLOG} ]; then echo "Edit this script accordingly."; exit; fi # A SIMPLE ARRAY, ALL BINLOGS IN ORDER: unset STARTED ENDED BINLOGNAMES ALLBINLOGNAMES ALLBINLOGNAMES=($(find ${BINLOGDIR} -name "${BINLOG_BASENAME}.*[0-9]" | sort)) # POPULATE A SUBARRAY STARTING AT $FIRSTBINLOG ENDING AT $LASTBINLOG for (( k=0; k<${#ALLBINLOGNAMES[@]}; k++ )); do if [ ! $ENDED ]; then if [ "${ALLBINLOGNAMES[$k]}" == "${FIRSTBINLOG}" ] || [ $STARTED ]; then STARTED=true; BINLOGNAMES+=(${ALLBINLOGNAMES[$k]}); fi fi if [ "${ALLBINLOGNAMES[$k]}" == "${LASTBINLOG}" ]; then ENDED=true; fi done unset ALLBINLOGNAMES for (( j=0; j<${#BINLOGNAMES[@]}; j++ )); do if [ "$j" == "0" ]&&[ "$((${#BINLOGNAMES[@]} - 1))" == "0" ]; then # THIS BINLOG IS THE FIRST AND THE LAST echo "Applying binlog ${BINLOGNAMES[$j]} (first and last, ${STARTPOSITION} to ${STOPPOSITION})" mariadb-binlog --no-defaults --start-position=${STARTPOSITION} --stop-position=${STOPPOSITION} ${BINLOGNAMES[$j]} | $MARIADB_CLIENT elif [ "$j" == "0" ]; then # THIS BINLOG IS THE FIRST echo "Applying binlog ${BINLOGNAMES[$j]} (first from position ${STARTPOSITION})" mariadb-binlog --no-defaults --start-position=${STARTPOSITION} ${BINLOGNAMES[$j]} | $MARIADB_CLIENT elif [ "$j" == "$((${#BINLOGNAMES[@]} - 1))" ]; then # THIS BINLOG IS THE LAST echo "Applying binlog ${BINLOGNAMES[$j]} (last to position ${STOPPOSITION})" mariadb-binlog --no-defaults --stop-position=${STOPPOSITION} ${BINLOGNAMES[$j]} | $MARIADB_CLIENT else # THIS BINLOG IS NEITHER THE FIRST NOR THE LAST echo "Applying binlog ${BINLOGNAMES[$j]}" mariadb-binlog --no-defaults ${BINLOGNAMES[$j]} | $MARIADB_CLIENT fi done $MARIADB_CLIENT -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED) from MY_DB.MY_DB_TABLE);"
Edit the script and change the values for the variables according to what is on your server. The FIRSTBINLOG and STARTPOSITION are identified in the file xtrabackup_binlog_info. The LASTBINLOG and STOPPOSITION are identified by the COMMIT for the transaction where we want to stop applying transactions. Run the script recover_to_a_point_in_time.sh
POINT IN TIME RECOVERY WITHOUT BASH SCRIPTS
For some MariaDB administrators, such as those using Windows, using a bash script may be difficult or impossible. If you can read through the bash commands in the script, each step can be done one at a time leading to the same result.
With mariadb-binlog, there is a way to minimize the steps by using the database’s catalog of binary logs. To demonstrate this, edit the file zzz_options.cnf that you saved in a subdirectory of /etc, and comment out or remove the line “skip-log-bin”. Then restart the MariaDB instance. Run the following command to see a list of binary logs:
mariadb -Ae "show binary logs;"
RESTORE DATABASE AND APPLY ALL BINARY LOGS WITHOUT A SCRIPT
Mariadb-binlog has client capability, meaning it can connect to the database and gather data from it. Thus, it has access to the list of binary logs that are shown by the show binary logs command. Keep in mind:
- To use mariadb-binlog as a client include the flag
--read-from-remote-server
(even if server is localhost) - Include the basename of the binlog that it will start reading from, but do not include the path leading to the file
- Do not pipe the output back into the database as it is created. Doing so will create a never ending loop of binlogs to be applied
Here is an example of applying all of the binary logs. These steps are to be run on the command line one-by-one:
Stop database, remove all files:
systemctl stop mariadb
rm -fr /var/lib/mysql/*
Restore the database, start the database:
mariabackup --prepare --target-dir=/BACKUPS/mariabackup_2024-07-06-16-40
mariabackup --copy-back --target-dir=/BACKUPS/mariabackup_2024-07-06-16-40
chown -R mysql:mysql /var/lib/mysql
systemctl start mariadb
Generate the logs to apply, starting at mariadb-bin.000002, position 208286 and continuing to end:
# In bash, the backslash at the end of each line allows the very long command to continue to the next line. mariadb-binlog --read-from-remote-server \ --user=root --protocol=socket --host=localhost \ --to-last-log --start-position=208286 \ mariadb-bin.000002 \ > /tmp/all_logs.sql
Edit the configuration file, and include skip-log-bin:
vi /etc/mysql/mariadb.conf.d/zzz_options.cnf # (debian)
vi /etc/my.cnf.d/zzz_options.cnf # (redhat)
Restart database:
systemctl restart mariadb
Check that binary logging is stopped:
mariadb -Ae "show global variables like 'log_bin';"
Apply the binary logs:
mariadb < /tmp/all_logs.sql
Check that our table is up to the final transaction:
mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED) from MY_DB.MY_DB_TABLE);"
mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_UPDATED=(select max(ROW_UPDATED) from MY_DB.MY_DB_TABLE);"
mariadb -Ae "checksum table MY_DB.MY_DB_TABLE;"
The queries show that we have restored the database to the final transaction.
RESTORE DATABASE AND RECOVER TO A POINT IN TIME WITHOUT A SCRIPT
One method to recover to a point in time before the final transaction is exactly the same as the previous example of applying all binary logs with the additional step of editing the SQL output file before it is run. Simply put, you edit the file /tmp/all_logs.sql and add the command exit on a new line, at the point in which you want it to stop. If the file is massive, then this might not be possible.
Another method is to indicate a timestamp at which mariadb-binlog must stop generating. The syntax is --stop-datetime="2023-11-22 19:21:01"
which will stop generating output at the first transaction with a datetime equal or posterior to the datetime indicated. So, to include a transaction, it must have a timestamp at least one second before the indicated timestamp.
All of the steps from the previous example apply here, except for the mariadb-binlog command which will now stop generating SQL commands 5 minutes of activity prior to the final transaction.
When we restored to a point in time using a bash script, the final insert was at 18:21:00. To include that transaction and the UPDATE that occurred in the same second, we use this command to generate the SQL script:
# In bash, the backslash at the end of each line allows the very long command to continue to the next line. mariadb-binlog --read-from-remote-server \ --user=root --protocol=socket --host=localhost \ --to-last-log --stop-datetime="2024-06-07 18:21:01" \ --start-position=208286 \ mariadb-bin.000002 \ > /tmp/point_in_time.sql
After restoring and applying the script point_in_time.sql, the final insert matches with our previous point in time example:
root@db:~/MY_DB$ mariadb -Ae "select * from MY_DB.MY_DB_TABLE where ROW_CREATED=(select max(ROW_CREATED) from MY_DB.MY_DB_TABLE);" +------+-------------------------------------------------------------------------+---------------------+-------------+ | ID | NOTE | ROW_CREATED | ROW_UPDATED | +------+-------------------------------------------------------------------------+---------------------+-------------+ | 6240 | I d wd prqwdrawvbbpcmawwmaewwpreqq brqamaqevcew bjpbvfpwacbpbb cvp afv. | 2024-06-07 18:21:00 | NULL | +------+-------------------------------------------------------------------------+---------------------+-------------+
CONCLUSION
Today we learned how to perform point in time recovery in MariaDB Server. The key is identifying where to start and stop applying transactions from binary logs. We can identify the starting point from the file xtrabackup_binlog_info in our backup target directory. The ending point is determined by what has happened to your database. You should now be able to perform point in time recovery using scripts or manually.