Understanding and Testing Non-blocking Backup Locks with MariaDB Enterprise Server

Let’s talk about database backups for a minute. Making sure your data is safe and recoverable is a key component of Database Administration. Keeping regular and reliable backups are essential to the success of any organization. When something unexpected happens like a server crashes, data corruption or even data tampering, you need to be able to recover mission critical applications quickly and reliably.

One of the new features in MariaDB Enterprise Server and MariaDB Enterprise Backup is the use of non-blocking backup lock to minimize workload impact during backup operations. It does this by doing the majority of the backup with a minimal number of server locks. This works really well on transactional tables like InnoDB and MyRocks. A short backup lock is needed for new commits and copying statistics and log tables. MariaDB Enterprise Backup comes from InnoDB’s recovery technology. It copies InnoDB tablespaces, which are not consistent after a straight copy. Then a crash recovery is performed to gain consistency on these copied tables by using redo logs, undo logs, the backup’s new DDL log, and backup locks when needed. This non-blocking feature temporarily locks tables during the backup stages to give you a consistent point in time for your backups.

MariaDB Enterprise Backup works by observing and noting the log sequence numbers in the transaction log before it starts copying the tablespaces. Simultaneously, it starts another background process to keep track of the transaction log entries. Later these logs will be used this to keep the copied tablespaces consistent. Backup locks block DDL operations on a table temporarily for a short time at the end of the backup so that the DDL can be made consistent with the data. This minimizes the impact and duration that a table is locked for backups. For example, for an Aria table that stores data in 3 files with extensions .frm, .MAI and .MAD, normal read/write operations can continue as normal.

The non-blocking backup functionality, which is only available in MariaDB Enterprise backups, is different from historical backup functionality in that it includes optimizations in the backup stages, including DDL statement tracking, which reduces lock-time during backups. Earlier releases of MariaDB (Community versions) use FLUSH TABLE WITH READ LOCK, this closes the open tables and only allows them to be reopened with a read lock during the backup stage. This also means the backup cannot run if there are long running queries. The reason why FLUSH is needed is to ensure that all the tables are in a consistent state at the exact same point in time.

MariaDB Enterprise Backup works by using multiple backup stages.  Each stage performs certain functions in the backup process. The first stage is called the BACKUP STAGE START. At this stage non-transactional tables not in use are copied.  It also starts the copy of tablespaces and other logs, it blocks redo log purges for certain storage engines and logs DDL commands into a ddl.log.

BACKUP STAGE FLUSH, blocks new writes locks on non-transactional tables, flushes all changes for non-transactional tables, with the exception of statistics and log tables. It closes tables not in use and marks them so.  It does not block DDL statements in this stage, because table inconsistency will be addressed later when the transaction logs and redo logs are applied. It also does not block or wait on read-only transactions.

During the BACKUP STAGE BLOCK_DDL, it waits for all non-transactional tables with write locks to complete. It blocks CREATE, DROP, TRUNCATE and RENAME table commands.  ALTER TABLE is handled in different ways depending on which stage ALTER TABLE is in. It blocks any new ALTER table commands and it blocks the final RENAME command of currently running ALTER TABLE. Currently running ALTER TABLE statements are not blocked.

Next is BACKUP STAGE BLOCK_COMMIT. In this stage the binary log is locked–this also prevents commits and rollbacks. If there are active commits or data to be copied to the binary log this will be allowed to finish. This stage doesn’t lock temporary tables that are not used by replication. These will be blocked when it is time to write to the binary log. System log tables and statistics tables are locked and flushed at this time. When the BLOCK_COMMIT stage returns, this is our consistent ‘backup time’. Everything that should be in this backup is committed and what shouldn’t is rolled back.

The final stage of MariaDB Enterprise Backup is BACKUP STAGE END. This ends DDL logging and frees up resources used by MariaDB Enterprise Backup. For more detail please look at MariaDB’s Enterprise backup stage documentation. It can be found here.

For a more interactive explanation, my fellow Enterprise Architect, Manjot Singh, has a video that describes these stages as well:

Practical Application

Now that we understand MariaDB Enterprise Backup locks. I will run through a quick test of the new non-blocking backup locks for MariaDB Enterprise Server 10.2.25-1. I create two separate database instances running MariaDB Enterprise Server 10.2.25-1. One database instance will be for taking the backup and the other database instance will be for the recovery process. We then can compare the two database instances for accuracy and show that they are identical. For further information about MariaDB Enterprise Backups please click here.

Connect to the server, log on to the first database instance and create a database for testing.  This instance will be known as “bob”.

ssh <username>@<server1>.com
MariaDB [(none)]> CREATE DATABASE bob;

We then create a backup user with the appropriate privileges for backing up and restoring the instance.

MariaDB [(none)]> CREATE USER 'mariadb_bob'@'localhost' IDENTIFIED BY 'mariadb_bob';
MariaDB [(none)]> GRANT RELOAD, PROCESS, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'mariadb_bob'@'localhost';

Create some tables to be populated with a large volume of rows, to be later filled with the scripts below.

MariaDB [(none)]> use bob;
create table bigt1(x int primary key auto_increment);
create table bigt2(x int primary key auto_increment);
create table bigt3(x int primary key auto_increment);
create table bigt4(x int primary key auto_increment);
create table bigt5(x int primary key auto_increment);
create table bigt6(x int primary key auto_increment);
create table bigt7(x int primary key auto_increment);

Populate those tables with at least 5Gb worth of data. We do this by inserting simple values into the tables.

insert into bigt1 () values(),(),();
insert into bigt2 () values(),(),();
insert into bigt3 () values(),(),();
insert into bigt4 () values(),(),();
insert into bigt5 () values(),(),();
insert into bigt6 () values(),(),();
insert into bigt7 () values(),(),();

Then we insert some more rows with these commands. Running it each time, doubling the table size.  This is put into a script to be continuously run until we get the acceptable table sizes we want to test against.

insert into bigt1 (x) select x + (select count(*) from bigt1) from bigt1;
insert into bigt2 (x) select x + (select count(*) from bigt2) from bigt2;
insert into bigt3 (x) select x + (select count(*) from bigt3) from bigt3;
insert into bigt4 (x) select x + (select count(*) from bigt4) from bigt4;
insert into bigt5 (x) select x + (select count(*) from bigt5) from bigt5;
insert into bigt6 (x) select x + (select count(*) from bigt6) from bigt6;
insert into bigt7 (x) select x + (select count(*) from bigt7) from bigt7;

To check the size of each of the tables as they grow, use the select statements below.

SELECT count(*) from bigt1;

             or

SELECT table_schema as `Database`, table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024 ), 2) `Size in MB` FROM information_schema.TABLES  WHERE table_schema="bob" ORDER BY (data_length + index_length) DESC;

We now have a database instance with some volume of data. To show the value of the non-blocking lock feature available in MariaDB Enterprise Backups in MariaDB Enterprise Server 10.2.25-1.  We will do different levels of mariabackup while introducing a workload against the database instance “bob” showing that backups do not interfere with the running workload. Next, we then restore the different levels of mariabackup to the second database instance, comparing the two instances to ensure they are identical.

We need to create some backup directories to store the backup files. Each sub directory will store a different level of mariabackup.  L0 for the full backup, known as level 0. L1 for the first incremental mariabackup. L2 for the second incremental mariabackup.

ssh <username>@<server1>.com
cd /backup/backup_test
mkdir L0 L1 L2

Running the script below, introduces a workload. It continuously does an ALTER TABLE, INSERT INTO multiple tables, a constant RENAME of a table, and a DROP TABLE, against the “bob” instance to simulate the workload.

 <server>:/backup/backup_test] # ./jobs_10.2.25-1.sh

 With the workload running we start a level 0 backup with the following command:

date; /mariabackup --defaults-file=/mysql/test_3600/etc/my.cnf --no-version-check --user=mariadb_bob --password=mariadb_bob --throttle=50000 --parallel=1 --ftwrl-wait-timeout=300 --ftwrl-wait-threshold=86400 --ftwrl-wait-query-type=update --backup --target-dir=/backup/test_3600/L0 | & tee -a /backup/backup_test/L0_10.2.25-1-bkp.log; date

Check for backup completion. This can be found by looking at the xtrabackup_info file created from the mariabackup. 

# Example.
[00] 2019-09-10 07:54:42 Writing xtrabackup_info
[00] 2019-09-10 07:54:42      ...done
[00] 2019-09-10 07:54:42 Redo log (from LSN 620478195674 to 622323984403) was copied.
[00] 2019-09-10 07:54:42 completed OK!

 

Keeping the workload active against the instance “bob”.  We now start an incremental backup calling it L1. 

/mariabackup --defaults-file=/mysql/test_3600/etc/my.cnf --no-version-check --user=mariadb_bob --password=mariadb_bob --throttle=50000 --parallel=1 --ftwrl-wait-timeout=300 --ftwrl-wait-threshold=86400 --ftwrl-wait-query-type=update --backup –target-dir=/backup/test_3600/L1 --incremental-basedir=/backup/backup_test/L0 | & tee -a /backup/backup_test/L1_10.2.25_bkp.log; date

Again, check for backup completion.

# Example.
[00] 2019-09-10 07:59:34 Writing xtrabackup_info
[00] 2019-09-10 07:59:34      ...done
[00] 2019-09-10 07:59:34 Redo log (from LSN 622780501285 to 625194016167) was copied.
[00] 2019-09-10 07:59:34 completed OK!

Stop the jobs script that was simulating the workload against “bob”.  

ps -ef | grep .py
root   33966      1 0 09:53 pts/0 00:00:00 python ./Tab5_Insert.py 3626
root   34037      1 0 09:53 pts/0 00:00:00 python ./Tab6_Insert.py 3626
root   34178      1 0 09:53 pts/0 00:00:00 python ./global_Alter.py 3626 bigt1
root   34694      1 0 09:53 pts/0 00:00:00 python ./global_Alter.py 3626 bigt2
….

Kill -9 <pids>

Start another incremental backup and call it L2.

/mariabackup --defaults-file=/mysql/test_3600/etc/my.cnf --no-version-check --user=mariadb_bob --password=mariadb_bob --throttle=50000 --parallel=1 --ftwrl-wait-timeout=300 --ftwrl-wait-threshold=86400 --ftwrl-wait-query-type=update --backup --target-dir=/backup/backup_test/L2 --incremental-basedir=/backup/backup_test/L1 |& tee -a /backup/backup_test/L2_10.2.25_bkp.log 

Once again checking for backup completion.

# Example.
[00] 2019-09-10 08:20:35 Writing xtrabackup_info
[00] 2019-09-10 08:20:35      ...done
[00] 2019-09-10 08:20:35 Redo log (from LSN 633112486665 to 633112486674) was copied.
[00] 2019-09-10 08:20:35 completed OK!

Recover this backup to another instance.

Stop the second database instance if it is running with the following command: 

service mysql stop

Remove the datafiles from the data_dir directory for database instance 3601.

cd /mysqld/test_3601/
rm -rf data/*
rm -rf log/*.log
rm -rf log/ib* 
rm -rf log/binlog/*

Prepare the 10.2.21-1 Enterprise MariaDB backup.

mariabackup --defaults-file=/mysql/test_3600/etc/my.cnf --prepare --target=/backup/backup_test/L0

Check for completion.

# Example.
[00] 2019-09-10 08:24:20 Last binlog file /mysql/test_3600/log/mysql-binlog.000326, position 969970776
[00] 2019-09-10 08:24:20 completed OK! 

Prepare the L1 incremental backups to the base backup.

/mariabackup --defaults-file=/mysql/test_3600/etc/my.cnf --prepare --target=/backup/backup_test/L0 --incremental-dir=/backup/backup_test/L1

Check for completion.

 # Example.
[00] 2019-09-10 08:40:03 Copying /backup/backup_test_3600/L1/xtrabackup_info to ./xtrabackup_info
[00] 2019-09-10 08:40:03      ...done
[00] 2019-09-10 08:40:03 completed OK!

 Prepare the L2 incremental backups to the base backup. 

mariabackup --defaults-file=/mysql/backup_test_3600/etc/my.cnf --prepare --target=/backup/backup_test/L0 --incremental-dir=/backup/backup_test/L2

Check for completion.

# Example.
[00] 2019-09-10 08:44:05 Copying /backup/test_3600/L2/xtrabackup_info to ./xtrabackup_info
[00] 2019-09-10 08:44:05      ...done
[00] 2019-09-10 08:44:05 completed OK!

Copy the backup data, back to the 3601 data_dir directory.

mariabackup --defaults-file=/mysql/backup_test_3600/etc/my.cnf --copy-back –target-dir=/mysql/backup_test_3601; date

Check and change file permissions.

chown -R mysql:mysql /mysql/backup_test_3601/*

Start the 10.2.21-1 Enterprise Server on the 3601 database instance.

service mysql start

To test, check both instances for the “bob” databases and consistency within. Use the following commands in each instance (3600 and 3601) to verify the ALTER TABLE,INSERT INTO,RENAME and DROP table scripts from the simulated workload.

Check the ALTER TABLE

Show create table bigt1; 

Check the INSERT TABLE

Select count(*) from bigt1;

and

SELECT table_schema as `Database`, table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024 ), 2) `Size in MB` FROM information_schema.TABLES  WHERE table_schema="bob" ORDER BY (data_length + index_length) DESC;

Checking the RENAME. 

Show tables;

If all was done correctly. You will have two identical database instances complete with the same tables. Each with the correct number of rows. The table that was continuously renamed will be identical in each instance. The dropped and recreated table will be gone. Lastly, all altered tables will be consistent as well.

MariaDB Database Backup plays a substantial role in achieving Recovery Point Objective (RPO), which defines the maximum amount of data a business is willing to lose and Recovery Time Objective (RTO), which defines how quickly a business needs to restore service in the event of a fault. Using MariaDB Enterprise Backup’s non-blocking feature reduces the amount of time your database is locked, improving RPO and RTO. Try it out yourself and see how MariaDB Enterprise Backup locks feature can help your organization by downloading MariaDB Enterprise Server, part of MariaDB Platform here.