Point-In-Time-Recovery
Point-in-time recovery (PITR) is a feature that allows you to restore a MariaDB instance to a specific point in time. For achieving this, it combines a full base backup and the binary logs that record all changes made to the database after the backup. This is something fully automated by operator, covering archival and restoration up to a specific time, ensuring business continuity and reduced RTO and RPO.
Supported MariaDB versions and topologies
The operator uses mariadb-binlog to replay binary logs, in particular, it filters binlog events by passing a GTID to mariadb-binlog via the --start-position flag. This is only supported by MariaDB server 10.8 and later, so make sure you are using a compatible MariaDB version.
Regarding supported MariaB topologies, at the moment, binary log archiving and point-in-time recovery are only supported by the asynchronous replication topology, which already relies on the binary logs for replication. Galera and standalone topologies will be supported in upcoming releases.
Storage types
Full base backups and binary logs can be stored in the following object storage types:
For additional details on configuring storage, please refer to the storage types section in the physical backup documentation, same settings are applicable to the PointInTimeRecovery object.
Configuration
To be able to perform a point-in-time restoration, a physical backup should be configured as full base backup. For example, you can configure a nightly backup:
apiVersion: enterprise.mariadb.com/v1alpha1
kind: PhysicalBackup
metadata:
name: physicalbackup-daily
spec:
mariaDbRef:
name: mariadb-repl
schedule:
cron: "0 0 * * *"
suspend: false
immediate: true
compression: bzip2
maxRetention: 720h
storage:
s3:
bucket: physicalbackups
prefix: mariadb
endpoint: minio.minio.svc.cluster.local:9000
region: us-east-1
accessKeyIdSecretKeyRef:
name: minio
key: access-key-id
secretAccessKeySecretKeyRef:
name: minio
key: secret-access-key
tls:
enabled: true
caSecretKeyRef:
name: minio-ca
key: ca.crtRefer to the full base backup section for additional details on how to configure the full base backup.
Next step is configuring common aspects of both binary log archiving and point-in-time restoration by defining a PointInTimeRecovery object:
physicalBackupRef: It is a reference to thePhysicalBackupresource used as full base backup. See full base backup.storage: Object storage configuration for binary logs. See storage types.compression: Algorithm to be used for compressing binary logs. It is disabled by default. See compression.archiveTimeout: Maximum duration for the binary log archival. If exceeded, agent will return an error and archival will be retried in the next archive cycle. Defaults to 1h.archiveInterval: Interval at which the binary logs will be archived. Defaults to 10m. See archival for additional details.maxParallel: Maximum number of workers that can be used for parallel binary log archival and restoration. Defaults to 1. See parallelization.maxRetention: Maximum retention duration for binary logs. By default, binary logs are not automatically deleted. See retention policy.strictMode: Controls the behavior when a point-in-time restoration cannot reach the exact target time. It is disabled by default. See strict mode.
With this configuration in place, you can enable binary log archival in a MariaDB instance by setting a reference to the PointInTimeRecovery object:
Once a full base backup has been completed and the binary logs have been archived, you can perform a point-in-time restoration. For example, you can create a new MariaDB instance with the following configuration:
Refer to the point-in-time restoration section for additional details.
Full base backup
To enable point-in-time recovery, a PhysicalBackup resource should be configured as full base backup. The backup should be a complete snapshot of the database at a specific point in time, and it will serve as the starting point for replaying the binary logs. Any of the supported backup strategies can be used as full base backup, as all of them provide a consistent snapshot of the database and a starting GTID position.
It is very important to note that a full physical backups should be completed before a point-in-time restoration can be performed. This is something that the operator accounts for when computing the last recoverable time.
To further expand the last recoverable time, it is recommended to take physical backups after the primary Pod has changed. This can be automated by setting schedule.onPrimaryChange, as documented in the physical backup docs:
Alternatively, you can schedule an on-demand physical backup or rely on the cron scheduling for doing so:
The backup taken in the new primary will establish a baseline for a new binlog timeline, which will be expanded when new binary logs are archived.
Archival
The mariadb-enterprise-operator sidecar agent will periodically check for new binary logs and archive them to the configured object storage. The archival process is controlled by the archiveInterval and archiveTimeout settings in the PointInTimeRecovery configuration, which determine how often the archival process runs and how long it can take before it is considered failed.
The archival process is performed on the primary Pod in the asynchronous replication topology, you may check the logs of the agent sidecar container, Kubernetes events and status of the MariaDB objects to monitor the current status of the archival process:
There are a couple of important considerations regarding binary log archival:
The archival process should start from a clean state, which means that the object storage should be empty at the time of the first archival.
It is not recommended to set
archiveIntervalto a very low value (< 1m), as it can lead to increased load on the databasePodand the storage system.If the archival process fails (e.g., due to network issues or storage unavailability), it will be retried in the next archive cycle.
If
binlog_expire_logs_secondsserver variable is configured, it should be set to a value higher than thearchiveIntervalto prevent automatic deletion of binary logs before they are archived.Manually executing
PURGE BINARY LOGScommand on the database is not recommended, as it can lead to inconsistencies between the database and the archived binary logs.Manually executing
FLUSH BINARY LOGScommand on the database should be compatible with the archival process, it will force the active binary log to be closed and will be archived by the agent in the next archive cycle.
Binary log size
The server has a default max_binlog_size of 1GB, which means that a new binary log file will be created once the current one reaches that size. This is sensible default value for most cases, but it can be adjusted based on the data volume in order to enable a faster archival, and therefore a reduced RPO:
Low Traffic
128MB
Keeps file size minimal for slow-growing logs.
Standard
256MB
Balances rotation frequency with server overhead.
High Throughput
512MB - 1GB
Reduces the contention caused by frequent rotations in write-heavy environments.
The smaller the binlog file size, the more frequently the files will be rotated and archived, which can lead to increased load on the database Pod and the storage system. On the other hand, setting a very high binlog file size can lead to longer archival times and increased RPO.
Refer to the configuration documentation for instructions on how to set the max_binlog_size server variable in the MariaDB instance.
Compression
In order to reduce storage usage and save bandwidth during archival and restoration, the operator supports compressing the binary log files. Compression is enabled by setting the compression field in the PointInTimeRecovery configuration:
The supported compression algorithms are:
bzip2: Good compression ratio, but slower compression/decompression speed compared to gzip.gzip: Good compression/decompression speed, but worse compression ratio compared to bzip2.none: No compression.
Compression is disabled by default, and the are some important considerations before enabling it:
Compression is immutable, which means that once configured and binary logs have been archived with a specific algorithm, it cannot be changed. This also applies to restoration, the same compression algorithm should be configured as the one used for archival.
Although it saves storage space and bandwidth, the restoration process may take longer when compression is enabled, leading to an increased RTO. This can migrated by enabling parallelization.
Server-Side Encryption with Customer-Provided Keys (SSE-C) For S3
When using S3-compatible storage, you can enable server-side encryption using your own encryption key (SSE-C) by providing a reference to a Secret containing a 32-byte (256-bit) key encoded in base64:
When using SSE-C, you are responsible for managing and securely storing the encryption key. If you lose the key, you will not be able to decrypt your binary logs. Ensure you have proper key management procedures in place.
When replaying SSE-C encrypted binary logs via bootstrapFrom, the same key must be provided in the S3 configuration.
Parallelization
Several tasks during both archival an restoration process can take a significant amount of time, specially when managing large data volumes. These tasks include compressing and uploading binary logs during archival, and downloading and decompressing binary logs during restoration. This can lead to longer archival and restoration times, which can impact the RTO.
To mitigate this, the operator supports parallelization of these tasks by using multiple workers. The maximum number of workers can be configured via the maxParallel field in the PointInTimeRecovery configuration:
This will create up to 4 workers, each of them responsible for the operations related to a single binary log, which means that up to 4 binary logs can be processed in parallel. This can significantly reduce the archival and restoration times, specially when compression is enabled.
Parallelization is disabled by default (maxParallel: 1), and there are some important considerations to be taken into account when enabling it:
During archival, the workers will be spawn in the agent sidecar container, sharing storage with the primary database
Pod. Using an elevated number of workers can exhaust IOPS and/or CPU resources of the primaryPod, which can impact the performance of the database.During both archival and restoration, using an elevated number of workers can saturate the network bandwidth when pulling/pushing multiple binary logs in parallel, something that can degrade the performance of the database.
Retention policy
Binary logs can grow significantly in size, especially in write-heavy environments, which can lead to increased storage costs. To mitigate this, the operator supports automatic purging of binary logs based on a retention policy defined by the maxRetention field in the PointInTimeRecovery configuration:
The binary logs that exceed the defined retention will be automatically deleted from the object storage after each archival cycle.
By default, binary logs are never purged from object storage, and there are few considerations regarding configuring a retention policy:
The date of the last event in the binary logs is used to determine its age, and therefore whether it should be purged or not.
The
maxRetentionfield should not be set to a value lower than thearchiveInterval, as it can lead to situations where binary logs are purged before they can be archived.
Binlog inventory
The operator maintains an inventory of the archived binary logs in an index.yaml file located at the root of the configured object storage. This file contains a list of all the archived binary logs per each server, along with their GTIDs and other metadata utilized internally. Here is an example of the index.yaml file:
This file is used internally by the operator to keep track of the archived binary logs, and it is updated after each successful archival. It should not be modified manually, as it can lead to inconsistencies between the actual archived binary logs and the inventory.
When it comes to point-in-time restoration, this file serves as a source of truth to compute the binlog timeline and the last recoverable time.
Binlog timeline and last recoverable time
Taking into account the last completed physical backup GTID and the archived binlogs in the inventory, the operator computes a timeline of binary logs that can replayed and its corresponding last recoverable time. The last recoverable time is the latest timestamp that the MariaDB instance can be restored to. This information is crucial for understanding the RPO of the system and for making informed decisions during a recovery process.
You can easily check the last recoverable time by looking at the status of the PointInTimeRecovery object:
Then, you may provide exactly this timestamp, or an earlier one, as target recovery time when bootstrapping a new MariaDB instance, as described in the point-in-time restoration section.
Point-in-time restoration
In order to perform a point-in-time restoration, you can create a new MariaDB instance with a reference to the PointInTimeRecovery object in the bootstrapFrom field, along with the targetRecoveryTime field indicating the desired point-in-time to restore to.
For setting the targetRecoveryTime, it is recommended to check the last recoverable time first in the PointInTimeRecovery object:
pointInTimeRecoveryRef: Reference to thePointInTimeRecoveryobject that contains the configuration for the point-in-time recovery.targetRecoveryTime: The desired point in time to restore to. It should be in RFC3339 format. If not provided, the current time will be used as target recovery time, which means restoring up to the last recoverable time.restoreJob: Compute resources and metadata configuration for the restoration job. To reduce RTO, it is recommended to properly tune compute resources.logLevel: Log level for the operator container, part of the restoration job.
The restoration process will match the closest physical backup before or at the targetRecoveryTime, and then it will replay the archived binary logs from the backup GTID position up until the targetRecoveryTime:
As you can see, the restoration process includes the following steps:
Perform a rolling restore of the full base backup, one
Podat a time.Configure replication in the
MariaDBinstance.Get the base backup GTID, to be used as the starting point for replaying the binary logs.
Schedule the point-in-time restoration job, which will:
Build the binlog timeline based on the base backup GTID and the archived binary log inventory.
Pull the binary logs in the timeline into a staging area.
Replay the binary logs using mariadb-binlog from the GTID position of the base backup up to the
targetRecoveryTime.
After having completed the restoration process, the following status conditions will be available for you to inspect the restoration process:
Strict mode
The strict mode controls whether the target recovery time provided during the bootstrap process should be strictly met or not. This is configured via the strictMode field in the PointInTimeRecovery configuration, and it is disabled by default:
When strict mode is enabled (recommended), if the target recovery time cannot be met, the initialization process will return an error early, and the MariaDB instance will not be created. This can happen, for example, if the target recovery time is later than the last recoverable time. Let's assume strict mode is enabled and the last recoverable time is:
If we attempt to provision the following MariaDB instance:
The following errors will be returned, as the target recovery time 2026-02-28T20:10:42Z is later than the last recoverable time 2026-02-27T20:10:42Z:
When strict mode is disabled (default), and the target recovery time cannot be met, the MariaDB provisioning will proceed and the last recoverable time will be used. This would mean that, the MariaDB instance will be provisioned with a recovery time of 2026-02-27T20:10:42Z, which is the last recoverable time:
After setting strictMode=false, if we attempt to create the same MariaDB instance as before, it will be successfully provisioned, but with a recovery time of 2026-02-27T20:10:42Z will be used instead of the requested 2026-02-28T20:10:42Z.
It is important to note that the last recoverable time is stored in the status field of the PointInTimeRecovery object, therefore if this object is deleted and recreated, the last recoverable time metadata will be lost, and it will not be available until recomputed. When it comes to restore, this implies that the error will be returned later in the process, when computing the binary log timeline, but the strict mode behaviour still applies. This is the error returned for that scenario:
Staging storage
The operator uses a staging area to temporarily store the binary logs during the restoration process. By default, the staging area is an emptyDir volume attached to the restoration job, which means that the binary logs are kept in the node storage where the job has been scheduled. This may not be suitable for large binary logs, as it can lead to exhausting the node's storage, resulting the restoration process to fail and potentially impacting other workloads running in the same node.
You are able to configure an alternative staging area using the stagingStorage field under the bootstrapFrom section in the MariaDB resource:
This will provision a PVC and attach it to the restoration job to be used as staging area.
Limitations
A
PointInTimeRecoveryobject can only be referred by a singleMariaDBobject via thepointInTimeRecoveryReffield.A combination object storage bucket + prefix can only be utilizied by a single
MariaDBinstance to archive binary logs.
Troubleshooting
The operator tracks the current archival status under the MariaDB status subresource. This status is updated after each archival cycle, and it contains metadata about the binary logs that have been archived, along with other useful information for troubleshooting:
Additionally, also under the status subresource, the operator sets status conditions whenever a specific state of the binlog archival or point-in-time restoration process is reached:
The operator also emits Kubernetes events during both archival and restoration process, to either report an outstanding event or error:
Common errors
Unable to start archival process
The following error will be returned if the archival process is configured pointing to a non-empty object storage, as the operator expects to start from a clean state:
To solve this, you can update the PointInTimeRecovery configuration pointing to another object storage bucket or prefix that is empty:
After updating the PointInTimeRecovery configuration, the error will be cleared in the next archival cycle, and a new archival operation will be attempted.
Alternatively, you can also consider deleting the existing binary logs and index.yaml inventory file, only after having double checked that they are not needed for recovery.
Target recovery time is after latest recoverable time
This error is returned in the MariaDB init process, when the targetRecoveryTime provided to bootstrap is later than the last recoverable time reported by the PointInTimeRecovery status.
For example, if you have configured the bootstrapFrom.targetRecoveryTime field with the value 2026-02-28T20:10:42Z, the following error will be returned:
There are two ways to solve this issue:
Update the
targetRecoveryTimein theMariaDBresource to be earlier than or equal to the last recoverable time, which in this case is2026-02-27T20:10:42Z.Disable
strictModein thePointInTimeRecoveryconfiguration, allowing to restore up until the latest recoverable time, in this case2026-02-27T20:10:42Z.
Invalid binary log timeline: error getting binlog timeline between GTID and target time: timeline did not reach target time
This error is returned when computing the binary log timeline during the restoration process, and it means that the operator could not build a timeline that reaches the targetRecoveryTime provided in the bootstrapFrom field of the MariaDB resource.
For example, if you have the following binary log inventory:
And your targetRecoveryTime is 2026-02-28T20:10:42Z, the following error will be returned:
There are two ways to solve this issue:
Update the
targetRecoveryTimein theMariaDBresource to be earlier than or equal to the last recoverable time, which in this case is2026-02-27T16:04:15Z.Disable
strictModein thePointInTimeRecoveryconfiguration, allowing to restore up until the latest recoverable time, in this case2026-02-27T16:04:15Z.
Last updated
Was this helpful?

