MariaDB Enterprise Kubernetes Operator automates provisioning, scaling, backups, and high availability, making cloud-native database operations efficient and reliable.
Customer access to docker.mariadb.com
This documentation aims to provide guidance on how to configure access to docker.mariadb.com in your MariaDB Enterprise Kubernetes Operator resources.
Customer credentials
MariaDB Corporation requires customers to authenticate when logging in to the . A Customer Download Token must be provided as the password. Customer Download Tokens are available through the MariaDB Customer Portal. To retrieve the customer download token for your account:
Navigate to the .
Log in using your .
Copy the Customer Download Token to use as the password when logging in to the MariaDB Enterprise Docker Registry.
Then, configure a Kubernetes to authenticate:
Openshift
If you are running in Openshift, it is recommended to use the to configure . The global pull secret is automatically used by all Pods in the cluster, without having to specify imagePullSecrets explicitly.
To configure the global pull secret, you can use the following commands:
Extract your :
Login in the MariaDB registry providing the customer download token as password:
Update the global pull secret:
Alternatively, you can also create a dedicated Secret for authenticating:
MariaDB
In order to configure access to docker.mariadb.com in your MariaDB resources, you can use the imagePullSecrets field to specify your :
As a result, the Pods created as part of the reconciliation process will have the imagePullSecrets.
MaxScale
Similarly to MariaDB, you are able to configure access to docker.mariadb.com in your MaxScale resources:
Backup, Restore and SqlJob
The batch Job resources will inherit the imagePullSecrets from the referred MariaDB, as they also make use of its image. However, you are also able to provide dedicated imagePullSecrets for these resources:
When the resources from the previous examples are created, a Job with both mariadb-enterprise and backup-registryimagePullSecrets will be reconciled.
Configure multiple backup strategies and perform restoration.
Installation
Installation instructions for MariaDB Enterprise Kubernetes Operator in Kubernetes and OpenShift
Plugins
Learn about the plugins supported by the MariaDB Enterprise Kubernetes Operator and how to configure them.
Topologies
Different topologies supported by the operator.
Migrations
Learn about migrations with MariaDB Enterprise Kubernetes Operator. This section covers strategies and procedures for smoothly migrating your MariaDB databases within Kubernetes environments.
Data Plane
In order to effectively manage the full lifecycle of both replication and Galera topologies, the operator relies on a set of components that run alonside the MariaDB instances and expose APIs for remote management. These components are collectively referred to as the "data-plane".
Components
The mariadb-enterprise-operator data-plane components are implemented as lightweight containers that run alongside the MariaDB instances within the same Pod. These components are available in the operator image. More preciselly, they are subcommands of the CLI shipped as binary inside the image.
Init container
The init container is reponsible for dynamically generating the Pod-specifc configuration files before the MariaDB container starts. It also plays a crucial role in the MariaDB container startup, enabling replica recovery for the replication topolology and guaranteeing ordered deployment of Pods for the Galera topology.
Agent sidecar
The agent sidecar provides an HTTP API that enables the operator to remotely manage MariaDB instances. Through this API, the operator is able to remotely operate the data directory and handle the instance lifecycle, including operations such as replica recovery for replication and cluster recovery for the Galera topology.
It supports methods to ensure that only the operator is able to call the agent API.
Agent auth methods
As previously mentioned, the agent exposes an API to remotely manage the replication and Galera clusters. The following authentication methods are supported to ensure that only the operator is able to call the agent:
ServiceAccount based authentication
The operator uses its ServiceAccount token as a mean of authentication for communicating with the agent, which subsequently verifies the token by creating a . This is the default authentication method and will be automatically applied by setting:
This Kubernetes-native authentication mechanism eliminates the need for the operator to manage credentials, as it relies entirely on Kubernetes for this purpose. However, the drawback is that the agent requires cluster-wide permissions to impersonate the ClusterRole and to create , which are cluster-scoped objects.
Basic authentication
As an alternative, the agent also supports basic authentication:
Unlike the , the operator needs to explicitly generate credentials to authenticate. The advantage of this approach is that it is entirely decoupled from Kubernetes and it does not require cluster-wide permissions on the Kubernetes API.
Updates
Please refer to the updates documentation for more information about .
This operator allows you to configure standalone MariaDB Enterprise Server instances. To achieve this, you can either omit the replicas field or set it to 1:
The examples catalog contains a number of sample manifests that aim to show the operator functionality in a practical way. Follow these instructions for getting started:
This section outlines a recommended StorageClass configuration for the that resolves common mounting and list operation issues encountered in Kubernetes environments.
The following is recommended when working with Azure Blob Storage (ABS).
Next, when defining your PhysicalBackup
25.08 version update guide
This guide illustrates, step by step, how to update to 25.8.0 from previous versions.
Uninstall you current mariadb-enterprise-operator for preventing conflicts:
Alternatively, you may only downscale and delete the webhook configurations:
Suspend Reconciliation
Suspended state
When a resource is suspended, all operations performed by the operator are disabled, including but not limited to:
Provisioning
Supported Docker Images
The following is a list of images that have plugins installed and available to use.
Even though these images have plugins installed, that doesn't necessarily mean that they are enabled by default. You may need to install them. The recommended operator native way to do so is to use:
Each supported plugin will have a section on how to install it.
Issue 1: Access for Non-Root Containers (-o allow_other)
The default configuration prevents non-root Kubernetes containers from accessing the mounted blob container, resulting in an "unaccessible" volume. By setting the mountOption -o allow_other, non-root containers are granted access to the volume, resolving this issue.
Issue 2: Immediate List Operations and Backup Deletion (--cancel-list-on-mount-seconds=0)
When using the blob-csi-driver with its default settings, list operations (which are critical for cleaning up old backups) may not work immediately upon mount, leading to issues like old physical backups never being deleted. Setting the mountOption --cancel-list-on-mount-seconds to "0" ensures that list operations work as expected immediately after the volume is mounted.
Setting cancel-list-on-mount-seconds to 0 forces the driver to perform an immediate list operation, which may increase both initial mount time and Azure transaction costs (depending on the number of objects in the container). Operators should consider these performance and financial trade-offs and consult the official Azure Blob Storage documentation or an Azure representative for guidance.
Upgrade mariadb-enterprise-operator-crds to 25.8.0:
The Galera data-plane must be updated to the 25.8.0 version.
If you want the operator to automatically update the data-plane (i.e. init and agent containers), you can set updateStrategy.autoUpdateDataPlane=true in your MariaDB resources:
Alternatively, you can also do this manually:
Upgrade mariadb-enterprise-operator to 25.8.0:
If you previously decided to downscale the operator, make sure you upscale it back:
If you previously set updateStratety.autoUpdateDataPlane=true, you may consider reverting the changes once the upgrades have finished:
More specifically, the reconciliation loop of the operator is omitted, anything part of it will not happen while the resource is suspended. This could be useful in maintenance scenarios, where manual operations need to be performed, as it helps prevent conflicts with the operator.
Suspend a resource
Currently, only MariaDB and MaxScale resources support suspension. You can enable it by setting suspend=true:
This results in the reconciliation loop being disabled and the status being marked as Suspended:
To re-enable it, simply remove the suspend setting or set it to suspend=false.
In this guide, we will be migrating an external MariaDB into a new MariaDB instance running in Kubernetes and managed by MariaDB Enterprise Kubernetes Operator. We will be using logical backups for achieving this migration.
If you are currently using or migrating to a Galera instance, use the following command instead:
2. Ensure that your backup file matches the following format: backup.2024-08-26T12:24:34Z.sql. If the file name does not follow this format, it will be ignored by the operator.
3. Upload the backup file to one of the supported . We recommend using S3.
4. Create your MariaDB resource declaring that you want to and providing a that matches the backup:
5. If you are using Galera in your new instance, migrate your previous users and grants to use the User and Grant CRs. Refer to the for further detail.
MariaDB Enterprise Kubernetes Operator supports managing resources in external MariaDB instances i.e running outside of the Kubernetes cluster where the operator runs. This feature allows to manage users, privileges, databases, run SQL jobs declaratively and taking backups using the same CRs that you use to manage internal MariaDB instances.
ExternalMariaDB configuration
The ExternalMariaDB resource is similar to the internal MariaDB resource, but we need to provide a
Storage
This operator gives you flexibility to define the storage that will back the /var/lib/mysql data directory mounted by MariaDB.
Configuration
The simplest way to configure storage for your MariaDB is:
Enabling TLS in existing instances
In this guide, we will be migrating existing MariaDB Galera and MaxScale instances to without downtime.
1. Ensure that MariaDB has TLS enabled and not enforced. Set the following options if needed:
By setting these options, the operator will issue and configure certificates for MariaDB, but TLS will not be enforced in the connections i.e. both TLS and non-TLS connections will be accepted. TLS enforcement will be optionally configured at the end of the migration process.
This will trigger a rolling upgrade, make sure it finishes successfully before proceeding with the next step. Refer to the for further information about update strategies.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: blob-fuse
provisioner: blob.csi.azure.com
parameters:
protocol: fuse2
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
mountOptions:
# Resolves the issue where non-root containers cannot access the mounted blob container.
- -o allow_other
# Ensures list operations (critical for backups/deletion) work immediately upon mount.
- --cancel-list-on-mount-seconds=0
apiVersion: enterprise.mariadb.com/v1alpha1
kind: PhysicalBackup
metadata:
name: physicalbackup
spec:
# ...
storage:
persistentVolumeClaim:
# Specify your own class
storageClassName: blob-fuse
containing the user password. These will be the connection details that the operator will use to connect to the external MariaDB in order to manage resources, make sure that the specified user has enough privileges:
If you need to use TLS to connect to the external MariaDB, you can provide the server CA certificate and the client certificate Secrets via the tls field:
When using TLS, if you don't want to send the client certificate during the TLS handshake, please set tls.mutual=false:
As a result, you will be able to specify the ExternalMariaDB as a reference in multiple objects, the same way you would do for a internal MariaDB resource.
As part of the ExternalMariaDB reconciliation, a Connection will be created whenever the connection template is specified. This could be handy to track the external connection status and declaratively create a connection string in a Secret to be consumed by applications to connect to the external MariaDB.
Supported objects
Currently, the ExternalMariaDB resource is supported by the following objects:
Connection
User
Grant
Database
Backup
SqlJob
You can use it as an internal MariaDB resource, just by setting kind to ExternalMariaDB in the mariaDBRef field:
When the previous example gets reconciled, an user will be created in the referred external MariaDB instance.
This will make use of the default StorageClass available in your cluster, but you can also provide a different one:
Under the scenes, the operator is configuring the StatefulSet's volumeClaimTemplate property, which you are also able to provide yourself:
Volume resize
The StorageClass used for volume resizing must define allowVolumeExpansion = true.
It is possible to resize your storage after having provisioned a MariaDB. We need to distinguish between:
PVCs already in use.
StatefulSet storage size, which will be used when provisioning new replicas.
It is important to note that, for the first case, your StorageClass must support volume expansion by declaring the allowVolumeExpansion = true. In such case, it will be safe to expand the storage by increasing the size and setting resizeInUseVolumes = true:
Depending on your storage provider, this operation might take a while, and you can decide to wait for this operation before the MariaDB becomes ready by setting waitForVolumeResize = true. Operations such as Galera cluster recovery and will not be performed if the MariaDB resource is not ready.
Ephemeral storage
Provisioning standalone MariaDB instances with ephemeral storage can be done by setting ephemeral = true:
This may be useful for multiple use cases, like provisioning ephemeral MariaDBs for the integration tests of your CI.
2. If you are currently using MaxScale, it is important to note that, unlike MariaDB, it does not support TLS and non-TLS connections simultaneously (see limitations). For this reason, you must temporarily point your applications to MariaDB during the migration process. You can achieve this by configuring your application to use the . At the end of the MariaDB migration process, the MaxScale instance will need to be recreated in order to use TLS, and then you will be able to point your application back to MaxScale. Ensure that all applications are pointing to MariaDB before moving on to the next step.
3.MariaDB is now accepting TLS connections. The next step is migrating your applications to use TLS by pointing them to MariaDB securely. Ensure that all applications are connecting to MariaDB via TLS before proceeding to the next step.
4. If you are currently using MaxScale, and you are planning to connect via TLS through it, you should now delete your MaxScale instance. If needed, keep a copy of the MaxScale manifest, as we will need to recreate it with TLS enabled in further steps:
It is very important that you wait until your old MaxScale instance is fully terminated to make sure that the old configuration is cleaned up by the operator.
5. For enhanced security, it is recommended to enforce TLS in all MariaDB connections by setting the following options. This will trigger a rolling upgrade, make sure it finishes successfully before proceeding with the next step:
6. For improved security, you can optionally configure TLS for Galera SSTs by following the steps below:
Run the migration script. Make sure you set <mariadb-name> with the name of the MariaDB resource:
Set the following option to enable TLS for Galera SSTs:
This will trigger a rolling upgrade, make sure it finishes successfully before proceeding with the next step
7. As mentioned in step 4, recreate your MaxScale instance with tls.enabled=true if needed:
8.MaxScale is now accepting TLS connections. Next, you need to migrate your applications to use TLS by pointing them back to MaxScale securely. You have done this previously for MariaDB, you just need to update your application configuration to use the MaxScale Service and its CA bundle.
This documentation provides guidance on installing the MariaDB Enterprise Kubernetes Operator operator in OpenShift. This operator has been certified by Red Hat and it is available in the OpenShift console.
Operators are deployed into OpenShift with the Operator Lifecycle Manager (OLM), which facilitates the installation, updates, and overall management of their lifecycle.
The recommended way to configure credentials is to use the provided by OpenShift, as described . Alternatively, the operator bundle has a mariadb-enterpriseimagePullSecret configured by default. This means that you can configure a Secret named mariadb-enterprise in same namespace where the operator will be installed in order to pull images from the MariaDB Enterprise registry.
PackageManifest
You can install the certified operator in OpenShift clusters that have the mariadb-enterprise-operatorpackagemanifest available. In order to check this, run the following command:
SecurityContextConstraints
Both the operator and the operand Pods run with the restricted-v2SecurityContextConstraint, the most restrictive SCC in OpenShift in terms of container permissions. This implies that OpenShift automatically assigns a SecurityContext for the Pods with minimum permissions, for example:
OpenShift does not assign SecurityContexts in the default and kube-system namespaces. Please refrain from deploying operands on them, as it will result in permission errors when trying to write to the filesystem.
You can read more about .
Installation in all namespaces
To install the operator watching resources on all namespaces, you need to create a Subscription object for mariadb-enterprise-operator using the stable channel in the openshift-operators namespace:
This will use the global-operatorsOperatorGroup that is created by default in the openshift-operators namespace. This OperatorGroup will watch all namespaces in the cluster, and the operator will be able to manage resources across all namespaces.
You can read more about .
Installation in specific namespaces
In order to define which namespaces the operator will be watching, you need to create an OperatorGroup in the namespace where the operator will be installed:
This OperatorGroup will watch the namespaces defined in the targetNamespaces field. The operator will be able to manage resources only in these namespaces.
Then, the operator can be installed by creating a Subscription object in the same namespace as the OperatorGroup:
Release channels
We maintain support across a variety of OpenShift channels to ensure compatibility with different release schedules and stability requirements. Below, you will find an overview of the specific OpenShift channels we support.
Channel
Supported OpenShift Versions
Description
An example Subscription would look like this:
Updates
Updates are fully managed by OLM and controlled by the installPlanApproval field in the Subscription object. The default value is Automatic, which means that OLM will automatically update the operator to the latest version available in the channel. If you want to control the updates, you can set this field to Manual, and OLM will only update the operator when you approve the update.
Uninstalling
The first step for uninstalling the operator is to delete the Subscription object. This will not remove the operator, but it will stop OLM from managing the operator:
After that, you can uninstall the ClusterServiceVersion (CSV) object that was created by OLM. This will remove the operator from the cluster:
OpenShift console
As an alternative to create Subscription objects via the command line, you can install operators by using the OpenShift console. Go to the Operators > OperatorHub section and search by mariadb enterprise:
Select MariaDB Enterprise Kubernetes Operator, click on install, and you will be able to create a Subscription object via the UI.
Once deployed, the operator comes with example resources that can be deployed from the console directly. For instance, to create a MariaDB:
As you can see in the previous screenshot, the form view that the OpenShift console offers is limited, we recommend using the YAML view:
This guide illustrates, step by step, how to update to 25.10.4 from previous versions. This guide only applies if you are updating from a version prior to 25.10.x, otherwise you may upgrade directly (see Helm and OpenShift docs)
The Galera data-plane must be updated to the 25.10.4 version. You must set updateStrategy.autoUpdateDataPlane=true in your MariaDB resources before updating the operator. Then, once updated, the operator will also be updating the data-plane based on its version:
Once set, you may proceed to update the operator. If you are using Helm:
Upgrade the mariadb-enterprise-operator-crds helm chart to 25.10.4:
Upgrade the mariadb-enterprise-operator helm chart to 25.10.4:
As part of the 25.10 LTS release, we have introduced support for LTS versions. Refer to the for sticking to LTS versions.
If you are on OpenShift:
If you are on the stable channel using installPlanApproval=Automatic in your Subscription object, then the operator will be automatically updated. If you use installPlanApproval=Manual, you should have a new InstallPlan which needs to be approved to update the operator:
As part of the 25.10 LTS release, we have introduced new . Consider switching to the stable-v25.10 if you are willing to stay in the 25.10.x version:
Consider reverting updateStrategy.autoUpdateDataPlane back to false in your MariaDB object to avoid unexpected updates:
0.37.1 version of the MariaDB Community Operator is installed in the cluster.
MariaDB community resources will be migrated to its counterpart MariaDB enterprise resource. In this case, we will be using 11.4.4 version, which is supported in both community and enterprise versions. Check the supported and migrate to a counterpart community version first if needed.
MaxScale resources cannot be migrated in a similar way, they need to be recreated. To avoid downtime, temporarily point your applications to MariaDB directly during the migration.
1. Install the Enterprise CRDs as described in the .
2. Get the and grant execute permissions:
3. Migrate MariaDB resources using the migration script. Make sure you set <mariadb-name> with the name of the MariaDB resource to be migrated and <operator-version> with the version of the Enterprise operator you will be installing:
4. Update the apiVersion of the rest of CRs to enterprise.mariadb.com/v1alpha1.
5. Uninstall the Community operator:
6. If your MariaDB Community had Galera enabled, delete the <mariadb-name>Role, as it will be specifying the Community CRDs:
7. Install the Enterprise operator as described in the . This will trigger a rolling upgrade, make sure it finishes successfully before proceeding with the next step.
8. Delete the finalizers and uninstall the Community CRDs:
9. Run mariadb-upgrade in all Pods. Make sure you set <mariadb-name> with the name of the MariaDB resource:
This guide aims to provide a quick way to get started with the MariaDB Enterprise Kubernetes Operator for Kubernetes. It will walk you through the process of deploying a MariaDB Enterprise Cluster and MaxScale via the MariaDB and MaxScale CRs (Custom Resources) respectively.
Before you begin, ensure you meet the following prerequisites:
The first step will be configuring a Secret with the credentials used by the MariaDB CR:
Next, we will deploy a MariaDB Enterprise Cluster (Galera) using the following CR:
Let's break it down:
rootPasswordSecretKeyRef: A reference to a Secret containing the root password.
imagePullSecrets: The name of the Secret containing the customer credentials to pull the MariaDB Enterprise Server image.
After applying the CR, we can observe the MariaDB Pods being created:
Now, let's deploy a MaxScale CR:
Again, let's break it down:
imagePullSecrets: The name of the Secret containing the customer credentials to pull the MaxScale image.
mariaDbRef: A reference to the MariaDB CR that we want to connect to.
After applying the CR, we can observe the MaxScale Pods being created, and that both the MariaDB and MaxScale CRs will become ready eventually:
To conclude, let's connect to the MariaDB Enterprise Cluster through MaxScale using the initial user and database we initially defined in the MariaDB CR:
You have successfully deployed a MariaDB Enterprise Cluster with MaxScale in Kubernetes using the MariaDB Enterprise Kubernetes Operator!
MariaDB Operator Enterprise enables you to manage SQL resources declaratively through CRs. By SQL resources, we refer to users, grants, and databases that are typically created using SQL statements.
The key advantage of this approach is that, unlike executing SQL statements manually, which is a one-time operation, declaring a SQL resource via a CR ensures that the resource is periodically reconciled by the operator. This provides a guarantee that the resource will be recreated if it gets manually deleted. Additionally, it prevents state drifts, as the operator will regularly update the resource according to the CR specification.
User CR
Updates
By leveraging the automation provided by MariaDB Enterprise Kubernetes Operator, you can declaratively manage large fleets of databases using CRs. This also covers day two operations, such as upgrades, which can be risky when rolling out updates to thousands of instances simultaneously.
To mitigate this, and to give you full control on the upgrade process, you are able to choose between multiple update strategies described in the following sections.
Update strategies
In order to provide you with flexibility for updating MariaDB reliably, this operator supports multiple update strategies:
Metadata
This documentation shows how to configure metadata in the MariaDB Enterprise Kubernetes Operator CRs.
Children object metadata
MariaDB and MaxScale resources allow you to propagate metadata to all the children objects by specifying the inheritMetadata field:
By creating this resource, you are declaring an intent to create an user in the referred MariaDB instance, just like a statement would do:
In the example above, a user named bob identified by the password available in the bob-passwordSecret will be created in the mariadb instance.
Refer to the API reference for more detailed information about every field.
Custom name
By default, the CR name is used to create the user in the database, but you can specify a different one providing the name field under spec:
Grant CR
By creating this resource, you are declaring an intent to grant permissions to a given user in the referred MariaDB instance, just like a statement would do.
You may provide any set of .
Refer to the API reference for more detailed information about every field.
Database CR
By creating this resource, you are declaring an intent to create a logical database in the referred MariaDB instance, just like a statement would do:
Refer to the API reference for more detailed information about every field.
Custom name
By default, the CR name is used to create the user in the database, but you can specify a different one providing the name field under spec:
Initial User, Grant and Database
If you only need one user to interact with a single logical database, you can use of the MariaDB resource to configure it, instead of creating the User, Grant and Database resources separately:
Behind the scenes, the operator will be creating an User resource with ALL PRIVILEGES in the initial Database.
Authentication plugins
This feature requires the skip-strict-password-validation option to be set. See: .
Passwords can be supplied using the passwordSecretKeyRef field in the User CR. This is a reference to a Secret that contains a password in plain text.
Alternatively, you can use to avoid passing passwords in plain text and provide the password in a hashed format instead. This doesn't affect the end user experience, as they will still need to provide the password in plain text to authenticate.
Password hash
Provide the password hashed using the function:
The password hash can be obtained by executing SELECT PASSWORD('<password>'); in an existing MariaDB installation.
Password plugin
Provide the password hashed using any of the available , for example mysql_native_password:
The plugin name should be available in a Secret referenced by pluginNameSecretKeyRef and the argument passed to it in pluginArgSecretKeyRef. The argument is the hashed password in most cases, refer to the for further detail.
Configure reconciliation
As we previously mentioned, SQL resources are periodically reconciled by the operator into SQL statements. You are able to configure the reconciliation interval using the following fields:
If the SQL statement executed by the operator is successful, it will schedule the next reconciliation cycle using the requeueInterval. If the statement encounters an error, the operator will use the retryInterval instead.
Cleanup policy
Whenever you delete a SQL resource, the operator will also delete the associated resource in the database. This is the default behaviour, that can also be achieved by setting cleanupPolicy=Delete:
You can opt-out from this cleanup process using cleanupPolicy=Skip. Note that this resources will remain in the database.
ReplicasFirstPrimaryLast: Roll out replica Pods one by one, wait for each of them to become ready, and then proceed with the primary Pod.
RollingUpdate: Utilize the rolling update strategy from Kubernetes.
OnDelete: Updates are performed manually by deleting Pods.
: Pause updates.
Configuration
The update strategy can be configured in the updateStrategy field of the MariaDB resource:
It defaults to ReplicasFirstPrimaryLast if not provided.
Trigger updates
Updates are not limited to updating the image field in the MariaDB resource, an update will be triggered whenever any field of the Pod template is changed. This translates into making changes to MariaDB fields that map directly or indirectly to the Pod template, for instance, the CPU and memory resources:
Once the update is triggered, the operator manages it differently based on the selected update strategy.
ReplicasFirstPrimaryLast
This role-aware update strategy consists in rolling out the replica Pods one by one first, waiting for each of them become ready (i.e. readiness probe passed), and then proceed with the primary Pod. This is the default update strategy, as it can potentially meet various reliability requirements and minimize the risks associated with updates:
Write operations won't be affected until all the replica Pods have been rolled out. If something goes wrong in the update, such as an update to an incompatible MariaDB version, this is detected early when the replicas are being rolled out and the update operation will be paused at that point.
Read operations impact is minimized by only rolling one replica Pod at a time.
Waiting for every Pod to be synced minimizes the impact in the clustering protocols and the network.
RollingUpdate
This strategy leverages the rolling update strategy from the StatefulSet resource, which, unlike ReplicasFirstPrimaryLast, does not take into account the role of the Pods(primary or replica). Instead, it rolls out the Pods one by one, from the highest to the lowest StatefulSet index.
You are able to pass extra parameters to this strategy via the rollingUpdate object:
OnDelete
This strategy aims to provide a method to update MariaDB resources manually by allowing the user to restart the Pods individually. This way, the user has full control over the update process and can decide which Pods are rolled out at any given time.
Whenever an update is triggered, the MariaDB will be marked as pending to update:
From this point, you are able to delete the Pods to trigger the update, which will result the MariaDB marked as updating:
Once all the Pods have been rolled out, the MariaDB resource will be back to a ready state:
Never
The operator will not perform updates on the StatefulSet whenever this update strategy is configured. This could be useful in multiple scenarios:
Progressive fleet upgrades: If you're managing large fleets of databases, you likely prefer to roll out updates progressively rather than simultaneously across all instances.
Operator upgrades: When upgrading the operator, changes to the StatefulSet or the Pod template may occur from one version to another, which could trigger a rolling update of your MariaDB instances.
Data-plane updates
Highly available topologies rely on data-plane containers that run alongside MariaDB to enable the remote management of the database instances. These containers use the mariadb-enterprise-operator image, which can be automatically updated by the operator based on its image version:
By default, updateStrategy.autoUpdateDataPlane is false, which means that no automatic upgrades will be performed, but you can opt-in/opt-out from this feature at any point in time by updating this field. For instance, you may want to selectively enable updateStrategy.autoUpdateDataPlane in a subset of your MariaDB instances after the operator has been upgraded to a newer version, and then disable it once the upgrades are completed.
It is important to note that this feature is fully compatible with the Never strategy: no upgrades will happen when updateStrategy.autoUpdateDataPlane=true and updateStrategy.type=Never.
This means that all the reconciled objects will inherit these labels and annotations. For instance, see the Services and Pods:
Pod metadata
You have the ability to provide dedicated metadata for Pods by specifying the podMetadata field in any CR that reconciles a Pod, for instance: MariaDB, MaxScale, Backup, Restore and SqlJobs:
It is important to note that the podMetadata field supersedes the inheritMetadata field, therefore the labels and annotations provided in the former will override the ones in the latter.
Service metadata
Provision dedicated metadata for Services in the MariaDB resources can be done via the service, primaryService and secondaryService fields:
In the case of MaxScale, you can also do this via the kubernetesService field.
Refer to the to know more about the Service fields and MaxScale.
PVC metadata
Both MariaDB and MaxScale allow you to define a volumeClaimTemplate to be used by the underlying StatefulSet. You may also define metadata for it:
Use cases
Being able to provide metadata allows you to integrate with other CNCF landscape projects:
Metallb
If you run on bare metal and you use Metallb for managing the LoadBalancer objects, you can declare its IPs via annotations:
Istio
Istio injects the data-plane container to all Pods, but you might want to opt-out of this feature in some cases:
For instance, you probably don't want to inject the Istio sidecar to BackupPods, as it will prevent the Jobs from finishing and therefore your backup process will hang.
Points to the latest stable version of the operator. This channel may span multiple major versions.
stable-v25.10
4.18, 4.16
v25.10.x is an LTS release. This channel points to the latest patch release of 25.10. Use this if you require version pinning to a stable version of the operator without necessarily looking for newer features.
All the Docker images used by this operator are based on Red Hat UBI and have been certified by Red Hat. The advantages of using UBI based images are:
Immutability: UBI images are built to be secure and stable, reducing the risk of unintended changes or vulnerabilities due to mutable base layers.
Small size: The UBI and variants used by this operator are designed to be lightweight, containing only the essential packages. This can lead to smaller container image sizes, resulting in faster build times, reduced storage requirements, and quicker image pulls.
Security and compliance: Regular CVE scanning and vulnerability patching help maintain compliance with industry standards and security best practices.
Enterprise-grade support: UBI images are maintained and supported by Red Hat, ensuring timely security updates and long-term stability.
List of compatible images
MariaDB Enterprise Kubernetes Operator is compatible with the following Docker images:
Component
Image
Supported Tags
CPU Architecture
Refer to the registry documentation to .
Working With Air-Gapped Environments
This section outlines several methods for pulling official MariaDB container images from docker.mariadb.com and making them available in your private container registry. This is often necessary for air-gapped, offline, or secure environments.
Option 1: Direct Pull, Tag, and Push
This method is ideal for a "bastion" or "jump" host that has network access to both the public internet (specifically docker.mariadb.com) and your internal private registry.
Log in to both registries. You will need a MariaDB token for the public registry and your credentials for the private one. Refer to the .
Pull the required image. Pull the official MariaDB Enterprise Kubernetes Operator image from its public registry.
Tag the image for your private registry. Create a new tag for the image that points to your private registry's URL and desired repository path.
Option 2: Using a Proxy or Caching Registry
Many modern container registries can be configured to function as a pull-through cache or proxy for public registries. When an internal client requests an image, your registry pulls it from the public source, stores a local copy, and then serves it. This automates the process after initial setup.
You can use as a pull-through cache (Harbor calls this Replication Rules).
Option 3: Offline Transfer using docker save and docker push
This method is designed for fully air-gapped environments where no single machine has simultaneous access to the internet and the private registry.
On the Internet-Connected Machine
Log in and pull the image.
Save the image to a tar archive. This command packages the image into a single, portable file.
Use a tool like scp or sftp or a USB drive to copy the generated .tar archives from the internet-connected machine to your isolated systems.
On the Machine with Private Registry Access
Load the image from the archive.
Log in to your private registry.
Tag the loaded image. The image loaded from the tar file will retain its original tag. You must re-tag it for your private registry.
Option 4: For OpenShift, you can use OpenShift Disconnected Installation Mirroring
Refer to the
Option 5: Offline Transfer for containerd Environments
This method is for air-gapped environments that use containerd as the container runtime (common in Kubernetes) and do not have the Docker daemon. It uses the ctr command-line tool to import, tag, and push images. ⚙️
1. On the Bastion Host (with Internet)
First, on a machine with internet access, you'll pull the images and export them to portable archive files.
Pull the Container Image Use the ctr image pull command to download the required image from its public registry.
Note: If your bastion host uses Docker, you can use docker pull instead as we did in Option 3.
Export the Image to an Archive Next, export the pulled image to a .tar file using
Repeat this process for all the container images you need to transfer.
2. Transfer the Archives
Use a tool like scp or sftp or a USB drive to copy the generated .tar archives from the bastion host to your isolated systems.
3. On the Isolated Host
Finally, on the isolated system, you will import the archives into containerd.
Importing for Kubernetes (Important!) ⚙️ If the images need to be available to Kubernetes, you must import them into the k8s.io namespace by adding the -n=k8s.io flag.
Verify the Image Check that containerd recognizes the newly imported image.
You can also verify that the Container Runtime Interface (CRI) sees it by running:
Important Note
The examples above use the mariadb-enterprise-operator:25.8.0 image. You must repeat the chosen process for all required container images. A complete list is available
for crd in $(kubectl get crds -o json | jq -r '.items[] | select(.spec.group=="k8s.mariadb.com") | .metadata.name'); do
kubectl get "$crd" -A -o json | jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name)"' | while read cr; do
ns=$(echo "$cr" | cut -d'/' -f1)
name=$(echo "$cr" | cut -d'/' -f2)
echo "Removing finalizers from $crd: $name in $ns..."
kubectl patch "$crd" "$name" -n "$ns" --type merge -p '{"metadata":{"finalizers":[]}}'
done
done
helm uninstall mariadb-operator-crds
for pod in $(kubectl get pods -l app.kubernetes.io/instance=<mariadb-name> -o jsonpath='{.items[*].metadata.name}'); do
kubectl exec "$pod" -- sh -c 'mariadb-upgrade -u root -p${MARIADB_ROOT_PASSWORD} -f'
done
Push the re-tagged image. Push the newly tagged image to your private registry.
Push the image to your private registry.
ctr image export
. The format is
ctr image export <output-filename> <image-name>
.
Note: To find the exact image name as containerd sees it, run ctr image ls. The Docker equivalent for this step is docker save <image-name> -o <output-filename>.
This documentation aims to provide guidance on various configuration aspects shared across many MariaDB Enterprise Kubernetes Operator CRs.
my.cnf
An inline can be provisioned in the MariaDB resource via the myCnf field:
In this field, you may provide any or supported by MariaDB.
Under the hood, the operator automatically creates a ConfigMap with the contents of the myCnf field, which will be mounted in the MariaDB instance. Alternatively, you can manage your own configuration using a pre-existing ConfigMap by linking it via myCnfConfigMapKeyRef. It is important to note that the key in this ConfigMap i.e. the config file name, must have a .cnf extension in order to be detected by MariaDB:
To ensure your configuration changes take effect, the operator triggers a MariaDB update whenever the myCnf field or the ConfigMap is updated. For the operator to detect changes in a ConfigMap, it must be labeled with enterprise.mariadb.com/watch. Refer to the section for further detail.
Compute resources
CPU and memory resouces can be configured via the resources field in both the MariaDB and MaxScale CRs:
In the case of MariaDB, it is recommended to set the innodb_buffer_pool_size system variable to a value that is 70-80% of the available memory. This can be done via the :
Timezones
By default, MariaDB does not load timezone data on startup for performance reasons and defaults the timezone to SYSTEM, obtaining the timezone information from the environment where it runs. See the for further information.
You can explicitly configure a timezone in your MariaDB instance by setting the timeZone field:
This setting is immutable and implies loading the timezone data on startup.
In regards to Backup and SqlJob resources, which get reconciled into CronJobs, you can also define a timeZone associated with their cron expression:
If timeZone is not provided, the local timezone will be used, as described in the .
Passwords
Some CRs require passwords provided as Secret references to function properly. For instance, the root password for a MariaDB resource:
By default, fields like rootPasswordSecretKeyRef are optional and defaulted by the operator, resulting in random password generation if not provided:
You may choose to explicitly provide a Secret reference via rootPasswordSecretKeyRef and opt-out from random password generation by either not providing the generate field or setting it to false:
This way, we are telling the operator that we are expecting a Secret to be available eventually, enabling the use of GitOps tools to seed the password:
: The Secret is reconciled from a SealedSecret, which is decrypted by the sealed-secrets controller.
: The Secret is reconciled fom an ExternalSecret, which is read by the external-secrets controller from an external secrets source (Vault, AWS Secrets Manager ...).
External resources
Many CRs have a references to external resources (i.e. ConfigMap, Secret) not managed by the operator.
These external resources should be labeled with enterprise.mariadb.com/watch so the operator can watch them and perform reconciliations based on their changes. For example, see the my.cnfConfigMap:
Probes
Kubernetes probes serve as an inversion of control mechanism, enabling the application to communicate its health status to Kubernetes. This enables Kubernetes to take appropriate actions when the application is unhealthy, such as restarting or stop sending traffic to Pods.
Make sure you check the if you are unfamiliar with Kubernetes probes.
Fine tunning of probes for databases running in Kubernetes is critical, you may do so by tweaking the following fields:
There isn't an universally correct default value for these thresholds, so we recommend determining your own based on factors like the compute resources, network, storage, and other aspects of the environment where your MariaDB and MaxScale instances are running.
The is used to implement encryption using keys stored in the Hashicorp Vault KMS.
For more information about configuring the plugin as well as different capabilities, please check the . This guide will cover a minimal example for configuring the plugin with the operator.
Configuring TDE in MariaDB Using Hashicorp Key Management Plugin
Transparent Data Encryption (TDE) can be configured in MariaDB leveraging the Hashicorp Key Management Plugin.
Requirements
Running and accessible Vault KMS setup with a valid SSL certificate.
Vault is unsealed and you've logged in to it with vault login $AUTH_TOKEN, where $AUTH_TOKEN is an authentication token given to you by an administrator
openssl for generating secrets
Steps
Creating A New Key-Value Store In Vault. Create a new key-value store and take note of the path. In our example we will use mariadb.
Adding necessary secrets. We will put 2 secrets with ids 1 and 2. 2 will be used for temporary files, while 1 will be used for everything else. It is not neccessary to create 2 of them and in that case, temporary files will use 1.
Note: Here you should use the path we chose in the previous step.
(Optional) Create An Authentication Token With Policy. This step can be skipped if you want to use your own token. Consult with a Vault administrator regarding this. Policies are Vault's way to restrict access to what you are allowed to do. The following is a policy that should be used by the token following the least permission principle.
After which, we can create a new token with the given policy.
You will see output similar to:
Your new token is: EXAMPLE_TOKEN.
Create A Secret For the vault token. Now that you've either created a new token, or are using an existing one, we need to create a secret with it.
Create a Secret for the Certificate Authority (CA) used to issue the Vault certificate. For further information, consult If you have the certificate locally in a file called ca.crt you can run:
Create A MariaDB Custom Resource. The final step is creating a new MariaDB instance.
mariadb-vault.yaml
kubectl apply -f mariadb-vault.yaml
Verify Encryption Works.
You should see something along the lines of:
At this point, you can check the encryption status:
If you create a new database and then table, the above query should return additional information about them. Something like:
Note: The above query is truncated. In reality, you will see a few more columns.
Day-2 Operations
Rotating Secrets
Put A New Secret In Vault. After logging in to vault, you can run again:
This will start re-encrypting data.
Monitor Re-Encryption.
If you check the encrpytion status again:
You should see CURRENT_KEY_VERSION column start getting updated to point to the new key version.
Rotating Token
Make sure when rotating the token, to do so in advance of the token expiring.
Acquire a new token and update the secret.
Restart MariaDB Pods. MariaDB will continue using the old token until the Pods are restarted. You can add the following annotation to the Pods in order to trigger an update, see the updates documentation for further detail:
Known Issues/Limitations
Vault Not Being Accessible Will Result In MariaDB Not Working
As MariaDB uses Vault to fetch it's decryption key, in case that Vault becomes unavailable, it will result in MariaDB not being able to fetch the decryption key and hence stop working. While the Hashicorp plugin has a configurable cache, that should be set and will result in MariaDB still working for a few seconds to minutes, depending on configuration, the cache is not reliable as it's ephemeral and short lived.
Deleting The Decryption Key Will Make Your Data Inaccessible.
It is recommended to back up the decryption key so accidental deletions will not result in issues.
Decryption Key Must Be Hexadecimal
Use the following to generate correct decryption keys.
Rotating The Decryption Key Before A Previous Re-Encryption Has Finished, Will Result In Data Corruption.
To check the re-encryption progress, you can run:
Look for the CURRENT_KEY_VERSION and make sure they are in sync with the latest version you have in Vault.
MariaDB Enterprise Kubernetes Operator provides the Connection resource to configure connection strings for applications connecting to MariaDB. This resource creates and maintains a Kubernetes Secret containing the credentials and connection details needed by your applications.
Connection CR
A Connection resource declares an intent to create a connection string for applications to connect to a MariaDB instance. When reconciled, it creates a Secret containing the DSN and optionally, individual connection parameters:
The operator creates a Secret named connection containing a DSN and individual fields like username, password, host, port, and database. Applications can mount this Secret to obtain the connection details.
Service selection
By default, the host in the generated Secret points to the Service named after the referenced MariaDB or MaxScale resource (the same as metadata.name). For HA MariaDB, this Service load balances across all pods, so use serviceName to target a specific Service such as <mariadb-name>-primary.
Please refer to the to identify which Services are available.
Credential generation
The operator can automatically generate credentials for users via the GeneratedSecretKeyRef type with the generate: true field. This feature is available in the MariaDB, MaxScale, and User resources.
For example, when creating a MariaDB resource with an initial user:
The operator will automatically generate a random password and store it in a Secret named app-password. You can then reference this Secret in your Connection resource:
If you prefer to provide your own password, you can opt-out from random password generation by either not providing the generate field or setting it to false. This enables the use of GitOps tools like or to seed the password.
Secret template
The secretTemplate field allows you to customize the output Secret, allowing you to include individual connection parameters:
The resulting Secret will contain:
dsn: The full connection string
username: The database username
password: The database password
Custom DSN format
You can customize the DSN format using Go templates via the format field:
Available template variables:
{{ .Username }}: The database username
{{ .Password }}: The database password
{{ .Host }}: The database host
Refer to the for additional details about the template syntax.
TLS authentication
Connection supports TLS client certificate authentication as an alternative to password authentication:
When using TLS authentication, provide tlsClientCertSecretRef instead of passwordSecretKeyRef. The referenced Secret must be a Kubernetes TLS Secret containing the client certificate and key.
Cross-namespace connections
Connection resources can reference MariaDB instances in different namespaces:
This creates a Connection in the app namespace that references a MariaDB in the mariadb namespace.
MaxScale connections
Connection resources can reference MaxScale instances using maxScaleRef:
When referencing a MaxScale, the operator uses the MaxScale Service and its listener port. The health check will consume connections from the MaxScale connection pool.
External MariaDB connections
Connection resources can reference ExternalMariaDB instances by specifying kind: ExternalMariaDB in the mariaDbRef:
This is useful for generating connection strings to external MariaDB instances running outside of Kubernetes.
Health checking
The healthCheck field configures periodic health checks to verify database connectivity:
interval: How often to perform health checks (default: 30s)
retryInterval: How often to retry after a failed health check (default: 3s)
The Connection status reflects the health check results, allowing you to monitor connectivity issues through Kubernetes.
apiVersion: enterprise.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb-galera
spec:
# Tune your liveness probe accordingly to avoid Pod restarts.
livenessProbe:
periodSeconds: 10
timeoutSeconds: 5
# Tune your readiness probe accordingly to prevent disruptions in network traffic.
readinessProbe:
periodSeconds: 10
timeoutSeconds: 5
# Tune your startup probe accordingly to ensure that the SST completes with a large amount of data.
# failureThreshold × periodSeconds = 30 × 10 = 300s = 5m until the container gets restarted if unhealthy
startupProbe:
failureThreshold: 30
periodSeconds: 10
timeoutSeconds: 5
If you don't see a command prompt, try pressing enter.
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 95
Server version: 11.4.7-4-MariaDB-enterprise MariaDB Enterprise Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>
SELECT * from information_schema.INNODB_TABLESPACES_ENCRYPTION;
The MariaDB pam plugin facilitates user authentication by interfacing with the Pluggable Authentication Modules (PAM) framework, enabling diverse and centralized authentication schemes.
Currently the enterprise operator utilizes this plugin to provide support for:
LDAP based authentication
LDAP
This guide outlines the process of configuring MariaDB to authenticate users against an LDAP or Active Directory service. The integration is achieved by using MariaDB's Pluggable Authentication Module (PAM) plugin, which delegates authentication requests to the underlying Linux PAM framework.
How Does It Work?
To enable LDAP authentication for MariaDB through PAM, several components work in tandem:
PAM (Pluggable Authentication Modules): A framework used by Linux and other UNIX-like systems to consolidate authentication tasks. Applications like MariaDB can use PAM to authenticate users without needing to understand the underlying authentication mechanism. Operations such as system login, screen unlocking, and sudo access commonly use PAM.
nss-pam-ldapd: This is the software package that provides the necessary bridge between PAM and an LDAP server. It includes the core components required for authentication.
pam_ldap.so: A specific PAM module, provided by the nss-pam-ldapd package. This module is the "plug-in" that the PAM framework loads to handle authentication requests destined for an LDAP server.
The nslcd daemon is ran as a sidecar container and communication happens through the shared unix socket, following container best practices of keeping a single process per container.
What is needed for LDAP Auth?
nslcd is configured with 2 files. nslcd.conf which tells the daemon about the LDAP server and nsswitch.conf, determine the sources from which to obtain name-service information.
nslcd can be configured to run as a specific user based on the uid and gid props specified in the config file, however that user should have sufficient permissions to read/write to /var/run/nslcd, should own both nslcd.conf and nsswitch.conf and they should not be too open (0600).
Both of these configuration files will be attached later on in the example given.
nslcd.conf
The /etc/nslcd.conf is the configuration file for LDAP nameservice daemon.
In a production environment it is recommended to use LDAPS (LDAP secure), which uses traditional TLS encryption to secure data in transit. To do so, you need to add the following to your nslcd.conf file:
nsswitch.conf
The Name Service Switch (NSS) configuration file, located at /etc/nsswitch.conf. It is used by the GNU C Library and certain other applications to determine the sources from which to obtain name-service information in a range of categories, and in what order. Each category of information is identified by a database name.
Installing The PAM Plugin
The pam plugin is not enabled by default (even though it is installed). To enable it, you should add the following lines to your MariaDB Custom Resource:
See below for a complete example.
Combining It All Together
Fistly, we need to create our ConfigMaps and Secrets, that will store the nsswitch.conf, nslcd.conf and the mariadb pam module.
Make sure to adapt the nslcd-conf as per your ldap server configuration.
mariadb-nss-config.yaml:
kubectl apply -f mariadb-nss-config.yaml
Now that our configuration is done, we need to create the MariaDB custom resource along with needed configurations.
mariadb.yaml:
kubectl apply -f mariadb.yaml
And in the end we need to create our user in the database, which must have the same name as a user in ldap server. In the example below that's ldap-user. We also create mariadb-ldap secret, which holds the name of the plugin we are using as well as the module we need to load.
mariadb-user.yaml:
kubectl apply -f mariadb-user.yaml
After a few seconds, the user should have been created by the operator. To verify that all is working as expected, modify the <password> field below and run:
You should see something along the lines of:
LDAPS
If you followed the instructions for setting up a basic MariaDB instance with ldap, you need to fetch the public certificate that your LDAP server is set up with and add it to a called mariadb-ldap-tls.
If you have the certificate locally in a file called tls.crt you can run:
Known Issues
Slow Start On KIND
This may be a problem with the maximum number of file-handles a process can allocate. Some systems have this value set to really high, which causes an issue. To remedy this, you need to delete your kind cluster and run:
nslcd (Name Service Lookup Daemon): This daemon acts as an intermediary service. The pam_ldap.so module does not communicate directly with the LDAP server. Instead, it forwards authentication requests to the nslcd daemon, which manages the connection and communication with the LDAP directory. This design allows for connection caching and a more robust separation of concerns.
# /etc/nslcd.conf: Configuration file for nslcd(8)
# The user/group nslcd will run as. Note that these should not be LDAP users.
# required to be `mysql`
uid mysql
# required to be `mysql`
gid mysql
# The location of the LDAP server.
uri ldap://openldap-service.default.svc.cluster.local:389
# The search base that will be used for all queries.
base dc=openldap-service,dc=default,dc=svc,dc=cluster,dc=local
# The distinguished name with which to bind to the directory server for lookups.
# This is a service account used by the daemon.
binddn cn=admin,dc=openldap-service,dc=default,dc=svc,dc=cluster,dc=local
bindpw PASSWORD_REPLACE-ME
# Change the protocol to `ldaps`
+uri ldaps://openldap-service.default.svc.cluster.local:636
-uri ldap://openldap-service.default.svc.cluster.local:389
# ...
+tls_reqcert demand # Look at: https://linux.die.net/man/5/ldap.conf then search for TLS_REQCERT
+tls_cacertfile /etc/openldap/certs/tls.crt # You will need to mount this certificate (from a secret) later
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: mariadb-nslcd-secret
stringData:
nslcd.conf: |
# /etc/nslcd.conf: Configuration file for nslcd(8)
# The user/group nslcd will run as. Note that these should not be LDAP users.
uid mysql # required to be `mysql`
gid mysql # required to be `mysql`
# The location of the LDAP server.
uri ldap://openldap-service.default.svc.cluster.local:389
# The search base that will be used for all queries.
base dc=openldap-service,dc=default,dc=svc,dc=cluster,dc=local
# The distinguished name with which to bind to the directory server for lookups.
# This is a service account used by the daemon.
binddn cn=admin,dc=openldap-service,dc=default,dc=svc,dc=cluster,dc=local
bindpw PASSWORD_REPLACE-ME
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mariadb-nsswitch-configmap
labels:
enterprise.mariadb.com/watch: ""
data:
nsswitch.conf: |
passwd: files ldap
group: files ldap
shadow: files ldap
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mariadb-pam-configmap
labels:
enterprise.mariadb.com/watch: ""
data:
mariadb: |
# This is needed to tell PAM to use pam_ldap.so
auth required pam_ldap.so
account required pam_ldap.so
---
apiVersion: v1
kind: Secret
metadata:
name: mariadb # Used to hold the mariadb and root user passwords
labels:
enterprise.mariadb.com/watch: ""
stringData:
password: MariaDB11!
root-password: MariaDB11!
---
apiVersion: enterprise.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb
spec:
rootPasswordSecretKeyRef:
name: mariadb
key: root-password
username: mariadb
passwordSecretKeyRef:
name: mariadb
key: password
generate: true
database: mariadb
port: 3306
storage:
size: 1Gi
service:
type: LoadBalancer
metadata:
annotations:
metallb.universe.tf/loadBalancerIPs: 172.18.0.20
myCnf: |
[mariadb]
bind-address=*
default_storage_engine=InnoDB
binlog_format=row
innodb_autoinc_lock_mode=2
innodb_buffer_pool_size=800M
max_allowed_packet=256M
plugin_load_add = auth_pam # Load auth plugin
resources:
requests:
cpu: 1
memory: 128Mi
limits:
memory: 1Gi
metrics:
enabled: true
volumes: # Attach `nslcd.conf`, `nsswitch.conf` and `mariadb` (pam). Also add an emptyDir volume for `nslcd` socket
- name: nslcd
secret:
secretName: mariadb-nslcd-secret
defaultMode: 0600
- name: nsswitch
configMap:
name: mariadb-nsswitch-configmap
defaultMode: 0600
- name: mariadb-pam
configMap:
name: mariadb-pam-configmap
defaultMode: 0600
- name: nslcd-run
emptyDir: {}
sidecarContainers:
# The `nslcd` daemon is ran as a sidecar container
- name: nslcd
image: docker.mariadb.com/nslcd:0.9.10-13
volumeMounts:
- name: nslcd
mountPath: /etc/nslcd.conf
subPath: nslcd.conf
- name: nsswitch
mountPath: /etc/nsswitch.conf
subPath: nsswitch.conf
# nslcd-run is missing because volumeMounts from main container are shared with sidecar
volumeMounts:
- name: mariadb-pam
mountPath: /etc/pam.d/mariadb
subPath: mariadb
- name: nslcd-run
mountPath: /var/run/nslcd
---
apiVersion: v1
kind: Secret
metadata:
name: mariadb-ldap
stringData:
plugin: pam # name of the plugin, must be `pam`
pamModule: mariadb # This is the name of the pam config file placed in `/etc/pam.d/`
---
apiVersion: enterprise.mariadb.com/v1alpha1
kind: User
metadata:
name: ldap-user # This user must exist already in your ldap server.
spec:
mariaDbRef:
name: mariadb
host: "%" # Don't specify the ldap host here. Keep this as is
passwordPlugin:
pluginNameSecretKeyRef:
name: mariadb-ldap
key: plugin
pluginArgSecretKeyRef:
name: mariadb-ldap
key: pamModule
cleanupPolicy: Delete
requeueInterval: 10h
retryInterval: 30s
If you don't see a command prompt, try pressing enter.
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 95
Server version: 11.4.7-4-MariaDB-enterprise MariaDB Enterprise Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>
A physical backup is a snapshot of the entire data directory (/var/lib/mysql), including all data files. This type of backup captures the exact state of the database at a specific point in time, allowing for quick restoration in case of data loss or corruption.
Physical backups are the recommended method for backing up MariaDB databases, especially in production environments, as they are faster and more efficient than logical backups.
Backup strategies
Multiple strategies are available for performing physical backups, including:
mariadb-backup: Taken using the enterprise version of , specifically , which is available in the MariaDB enterprise images. The operator supports scheduling Jobs to perform backups using this utility.
Kubernetes VolumeSnapshot: Leverage to create snapshots of the persistent volumes used by the MariaDBPods. This method relies on a compatible CSI (Container Storage Interface) driver that supports volume snapshots. See the section for more details.
In order to use VolumeSnapshots, you will need to provide a VolumeSnapshotClass that is compatible with your storage provider. The operator will use this class to create snapshots of the persistent volumes:
For the rest of compatible , the mariadb-backup CLI will be used to perform the backup. For instance, to use S3 as backup storage:
Storage types
Multiple storage types are supported for storing physical backups, including:
S3 compatible storage: Store backups in a S3 compatible storage, such as or .
Persistent Volume Claims (PVC): Use any of the available in your Kubernetes cluster to create a PersistentVolumeClaim (PVC) for storing backups.
Kubernetes Volumes: Store backups in any of the supported by Kubernetes out of the box, such as NFS.
Scheduling
Physical backups can be scheduled using the spec.schedule field in the PhysicalBackup resource. The schedule is defined using a and allows you to specify how often backups should be taken:
If you want to immediately trigger a backup after creating the PhysicalBackup resource, you can set the immediate field to true. This will create a backup immediately, regardless of the schedule.
If you want to suspend the schedule, you can set the suspend field to true. This will prevent any new backups from being created until the PhysicalBackup is resumed.
It is very important to note that, by default, backups will only be scheduled if the referred MariaDB resource is in ready state. You can override this behavior by setting mariaDbRef.waitForIt=false which will allow backups to be scheduled even if the MariaDB resource is not ready.
Compression
When using physical backups based on mariadb-backup, you are able to choose the compression algorithm used to compress the backup files. The available options are:
bzip2: Good compression ratio, but slower compression/decompression speed compared to gzip.
gzip: Good compression/decompression speed, but worse compression ratio compared to bzip2.
none: No compression.
To specify the compression algorithm, you can use the compression field in the PhysicalBackup resource:
compression is defaulted to none by the operator.
Server-Side Encryption with Customer-Provided Keys (SSE-C)
You can enable server-side encryption using your own encryption key (SSE-C) by providing a reference to a Secret containing a 32-byte (256-bit) key encoded in base64:
When using SSE-C, you are responsible for managing and securely storing the encryption key. If you lose the key, you will not be able to decrypt your backups. Ensure you have proper key management procedures in place.
When restoring from SSE-C encrypted backups via bootstrapFrom, the same key must be provided in the S3 configuration.
Retention policy
You can define a retention policy both for backups based on mariadb-backup and for VolumeSnapshots. The retention policy allows you to specify how long backups should be retained before they are automatically deleted. This can be defined via the maxRetention field in the PhysicalBackup resource:
When using physical backups based on mariadb-backup, the operator will automatically delete backups files in the specified storage older than the retention period. The cleanup process will be performed after each successful backup.
When using VolumeSnapshots, the operator will automatically delete the VolumeSnapshot resources older than the retention period using the Kubernetes API. The cleanup process will be performed after a VolumeSnapshot is successfully created.
Target policy
You can define a target policy both for backups based on mariadb-backup and for VolumeSnapshots. The target policy allows you to specify in which Pod the backup should be taken. This can be defined via the target field in the PhysicalBackup resource:
The following target policies are available:
Replica: The backup will be taken in a ready replica. If no ready replicas are available, the backup will not be scheduled.
PreferReplica: The backup will be taken in a ready replica if available, otherwise it will be taken in the primary Pod.
When using the PreferReplica target policy, you may be willing to schedule the backups even if the MariaDB resource is not ready. In this case, you can set mariaDbRef.waitForIt=false to allow scheduling the backup even if no replicas are available.
Restoration
Physical backups can only be restored in brand new MariaDB instances without any existing data. This means that you cannot restore a physical backup into an existing MariaDB instance that already has data.
To perform a restoration, you can specify a PhysicalBackup as restoration source under the spec.bootstrapFrom field in the MariaDB resource:
This will take into account the backup strategy and storage type used in the PhysicalBackup, and it will perform the restoration accordingly.
As an alternative, you can also provide a reference to an S3 bucket that was previously used to store the physical backup files:
It is important to note that the backupContentType field must be set to Physical when restoring from a physical backup. This ensures that the operator uses the correct restoration method.
To restore a VolumeSnapshot, you can provide a reference to a specific VolumeSnapshot resource in the spec.bootstrapFrom field:
Target recovery time
By default, the operator will match the closest backup available to the current time. You can specify a different target recovery time by using the targetRecoveryTime field in the PhysicalBackup resource. This lets you define the exact point in time you want to restore to:
Timeout
By default, both backups based on mariadb-backup and VolumeSnapshots will have a timeout of 1 hour. You can change this timeout by using the timeout field in the PhysicalBackup resource:
When timed out, the operator will delete the Jobs or VolumeSnapshots resources associated with the PhysicalBackup resource. The operator will create new Jobs or VolumeSnapshots to retry the backup operation if the PhysicalBackup resource is still scheduled.
Log level
When taking backups based on mariadb-backup, you can specify the log level to be used by the mariadb-enterprise-operator container using the logLevel field in the PhysicalBackup resource:
Extra options
When taking backups based on mariadb-backup, you can specify extra options to be passed to the mariadb-backup command using the args field in the PhysicalBackup resource:
Refer to the for a list of available options.
S3 credentials
Credentials for accessing an S3 compatible storage can be provided via the s3 key in the storage field of the PhysicalBackup resource. The credentials can be provided as a reference to a Kubernetes Secret:
Alternatively, if you are running in EKS, you can use dynamic credentials from an EKS Service Account using EKS Pod Identity or IRSA:
By leaving out the accessKeyIdSecretKeyRef and secretAccessKeySecretKeyRef credentials and pointing to the correct serviceAccountName, the backup Job will use the dynamic credentials from EKS.
Staging area
S3 backups based on mariadb-backup are the only scenario that requires a staging area.
When using S3 storage for backups, a staging area is used for keeping the external backups while they are being processed. By default, this staging area is an emptyDir volume, which means that the backups are temporarily stored in the node's local storage where the PhysicalBackupJob is scheduled. In production environments, large backups may lead to issues if the node doesn't have sufficient space, potentially causing the backup/restore process to fail.
Additionally, when restoring these backups, the operator will pull the backup files from S3, uncompress them if needded, and restore them to each of the MariaDBPods in the cluster individually. To save network bandwidth and compute resources, a staging area is used to keep the uncompressed backup files after they have been restored to the first MariaDBPod. This allows the operator to restore the same backup to the rest of MariaDBPods seamlessly, without needing to pull and uncompress the backup again.
To configure the staging area, you can use the stagingStorage field in the PhysicalBackup resource:
Similarly, you may also use a staging area when , in the MariaDB resource:
In the examples above, a PVC with the default StorageClass will be provisioned to be used as staging area.
VolumeSnapshots
Before using this feature, ensure that you meet the following prerequisites :
and its CRs are installed in the cluster.
The operator is capable of creating of the PVCs used by the MariaDBPods. This allows you to create point-in-time snapshots of your data in a Kubernetes-native way, leveraging the capabilities of your storage provider.
Most of the fields described in this documentation apply to VolumeSnapshots, including scheduling, retention policy, and compression. The main difference with the mariadb-backup based backups is that the operator will not create a Job to perform the backup, but instead it will create a VolumeSnapshot resource directly.
In order to create consistent, point-in-time snapshots of the MariaDB data, the operator will perform the following steps:
Execute a BACKUP STAGE START statement followed by BACKUP STAGE BLOCK_COMMIT in one of the secondary Pods.
Create a VolumeSnapshot resource of the data PVC mounted by the MariaDB secondary Pod.
This backup process is described in the and is designed to be .
Non-blocking physical backups
Both for mariadb-backup and VolumeSnapshot , the enterprise operator performs non-blocking physical backups by leveraging the . This implies that the backups are taken without long read locks, enabling consistent, production-grade backups with minimal impact on running workloads, ideal for high-availability and performance-sensitive environments.
Important considerations and limitations
Root credentials
When restoring a backup, the root credentials specified through the spec.rootPasswordSecretKeyRef field in the MariaDB resource must match the ones in the backup. These credentials are utilized by the liveness and readiness probes, and if they are invalid, the probes will fail, causing your MariaDBPods to restart after the backup restoration.
Restore Job
When using backups based on mariadb-backup, restoring and uncompressing large backups can consume significant compute resources and may cause restoration Jobs to become stuck due to insufficient resources. To prevent this, you can define the compute resources allocated to the Job:
ReadWriteOncePod access mode partially supported
When using backups based on mariadb-backup, the data PVC used by the MariaDBPod cannot use the access mode, as it needs to be mounted at the same time by both the MariaDBPod and the PhysicalBackupJob. In this case, please use either the ReadWriteOnce or ReadWriteMany access modes instead.
Alternatively, if you want to keep using the ReadWriteOncePod access mode, you must use backups based on VolumeSnapshots, which do not require creating a Job to perform the backup and therefore avoid the volume sharing limitation.
PhysicalBackupJobs scheduling
PhysicalBackupJobs must mount the data PVC used by one of the secondary MariaDBPods. To avoid scheduling issues caused by the commonly used ReadWriteOnce access mode, the operator schedules backup Jobs on the same node as MariaDB by default.
If you prefer to disable this behavior and allow Jobs to run on any node, you can set podAffinity=false:
This configuration may be suitable when using the ReadWriteMany access mode, which allows multiple Pods across different nodes to mount the volume simultaneously.
Troubleshooting
Custom columns are used to display the status of the PhysicalBackup resource:
To get a higher level of detail, you can also check the status field directly:
You may also check the related events for the PhysicalBackup resource to see if there are any issues:
In some situations, when using the mariadb-backup strategy, you may encounter the following error in the backup Job logs:
This can be addressed by increasing the innodb_log_file_size in the MariaDB configuration. You can do this by adding the following to your MariaDB resource:
Refer to for further details on this issue.
mariadb-backupJob fails to start because the Pod cannot mount MariaDB PVC created with StorageClass provider
Without explicitly enabled the ReadWriteOnce access mode is treated as ReadWriteOncePod.
A logical backup is a backup that contains the logical structure of the database, such as tables, indexes, and data, rather than the physical storage format. It is created using , which generates SQL statements that can be used to recreate the database schema and populate it with data.
Logical backups serve not just as a source of restoration, but also enable data mobility between MariaDB instances. These backups are called "logical" because they are independent from the MariaDB topology, as they only contain DDLs and INSERT
Synchronous Multi-Master With Galera
MariaDB Enterprise Kubernetes Operator provides cloud native support for provisioning and operating multi-master MariaDB clusters using Galera. This setup enables the ability to perform writes on a single node and reads in all nodes, enhancing availability and allowing scalability across multiple nodes.
In certain circumstances, it could be the case that all the nodes of your cluster go down at the same time, something that Galera is not able to recover by itself, and it requires manual action to bring the cluster up again, as documented in the . The MariaDB Enterprise Kubernetes Operator encapsulates this operational expertise in the MariaDB CR. You just need to declaratively specify spec.galera, as explained in more detail .
To accomplish this, after the MariaDB cluster has been provisioned, the operator will regularly monitor the cluster's status to make sure it is healthy. If any issues are detected, the operator will initiate the to restore the cluster to a healthy state. During this process, the operator will set status conditions in the MariaDB and emit Events
Kubernetes VolumeSnapshots: Use Kubernetes VolumeSnapshots to create snapshots of the persistent volumes used by the MariaDBPods. This method relies on a compatible CSI (Container Storage Interface) driver that supports volume snapshots. See the VolumeSnapshots section for more details.
You have a compatible CSI driver that supports VolumeSnapshots installed in the cluster.
You have a VolumeSnapshotClass configured configured for your CSI driver.
Wait until the VolumeSnapshot is provisioned by the storage system. When timing out, the operator will delete the VolumeSnapshot resource and retry the operation.
Although logical backups are a great fit for data mobility and migrations, they are not as efficient as physical backups for large databases. For this reason, physical backups are the recommended method for backing up MariaDB databases, especially in production environments.
Storage types
Currently, the following storage types are supported:
S3 compatible storage: Store backups in a S3 compatible storage, such as AWS S3 or Minio.
PVCs: Use the available StorageClasses in your Kubernetes cluster to provision a PVC dedicated to store the backup files.
Kubernetes volumes: Use any of the volume types supported natively by Kubernetes.
Our recommendation is to store the backups externally in a S3 compatible storage.
Backup CR
You can take a one-time backup of your MariaDB instance by declaring the following resource:
This will use the default StorageClass to provision a PVC that would hold the backup files, but ideally you should use a S3 compatible storage:
By providing the authentication details and the TLS configuration via references to Secret keys, this example will store the backups in a local Minio instance.
Alternatively you can use dynamic credentials from an EKS Service Account using EKS Pod Identity or IRSA:
By leaving out the accessKeyIdSecretKeyRef and secretAccessKeySecretKeyRef credentials and pointing to the correct serviceAccountName, the backup Job will use the dynamic credentials from EKS.
Scheduling
To minimize the Recovery Point Objective (RPO) and mitigate the risk of data loss, it is recommended to perform backups regularly. You can do so by providing a spec.schedule in your Backup resource:
This resource gets reconciled into a CronJob that periodically takes the backups.
It is important to note that regularly scheduled Backups complement very well the target recovery time feature detailed below.
Retention policy
Given that the backups can consume a substantial amount of storage, it is crucial to define your retention policy by providing the spec.maxRetention field in your Backup resource:
Compression
You are able to compress backups by providing the compression algorithm you want to use in the spec.compression field:
Currently the following compression algorithms are supported:
bzip2: Good compression ratio, but slower compression/decompression speed compared to gzip.
gzip: Good compression/decompression speed, but worse compression ratio compared to bzip2.
none: No compression.
compression is defaulted to none by the operator.
Server-Side Encryption with Customer-Provided Keys (SSE-C)
You can enable server-side encryption using your own encryption key (SSE-C) by providing a reference to a Secret containing a 32-byte (256-bit) key encoded in base64:
When using SSE-C, you are responsible for managing and securely storing the encryption key. If you lose the key, you will not be able to decrypt your backups. Ensure you have proper key management procedures in place.
When restoring from SSE-C encrypted backups, the same key must be provided in the Restore CR or bootstrapFrom configuration.
Restore CR
You can easily restore a Backup in your MariaDB instance by creating the following resource:
This will trigger a Job that will mount the same storage as the Backup and apply the dump to your MariaDB database.
Nevertheless, the Restore resource doesn't necessarily need to specify a spec.backupRef, you can point to other storage source that contains backup files, for example a S3 bucket:
Target recovery time
If you have multiple backups available, specially after configuring a scheduled Backup, the operator is able to infer which backup to restore based on the spec.targetRecoveryTime field.
The operator will look for the closest backup available and utilize it to restore your MariaDB instance.
By default, spec.targetRecoveryTime will be set to the current time, which means that the latest available backup will be used.
Bootstrap new MariaDB instances
To minimize your Recovery Time Objective (RTO) and to switfly spin up new clusters from existing Backups, you can provide a Restore source directly in the MariaDB object via the spec.bootstrapFrom field:
As in the Restore resource, you don't strictly need to specify a reference to a Backup, you can provide other storage types that contain backup files:
Under the hood, the operator creates a Restore object just after the MariaDB resource becomes ready. The advantage of using spec.bootstrapFrom over a standalone Restore is that the MariaDB is bootstrap-aware and this will allow the operator to hold primary switchover/failover operations until the restoration is finished.
Backup and restore specific databases
By default, all the logical databases are backed up when a Backup is created, but you may also select specific databases by providing the databases field:
When it comes to restore, all the databases available in the backup will be restored, but you may also choose a single database to be restored via the database field available in the Restore resource:
There are a couple of points to consider here:
The referred database (db1 in the example) must previously exist for the Restore to succeed.
The mariadb CLI invoked by the operator under the hood only supports selecting a single database to restore via the --one-database option, restoration of multiple specific databases is not supported.
Extra options
Not all the flags supported by mariadb-dump and mariadb have their counterpart field in the Backup and Restore CRs respectively, but you may pass extra options by using the args field. For example, setting the --verbose flag can be helpful to track the progress of backup and restore operations:
Refer to the mariadb-dump and mariadb CLI options in the reference section.
Staging area
S3 is the only storage type that supports a staging area.
When using S3 storage for backups, a staging area is used for keeping the external backups while they are being processed. By default, this staging area is an emptyDir volume, which means that the backups are temporarily stored in the node's local storage where the Backup/RestoreJob is scheduled. In production environments, large backups may lead to issues if the node doesn't have sufficient space, potentially causing the backup/restore process to fail.
To overcome this limitation, you are able to define your own staging area by setting the stagingStorage field to both the Backup and Restore CRs:
In the examples above, a PVC with the default StorageClass will be used as staging area. Refer to the API reference for more configuration options.
When restoring a backup, the root credentials specified through the spec.rootPasswordSecretKeyRef field in the MariaDB resource must match the ones in the backup. These credentials are utilized by the liveness and readiness probes, and if they are invalid, the probes will fail, causing your MariaDBPods to restart after the backup restoration.
Restore job
Restoring large backups can consume significant compute resources and may cause RestoreJobs to become stuck due to insufficient resources. To prevent this, you can define the compute resources allocated to the Job:
Galera backup limitations
mysql.global_priv
Galera only replicates the tables with InnoDB engine, see the Galera docs.
Something that does not include mysql.global_priv, the table used to store users and grants, which uses the MyISAM engine. This basically means that a Galera instance with mysql.global_priv populated will not replicate this data to an empty Galera instance. However, DDL statements (CREATE USER, ALTER USER ...) will be replicated.
Taking this into account, if we think now about a restore scenario where:
The backup file includes a DROP TABLE statement for the mysql.global_priv table.
The backup has some INSERT statements for the mysql.global_priv table.
The Galera cluster has 3 nodes: galera-0, galera-1 and galera-2.
The backup is restored in galera-0.
This is what will happen under the scenes while restoring the backup:
The DROP TABLE statement is a DDL so it will be executed in galera-0, galera-1 and galera-2.
The INSERT statements are not DDLs, so they will only be applied to galera-0.
This results in the galera-1 and galera-2 not having the mysql.global_priv table.
After the backup is fully restored, the liveness and readiness probes will kick in, they will succeed in galera-0, but they will fail in galera-1 and galera-2, as they rely in the root credentials available in mysql.global_priv, resulting in the galera-1 and galera-2 getting restarted.
To address this issue, when backing up MariaDB instances with Galera enabled, the mysql.global_priv table will be excluded from backups by using the --ignore-table option with mariadb-dump. This prevents the replication of the DROP TABLE statement for the mysql.global_priv table. You can opt-out from this feature by setting spec.ignoreGlobalPriv=false in the Backup resource.
Also, to avoid situations where mysql.global_priv is unreplicated, all the entries in that table must be managed via DDLs. This is the recommended approach suggested in the Galera docs. There are a couple of ways that we can guarantee this:
Use the rootPasswordSecretKeyRef, username and passwordSecretKeyRef fields of the MariaDB CR to create the root and initial user respectively. This fields will be translated into DDLs by the image entrypoint.
For this reason, the operator automatically adds the --skip-add-locks option to the Backup to overcome this limitation.
Migrations using logical backups
Migrating an external MariaDB to a MariaDB running in Kubernetes
You can leverage logical backups to bring your external MariaDB data into a new MariaDB instance running in Kubernetes. Follow this runbook for doing so:
Take a logical backup of your external MariaDB using one of the commands below:
If you are using Galera or planning to migrate to a Galera instance, make sure you understand the Galera backup limitations and use the following command instead:
Ensure that your backup file is named in the following format: backup.2024-08-26T12:24:34Z.sql. If the file name does not follow this format, it will be ignored by the operator.
Upload the backup file to one of the supported storage types. We recommend using S3.
If you are using Galera in your new instance, migrate your previous users and grants to use the User and Grant CRs. Refer to the SQL resource documentation for further detail.
Migrating to a MariaDB with different topology
Database mobility between MariaDB instances with different topologies is possible with logical backups. However, there are a couple of technical details that you need to be aware of in the following scenarios:
Migrating between standalone and replicated MariaDBs
This should be fully compatible, no issues have been detected.
Migrating from standalone/replicated to Galera MariaDBs
There are a couple of limitations regarding the backups in Galera, please make sure you read the Galera backup limitations section before proceeding.
To overcome this limitations, the Backup in the standalone/replicated instance needs to be taken with spec.ignoreGlobalPriv=true. In the following example, we are backing up a standalone MariaDB (single instance):
Once the previous Backup is completed, we will be able bootstrap a new Galera instance from it:
After doing so, ensure that your backup does not contain a DROP TABLE mysql.global_priv; statement, as it will make your liveness and readiness probes to fail after the backup restoration.
so you have a better understanding of the recovery progress and the underlying activities being performed. For example, you may want to know which
Pods
were out of sync to further investigate infrastructure-related issues (i.e. networking, storage...) on the nodes where these
Pods
were scheduled.
MariaDB configuration
The easiest way to get a MariaDB Galera cluster up and running is setting spec.galera.enabled = true:
This relies on sensible defaults set by the operator, which may not be suitable for your Kubernetes cluster. This can be solved by overriding the defaults, so you have fine-grained control over the Galera configuration.
Refer to the API reference to better understand the purpose of each field.
Storage
By default, the operator provisions two PVCs for running Galera:
Storage PVC: Used to back the MariaDB data directory, mounted at /var/lib/mysql.
Config PVC: Where the Galera config files are located, mounted at /etc/mysql/conf.d.
However, you are also able to use just one PVC for keeping both the data and the config files:
Wsrep provider
You are able to pass extra options to the Galera wsrep provider by using the galera.providerOptions field:
It is important to note that, the ist.recv_addr cannot be set by the user, as it is automatically configured to the Pod IP by the operator, something that an user won't be able to know beforehand.
If you have a Kubernetes cluster running with IPv6, the operator will automatically detect the IPv6 addresses of your Pods and it will configure several wsrep provider options to ensure that the Galera protocol runs smoothly with IPv6.
Galera cluster recovery
MariaDB Enterprise Kubernetes Operator monitors the Galera cluster and acts accordinly to recover it if needed. This feature is enabled by default, but you may tune it as you need:
The minClusterSize field indicates the minimum cluster size (either absolut number of replicas or percentage) for the operator to consider the cluster healthy. If the cluster is unhealthy for more than the period defined in clusterHealthyTimeout (30s by default), a cluster recovery process is initiated by the operator. The process is explained in the Galera documentation and consists of the following steps:
Recover the sequence number from the grastate.dat on each node.
Trigger a recovery Job to obtain the sequence numbers in case that the previous step didn't manage to.
Mark the node with highest sequence (bootstrap node) as safe to bootstrap.
Bootstrap a new cluster in the bootstrap node.
Restart and wait until the bootstrap node becomes ready.
Restart the rest of the nodes one by one so they can join the new cluster.
The operator monitors the Galera cluster health periodically and performs the cluster recovery described above if needed. You are able to tune the monitoring interval via the clusterMonitorInterval field.
Refer to the API reference to better understand the purpose of each field.
Galera recovery Job
During the recovery process, a Job is triggered for each MariaDBPod to obtain the sequence numbers. It's crucial for this Job to succeed; otherwise, the recovery process will fail. As a user, you are responsible for adjusting this Job to allocate sufficient resources and provide the necessary metadata to ensure its successful completion.
For example, if you're using a service mesh like Istio, it's important to add the sidecar.istio.io/inject=false label. Without this label, the Job will not complete, which would prevent the recovery process from finishing successfully.
Force cluster bootstrap
Use this option only in exceptional circumstances. Not selecting the Pod with the highest sequence number may result in data loss.
Ensure you unset forceClusterBootstrapInPod after completing the bootstrap to allow the operator to choose the appropriate Pod to bootstrap from in an event of cluster recovery.
You have the ability to manually select which Pod is used to bootstrap a new cluster during the recovery process by setting forceClusterBootstrapInPod:
This should only be used in exceptional circumstances:
You are absolutely certain that the chosen Pod has the highest sequence number.
The operator has not yet selected a Pod to bootstrap from.
You can verify this with the following command:
In this case, assuming that mariadb-galera-2 sequence is lower than 350454, it should be safe to bootstrap from mariadb-galera-0.
Finally, after your cluster has been bootstrapped, remember to unset forceClusterBootstrapInPod to allow the operator to select the appropriate node for bootstrapping in the event of a cluster recovery.
Bootstrap Galera cluster from existing PVCs
MariaDB Enterprise Kubernetes Operator will never delete your MariaDB PVCs. Whenever you delete a MariaDB resource, the PVCs will remain intact so you could reuse them to re-provision a new cluster.
That said, Galera is unable to form a cluster from pre-existing state, it requires a cluster recovery process to identify which Pod has the highest sequence number to bootstrap a new cluster. That's exactly what the operator does: whenever a new MariaDB Galera cluster is created and previously created PVCs exist, a cluster recovery process is automatically triggered.
Quickstart
Apply the following manifests to get started with Galera in Kubernetes:
Next, check the MariaDB status and the resources created by the operator:
Let's now proceed with simulating a Galera cluster failure by deleting all the Pods at the same time:
After some time, we will see the MariaDB entering a non Ready state:
Eventually, the operator will kick in and recover the Galera cluster:
Finally, the MariaDB resource will become Ready and your Galera cluster will be operational again:
Troubleshooting
The aim of this section is showing you how to diagnose your Galera cluster when something goes wrong. In this situations, observability is a key factor to understand the problem, so we recommend following these steps before jumping into debugging the problem.
Inspect MariaDB status conditions.
Make sure network connectivity is fine by checking that you have an Endpoint per Pod in your Galera cluster.
Check the events associated with the MariaDB object, as they provide significant insights for diagnosis, particularly within the context of cluster recovery.
Enable debug logs in mariadb-enterprise-operator.
Get the logs of all the MariaDBPod containers, not only of the main mariadb container but also the agent and init ones.
Once you are done with these steps, you will have the context required to jump ahead to the Common errors section to see if any of them matches your case.
Common errors
Galera cluster recovery not progressing
If your MariaDB Galera cluster has been in GaleraNotReady state for a long time, the recovery process might not be progressing. You can diagnose this by checking:
Operator logs.
Galera recovery status:
MariaDB events:
If you have Pods named <mariadb-name>-<ordinal>-recovery-<suffix> running for a long time, check its logs to understand if something is wrong.
One of the reasons could be misconfigured Galera recovery Jobs, please make sure you read this section. If after checking all the points above, there are still no clear symptoms of what could be wrong, continue reading.
First af all, you could attempt to forcefully bootstrap a new cluster as it is described in this section. Please, refrain from doing so if the conditions described in the docs are not met.
Alternatively, if you can afford some downtime and your PVCs are in healthy state, you may follow this procedure:
Delete your existing MariaDB, this will leave your PVCs intact.
Create your MariaDB again, this will trigger a Galera recovery process as described in this section.
As a last resource, you can always delete the PVCs and bootstrap a new MariaDB from a backup as documented here.
Permission denied writing Galera configuration
This error occurs when the user that runs the container does not have enough privileges to write in /etc/mysql/mariadb.conf.d:
To mitigate this, by default, the operator sets the following securityContext in the MariaDB's StatefulSet :
This enables the CSIDriver and the kubelet to recursively set the ownership ofr the /etc/mysql/mariadb.conf.d folder to the group 999, which is the one expected by MariaDB. It is important to note that not all the CSIDrivers implementations support this feature, see the CSIDriver documentation for further information.
Unauthorized error disabling bootstrap
This situation occurs when the mariadb-enterprise-operator credentials passed to the agent as authentication are either invalid or the agent is unable to verify them. To confirm this, ensure that both the mariadb-enterprise-operator and the MariaDBServiceAccounts are able to create TokenReview objects:
If that's not the case, check that the following ClusterRole and ClusterRoleBindings are available in your cluster:
mariadb-enterprise-operator:auth-delegator is the ClusterRoleBinding bound to the mariadb-enterprise-operatorServiceAccount which is created by the helm chart, so you can re-install the helm release in order to recreate it:
mariadb-galera:auth-delegator is the ClusterRoleBinding bound to the mariadb-galeraServiceAccount which is created on the flight by the operator as part of the reconciliation logic. You may check the mariadb-enterprise-operator logs to see if there are any issues reconciling it.
Bear in mind that ClusterRoleBindings are cluster-wide resources that are not garbage collected when the MariaDB owner object is deleted, which means that creating and deleting MariaDBs could leave leftovers in your cluster. These leftovers can lead to RBAC misconfigurations, as the ClusterRoleBinding might not be pointing to the right ServiceAccount. To overcome this, you can override the ClusterRoleBinding name setting the spec.galera.agent.kubernetesAuth.authDelegatorRoleName field.
Timeout waiting for Pod to be Synced
This error appears in the mariadb-enterprise-operator logs when a Pod is in non synced state for a duration exceeding the spec.galera.recovery.podRecoveryTimeout. Just after, the operator will restart the Pod.
Increase this timeout if you consider that your Pod may take longer to recover.
Galera cluster bootstrap timed out
This is error is returned by the mariadb-enterprise-operator after exceeding the spec.galera.recovery.clusterBootstrapTimeout when recovering the cluster. At this point, the operator will reset the recovered sequence numbers and start again from a clean state.
Increase this timeout if you consider that your Galera cluster may take longer to recover.
kubectl get events --field-selector involvedObject.name=physicalbackup
LAST SEEN TYPE REASON OBJECT MESSAGE
116s Normal WaitForFirstConsumer persistentvolumeclaim/physicalbackup waiting for first consumer to be created before binding
116s Normal JobScheduled physicalbackup/physicalbackup Job physicalbackup-20250714140837 scheduled
116s Normal ExternalProvisioning persistentvolumeclaim/physicalbackup Waiting for a volume to be created either by the external provisioner 'rancher.io/local-path' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
116s Normal Provisioning persistentvolumeclaim/physicalbackup External provisioner is provisioning volume for claim "default/physicalbackup"
113s Normal ProvisioningSucceeded persistentvolumeclaim/physicalbackup Successfully provisioned volume pvc-7b7c71f9-ea7e-4950-b612-2d41d7ab35b7
mariadb [00] 2025-08-04 09:15:57 Was only able to copy log from 58087 to 59916, not 68968; try increasing
innodb_log_file_size
mariadb mariabackup: Stopping log copying thread.[00] 2025-08-04 09:15:57 Retrying read of log at LSN=59916
kubectl get mariadbs
NAME READY STATUS PRIMARY POD AGE
mariadb-galera True Running mariadb-galera-0 48m
kubectl get events --field-selector involvedObject.name=mariadb-galera --sort-by='.lastTimestamp'
LAST SEEN TYPE REASON OBJECT MESSAGE
...
45m Normal GaleraClusterHealthy mariadb/mariadb-galera Galera cluster is healthy
kubectl get mariadb mariadb-galera -o jsonpath="{.status.conditions[?(@.type=='GaleraReady')]}" | jq
{
"lastTransitionTime": "2023-07-13T18:22:31Z",
"message": "Galera ready",
"reason": "GaleraReady",
"status": "True",
"type": "GaleraReady"
}
kubectl get mariadb mariadb-galera -o jsonpath="{.status.conditions[?(@.type=='GaleraConfigured')]}" | jq
{
"lastTransitionTime": "2023-07-13T18:22:31Z",
"message": "Galera configured",
"reason": "GaleraConfigured",
"status": "True",
"type": "GaleraConfigured"
}
kubectl get statefulsets
NAME READY AGE
mariadb-galera 3/3 58m
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mariadb-galera-0 2/2 Running 0 58m 10.244.2.4 mdb-worker3 <none> <none>
mariadb-galera-1 2/2 Running 0 58m 10.244.1.9 mdb-worker2 <none> <none>
mariadb-galera-2 2/2 Running 0 58m 10.244.5.4 mdb-worker4 <none> <none>
kubectl delete pods -l app.kubernetes.io/instance=mariadb-galera
pod "mariadb-galera-0" deleted
pod "mariadb-galera-1" deleted
pod "mariadb-galera-2" deleted
kubectl get mariadb mariadb-galera
NAME READY STATUS PRIMARY POD AGE
mariadb-galera False Galera not ready mariadb-galera-0 67m
kubectl get events --field-selector involvedObject.name=mariadb-galera --sort-by='.lastTimestamp'
LAST SEEN TYPE REASON OBJECT MESSAGE
...
48s Warning GaleraClusterNotHealthy mariadb/mariadb-galera Galera cluster is not healthy
kubectl get mariadb mariadb-galera -o jsonpath="{.status.conditions[?(@.type=='GaleraReady')]}" | jq
{
"lastTransitionTime": "2023-07-13T19:25:17Z",
"message": "Galera not ready",
"reason": "GaleraNotReady",
"status": "False",
"type": "GaleraReady"
}
kubectl get events --field-selector involvedObject.name=mariadb-galera --sort-by='.lastTimestamp'
LAST SEEN TYPE REASON OBJECT MESSAGE
...
16m Warning GaleraClusterNotHealthy mariadb/mariadb-galera Galera cluster is not healthy
16m Normal GaleraPodStateFetched mariadb/mariadb-galera Galera state fetched in Pod 'mariadb-galera-2'
16m Normal GaleraPodStateFetched mariadb/mariadb-galera Galera state fetched in Pod 'mariadb-galera-1'
16m Normal GaleraPodStateFetched mariadb/mariadb-galera Galera state fetched in Pod 'mariadb-galera-0'
16m Normal GaleraPodRecovered mariadb/mariadb-galera Recovered Galera sequence in Pod 'mariadb-galera-1'
16m Normal GaleraPodRecovered mariadb/mariadb-galera Recovered Galera sequence in Pod 'mariadb-galera-2'
17m Normal GaleraPodRecovered mariadb/mariadb-galera Recovered Galera sequence in Pod 'mariadb-galera-0'
17m Normal GaleraClusterBootstrap mariadb/mariadb-galera Bootstrapping Galera cluster in Pod 'mariadb-galera-2'
20m Normal GaleraClusterHealthy mariadb/mariadb-galera Galera cluster is healthy
kubectl get mariadb mariadb-galera -o jsonpath="{.status.galeraRecovery}" | jq
{
"bootstrap": {
"pod": "mariadb-galera-2",
"time": "2023-07-13T19:25:28Z"
},
"recovered": {
"mariadb-galera-0": {
"seqno": 3,
"uuid": "bf00b9c3-21a9-11ee-984f-9ba9ff0e9285"
},
"mariadb-galera-1": {
"seqno": 3,
"uuid": "bf00b9c3-21a9-11ee-984f-9ba9ff0e9285"
},
"mariadb-galera-2": {
"seqno": 3,
"uuid": "bf00b9c3-21a9-11ee-984f-9ba9ff0e9285"
}
},
"state": {
"mariadb-galera-0": {
"safeToBootstrap": false,
"seqno": -1,
"uuid": "bf00b9c3-21a9-11ee-984f-9ba9ff0e9285",
"version": "2.1"
},
"mariadb-galera-1": {
"safeToBootstrap": false,
"seqno": -1,
"uuid": "bf00b9c3-21a9-11ee-984f-9ba9ff0e9285",
"version": "2.1"
},
"mariadb-galera-2": {
"safeToBootstrap": false,
"seqno": -1,
"uuid": "bf00b9c3-21a9-11ee-984f-9ba9ff0e9285",
"version": "2.1"
}
}
}
kubectl get mariadb mariadb-galera -o jsonpath="{.status.conditions[?(@.type=='GaleraReady')]}" | jq
{
"lastTransitionTime": "2023-07-13T19:27:51Z",
"message": "Galera ready",
"reason": "GaleraReady",
"status": "True",
"type": "GaleraReady"
}
kubectl get mariadb mariadb-galera
NAME READY STATUS PRIMARY POD AGE
mariadb-galera True Running mariadb-galera-0 82m
kubectl get events --field-selector involvedObject.name=mariadb-galera --sort-by='.lastTimestamp'
LAST SEEN TYPE REASON OBJECT MESSAGE
...
16m Warning GaleraClusterNotHealthy mariadb/mariadb-galera Galera cluster is not healthy
16m Normal GaleraPodStateFetched mariadb/mariadb-galera Galera state fetched in Pod 'mariadb-galera-2'
16m Normal GaleraPodStateFetched mariadb/mariadb-galera Galera state fetched in Pod 'mariadb-galera-1'
16m Normal GaleraPodStateFetched mariadb/mariadb-galera Galera state fetched in Pod 'mariadb-galera-0'
16m Normal GaleraPodRecovered mariadb/mariadb-galera Recovered Galera sequence in Pod 'mariadb-galera-1'
16m Normal GaleraPodRecovered mariadb/mariadb-galera Recovered Galera sequence in Pod 'mariadb-galera-2'
17m Normal GaleraPodRecovered mariadb/mariadb-galera Recovered Galera sequence in Pod 'mariadb-galera-0'
17m Normal GaleraClusterBootstrap mariadb/mariadb-galera Bootstrapping Galera cluster in Pod 'mariadb-galera-2'
20m Normal GaleraClusterHealthy mariadb/mariadb-galera Galera cluster is healthy
kubectl get clusterrole system:auth-delegator
NAME CREATED AT
system:auth-delegator 2023-08-03T19:12:37Z
kubectl get clusterrolebinding | grep mariadb | grep auth-delegator
mariadb-galera:auth-delegator ClusterRole/system:auth-delegator 108m
mariadb-enterprise-operator:auth-delegator ClusterRole/system:auth-delegator 112m
Timeout waiting for Pod 'mariadb-galera-2' to be Synced
Galera cluster bootstrap timed out. Resetting recovery status
MaxScale Database Proxy
MaxScale is a sophisticated database proxy, router, and load balancer designed specifically for and by MariaDB. It provides a range of features that ensure optimal high availability:
Query-based routing: Transparently route write queries to the primary nodes and read queries to the replica nodes.
Connection-based routing: Load balance connections between multiple servers.
Automatic primary failover based on MariaDB internals.
Replay pending transactions when a server goes down.
Support for Galera and Replication.
To better understand what MaxScale is capable of you may check the product page and the documentation.
MaxScale resources
Prior to configuring MaxScale within Kubernetes, it's essential to have a basic understanding of the resources managed through its API.
Servers
A server defines the backend database servers that MaxScale forwards traffic to. For more detailed information, please consult the .
Monitors
A monitor is an agent that queries the state of the servers and makes it available to the services in order to route traffic based on it. For more detailed information, please consult the monitor reference.
Depending on which highly available configuration your servers have, you will need to choose between the following modules:
Galera Monitor: Detects whether servers are part of the cluster, ensuring synchronization among them, and assigning primary and replica roles as needed.
MariaDB Monitor: Probes the state of the cluster, assigns roles to the servers, and executes failover, switchover, and rejoin operations as necessary.
Services
A service defines how the traffic is routed to the servers based on a routing algorithm that takes into account the state of the servers and its role. For more detailed information, please consult the .
Depending on your requirements to route traffic, you may choose between the following routers:
Readwritesplit: Route write queries to the primary server and read queries to the replica servers.
Readconnroute: Load balance connections between multiple servers.
Listeners
A listener specifies a port where MaxScale listens for incoming connections. It is associated with a service that handles the requests received on that port. For more detailed information, please consult the .
MaxScale CR
The minimal spec you need to provision a MaxScale instance is just a reference to a MariaDB resource:
This will provision a new StatefulSet for running MaxScale and configure the servers specified by the MariaDB resource. Refer to the Server configuration section if you want to manually configure the MariaDB servers.
The rest of the configuration uses reasonable defaults set automatically by the operator. If you need a more fine grained configuration, you can provide this values yourself:
As you can see, the MaxScale resources we previously mentioned have a counterpart resource in the MaxScale CR.
The previous example configured a MaxScale for a Galera cluster, but you may also configure MaxScale with a MariaDB that uses replication. It is important to note that the monitor module is automatically inferred by the operator based on the MariaDB reference you provided, however, its parameters are specific to each monitor module:
You also need to set a reference in the MariaDB resource to make it MaxScale-aware. This is explained in the MariaDB CR section.
You can set a spec.maxScaleRef in your MariaDB resource to make it MaxScale-aware. By doing so, the primary server reported by MaxScale will be used in MariaDB and the high availability tasks such the primary failover will be delegated to MaxScale:
To streamline the setup outlined in the MaxScale CR and MariaDB CR sections, you can provision a MaxScale to be used with MariaDB in just one resource:
This will automatically set the references between MariaDB and MaxScale and default the rest of the fields.
It is important to note that, this is intended for simple use cases that only require a single replica and where no further modifications are done on the spec.maxscale field. If you need a more fine grained configuration and perform further updates to the MaxScale resource, please use a dedicated MaxScale as described in the MaxScale CR section.
MariaDB Enterprise Kubernetes Operator aims to provide highly configurable CRs, but at the same time maximize its usability by providing reasonable defaults. In the case of MaxScale, the following defaulting logic is applied:
spec.servers are inferred from spec.mariaDbRef.
spec.monitor.module is inferred from the spec.mariaDbRef.
spec.monitor.cooperativeMonitoring is set if is enabled.
If spec.services is not provided, a readwritesplit service is configured on port 3306 by default.
Server configuration
As an alternative to provide a reference to a MariaDB via spec.mariaDbRef, you can also specify the servers manually:
As you could see, you can refer to in-cluser MariaDB servers by providing the DNS names of the MariaDBPods as server addresses. In addition, you can also refer to external MariaDB instances running outside of the Kubernetes cluster where the operator was deployed:
Pointing to external MariaDBs has some limitations: Since the operator doesn't have a reference to a MariaDB resource (spec.mariaDbRef), it will be unable to perform the following actions:
Infer the monitor module (spec.monitor.module), so it will need to be provided by the user.
Autogenerate authentication credentials (spec.auth), so they will need to be provided by the user. See Authentication section.
Primary server switchover
Only the MariaDB Monitor, to be used with MariaDB replication, supports the primary switchover operation.
You can declaratively select the primary server by setting spec.primaryServer=<server>:
This will trigger a switchover operation and MaxScale will promote the specified server to be the new primary server.
Server maintenance
You can put servers in maintenance mode by setting the server field maintenance=true:
Configuration
Similar to MariaDB, MaxScale allows you to provide global configuration parameters in a maxscale.conf file. You don't need to provide this config file directly, but instead you can use the spec.config.params to instruct the operator to create the maxscale.conf:
Both this global configuration and the resources created by the operator using the MaxScale API are stored under a volume provisioned by the spec.config.volumeClaimTemplate. Refer to the troubleshooting if you are getting errors writing on this volume.
Refer to the for more details about the supported parameters.
Authentication
MaxScale requires authentication with different levels of permissions for the following components/actors:
MaxScale API consumed by MariaDB Enterprise Kubernetes Operator.
Clients connecting to MaxScale.
MaxScale connecting to MariaDB servers.
MaxScale monitor connecting to MariaDB servers.
MaxScale configuration syncer to connect to MariaDB servers. See section.
By default, the operator generates this credentials when spec.mariaDbRef is set and spec.auth.generate = true, but you are still able to provide your own:
As you could see, you are also able to limit the number of connections for each component/actor. Bear in mind that, when running in high availability, you may need to increase this number, as more MaxScale instances implies more connections.
Kubernetes Services
To enable your applications to communicate with MaxScale, a Kubernetes Service is provisioned with all the ports specified in the MaxScale listeners. You have the flexibility to provide a template to customize this Service:
This results in the reconciliation of the following Service:
There is also another Kubernetes Service to access the GUI, please refer to the MaxScale GUI section for further detail.
Connection
You can leverage the Connection resource to automatically configure connection strings as Secret resources that your applications can mount:
Alternatively, you can also provide a connection template to your MaxScale resource:
Note that, the Connection uses the Service described in the Kubernetes Service section and you are able to specify which MaxScale service to connect to by providing the port (spec.port) of the corresponding MaxScale listener.
High availability
To synchronize the configuration state across multiple replicas, MaxScale stores the configuration externally in a MariaDB table and conducts periodic polling across all replicas. By default, the table mysql.maxscale_config is used, but this can be configured by the user as well as the synchronization interval.
Another crucial aspect to consider regarding HA is that only one monitor can be running at any given time to avoid conflicts. This can be achieved via cooperative locking, which can be configured by the user. Refer to for more information.
Multiple MaxScale replicas can be specified by providing the spec.replicas field. Note that, MaxScale exposes the scale subresource, so you can scale/downscale it by running the following command:
Or even configuring an HorizontalPodAutoscaler to do the job automatically.
Suspend resources
In order to enable this feature, you must set the --feature-maxscale-suspend feature flag:
Then you will be able to suspend any MaxScale resources, for instance, you can suspend a monitor:
MaxScale GUI
MaxScale offers a great user interface that provides very useful information about the MaxScale resources. You can enable it by providing the following configuration:
The GUI is exposed via a dedicated Kubernetes Service in the same port as the MaxScale API. Once you access, you will need to enter the MaxScale API credentials configured by the operator in a Secret. See the Authentication section for more details.
MaxScale API
MariaDB Enterprise Kubernetes Operator interacts with the to reconcile the specification provided by the user, considering both the MaxScale status retrieved from the API and the provided spec.
Troubleshooting
The operator tracks both the MaxScale status in regards to Kubernetes resources as well as the status of the MaxScale API resources. This information is available on the status field of the MaxScale resource, it may be very useful for debugging purposes:
Kubernetes events emitted by mariadb-enterprise-operator may also be very relevant for debugging. For instance, an event is emitted whenever the primary server changes:
The operator logs can also be a good source of information for troubleshooting. You can increase its verbosity and enable MaxScale API request logs by running:
Common errors
Permission denied writing /var/lib/maxscale
This error occurs when the user that runs the container does not have enough privileges to write in /var/lib/maxscale:
To mitigate this, by default, the operator sets the following securityContext in the MaxScale's StatefulSet:
This enables the CSIDriver and the kubelet to recursively set the ownership ofr the /var/lib/maxscale folder to the group 999, which is the one expected by MaxScale. It is important to note that not all the CSIDrivers implementations support this feature, see the CSIDriver documentation for further information.
New innovations in MaxScale 25.01 and Enterprise Platform
Asynchronous Replication
The operator supports provisioning and operating MariaDB clusters with replication as a highly availability topology. In the following sections we will be covering how to manage the full lifecycle of a replication cluster.
In a replication setup, one primary server handles all write operations while one or more replica servers replicate data from the primary, being able to handle read operations. More precisely, the primary has a binary log and the replicas asynchronously replicate the binary log events over the network.
Please refer to the for more details about replication.
kubectl patch maxscale maxscale-repl \
--type='merge' \
-p '{"spec":{"primaryServer":"mariadb-repl-1"}}'
kubectl get maxscale
NAME READY STATUS PRIMARY AGE
maxscale-repl False Switching primary to 'mariadb-repl-1' mariadb-repl-0 2m15s
kubectl get events --field-selector involvedObject.name=mariadb-repl-maxscale --sort-by='.lastTimestamp'
LAST SEEN TYPE REASON OBJECT MESSAGE
24s Normal MaxScalePrimaryServerChanged maxscale/mariadb-repl-maxscale MaxScale primary server changed from 'mariadb-repl-0' to 'mariadb-repl-1'
In order to provision a replication cluster, you need to configure a number of replicas greater than 1 and set the replication.enabled=true in the MariaDB CR:
After applying the previous CR, the operator will provision a replication cluster with one primary and two replicas. The operator will take care of setting up replication, configuring the replication user and monitoring the replication status:
As you can see, the primary can be identified in the PRIMARY column of the kubectl get mariadb output. You may also inspect the current replication status by checking the MariaDB CR status:
The operator continuously monitors the replication status via SHOW SLAVE STATUS, taking it into account for internal operations and updating the CR status accordingly.
Asynchronous vs semi-syncrhonous replication
By default, semi-synchronous replication is configured, which requires an acknowledgement from at least one replica before committing the transaction back to the client. This trades off performance for better consistency and facilitates failover and switchover operations.
If you are aiming for better performance, you can disable semi-synchronous replication, and go fully asynchronous, please refer to configuration section for doing so.
Configuration
The replication settings can be customized under the replication section of the MariaDB CR. The following options are available:
gtidStrictMode: Enables GTID strict mode. It is recommended and enabled by default. See MariaDB documentation.
semiSyncEnabled: Determines whether semi-synchronous replication should be enabled. It is enabled by default. See MariaDB documentation.
semiSyncAckTimeout: ACK timeout for the replicas to acknowledge transactions to the primary. It requires semi-synchronous replication. See .
semiSyncWaitPoint: Determines whether the transaction should wait for an ACK after having synced the binlog (AfterSync) or after having committed to the storage engine (AfterCommit, the default). It requires semi-synchronous replication. See .
syncBinlog: Number of events after which the binary log is synchronized to disk. See .
standaloneProbes: Determines whether to use regular non-HA startup and liveness probes. It is disabled by default.
These options are used by the operator to create a replication configuration file that is applied to all nodes in the cluster. When updating any of these options, an update of the cluster will be triggered in order to apply the new configuration.
For replica-specific configuration options, please refer to the replica configuration section. Additional system variables may be configured via the myCnf configuration field. Refer to the configuration documentation for more details.
Replica configuration
The following options are replica-specific and can be configured under the replication.replica section of the MariaDB CR:
replPasswordSecretKeyRef: Reference to the Secret key containing the password for the replication user, used by the replicas to connect to the primary. By default, a Secret with a random password will be created.
gtid: GTID position mode to be used (CurrentPos and SlavePos allowed). It defaults to CurrentPos. See .
connectionRetrySeconds: Number of seconds that the replica will wait between connection retries. See .
maxLagSeconds: Maximum acceptable lag in seconds between the replica and the primary. If the lag exceeds this value, the will fail and the replica will be marked as not ready. It defaults to 0, meaning that no lag is allowed. See section for more details.
syncTimeout: Timeout for the replicas to be synced during switchover and failover operations. It defaults to 10s. See the and sections for more details.
Probes
Kubernetes probes are resolved by the agent (see data-plane documentation) in the replication topology, taking into account both the MariaDB and replication status. Additionally, as described in the configuration documentation, probe thresholds may be tuned accordingly for a better reliability based on your environment.
In the following sub-sections we will be covering specifics about the replication topology.
Liveness probe
As part of the liveness probe, the agent checks that the MariaDB server is running and that the replication threads (Slave_IO_Running and Slave_SQL_Running) are both running on replicas. If any of these checks fail, the liveness probe will fail.
If such a behaviour is undesirable, it is possible to opt in for regular standalone startup/liveness probes (default SELECT 1 query). See standaloneProbes in the configuration section.
Readiness probe
The readiness probe checks that the MariaDB server is running and that the Seconds_Behind_Master value is within the acceptable lag range defined by the spec.replication.replica.maxLagSeconds configuration option. If the lag exceeds this value, the readiness probe will fail and the replica will be marked as not ready.
Lagged replicas
A replica is considered to be lagging behind the primary when the Seconds_Behind_Master value reported by SHOW SLAVE STATUS exceeds the spec.replication.replica.maxLagSeconds configuration option. This results in the readiness probe failing for that replica, and it has the following implications:
When taking a physical backup, lagged replicas will not be considered as a target for taking the backup.
During a primary switchover managed by the operator, lagged replicas will block switchover operations, as all the replicas must be in sync before promoting the new primary. This doesn't affect MaxScale switchover operation.
During a managed by the operator, lagged replicas will not be considered as candidates to be promoted as the new primary. MaxScale failover will not consider lagged replicas either.
During , lagged replicas will block the update operation, as each of the replicas must pass the readiness probe before proceeding to the update of the next one.
Backing up and restoring
In order to back up and restore a replication cluster, all the concepts and procedures described in the physical backup documentation apply.
Additionally, for the replication topology, the operator tracks the GTID position at the time of taking the backup, and sets this position based on the gtid_current_pos system variable when restoring the backup, as described in the MariaDB documentation