Built-in Alert Rules
MariaDB Enterprise Manager includes a comprehensive set of pre-configured alert rules to provide production-ready monitoring for your entire database stack out-of-the-box. These alerts are built on the integrated Grafana Alerting engine and are designed to detect common issues across your MariaDB Servers, Galera Clusters, MaxScale instances, and the underlying operating systems.
A key feature of these rules is the use of a "sustained for" duration. This means a condition must remain true for a specified period (e.g., 3 minutes) before an alert will fire. This prevents alert fatigue from brief, transient spikes and ensures you are only notified of persistent, actionable problems.
MariaDB Server
MariadbInstanceDown
MariaDB instance down for 3 minutes (sustained for 3m). Triggers when the exporter reports the instance as down (mariadb_up = 0) or when no sample from mariadb_up has been received for more than 120 seconds.
ReplicaProcessDown
MariaDB instance has a Replica process Down (sustained for 3m). Triggers when replication is unhealthy: the I/O or SQL thread is stopped, or Seconds_Behind_Master is missing (replica not reporting progress).
ReplicaSecondsBehindPrimary
MariaDB replica is more than 600s behind primary (sustained for 3m). Triggers when replication lag exceeds 600 seconds.
HighUtilizationMaxConnections
MariaDB instance has high connection utilization (sustained for 5m). Triggers when Threads_connected exceeds ~80% of max_connections.
MariaDBInstanceRestart
MariaDB instance restarted recently (sustained for 5m). Triggers when server uptime is below 1 hour, indicating a recent restart.
MariaDBDeadlockFound
MariaDB Deadlock found in the last 15m (sustained for 5m). Triggers when the count of InnoDB deadlocks increases compared to 15 minutes ago.
Galera Cluster
GaleraClusterDown
Galera instance down for 5 minutes (sustained for 5m). Triggers when the cluster is not in Primary state (wsrep_cluster_status ≠ 1) or the node is not ready (wsrep_ready ≠ 1).
GaleraNodeNotReady
Galera node not ready (state ≠ 4) for 5m (sustained for 5m). Triggers when the node is not in Synced state and it’s not a temporary DESYNC (desync counter did not change in the last 5 minutes).
GaleraInWrongState
Galera instance is in an unexpected state (sustained for 5m). Triggers when the node’s state comment isn’t one of the normal values (Synced / Donor / Joining / Joined / Waiting for SST).
GaleraClusterDonorFallingBehind
Galera donor lagging (recv queue > 100) for 5m (sustained for 5m). Triggers when a Donor node (state=2) accumulates a large receive queue, indicating it’s falling behind replication.
GaleraClusterSizeChanged
Galera cluster size changed in last 15m (sustained for 5m). Triggers when the cluster size increases within 15 minutes.
MaxScale
MaxScaleInstanceDown
MaxScale down for 3 minutes (sustained for 3m). Triggers when no recent MaxScale metrics have been received for more than 120 seconds (e.g., MaxScale down or exporter/scrape pipeline issue).
MaxScaleNoPrimary
MaxScale has no primary for 3 minutes (sustained for 3m). Triggers when MaxScale reports zero servers with role = Primary/Master.
Node/OS
NodeFilesystemSpaceUsage
Filesystem disk space is above 90% (sustained for 1h). Triggers when disk space used exceeds 90% on a writable filesystem.
NodeFilesystemSpaceFillingUp
Filesystem predicted to run out of space within ~24h (sustained for 1h). Triggers when usage is above 80% and the trend (predictive model) indicates free space will reach zero within ~24 hours; excludes read-only filesystems.
NodeMemoryHighUtilization
Instance is running out of memory > 95% (sustained for 15m). Triggers when memory utilization exceeds 95%.
NodeCPUHighUtilization
Instance is running out of CPU > 90% (sustained for 15m). Triggers when CPU utilization exceeds 90% over a 5-minute window.
NodeFilesystemAlmostOutOfFiles
Filesystem has less than 3% inodes left (sustained for 1h). Triggers when available inodes drop below 3% on a writable filesystem.
NodeNetworkReceiveErrs
Network interface has a high receive-error rate (sustained for 1h). Triggers when receive errors exceed 1% of total received packets over a 2-minute rate window.
NodeFileDescriptorLimit
Kernel is predicted to exhaust file descriptors soon (sustained for 15m). Triggers when allocated file descriptors exceed 70% of the kernel limit.
NodeFileDescriptorLimit
Kernel is close to exhausting file descriptors (sustained for 15m). Triggers when allocated file descriptors exceed 90% of the kernel limit.
This page is: Copyright © 2025 MariaDB. All rights reserved.
Last updated
Was this helpful?

