# Built-in Alert Rules

MariaDB Enterprise Manager includes a comprehensive set of pre-configured alert rules to provide production-ready monitoring for your entire database stack out-of-the-box. These alerts are built on the integrated Grafana Alerting engine and are designed to detect common issues across your MariaDB Servers, Galera Clusters, MaxScale instances, and the underlying operating systems.

A key feature of these rules is the use of a **"sustained for"** duration. This means a condition must remain true for a specified period (e.g., 3 minutes) before an alert will fire. This prevents alert fatigue from brief, transient spikes and ensures you are only notified of persistent, actionable problems.

## MariaDB Server

| Alert name                        | Description                                                                                                                                                                                                             |
| --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **MariadbInstanceDown**           | MariaDB instance down for 3 minutes (sustained for 3m). Triggers when the exporter reports the instance as down (`mariadb_up = 0`) **or** when no sample from `mariadb_up` has been received for more than 120 seconds. |
| **ReplicaProcessDown**            | MariaDB instance has a Replica process Down (sustained for 3m). Triggers when replication is unhealthy: the I/O or SQL thread is stopped, **or** `Seconds_Behind_Master` is missing (replica not reporting progress).   |
| **ReplicaSecondsBehindPrimary**   | MariaDB replica is more than 600s behind primary (sustained for 3m). Triggers when replication lag exceeds 600 seconds.                                                                                                 |
| **HighUtilizationMaxConnections** | MariaDB instance has high connection utilization (sustained for 5m). Triggers when `Threads_connected` exceeds \~80% of `max_connections`.                                                                              |
| **MariaDBInstanceRestart**        | MariaDB instance restarted recently (sustained for 5m). Triggers when server uptime is below 1 hour, indicating a recent restart.                                                                                       |
| **MariaDBDeadlockFound**          | MariaDB Deadlock found in the last 15m (sustained for 5m). Triggers when the count of InnoDB deadlocks increases compared to 15 minutes ago.                                                                            |

## Galera Cluster

| Alert name                          | Description                                                                                                                                                                                               |
| ----------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **GaleraClusterDown**               | Galera instance down for 5 minutes (sustained for 5m). Triggers when the cluster is not in Primary state (`wsrep_cluster_status ≠ 1`) **or** the node is not ready (`wsrep_ready ≠ 1`).                   |
| **GaleraNodeNotReady**              | Galera node not ready (state ≠ 4) for 5m (sustained for 5m). Triggers when the node is not in **Synced** state and it’s **not** a temporary DESYNC (desync counter did not change in the last 5 minutes). |
| **GaleraInWrongState**              | Galera instance is in an unexpected state (sustained for 5m). Triggers when the node’s state comment isn’t one of the normal values (Synced / Donor / Joining / Joined / Waiting for SST).                |
| **GaleraClusterDonorFallingBehind** | Galera donor lagging (recv queue > 100) for 5m (sustained for 5m). Triggers when a **Donor** node (state=2) accumulates a large receive queue, indicating it’s falling behind replication.                |
| **GaleraClusterSizeChanged**        | Galera cluster size changed in last 15m (sustained for 5m). Triggers when the cluster size **increases** within 15 minutes.                                                                               |

## MaxScale

| Alert name               | Description                                                                                                                                                                                        |
| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **MaxScaleInstanceDown** | MaxScale down for 3 minutes (sustained for 3m). Triggers when **no recent MaxScale metrics** have been received for more than 120 seconds (e.g., MaxScale down or exporter/scrape pipeline issue). |
| **MaxScaleNoPrimary**    | MaxScale has no primary for 3 minutes (sustained for 3m). Triggers when MaxScale reports **zero servers with role = Primary/Master**.                                                              |

## Node/OS

| Alert name                         | Description                                                                                                                                                                                                                             |
| ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **NodeFilesystemSpaceUsage**       | Filesystem disk space is above 90% (sustained for 1h). Triggers when disk space used exceeds 90% on a writable filesystem.                                                                                                              |
| **NodeFilesystemSpaceFillingUp**   | Filesystem predicted to run out of space within \~24h (sustained for 1h). Triggers when usage is above 80% **and** the trend (predictive model) indicates free space will reach zero within \~24 hours; excludes read-only filesystems. |
| **NodeMemoryHighUtilization**      | Instance is running out of memory > 95% (sustained for 15m). Triggers when memory utilization exceeds 95%.                                                                                                                              |
| **NodeCPUHighUtilization**         | Instance is running out of CPU > 90% (sustained for 15m). Triggers when CPU utilization exceeds 90% over a 5-minute window.                                                                                                             |
| **NodeFilesystemAlmostOutOfFiles** | Filesystem has less than 3% inodes left (sustained for 1h). Triggers when available inodes drop below 3% on a writable filesystem.                                                                                                      |
| **NodeNetworkReceiveErrs**         | Network interface has a high receive-error rate (sustained for 1h). Triggers when receive errors exceed **1%** of total received packets over a 2-minute rate window.                                                                   |
| **NodeFileDescriptorLimit**        | Kernel is predicted to exhaust file descriptors soon (sustained for 15m). Triggers when allocated file descriptors exceed **70%** of the kernel limit.                                                                                  |
| **NodeFileDescriptorLimit**        | Kernel is close to exhausting file descriptors (sustained for 15m). Triggers when allocated file descriptors exceed **90%** of the kernel limit.                                                                                        |

<sub>*This page is: Copyright © 2025 MariaDB. All rights reserved.*</sub>

{% @marketo/form formId="4316" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://mariadb.com/docs/tools/mariadb-enterprise-manager/usage/monitoring/alerts-and-notifications/built-in-alert-rules.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
