Host Anomaly Detector for MariaDB Xpand

Overview

MariaDB Xpand 6.1.1 introduces the Host Anomaly Detector, which can help determine the cause of cluster-wide instability, with an initial focus on diagnosing network issues:

  • Xpand's Host Anomaly Detector aggregates and analyzes the cluster logs to detect issues

  • Metrics can be collected by scraping the exposed Prometheus endpoint or by configuring the monitor to export to InfluxDB directly

Compatibility

  • MariaDB Xpand 6.1 (6.1.1 and later)

Use Cases

  • Identify potential problems with nodes sooner

  • Detect when backend network connections repeatedly fail

  • Detect when particular periodic tasks take too long

Aggregates and Analyzes Cluster Logs

MariaDB Xpand's Host Anomaly Detector functions by aggregating and analyzing the cluster logs.

Despite the additional complexity required to aggregate and analyze the cluster logs, the logs are an excellent source of information about the state of the Xpand cluster that is not easily available through other means, especially in cases where the database itself can't be queried.

For example, consider a scenario where a cluster is repeatedly looping through group changes due to poor network connectivity, so the cluster never reaches the point where it can execute queries. In this scenario, the Xpand cluster can't be queried, so some other source of data must be used to obtain the state of the cluster. Since the logs are available even when the database is down, the logs can be used entirely independent of the state of the local node or the cluster as a whole.

Additionally, the logs from different nodes can be used to correlate events from each node. In our example scenario, the problematic nodes with poor connectivity would fail to connect to the good nodes roughly as often as the good nodes would fail to connect to the problematic nodes. Since the Host Anomaly Detector collects logs from all nodes, it is able to view all similar events from all logs to find the common node in the error messages.

Metrics

MariaDB Xpand's Host Anomaly Detector currently tracks the following metrics on each Xpand node:

  • The amount of memory used (in kilobytes) by the monitor process

  • The Unix timestamp of the last observed database process restart on the monitor's host

  • The number of errors reported by this node

  • The number of alerts reported by this node

  • The number of crashes reported by this node

  • The Unix timestamp of the most recent time this node has crashed

  • The number of group changes this node has experienced

  • The Unix timestamp of when the most recent group change began

  • The Unix timestamp of when the most recent group formed

  • The number of times this node has failed to connect to each other node in the cluster