MaxScale Troubleshooting

You are viewing an old version of this article. View the current version here.

SystemD Watchdog Kills MaxScale

This can occur if a reverse DNS name lookup takes a long time. To disable reverse name lookups of client IPs to client hostnames, add skip_name_resolve=true under the [maxscale] section.

High Memory Usage

MaxScale starting with 22.08.4

The default value of writeq_high_water was lowered to 64KiB to reduce excessive memory usage. This change should result in a net decrease in memory usage and possibly a small improvement in performance.

Set writeq_high_water and writeq_low_water to lower values, for example writeq_high_water=512 and writeq_low_water=128. The default is to buffer a maximum of 16MB in memory before network throttling begins which under intensive loads can result in a large amount of memory being used per client.

The query classifier cache in MaxScale by default takes up to 15% of memory to cache query classification data. This value can be lowered using the query_classifier_cache_size parameter.

The retain_last_statements and session_trace debugging parameters can cause memory usage to increase. Disabling them under intensive loads is recommended if they are not needed. Note that the maxctrl list queries requires that retain_last_statements=1 is set.

Profiling Memory Usage

Profiling the memory usage can be useful for finding out why MaxScale appears to use more memory than it should. It is especially helpful for analyzing OOM situations or other cases where the memory grows linearly and causes problems.

To profile the memory usage of MaxScale, there are multiple options. The following sections describe the methods that are available.

If a problem in memory usage is identified and it appears to be due to a bug in MaxScale, please open a new bug report on the MariaDB Jira under the MaxScale project. Remember to include all the profiling and leak check reports along with the MaxScale version number and the configuration file with all password and other sensitive information removed.

Debug Binaries

The easiest option is to install the MaxScale debug binaries which are built with AddressSanitizer and LeakSanitizer enabled. These are low-impact instrumentation tools that detect memory access errors as well as memory leaks.

Once installed, make sure that the maxlog parameter is not disabled and then start MaxScale. Let it run until the memory usage grows beyond normal limits and then shut MaxScale down with systemctl stop maxscale.service. The MaxScale log should contain a verbose explanation of where memory leaks occurred, if any were found.

Jemalloc Heap Profiling

Jemalloc is an alternative to the default glibc memory allocator. It is capable of analyzing the heap memory usage of a process which allows it to be used to detect all sorts of memory usage problems with a lower overhead compared to tools like Valgrind. Unlike the ASAN and LSAN sanitizers, it is capable of detecting cases where memory doesn't actually leak but keeps growing with no upper limit (e.g. items get appended to a list but are never removed).

Ubuntu and Debian

To enable jemalloc, the packages for it must be first installed from the system repositories. Ubuntu 20.04 requires the following packages to be installed for jemalloc profiling:

apt-get -y install libjemalloc2 libjemalloc-dev binutils graphviz ghostscript gv

Configuring MaxScale for Jemalloc Heap Profiling

Once installed, edit /lib/systemd/system/maxscale.service and add the following two lines into the [Service] section.

Environment=MALLOC_CONF=prof:true,prof_leak:true,prof_gdump:true,lg_prof_sample:18,prof_prefix:/var/log/maxscale/jeprof
Environment=LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2

Then run systemctl daemon-reload and restart MaxScale with systemctl restart maxscale.service.

The MaxScale log directory in /var/log/maxscale/ will start to be filled by versioned files with a .heap suffix. Every time the virtual memory used by MaxScale reaches a new high, a file will be created. Initially, the files will be created very often but eventually the pace will slow down. Once the problematic memory usage has been identified, the latest .heap file can be analyzed with the jeprof program.

The easiest way to look at the generated heap profile is with the PDF output. To generate the PDF report of the latest heap dump, run the following command:

jeprof --pdf /usr/bin/maxscale $(ls -1 /var/log/maxscale/*.heap|sort -V|tail -n 1) > heap-report.pdf

The generated heap-report.pdf will contain a breakdown of the memory usage of MaxScale.

Note that the report generation with the jeprof program must be done on the same system where the profiling was done. If done elsewhere, the binaries do not necessarily match and can cause the report generation to fail.

Tcmalloc Heap Profiling

Similarly to the jemalloc memory allocator, the tcmalloc memory allocator comes with a leak checker and heap profiler.

Installation

Rocky Linux 8
sudo dnf -y install gperftools
Ubuntu 20.04
sudo apt -y install google-perftools

Service file configuration

Once tcmalloc is installed, edit /lib/systemd/system/maxscale.service and add the following lines into the [Service] section.

Note: Make sure to use the correct path to the tcmalloc library in LD_PRELOAD. The following example uses the Debian location of the library. The file is usually located in /usr/lib64/libtcmalloc_and_profiler.so.4.5.3 on RHEL systems. The version number of the library can also change which might require other adjustments to the library path.

Environment=LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_and_profiler.so.4.5.3
Environment=HEAPPROFILE=/var/log/maxscale/maxscale.prof
Environment=HEAPCHECK=normal
Environment=HEAP_CHECK_AFTER_DESTRUCTORS=true

Then run systemctl daemon-reload and restart MaxScale with systemctl restart maxscale.service.

Report generation

Depending on which OS you are using, the report generation program is named either pprof (RHEL) or google-pprof (Debian/Ubuntu).

It is important to pick the latest .heap file to analyze. The following command generates the heap-report.pdf from the latest heap dump. The file will show the breakdown of the memory usage.

pprof --pdf /usr/bin/maxscale $(ls /var/log/maxscale/*.heap|sort -V|tail -n 1) > heap-report.pdf

Valgrind

Valgrind can be used to analyze memory usage problems but usually it is left as the last resort due to the heavy performance penalty that it incurs. However, the use of Valgrind is simple as it is widely available and can be used with existing MaxScale binaries.

To use valgrind for memory leak detection, edit the /lib/systemd/system/maxscale.service file and replace the following values:

  • ExecStart=/usr/bin/maxscale with ExecStart=valgrind --leak-check=full /usr/bin/maxscale -d
  • Type=forking with Type=simple

Then reload the daemon with systemctl daemon-reload and restart the MaxScale process with systemctl restart maxscale.service. Once the memory problem is confirmed, stop the MaxScale process with systemctl stop maxscale.service. Valgrind will print the leak report into the system journal that can be viewed with journalctl -u maxscale.

Authentication Errors

Access Denied

If you are receiving authentication errors like this:

ERROR 1045 (28000): Access denied for user 'bob'@'office' (using password: YES)

Make sure you create users for both 'bob'@'office' and 'bob'@'maxscale'. The host 'office' is where the client is attempting to connect from and 'maxscale' is the host where MaxScale is installed.

If you do not want to create a second set of users, you can enable proxy_protocol in MaxScale and configure the MariaDB server to allow proxied connections from the MaxScale host.

Verifying that a user is allowed to connect

  • MaxScale connection
    1. SSH to the server where MaxScale is installed
    2. Connect to MariaDB
    3. Check output of SHOW GRANTS
  • Client connection
    1. SSH to theserver where client is connecting from
    2. Connect to MariaDB
    3. Check output of SHOW GRANTS

Checking MaxScale has correct grants

Service Grants

Make sure that the MaxScale services have a user configured and that it has the correct grants. Refer to the MariaDB protocol documentation on what grants are required for services.

Monitor Grants

The monitor user requires different grants than the service user and each monitor type requires different grants.

Other Errors

For all authentication and permission related errors, add debug=enable-statement-logging under the [maxscale] section of your MaxScale configuration file. This will cause all SQL statements to be logged on the notice level which will help you figure out what the problem is.

Access denied errors for user root!

If you want to connect as root, you'll need to add enable_root_user=true to the service.

Access denied on databases/tables containing underscores

There seems to be a bug for databases containing underscores. Connect as root and use "SHOW GRANTS FOR user".

GRANT SELECT ON `my\_database`.* TO 'user'@'%' <-- bad

GRANT SELECT ON `my_database`.* TO 'user'@'%' <-- good

If you got a grant containing a escaped underscore, you can add the strip_db_esc=true parameter to the service to automatically strip escape characters or just replace the grant with a unescaped one.

System Errors

Failed to write message: 11, Resource temporarily unavailable

MaxScale starting with 22.08.0

MaxScale 22.08 no longer uses pipes for internal communication. This means that this error is never logged and the pipe size no longer needs to be adjusted.

Starting with MaxScale 2.1 and until MaxScale 6, MaxScale can log the Failed to write message: 11, Resource temporarily unavailable message under extremely intensive workloads (see MXS-1983).

The first action to take when these messages are encountered is to upgrade your MaxScale installation to the latest version. Whenever this message is seen, it means that something is causing the internal message queue in MaxScale to fill up. More often than not it is a sign of a possible bug in MaxScale and most likely has been fixed in the most recent release of MaxScale.

To correct it increase the pipe buffer size from the default 1MB to a higher value. At least 8MB is recommended and should be increased until the message stops appearing.

To set the pipe buffer size, execute the following command.

sudo sysctl -w fs.pipe-max-size=8388608

Before MaxScale 6.4.5, messages in the queue would end up taking 4096 bytes of memory which translated to a maximum of 256 messages with a 1MiB pipe size. In MaxScale 6.4.5, the size is 24 bytes which causes the maximum limit to be increased to about 43k messages.

If after all these actions you still see these warnings, please open a bug report on the MariaDB Jira under the MaxScale project.

Error 23: Too many open files

This is a common error when system limits for open files is too low. The fix to this is to increase the limits.

Systemd

Edit or add LimitNOFILE=<number of files> under the [Service] section in /usr/lib/systemd/system/maxscale.service.

Binlogrouter

Commands not Working

Make sure you are connecting on the port where the binlogrouter is listening. A common mistake is to connect to a readwritesplit or readconnroute port and execute the replication configuration commands there.

MaxScale CDC: Avrorouter

For most problems, resetting the conversion state is the solution. If the conversion repeatedly stops at a certain point, please open a bug report.

Resetting conversion state

  • Stop MaxScale
  • Remove the avro.index and avro-conversion.ini files along with any generated .avro files from the director where the Avro files are stored
  • Start MaxScale

Binlog files are not found

Make sure the start_index parameter is set to the lowest binlog file number. For example, to start from mariadb-bin-000005, set start_index=5.

Access denied to CDC interface

Create the user with maxadmin call command cdc add_user <service name> <user> <password> or maxctrl call command cdc add_user <service name> <user> <password>.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.