MaxScale Troubleshooting
Contents
SystemD Watchdog Kills MaxScale
This can occur if a reverse DNS name lookup takes a long time. To disable reverse name lookups of client IPs to client hostnames, add skip_name_resolve=true under the [maxscale]
section.
High Memory Usage
MaxScale starting with 22.08.4
The default value of writeq_high_water was lowered to 64KiB to reduce excessive memory usage. This change should result in a net decrease in memory usage and possibly a small improvement in performance.
Set writeq_high_water and writeq_low_water to lower values, for example writeq_high_water=512
and writeq_low_water=128
. The default is to buffer a maximum of 16MB in memory before network throttling begins which under intensive loads can result in a large amount of memory being used per client.
The query classifier cache in MaxScale by default takes up to 15% of memory to cache query classification data. This value can be lowered using the query_classifier_cache_size parameter.
The retain_last_statements and session_trace debugging parameters can cause memory usage to increase. Disabling them under intensive loads is recommended if they are not needed. Note that the maxctrl list queries
requires that retain_last_statements=1
is set.
Profiling Memory Usage
Profiling the memory usage can be useful for finding out why MaxScale appears to use more memory than it should. It is especially helpful for analyzing OOM situations or other cases where the memory grows linearly and causes problems.
To profile the memory usage of MaxScale, there are multiple options. The following sections describe the methods that are available.
If a problem in memory usage is identified and it appears to be due to a bug in MaxScale, please open a new bug report on the MariaDB Jira under the MaxScale project. Remember to include all the profiling and leak check reports along with the MaxScale version number and the configuration file with all password and other sensitive information removed.
Debug Binaries
The easiest option is to install the MaxScale debug binaries which are built with AddressSanitizer and LeakSanitizer enabled. These are low-impact instrumentation tools that detect memory access errors as well as memory leaks.
Once installed, make sure that the maxlog
parameter is not disabled and then start MaxScale. Let it run until the memory usage grows beyond normal limits and then shut MaxScale down with systemctl stop maxscale.service
. The MaxScale log should contain a verbose explanation of where memory leaks occurred, if any were found.
Jemalloc Heap Profiling
Jemalloc is an alternative to the default glibc memory allocator. It is capable of analyzing the heap memory usage of a process which allows it to be used to detect all sorts of memory usage problems with a lower overhead compared to tools like Valgrind. Unlike the ASAN and LSAN sanitizers, it is capable of detecting cases where memory doesn't actually leak but keeps growing with no upper limit (e.g. items get appended to a list but are never removed).
Ubuntu and Debian
To enable jemalloc, the packages for it must be first installed from the system repositories. Ubuntu 20.04 requires the following packages to be installed for jemalloc profiling:
apt-get -y install libjemalloc2 libjemalloc-dev binutils graphviz ghostscript gv
Configuring MaxScale for Jemalloc Heap Profiling
Once installed, edit /lib/systemd/system/maxscale.service
and add the following two lines into the [Service]
section.
Environment=MALLOC_CONF=prof:true,prof_leak:true,prof_gdump:true,lg_prof_sample:18,prof_prefix:/var/log/maxscale/jeprof Environment=LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
Then run systemctl daemon-reload
and restart MaxScale with systemctl restart maxscale.service
.
The MaxScale log directory in /var/log/maxscale/
will start to be filled by versioned files with a .heap
suffix. Every time the virtual memory used by MaxScale reaches a new high, a file will be created. Initially, the files will be created very often but eventually the pace will slow down. Once the problematic memory usage has been identified, the latest .heap
file can be analyzed with the jeprof
program.
The easiest way to look at the generated heap profile is with the PDF output. To generate the PDF report of the latest heap dump, run the following command:
jeprof --pdf /usr/bin/maxscale $(ls -1 /var/log/maxscale/*.heap|sort -V|tail -n 1) > heap-report.pdf
The generated heap-report.pdf
will contain a breakdown of the memory usage of MaxScale.
Note that the report generation with the jeprof
program must be done on the same system where the profiling was done. If done elsewhere, the binaries do not necessarily match and can cause the report generation to fail.
Tcmalloc Heap Profiling
Similarly to the jemalloc memory allocator, the tcmalloc memory allocator comes with a leak checker and heap profiler.
Installation
Rocky Linux 8
sudo dnf -y install gperftools
Ubuntu 20.04
sudo apt -y install google-perftools
Service file configuration
Once tcmalloc is installed, edit /lib/systemd/system/maxscale.service
and add the following lines into the [Service]
section.
Note: Make sure to use the correct path to the tcmalloc library in LD_PRELOAD
. The following example uses the Debian location of the library. The file is usually located in /usr/lib64/libtcmalloc_and_profiler.so.4.5.3
on RHEL systems. The version number of the library can also change which might require other adjustments to the library path.
Environment=LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_and_profiler.so.4.5.3 Environment=HEAPPROFILE=/var/log/maxscale/maxscale.prof Environment=HEAPCHECK=normal Environment=HEAP_CHECK_AFTER_DESTRUCTORS=true
Then run systemctl daemon-reload
and restart MaxScale with systemctl restart maxscale.service
.
Report generation
Depending on which OS you are using, the report generation program is named either pprof
(RHEL) or google-pprof
(Debian/Ubuntu).
It is important to pick the latest .heap
file to analyze. The following command generates the heap-report.pdf
from the latest heap dump. The file will show the breakdown of the memory usage.
pprof --pdf /usr/bin/maxscale $(ls /var/log/maxscale/*.heap|sort -V|tail -n 1) > heap-report.pdf
Valgrind
Valgrind can be used to analyze memory usage problems but usually it is left as the last resort due to the heavy performance penalty that it incurs. However, the use of Valgrind is simple as it is widely available and can be used with existing MaxScale binaries.
To use valgrind
for memory leak detection, edit the /lib/systemd/system/maxscale.service
file and replace the following values:
ExecStart=/usr/bin/maxscale
withExecStart=valgrind --leak-check=full /usr/bin/maxscale -d
Type=forking
withType=simple
Then reload the daemon with systemctl daemon-reload
and restart the MaxScale process with systemctl restart maxscale.service
. Once the memory problem is confirmed, stop the MaxScale process with systemctl stop maxscale.service
. Valgrind will print the leak report into the system journal that can be viewed with journalctl -u maxscale
.
Authentication Errors
Access Denied
If you are receiving authentication errors like this:
ERROR 1045 (28000): Access denied for user 'bob'@'office' (using password: YES)
Make sure you create users for both 'bob'@'office'
and 'bob'@'maxscale'
. The host 'office'
is where the client is attempting to connect from and 'maxscale'
is the host where MaxScale is installed.
If you do not want to create a second set of users, you can enable proxy_protocol in MaxScale and configure the MariaDB server to allow proxied connections from the MaxScale host.
Verifying that a user is allowed to connect
- MaxScale connection
- SSH to the server where MaxScale is installed
- Connect to MariaDB
- Check output of
SHOW GRANTS
- Client connection
- SSH to theserver where client is connecting from
- Connect to MariaDB
- Check output of
SHOW GRANTS
Checking MaxScale has correct grants
Service Grants
Make sure that the MaxScale services have a user configured and that it has the correct grants. Refer to the MariaDB protocol documentation on what grants are required for services.
Monitor Grants
The monitor user requires different grants than the service user and each monitor type requires different grants.
Other Errors
For all authentication and permission related errors, add debug=enable-statement-logging
under the [maxscale]
section of your MaxScale configuration file. This will cause all SQL statements to be logged on the notice level which will help you figure out what the problem is.
Access denied errors for user root!
If you want to connect as root, you'll need to add enable_root_user=true to the service.
Access denied on databases/tables containing underscores
There seems to be a bug for databases containing underscores. Connect as root and use "SHOW GRANTS FOR user".
GRANT SELECT ON `my\_database`.* TO 'user'@'%' <-- bad
GRANT SELECT ON `my_database`.* TO 'user'@'%' <-- good
If you got a grant containing a escaped underscore, you can add the strip_db_esc=true parameter to the service to automatically strip escape characters or just replace the grant with a unescaped one.
System Errors
Failed to write message: 11, Resource temporarily unavailable
MaxScale starting with 22.08.0
MaxScale 22.08 no longer uses pipes for internal communication. This means that this error is never logged and the pipe size no longer needs to be adjusted.
Starting with MaxScale 2.1 and until MaxScale 6, MaxScale can log the Failed to write message: 11, Resource temporarily unavailable
message under extremely intensive workloads (see MXS-1983).
The first action to take when these messages are encountered is to upgrade your MaxScale installation to the latest version. Whenever this message is seen, it means that something is causing the internal message queue in MaxScale to fill up. More often than not it is a sign of a possible bug in MaxScale and most likely has been fixed in the most recent release of MaxScale.
To correct it increase the pipe buffer size from the default 1MB to a higher value. At least 8MB is recommended and should be increased until the message stops appearing.
To set the pipe buffer size, execute the following command.
sudo sysctl -w fs.pipe-max-size=8388608
Before MaxScale 6.4.5, messages in the queue would end up taking 4096 bytes of memory which translated to a maximum of 256 messages with a 1MiB pipe size. In MaxScale 6.4.5, the size is 24 bytes which causes the maximum limit to be increased to about 43k messages.
If after all these actions you still see these warnings, please open a bug report on the MariaDB Jira under the MaxScale project.
Error 23: Too many open files
This is a common error when system limits for open files is too low. The fix to this is to increase the limits.
Systemd
Edit or add LimitNOFILE=<number of files>
under the [Service]
section in /usr/lib/systemd/system/maxscale.service
.
Binlogrouter
Commands not Working
Make sure you are connecting on the port where the binlogrouter is listening. A common mistake is to connect to a readwritesplit or readconnroute port and execute the replication configuration commands there.
MaxScale CDC: Avrorouter
For most problems, resetting the conversion state is the solution. If the conversion repeatedly stops at a certain point, please open a bug report.
Resetting conversion state
- Stop MaxScale
- Remove the
avro.index
andavro-conversion.ini
files along with any generated.avro
files from the director where the Avro files are stored - Start MaxScale
Binlog files are not found
Make sure the start_index
parameter is set to the lowest binlog file number. For example, to start from mariadb-bin-000005
, set start_index=5
.
Access denied to CDC interface
Create the user with maxadmin call command cdc add_user <service name> <user> <password>
or maxctrl call command cdc add_user <service name> <user> <password>
.