MaxScale Workload Capture and Replay

Record and replay real database traffic. This filter captures client sessions to create realistic benchmarks and test environments based on production workloads.

The WCAR (Workload Capture and Replay) module is a sophisticated feature designed to capture and archive client traffic passing through a MaxScale instance. This allows system administrators and developers to process and store vast volumes of data related to client-server interactions in a reliable manner. By harnessing this captured data, users gain the flexibility to replay and simulate the varied client activity typically seen in a production environment.

One of the module's primary advantages is that it removes the necessity of creating explicit traffic generators, which can be resource-intensive and complex to maintain. Instead, the WCAR module provides a seamless method for mirroring realistic client interactions and behavior patterns, which can be critical for testing, debugging, and optimizing system performance.

Additionally, by facilitating traffic replay, the WCAR module aids in identifying potential system vulnerabilities and performance bottlenecks, allowing for preemptive optimization. This proactive approach ensures that systems are well-prepared for live production scenarios, enhancing overall efficiency and uptime.

In essence, the WCAR module not only preserves detailed and valuable traffic data but also empowers users with the tools to analyze and refine their systems through accurate simulation, paving the way for robust and resilient system architectures.

Overview

The WCAR filter (module wcar) captures client traffic and stores it in a replayable format.

WCAR is designed to capture traffic on a production MaxScale instance. The captured data can then be used as a reproducible way of generating client traffic without having to write application-specific traffic generators.

The captured workloads can be used to:

Verify that upgrades of MariaDB behave as expected.
Repeatedly measure effects of configuration changes, which is useful for database tuning.
Investigate why certain scenarios take longer then expected, as a kind of SQL debugging tool.

Prerequisites

Both the capture MaxScale and replay MaxScale servers must use the same linux distribution and CPU architecture. For example, if the capture was taken on an x86_64 RHEL 8 instance, the replay should also happen on an x86_64 RHEL 8 instance. Captured workloads are however usually compatible across different linux distributions that use the same CPU architecture.
The capture MariaDB instance must have binlogging enabled (log-bin=1)

Capture

Quick start

Workload capture can be used without definitions in static configuration files and without a MaxScale restart.

If you have an existing routing service named, e.g., RWS-Router in your configuration you can attach a capture filter to it dynamically:

maxctrl create filter CAPTURE_FLTR wcar
maxctrl link service RWS-Router CAPTURE_FLTR

You can then start a capture with

maxctrl call command wcar start CAPTURE_FLTR <options>

If limiting options were given the capture will stop automatically when one of the limits is triggered. You can also stop the capture at any time with:

maxctrl call command wcar stop CAPTURE_FLTR

See Replay to see how the captured files are used.

When capture is no longer needed you can remove it with:

maxctrl unlink service RWS-Router CAPTURE_FLTR
maxctrl destroy filter CAPTURE_FLTR

File based configuration

Define a capture filter by adding the following configuration object and add it to each service whose traffic is to be captured. The traffic from all services that use the filter will be combined so only use the filter in services that point to the same database cluster.

[CAPTURE_FLTR]
type=filter
module=wcar
capture_duration=1h # Limit capture duration to one hour
capture_size=1Gi    # Limit capture size to 1GiB
start_capture=true  # Start capturing immediately after starting MaxScale

Example configuration file

Here is an example configuration for capturing from a single MariaDB server, where capture starts when MaxScale starts and stops when MaxScale is stopped (start_capture=true). MaxScale listens on port 4006 and connects to MariaDB on port 3306.

[server1]
type=server
address=127.0.0.1
port=3306

[MariaDB-Monitor]
type=monitor
module=mariadbmon
servers=server1
user=maxuser
password=maxpwd

[CAPTURE_FLTR]
type=filter
module=wcar
capture_duration=1h     # Limit capture duration to one hour
capture_size=1Gi        # Limit capture size to 1GiB
start_capture=true      # Start capturing immediately after starting MaxScale

[RWS-Router]
type=service
router=readwritesplit
cluster=MariaDB-Monitor
user=maxuser
password=maxpwd
filters=CAPTURE_FLTR

[RWS-Listener]
type=listener
service=RWS-Router
protocol=MariaDBClient
port=4006

Capturing Traffic

This section explains how capture is done with configuration value start_capture=true.

Two things are needed to replay a workload: the client traffic that's captured by MaxScale and a backup of the database that is used to initialize the replay server. The backup should be taken from the point in time where the capture starts and the simplest way to achieve this is to take a logical backup by doing the following.

Stop MaxScale
Take a backup of the database with mariadb-dump --all-databases --system=all
Start MaxScale

Once MaxScale has been started, the captured traffic will be written to files in/var/lib/maxscale/wcar/<name> where <name> is the name of the filter (CAPTURE_FLTR in the examples).

Each capture will generate a number of files named NAME_YYYY-MM-DD_HHMMSS.SUFFIX where NAME is the capture name (defaults to capture), YYYY-MM-DD is the date and HHMMSS is the time and the SUFFIX is one of .cx, .ex or.tx. For example, a capture started on the 18th of April 2024 at 10:26:11 would generate a file named capture_2024-04-18_102611.cx.

Stopping the Capture

To stop the capture, simply stop MaxScale, or issue the command:

maxctrl call command wcar stop CAPTURE_FLTR

where "CAPTURE_FLTR" is the name given to the filter as in the example configuration above.

To disable capturing altogether, remove the capture filter from the configuration and remove it from all services that it was added to. Restart MaxScale.

If the replay is to take place on another server, the results can be collected easily from /var/lib/maxscale/wcar/ with the following command.

tar -caf captures.tar.gz -C /var/lib/maxscale wcar

Once the capture tarball has been generated, copy it to the replay server. You might then want to delete the directories on the capture server from /var/lib/maxscale/wcar/* to save space (and not copy them again later).

Commands

Each of the commands can be called with the following syntax.

maxctrl call command wcar <command> <filter> [options]

The <filter> is the name of the filter instance. In the example configuration, the value is CAPTURE_FLTR. The [options] is a list of optional arguments that the command might expect.

`start <filter> [options]`

Starts a new capture. Issuing a start command will stop any ongoing capture.

The start command supports optional key-value pairs. If the values are also defined in the configuration file the command line options have priority. The supported keys are:

prefix The prefix added to capture files. The default value is capture.
duration Limit capture to this duration. See also configuration file value 'capture_duration'.
size Limit capture to approximately this many bytes in the file system. See also configuration file value 'capture_size'.

The start command options are not persistent, and only apply to the capture that was thus started.

For example, starting a capture with the below command would create a capture file named Scenario1_2024-04-18_102605.cx and limit the file system usage to approximately 10GiB. If capture_duration was defined in the configuration file it would also be used.

If both duration and size are specified, the one that triggers first, stops the capture.

maxctrl call command wcar start CAPTURE_FLTR prefix=Scenario1 size=10G

Running the same command again, but without size=10G, the capture_size used would be that defined in the configuration file or no limit if there was no such definition.

`stop <filter>`

Stops the currently active capture if one is in progress.

maxctrl call command wcar stop CAPTURE_FLTR

Replay

Installation

Install the required packages on the MaxScale server where the replay is to be done. An additional dependency that must be manually installed is Python, version 3.9 or newer. On most linux distributions a new enough version is available as the default Python interpreter. You may also need to install the development packages for Python, python3-devel on RHEL based systems or python3-dev on Debian based systems, as well as a C++ compiler.

For RHEL 8, Rocky Linux 8 and Alma Linux 8, a newer version of Python must be installed along with the development headers with dnf install python39 python39-devel gcc-c++ and it must be set as the default python implementation with alternatives --set python3 /usr/bin/python3.9.

The replay consists of restoring the database to the point in time where the capture was started. Start by restoring the replay database to this state. Once the database has been restored from the backup, copy the capture files over to the replay MaxScale server.

Preparing the Replay MariaDB Database

Full Restore

Start by restoring the database from the backup to put it at the point in time where the capture was started. The GTID position of the first commit within the capture can be seen in the output of the summary command:

maxplayer summary /path/to/capture.cx

If the captured data has not been transformed to replay format yet, the command will perform the transformation before displaying the summary.

Run maxplayer --help to see the command line options. The help output is also shown at the end of this file.

The replay also requires a user account using which the captured traffic is replayed. This user must have access to all the tables in question. In practice the simplest way to do this for testing is to create the user as follows:

CREATE USER 'maxreplay'@'%' IDENTIFIED BY 'replay-pw';
GRANT ALL ON *.* TO 'maxreplay'@'%';

Restore for read-only Replay

For captures that are intended for read-only Replay, it may not be as important that the servers to be tested against are in the exact GTID the capture server was when capture started. In fact, it may be advantageous that the servers are at the state after the capture finished.

On the other hand, Replay also supports write-only. Following the Full Restore procedure above and then running a write-only Replay prepares the replay server(s) for easily running read-only multiple times. This way of running read-only may, for example, be used when fine tuning server settings.

Replaying the Capture

When replay is first done, the capture files will be transformed in-place. Transform can be run separately as well. Depending on the size and structure of the capture file, Transform can use up to twice the space of the capture.ex file. The files with extension .ex contain most of the captured data (events).

Start by copying the replay file tarball created earlier (captures.tar.gz) to the replay MaxScale server and copy it to a directory of your choice (here called/path/to/capture-dir). Then extract the files.

cd /path/to/capture-dir
tar -xaf captures.tar.gz

After this, replay the workload against the baseline MariaDB setup:

maxplayer replay --user maxreplay --password replay-pw --host <host:port> --output baseline-result.csv /path/to/capture.cx

Once the baseline replay results have been generated, run the replay again but this time against the new MariaDB setup to which the baseline is compared to:

maxplayer replay --user maxreplay --password replay-pw --host <host:port> --output comparison-result.csv /path/to/capture.cx

After both replays have been completed, the results can be post-processed and visualized.

Visualizing

The results of the captured replay must first be post-processed into summaries that the visualization will then use. First, the canonicals.csv file must be generated that is needed in the post-processing:

maxplayer canonicals /path/to/capture.cx > canonicals.csv

After that, the baseline and comparison replay results can be post-processed into summaries using the maxpostprocess command:

maxpostprocess canonicals.csv baseline-result.csv -o baseline-summary.json
maxpostprocess canonicals.csv comparison-result.csv -o comparison-summary.json

The visualization itself is done with the maxvisualize program. The visualization will open up a browser window to show the visualization. If no browser opens up, the visualization URL is also printed into the command line which by default should be http://localhost:8866/.

maxvisualize baseline-summary.json comparison-summary.json

To listen on all network interfaces, use --Voila.ip='0.0.0.0' as the last argument.

maxvisualize baseline-summary.json comparison-summary.json --Voila.ip='0.0.0.0'

Settings

`capture_dir`

Type: path
Default: /var/lib/maxscale/wcar/
Mandatory: No
Dynamic: No

Directory under which capture directories are stored. Each capture directory has the name of the filter. In the examples above the name "CAPTURE_FLTR" was used.

`start_capture`

Type: boolean
Default: false
Mandatory: No
Dynamic: No

Start capture when maxscale starts.

`capture_duration`

Type: duration
Default: 0s
Maximum: Unlimited in MaxScale, 5min in MaxScale Lite.
Mandatory: No
Dynamic: No

Limit capture to this duration. If set to zero there is no limit.

`capture_size`

Type: size
Default: 0
Maximum: Unlimited in MaxScale, 10MB in MaxScale Lite.
Mandatory: No
Dynamic: No

Limit capture to approximately this many bytes in the file system. If set to zero there is no limit.

maxplayer command line options

maxplayer -u user -p pwd --speed 1.5 -i 5s -o baseline.csv capture_2024-09-06_090002.cx --help
Usage: maxplayer [OPTION]... [COMMAND] FILE

Commands: (default: replay)
summary    Show a summary of the capture.
replay     Replay the capture.
convert    Converts the input file (either .cx or .rx) to a replay file (.rx or .csv).
canonicals List the canonical forms of the captured SQL as CSV.
dump-data  Dump capture data as SQL.
show       Show the SQL of one or more events.

Options:
--user          User name for login to the replay server.
-u              This version does not support using the actual user names
                that were used during capture.

--password      Only clear text passwords are supported as of yet.
-p

--host          The address of the replay server in <IP>:<port> format.
-h              E.g. 127.0.0.1:4006

--output        The name of the output file: e.g. baseline.csv.
-o

--report        Periodically report statistics of ongoing operations.
-r              The option takes a duration, such as 10s.

--report-file   The --report option by default writes to stdout.
-R              Provide the name of the file to write to. The file will
                be truncated every time it is written to, allowing for a
                simple status window by running 'watch cat <path-to-file>'
                in a terminal.

--speed         The value is a multiplier. 2.5 is 2.5x speed and 0.5 is half speed.
-s              A value of zero means no limit, or replay as fast as possible.
                A multiplier of 2.5 might not have any effect as the actual time spent
                depends on many factors, such as the captured volume and replay server.

--idle-wait     Relates to playback speed, and can be used together with --speed.
-i              During capture there can be long delays where there is no traffic.
                One hour of no capture traffic would mean replay waits for one hour.
                idle-wait allows to move simulation time forwards when such gaps
                occure. A 'gap' starts when all prior queries have fully executed.
                --idle-wait takes a duration value. A negative value turns the feature off,
                            i.e. the one hour wait would happen.
                --idle-wait 0s means time moves to the event start-time immediately
                            when a gap is detected, i.e., all gaps are skipped over.
                --idle-wait 10s means time moves to the event start-time 10 seconds
                            (wall time) after the gap was detected. Shorter
                            gaps than 10 seconds will thus be fully waited for.
                --idle-wait has a default value of 1 second.
                Examples: 1h, 60m, 3600s, 3600000ms, which all define the same duration.

--query-filter  Options: none, write-only, read-only. Default: none.
-f              Replay can optionally apply only writes or only reads. This option is useful
                once the databases to be tested have been prepared (see full documentation)
                and optionally either a write-only run, or a full replay has been run.
                Now multiple read-only runs against the server(s) are simple as no further
                data syncronization is needed.
                Note that this mode has its limitations as the query results may
                be very different than what they were during capture.

--analyze       Enabling this option will track the server Rows_read statistic for each query.
-A              This will slow down the overall replay time. The query time measurements
                are still valid, but currently this option should only be used when
                it is of real value to know how many rows the server read for each query.

--verbose       Verbose output. The option can be repeated for more verbosity: -vvv
-v

--version       Display the version number and copyrights.
-V

input file:       capture_2024-09-06_090002.cx
-h --help         true
-u --user         user
-p --password     pwd
-H --host         127.1.1.0:3306
-o --output       baseline.csv
-r --report       0ns
-R --report-file
-s --speed        1.5
-i --idle-wait    5s
-f --query-filter none
-A --analyze      false
-v --verbose      0
-V --version      0.2

Limitations

KILL commands do not work correctly during replay and may kill the wrong session (MXS-5056)
COM_STMT_BULK_EXECUTE is not captured (MXS-5057)
COM_STMT_EXECUTE that uses a cursor is replayed without a cursor (MXS-5059)
For MyISAM and Aria tables, this will cause the table level lock to be held for a shorter time.
Execution of a COM_STMT_SEND_LONG_DATA will not work (MXS-5060)
The capture files are not necessarily compatible with different linux distributions and CPU architectures than the original capture server has. Different combinations will require further testing, and once done, this document will be updated.

_{This page is licensed: CC BY-SA / Gnu FDL}

PreviousMaxScale Top Filter NextMaxScale MaxCtrl

Last updated 2 months ago

Was this helpful?

Good evening

hashtagOverview

hashtagPrerequisites

hashtagCapture

hashtagQuick start

hashtagFile based configuration

hashtagExample configuration file

hashtagCapturing Traffic

hashtagStopping the Capture

hashtagCommands

hashtagstart <filter> [options]

hashtagstop <filter>

hashtagReplay

hashtagInstallation

hashtagPreparing the Replay MariaDB Database

hashtagReplaying the Capture

hashtagVisualizing

hashtagSettings

hashtagcapture_dir

hashtagstart_capture

hashtagcapture_duration

hashtagcapture_size

hashtagmaxplayer command line options

hashtagLimitations