MariaDB MaxScale plugin development guide

MariaDB MaxScale plugin development guide

This document and the attached example code explain prospective plugin developers the MariaDB MaxScale plugin API and also present and explain some best practices and possible pitfalls in module development. We predict that filters and routers are the module types developers are most likely to work on, so the APIs of these two are discussed in detail.

Introduction

MariaDB MaxScale is designed to be an extensible program. Much, if not most, of the actual processing is done by plugin modules. Plugins receive network data, process it and relay it to its destination. The MaxScale core loads plugins, manages client sessions and threads and, most importantly, offers a selection of functions for the plugins to call upon. This collection of functions is called the MaxScale Public Interface or just MPI for short.

The plugin modules are shared libraries (.so-files) implementing a set of interface functions, the plugin API. Different plugin types have different APIs, although there are similarities. The MPI is a set of C and C++ header files, from which the module code includes the ones required. MariaDB MaxScale is written in C/C++ and the plugin API is in pure C. Although it is possible to write plugins in any language capable of exposing a C interface and dynamically binding to the core program, in this document we assume plugin modules are written in C++.

The RoundRobinRouter is a practical example of a simple router plugin. The RoundRobinRouter is compiled, installed and ran in section 5.1. The source for the router is located in the examples-folder.

Module categories

This section lists all the module types and summarises their core tasks. The modules are listed in the order a client packet would typically travel through. For more information about a particular module type, see the corresponding folder in MaxScale/Documentation/, located in the main MariaDB MaxScale repository.

Protocol modules implement I/O between clients and MaxScale, and between MaxScale and backend servers. Protocol modules read and write to socket descriptors using raw I/O functions provided by the MPI, and implement protocol-specific I/O functions to be used through a common interface. The Protocol module API is defined in protocol.h. Currently, the only implemented database protocol is MySQL.

Authenticator modules retrieve user account information from the backend databases, store it and use it to authenticate connecting clients. MariaDB MaxScale includes authenticators for MySQL (normal and GSSApi). The authenticator API is defined in authenticator.h.

Filter modules process data from clients before routing. A data buffer may travel through multiple filters before arriving in a router. For a data buffer going from a backend to the client, the router receives it first and the filters receive it in reverse order. MaxScale includes a healthly selection of filters ranging from logging, overwriting query data and caching. The filter API is defined in filter.h.

Router modules route packets from the last filter in the filter chain to backends and reply data from backends to the last filter. The routing decisions may be based on a variety of conditions; typically packet contents and backend status are the most significant factors. Routers are often used for load balancing, dividing clients and even individual queries between backends. Routers use protocol functions to write to backends, making them somewhat protocol-agnostic. The router API is defined in router.h.

Monitor modules do not process data flowing through MariaDB MaxScale, but support the other modules in their operation by updating the status of the backend servers. Monitors are ran in their own threads to minimize interference to the worker threads. They periodically connect to all their assigned backends, query their status and write the results in global structs. The monitor API is defined in monitor.h.

Common definitions and headers

Generally, most type definitions, macros and functions exposed by the MPI to be used by modules are prefixed with MXS. This should avoid name collisions in the case a module includes many symbols from the MPI.

Every compilation unit in a module should begin with #define MXS_MODULE_NAME "<name>". This definition will be used by log macros for clarity, prepending <name> to every log message. Next, the module should #include <maxscale/cppdefs.h> (for C++) or #include <maxscale/cdefs.h> (for C). These headers contain compilation environment dependent definitions and global constants, and include some generally useful headers. Including one of them first in every source file enables later global redefinitions across all MaxScale modules. If your module is composed of multiple source files, the above should be placed to a common header file included in the beginning of the source files. The file with the module API definition should also include the header for the module type, e.g. filter.h.

Other common MPI header files required by most modules are listed in the table below.

Header Contents
alloc.h Malloc, calloc etc. replacements
buffer.h Packet buffer management
config.h Configuration settings
dcb.h I/O using descriptor control blocks
debug.h Debugging macros
modinfo.h Module information structure
server.h Backend server information
service.h Service definition
session.h Client session definition
logmanager.h Logging macros and functions

Module information container

A module must implement the MXS_CREATE_MODULE()-function, which returns a pointer to a MXS_MODULE-structure. This function is called by the module loader during program startup. MXS_MODULE (type defined in modinfo.h) contains function pointers to further module entrypoints, miscellaneous information about the module and the configuration parameters accepted by the module. This function must be exported without C++ name mangling, so in C++ code it should be defined extern "C".

The information container describes the module in general and is constructed once during program excecution. A module may have multiple instances with different values for configuration parameters. For example, a filter module can be used with two different configurations in different services (or even in the same service). In this case the loader uses the same module information container for both but creates two module instances.

The MariaDB MaxScale configuration file maxscale.cnf is parsed by the core. The core also checks that all the defined parameters are of the correct type for the module. For this, the MXS_MODULE-structure includes a list of parameters accepted by the module, defining parameter names, types and default values. In the actual module code, parameter values should be extracted using functions defined in config.h.

Module API

Overview

This section explains some general concepts encountered when implementing a module API. For more detailed information, see the module specific subsection, header files or the doxygen documentation.

Modules with configuration data define an INSTANCE object, which is created by the module code in a createInstance-function or equivalent. The instance creation function is called during MaxScale startup, usually when creating services. MaxScale core holds the module instance data in the SERVICE-structure (or other higher level construct) and gives it as a parameter when calling functions from the module in question. The instance structure should contain all non-client-specific information required by the functions of the module. The core does not know what the object contains (since it is defined by the module itself), nor will it modify the pointer or the referenced object in any way.

Modules dealing with client-specific data require a SESSION object for every client. As with the instance data, the definition of the module session structure is up to the module writer and MaxScale treats it as an opaque type. Usually the session contains status indicators and any resources required by the client. MaxScale core has its own MXS_SESSION object, which tracks a variety of client related information. The MXS_SESSION is given as a parameter to module-specific session creation functions and is required for several typical operations such as connecting to backends.

Descriptor control blocks (DCB), are generalized I/O descriptor types. DCBs store the file descriptor, state, remote address, username, session, and other data. DCBs are created whenever a new socket is created. Typically this happens when a new client connects or MaxScale connects the client session to backend servers. The module writer should use DCB handling functions provided by the MPI to manage connections instead of calling general networking libraries. This ensures that I/O is handled asynchronously by epoll. In general, module code should avoid blocking I/O, sleep, yield or other potentially costly operations, as the same thread is typically used for many client sessions.

Network data such as client queries and backend replies are held in a buffer container called GWBUF. Multiple GWBUFs can form a linked list with type information and properties in each GWBUF-node. Each node includes a pointer to a reference counted shared buffer (SHARED_BUF), which finally points to a slice of the actual data. In effect, multiple GWBUF-chains can share some data while keeping some parts private. This construction is meant to minimize the need for data copying and makes it easy to append more data to partially received data packets. Plugin writers should use the MPI to manipulate GWBUFs. For more information on the GWBUF, see Filter and Router.

General module management

int process_init()
void process_finish()
int thread_init()
void thread_finish()

These four functions are present in all MXS_MODULE structs and are not part of the API of any individual module type. process_init and process_finish are called by the module loader right after loading a module and just before MaxScale terminates, respectively. Usually, these can be set to null in MXS_MODULE unless the module needs some general initializations before creating any instances. thread_init and thread_finish are thread-specific equivalents.

void diagnostics(INSTANCE *instance, DCB *dcb)

A diagnostics printing routine is present in nearly all module types, although with varying signatures. This entrypoint should print various statistics and status information about the module instance instance in string form. The target of the printing is the given DCB, and printing should be implemented by calling dcb_printf.

Protocol

int32_t read(struct dcb *)
int32_t write(struct dcb *, GWBUF *)
int32_t write_ready(struct dcb *)
int32_t error(struct dcb *)
int32_t hangup(struct dcb *)
int32_t accept(struct dcb *)
int32_t connect(struct dcb*, struct server*, MXS_SESSION*)
int32_t close(struct dcb *)
int32_t listen(struct dcb *, char *)
int32_t auth(struct dcb*, struct server*, MXS_SESSION*, GWBUF*)
int32_t session(struct dcb *, void *)
char auth_default()
int32_t connlimit(struct dcb *, int limit)

Protocol modules are laborous to implement due to their low level nature. Each DCB maintains pointers to the correct protocol functions to be used with it, allowing the DCB to be used in a protocol-independent manner.

read, write_ready, error and hangup are epoll handlers for their respective events. write implements writing and is usually called in a router module. accept is a listener socker handler. connect is used during session creation when connecting to backend servers. listen creates a listener socket. close closes a DCB created by accept, connect or listen.

In the ideal case modules other than the protocol modules themselves should not be protocol-specific. This is currently difficult to achieve, since many actions in the modules are dependent on protocol-speficic details. In the future, protocol modules may be expanded to implement a generic query parsing and information API, allowing filters and routers to be used with different SQL variants.

Authenticator

void* initialize(char **options)
void* create(void* instance)
int extract(struct dcb *, GWBUF *)
bool connectssl(struct dcb *)
int authenticate(struct dcb *)
void free(struct dcb *)
void destroy(void *)
int loadusers(struct servlistener *)
void diagnostic(struct dcb*, struct servlistener *)
int reauthenticate(struct dcb *, const char *user, uint8_t *token,
                   size_t token_len, uint8_t *scramble, size_t scramble_len,
                   uint8_t *output, size_t output_len);

Authenticators must communicate with the client or the backends and implement authentication. The authenticators can be divided to client and backend modules, although the two types are linked and must be used together. Authenticators are also dependent on the protocol modules.

Filter and Router

Filter and router APIs are nearly identical and are presented together. Since these are the modules most likely to be implemented by plugin developers, their APIs are discussed in more detail.

INSTANCE* createInstance(SERVICE* service, char** options)
void destroyInstance(INSTANCE* instance)

createInstance should read the options and initialize an instance object for use with service. Often, simply saving the configuration values to fields is enough. destroyInstance is called when the service using the module is deallocated. It should free any resources claimed by the instance. All sessions created by this instance should be closed before calling the destructor.

SESSION* newSession(INSTANCE* instance, MXS_SESSION* mxs_session, SERVICE* service)
void closeSession(INSTANCE* instance, SESSION* session)
void freeSession(INSTANCE* instance, SESSION* session)

These functions manage sessions. newSession should allocate a router or filter session attached to the client session represented by mxs_session. MaxScale will pass the returned pointer to all the API entrypoints that process user data for the particular client. closeSession should close connections the session has opened and release any resources specific to the served client. The SESSION structure allocated in newSession should not be deallocated by closeSession but in freeSession. These two are called in succession by the core.

int routeQuery(INSTANCE *instance, SESSION session, GWBUF* queue) void
clientReply(INSTANCE* instance, SESSION session, GWBUF* queue, const mxs::ReplyRoute& down, const mxs::Reply& reply)
uint64_t getCapabilities(INSTANCE* instance)

routeQuery is called for client requests which should be routed to backends, and clientReply for backend reply packets which should be routed to the client. For some modules, MaxScale itself is the backend. For filters, these can be NULL, in which case the filter will be skipped for that packet type.

routeQuery is often the most complicated function in a router, as it implements the routing logic. It typically considers the client request queue, the router settings in instance and the session state in session when making a routing decision. For filters aswell, routeQuery typically implements the main logic, although the routing target is constant. For router modules, routeQuery should send data forward with dcb->func.write(). Filters should directly call routeQuery for the next filter or router in the chain.

clientReply processes data flowing from backend back to client. For routers, this function is often much simpler than routeQuery, since there is only one client to route to. Depending on the router, some packets may not be routed to the client. For example, if a client query was routed to multiple backends, MaxScale will receive multiple replies while the client only expects one. Routers should pass the reply packet to the last filter in the chain (reversed order) using the function mxs_route_reply. Filters should call the clientReply of the previous filter in the chain. There is no need for filters to worry about being the first filter in the chain, as this is handled transparently by the session creation routine.

Application data is not always received in complete packets from the network stack. How partial packets are handled by the receiving protocol module depends on the attached filters and the router, communicated by their getCapabilities-functions. getCapabilities should return a bitfield resulting from ORring the individual capabilities. routing.hh lists the allowed capability flags.

If a router or filter sets no capabilities, routeQuery or clientReply may be called to route partial packets. If the routing logic does not require any information on the contents of the packets or even tracking the number of packets, this may be fine. For many cases though, receiving a data packet in a complete GWBUF chain or in one contiguos GWBUF is required. The former can be requested by getCapabilities returning RCAP_TYPE_STMT, the latter by RCAP_TYPE_CONTIGUOUS. Separate settings exist for queries and replies. For replies, an additional value, RCAP_TYPE_RESULTSET_OUTPUT is defined. This requests the protocol module to gather partial results into one result set. Enforcing complete packets will delay processing, since the protocol module will have to wait for the entire data packet to arrive before sending it down the processing chain.

bool handleError(INSTANCE* instance, SESSION* session, GWBUF* errmsgbuf, mxs::Endpoint* problem, const mxs::Reply& reply);

This router-only entrypoint is called if a network error occurs in one of the backend server connections in use by the session. When the entrypoint is called, the router should try to continue the session if possible. If the session can continue operating normally, the function should return true. If the router cannot continue routing queries, for example due to a complete cluster outage, the function should return false which will cause the whole session to close.

Monitor

MONITOR* startMonitor(MXS_MONITOR *monitor, const MXS_CONFIG_PARAMETER *params)
void stopMonitor(MXS_MONITOR *monitor)
void diagnostics(DCB *, const MXS_MONITOR *)

Monitor modules typically run a repeated monitor routine with a used defined interval. The MXS_MONITOR is a standard monitor definition used for all monitors and contains a void pointer for storing module specific data. startMonitor should create a new thread for itself using functions in the MPI and have it regularly run a monitor loop. In the beginning of every monitor loop, the monitor should lock the SERVER-structures of its servers. This prevents any administrative action from interfering with the monitor during its pass.

Compiling, installing and running

The requirements for compiling a module are: The public headers (MPI) A compatible compiler, typically GCC * Libraries required by the public headers

Some of the public header files themselves include headers from other libraries. These libraries need to be installed and it may be required to point out their location to gcc. Some of the more commonly required libraries are: * MySQL Connector-C, used by the MySQL protocol module * pcre2 regular expressions (libpcre2-dev), used for example by the header modutil.h

After all dependencies are accounted for, the module should compile with a command similar to

gcc -I /usr/local/include/mariadb -shared -fPIC -g -o libmymodule.so mymodule.cpp

Large modules composed of several source files and using additional libraries may require a more complicated compilation scheme, but that is outside the scope of this document. The result of compiling a plugin should be a single shared library file.

The compiled .so-file needs to be copied to the MaxScale library folder, which is /usr/local/lib/maxscale by default. MaxScale expects the filename to be lib<name>.so, where <name> must match the module name given in the configuration file.

Hands-on example: RoundRobinRouter

In this example, the RoundRobinRouter is compiled, installed and tested. The software environment this section was written and tested is listed below. Any recent Linux setup should be applicaple.

  • Linux Mint 18
  • gcc 5.4.0, glibc 2.23
  • MariaDB MaxScale 2.1.0 debug build (binaries in usr/local/maxscale, modules in /usr/local/lib/maxscale)
  • MariaDB Connector-C 2.3.2 (installed to /usr/local/lib/mariadb, headers in /usr/local/include/mariadb)
  • roundrobinrouter.cpp in the current directory
  • MaxScale plugin development headers (in usr/include/maxscale)

Step 1 Compile RoundRobinRouter with $gcc -I /usr/local/include/mariadb -shared -fPIC -g -o libroundrobinrouter.so roundrobinrouter.cpp. Assuming all headers were found, the shared library libroundrobinrouter.so is produced.

Step 2 Copy the compiled module to the MaxScale module directory: $sudo cp libroundrobinrouter.so /usr/local/lib/maxscale.

Step 3 Modify the MaxScale configuration file to use the RoundRobinRouter as a router. Example service and listener definitions are below. The servers and write_backend-lines should be configured according to the actual backend configuration.

[RR-Service]
type=service
router=roundrobinrouter
servers=LocalPrimary1,LocalReplica1,LocalReplica2
user=maxscale
password=maxscale
filters=MyLogFilter1
max_backends=10
write_backend=LocalPrimary1
print_on_routing=true
dummy_setting=two

[RR-Listener]
type=listener
service=RR-Service
protocol=MariaDBClient
port=4009

Step 4 Start MaxScale: $ maxscale -d. Output:

MariaDB Corporation MaxScale 2.1.0  Mon Feb 20 17:22:18 2017
------------------------------------------------------
Info : MaxScale will be run in the terminal process.
    See the log from the following log files :

Configuration file : /etc/maxscale.cnf
Log directory      : /var/log/maxscale
Data directory     : /var/lib/maxscale
Module directory   : /usr/local/lib/maxscale
Service cache      : /var/cache/maxscale

Step 5 Test with a MySQL client. The RoundRobinRouter has been tested with both a command line and a GUI client. With DEBUG_RRROUTER defined and print_on_routing enabled, the /var/log/maxscale/maxscale.log file will report nearly every action taken by the router.

2017-02-21 10:37:23   notice : [RoundRobinRouter] Creating instance.
2017-02-21 10:37:23   notice : [RoundRobinRouter] Settings read:
2017-02-21 10:37:23   notice : [RoundRobinRouter] 'max_backends': 10
2017-02-21 10:37:23   notice : [RoundRobinRouter] 'write_backend': 0xf0ce70
2017-02-21 10:37:23   notice : [RoundRobinRouter] 'print_on_routing': 1
2017-02-21 10:37:23   notice : [RoundRobinRouter] 'dummy_setting': 2
.
.
.
2017-02-21 10:37:37   notice : [RoundRobinRouter] Session with 4 connections created.
2017-02-21 10:37:37   notice : [RoundRobinRouter] QUERY: SHOW VARIABLES WHERE Variable_name in ('max_allowed_packet', 'system_time_zone', 'time_zone', 'sql_mode')
2017-02-21 10:37:37   notice : [RoundRobinRouter] Routing statement of length 110u  to backend 'LocalPrimary1'.
2017-02-21 10:37:37   notice : [RoundRobinRouter] Replied to client.
2017-02-21 10:37:37   notice : [RoundRobinRouter] QUERY: set session autocommit=1,sql_mode='NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES'
2017-02-21 10:37:37   notice : [RoundRobinRouter] Routing statement of length 103u to 4 backends.
2017-02-21 10:37:37   notice : [RoundRobinRouter] Replied to client.
2017-02-21 10:37:37   notice : [RoundRobinRouter] QUERY: SET @ApplicationName='DBeaver 3.8.5 - Main'
2017-02-21 10:37:37   notice : [RoundRobinRouter] Routing statement of length 48u to 4 backends.
2017-02-21 10:37:37   notice : [RoundRobinRouter] Replied to client.
2017-02-21 10:37:37   notice : [RoundRobinRouter] QUERY: select @@lower_case_table_names
2017-02-21 10:37:37   notice : [RoundRobinRouter] Routing statement of length 36u  to backend 'LocalReplica1'.
2017-02-21 10:37:37   notice : [RoundRobinRouter] Replied to client.

Step 5 Connect with MaxCtrl, print diagnostics and call a custom command.

$ maxctrl
maxctrl show service RR-Service
┌─────────────────────┬────────────────────────────────────────────┐
│ Service             │ RR-Service                                 │
├─────────────────────┼────────────────────────────────────────────┤
│ Router              │ roundrobinrouter                           │
├─────────────────────┼────────────────────────────────────────────┤
│ State               │ Started                                    │
├─────────────────────┼────────────────────────────────────────────┤
│ Started At          │ Tue Apr 28 08:45:19 2020                   │
├─────────────────────┼────────────────────────────────────────────┤
│ Current Connections │ 0                                          │
├─────────────────────┼────────────────────────────────────────────┤
│ Total Connections   │ 0                                          │
├─────────────────────┼────────────────────────────────────────────┤
│ Max Connections     │ 0                                          │
├─────────────────────┼────────────────────────────────────────────┤
│ Cluster             │                                            │
├─────────────────────┼────────────────────────────────────────────┤
│ Servers             │ Server1                                    │
├─────────────────────┼────────────────────────────────────────────┤
│ Services            │                                            │
├─────────────────────┼────────────────────────────────────────────┤
│ Filters             │                                            │
├─────────────────────┼────────────────────────────────────────────┤
│ Parameters          │ {                                          │
│                     │     "router_options": null,                │
│                     │     "targets": null,                       │
│                     │     "user": "maxskysql",                   │
│                     │     "password": "*****",                   │
│                     │     "enable_root_user": false,             │
│                     │     "max_connections": 0,                  │
│                     │     "connection_timeout": 0,               │
│                     │     "net_write_timeout": 0,                │
│                     │     "auth_all_servers": false,             │
│                     │     "strip_db_esc": true,                  │
│                     │     "localhost_match_wildcard_host": true, │
│                     │     "version_string": null,                │
│                     │     "log_auth_warnings": true,             │
│                     │     "session_track_trx_state": false,      │
│                     │     "retain_last_statements": -1,          │
│                     │     "session_trace": false,                │
│                     │     "cluster": null,                       │
│                     │     "rank": "primary",                     │
│                     │     "connection_keepalive": 300,           │
│                     │     "connection_init_sql_file": null,      │
│                     │     "max_backends": 0,                     │
│                     │     "print_on_routing": false,             │
│                     │     "write_backend": null,                 │
│                     │     "dummy_setting": "the_answer"          │
│                     │ }                                          │
├─────────────────────┼────────────────────────────────────────────┤
│ Router Diagnostics  │ {                                          │
│                     │     "queries_ok": 0,                       │
│                     │     "queries_failed": 0,                   │
│                     │     "replies": 0                           │
│                     │ }                                          │
└─────────────────────┴────────────────────────────────────────────┘
maxctrl

MaxScale> call command roundrobinrouter test_command "one" 0

The result of the test_command "one" 0 is printed to the terminal MaxScale is running in:

RoundRobinRouter wishes the Admin a good day.
The module got 2 arguments.
Argument 0: type 'string' value 'one'
Argument 1: type 'boolean' value 'false'

Summary and conclusion

Plugins offer a way to extend MariaDB MaxScale whenever the standard modules are found insufficient. The plugins need only implement a set API, can be independently compiled and installation is simply a file copy with some configuration file modifications.

Out of the different plugin types, filters are the easiest to implement. They work independently and have few requirements. Protocol and authenticator modules require indepth knowledge of the database protocol they implement. Router module complexity depends on the routing logic requirements.

The provided RoundRobinRouter example code should serve as a valid starting point for both filters and routers. Studying the MaxScale Public Interface headers to get a general idea of what services the core provides for plugins, is also highly recommeded.

Lastly, MariaDB MaxScale is an open-source project, so code contributions can be accepted if they fulfill the requirements.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.