The SchemaRouter provides an easy and manageable sharding solution by
building a single logical database server from multiple separate ones. Each
database is shown to the client and queries targeting unique databases are
routed to their respective servers. In addition to providing simple
database-based sharding, the schemarouter also enables cross-node
session variable usage by routing all queries that modify the session to all
nodes.
By default the SchemaRouter assumes that each database and table is only located
on one server. If it finds the same database or table on multiple servers, it
will close the session with the following error:
The exception to this rule are the system tables mysql, information_schema,performance_schema, sys that are never treated as duplicates.
If duplicate tables are expected, use the parameter to controls which
duplicate tables are allowed. To disable the duplicate database detection, useignore_tables_regex=.*.
Schemarouter compares table and database names case-insensitively. This means
that the tables test.t1 and test.T1 are assumed to refer to the same table.
The main limitation of SchemaRouter is that aside from session variable writes
and some specific queries, a query can only target one server. This means that
queries which depend on results from multiple servers give incorrect results.
See for more information.
From 2.3.0 onwards, SchemaRouter is capable of limited table family sharding.
Older versions of MaxScale required that the auth_all_servers parameter was
enabled in order for the schemarouter services to load the authentication data
from all servers instead of just one server. Starting with MaxScale 24.02, the
schemarouter automatically fetches the authentication data from all servers and
joins it together. At the same time, the auth_all_servers parameter has been
deprecated and is ignored if present in the configuration.
If a command modifies the session state by modifying any session or user
variables, the query is routed to all nodes. These statements include SET
statements as well as any other statements that modify the behavior of the
client.
If a client changes the default database after connecting, either with a USE <db> query or a COM_INIT_DB command, the query is routed to all servers
that contain the database. This same logic applies when a client connects with
a default database: the default database is set only on servers that actually
contain it.
This means that all administrative commands, replication related command as
well as certain transaction control statements (XA transaction) are routed to
the first available server in certain cases. To avoid problems, use routing
hints to direct where these statements should go.
Starting with MaxScale 6.4.5, transaction control commands (BEGIN, COMMIT
and ROLLBACK) are routed to all nodes. Older versions of MaxScale routed the
queries to the first available backend. This means that cross-shard
transactions are technically possible but, without external synchronization,
the transactions are not guaranteed to be globally consistent.
LOAD DATA LOCAL INFILE commands are routed to the first available server
that contains the tables listed in the query.
Custom SQL Commands
To check how databases and tables map to servers, execute the special querySHOW SHARDS. The query does not support any modifiers such as LIKE.
The schemarouter will also intercept the SHOW DATABASES command and generate
it based on its internal data. This means that newly created databases will not
show up immediately and will only be visible when the cached data has been
updated.
Database Mapping
The schemarouter maps each of the servers to know where each database and table
is located. As each user has access to a different set of tables and databases,
the result is unique to the username and the set of servers that the service
uses. These results are cached by the schemarouter. The lifetime of the cached
result is controlled by the refresh_interval parameter.
When a server needs to be mapped, the schemarouter will route a query to each of
the servers using the client's credentials. While this query is being executed,
all other sessions that would otherwise share the cached result will wait for
the update to complete. This waiting functionality was added in MaxScale 2.4.19,
older versions did not wait for existing updates to finish and would perform
parallel database mapping queries.
Here is an example configuration of the schemarouter:
The module generates the list of databases based on the servers parameter
using the connecting client's credentials. The user and password parameters
define the credentials that are used to fetch the authentication data from
the database servers. The credentials used only require the same grants as
mentioned in the configuration documentation.
The list of databases is built by sending a SHOW DATABASES query to all the
servers. This requires the user to have at least USAGE and SELECT grants on
the databases that need be sharded.
If you are connecting directly to a database or have different users on some
of the servers, you need to get the authentication data from all the
servers. You can control this with the auth_all_servers parameter. With
this parameter, MariaDB MaxScale forms a union of all the users and their
grants from all the servers. By default, the schemarouter will fetch the
authentication data from all servers.
For example, if two servers have the database shard and the following
rights are granted only on one server, all queries targeting the databaseshard would be routed to the server where the grants were given.
This would in effect allow the user 'john' to only see the database 'shard'
on this server. Take notice that these grants are matched against MariaDB
MaxScale's hostname instead of the client's hostname. Only user
authentication uses the client's hostname and all other grants use MariaDB
MaxScale's hostname.
Router Parameters
List of full table names (e.g. db1.t1) to ignore when checking for duplicate
tables. By default no tables are ignored.
This parameter was once called ignore_databases.
ignore_tables_regex
A
that is matched against database names when checking for duplicate databases.
By default no tables are ignored.
The following configuration ignores duplicate tables in the databases db1 and db2,
and all tables starting with "t" in db3.
This parameter was once called ignore_databases_regex.
max_sescmd_history
This parameter has been moved to
in MaxScale 6.0.
disable_sescmd_history
This parameter has been moved to
in MaxScale 6.0.
refresh_databases
Enable database map refreshing mid-session. These are triggered by a failure to
change the database i.e. USE ... queries. This feature is disabled by default.
Before MaxScale 6.2.0, this parameter did nothing. Starting with the 6.2.0
release of MaxScale this parameter now works again but it is disabled by default
to retain the same behavior as in older releases.
refresh_interval
The minimum interval between database map refreshes in seconds. The default
value is 300 seconds.
The interval is specified as documented . If no explicit unit
is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent
versions a value without a unit may be rejected. Note that since the granularity
of the intervaltimeout is seconds, a timeout specified in milliseconds will be rejected,
even if the duration is longer than a second.
The time how long stale database map entries can be used while an update is in
progress. When a database map entry goes stale, the next connection to be
created will start an update of the database map. While this update is ongoing,
other connections can use the stale entry for up to max_staleness seconds. If
this limit is exceeded and the update still hasn't completed, new connections
will instead block and wait for the update to finish.
This feature was added in MaxScale 23.08.0. Older versions of MaxScale
always waited for the update to complete when the database map entry
went stale.
Table Family Sharding
This functionality was introduced in 2.3.0.
If the same database exists on multiple servers, but the database contains different
tables in each server, SchemaRouter is capable of routing queries to the right server,
depending on which table is being addressed.
As an example, suppose the database db exists on servers server1 and server2, but
that the database on server1 contains the table tbl1 and on server2 contains the
table tbl2. The query SELECT * FROM db.tbl1 will be routed to server1 and the querySELECT * FROM db.tbl2 will be routed to server2. As in the example queries, the table
names must be qualified with the database names for table-level sharding to work.
Specifically, the query series below is not supported.
Router Diagnostics
The router_diagnostics output for a schemarouter service contains the
following fields.
shard_map_hits: Cache hits for the shard map cache.
shard_map_misses: Cache misses for the shard map cache.
In MaxScale 24.02, the queries sescmd_percentage, longest_sescmd_chain,times_sescmd_limit_exceeded, longest_session, shortest_session andaverage_session statistics have been replaced by core service statistics.
Module commands
Read documentation for
details about module commands.
The schemarouter supports the following module commands.
invalidate SERVICE
Invalidates the database map cache of the given service. This can be used to schedule
the updates to the database maps to happen at off-peak hours by configuring a
high value for refresh_interval and invalidating the cache externally.
Clears the database map cache of the given service. This forces new connections
to use a freshly retrieved entry.
If the set of databases and tables in each shard is very large, the update can
take some time. If there are stale cache entries and max_staleness is
configured to be higher than the time it takes to update the database map, the
invalidation will only slow down one client connection that ends up doing the
update. When the cache is cleared completely, all clients will have to wait for
the update to complete. In general, cache invalidation should be preferred over
cache clearing.
Cross-database queries (e.g. SELECT column FROM database1.table UNION select column FROM database2.table) are not properly supported. Such queries are routed either to the
first explicit database in the query, the current database in use or to the first
available database, depending on which succeeds.
Without a default database, queries that do not use fully qualified table
names and which do not modify the session state (e.g. SELECT * FROM t1) will
be routed to the first available server. This includes queries such as explicit
transaction commands (BEGIN, COMMIT, ROLLBACK
is a small tutorial on how
to set up a sharded database.
This page is licensed: CC BY-SA / Gnu FDL