Spider Storage Engine Overview
An introduction to the Spider storage engine, which provides built-in sharding by linking to tables on remote MariaDB servers, supporting partitioning and XA transactions.
About

The Spider storage engine is a storage engine with built-in sharding features. It supports partitioning and xa transactions, and allows tables of different MariaDB instances to be handled as if they were on the same instance. It refers to one possible implementation of ISO/IEC 9075-9:2008 SQL/MED.
When a table is created with the Spider storage engine, the table links to the table on a remote server. The remote table can be of any storage engine. The table link is concretely achieved by the establishment of the connection from a local MariaDB server to a remote MariaDB server. The link is shared for all tables that are part of a the same transaction.
The Spider documentation on the MariaDB documentation is currently incomplete. See the Spider website for more:, as well as the spider-1.0-doc and spider-2.0-doc repositories.
Spider Versions in MariaDB
Usage
Basic Usage
To create a table in the Spider storage engine format, the COMMENT and/or CONNECTION clauses of the CREATE TABLE statement are used to pass connection information about the remote server.
For example, the following table exists on a remote server (in this example, the remote node was created with the MySQL Sandbox tool, an easy way to test with multiple installations)::
On the local server, a Spider table can be created as follows:
Records can now be inserted on the local server, and they are stored on the remote server:
MariaDB starting with 10.8.1
Alternative to specifying the data node information in the COMMENT, certain information (server, database, table) can also be specified using Table Options, like so:
Further Examples
Preparing 10M record table using the sysbench utility
Make a first read only benchmark to check the initial single node performance.
Define an easy way to access the nodes from the MariaDB or MySQL client.
Create the empty tables to hold the data and repeat for all available backend nodes.
Federation Setup


Without connection pool or MariaDB thread pool, HaProxy and Spider have been protecting the tcp socket overflow without specific TCP tuning. In reality with a well tuned TCP stack or thread pool the curve should not decrease so abruptly to 0. Refer to the MariaDB Thread Pool to explore this feature.
Sharding Setup

Create the spider table on the Spider Node
Copy the data from the original sysbench table to the spider table
We observe a common issue with partitioning is a non uniform distribution of data between the backends. based on the partition key hashing algorithm.
Rerun the Benchmark with less queries
The response time decreases to 0.04. This is expected because the query latency is increased from multiple network round trips and condition push down is not implemented yet. Sysbench doing a lot of range queries. Just consider for now that this range query can be a badly optimized query.
We need to increase the concurrency to get better throughput.

Background Setup
We have no background search available in MariaDB. It won't be available before MariaDB 10.2, but the next table definition mainly enables improving the performance of a single complex query plan with background search that can be found via the upstream spiral binaries MariaDB branch.
We have 4 cores per backend and 2 backends .
On backend1
On backend2
On Spider Node
Now test the following query :

High Availability Setup
MariaDB starting with 10.7.5
Spider's high availability feature has been deprecated (MDEV-28479), and are deleted. Please use other high availability solutions like replication or galera-cluster.

What is happening if we stop one backend?
Let's fix this with spider monitoring. Note that msi is the list of spider nodes @@server_id variable participating in the quorum.
Monitoring should be setup between Spider nodes participating in the cluster. We only have one Spider Node and spider_link_mon_servers represent the inter-connection of all Spider nodes in our setup.
This simple setup does not bring HA in case the Spider Node is not available. In a production setup the number of Spider Nodes in the spider_link_mon_servers table should be at least 3 to get a majority consensus.
Checking the state of the nodes:

No change has been made to cluster, so let's create a divergence:
Reintroducing the failed backend1 in the cluster:
This page is licensed: CC BY-SA / Gnu FDL
Last updated
Was this helpful?

