Uses for MariaDB and the Spider Storage Engine

Posted on November 18, 2019 by Anders Karlsson

Spider is a storage engine for MariaDB Platform that allows you to build distributed databases from a standard MariaDB setup. The technology is not per se complicated, although the implementation is. This blog will explain how the Spider storage engine works, what it does and will also show some of the use cases.

MariaDB Storage Engines

Before we look at the Spider storage engine, let’s have a quick look at the Storage Engine concept. A storage engine is the implementation of code that manages the low level of data access in MariaDB. The storage engine handles things such as reading and writing data, row-level locking, if supported, multi versioning and transaction management, among other things.

Storage engines are defined on a table by table bases, and once a table is created and defined to use a particular storage engine (using the ENGINE Table attribute) it will generally be treated just like any other tables. Tables using different engines, once created, are handled as equal and can be joined, have data selected from one into another, etc. Tables may also have the storage engine changed after the table has been created.

Spider From a High Level

What Spider provides, first and foremost, is a way to access a table on one MariaDB Server from another MariaDB Server. The MariaDB Server that holds the actual table data does not have any specific Spider code on it at all, it is a normal MariaDB Server. The MariaDB Server that is configured to access that data then uses the Spider storage engine to access the data on the other server using the usual MariaDB protocol.

As can be seen, Spider is only active on the referencing node, the target node does not need to have Spider installed on it. Creating a “spider table” means that we define a table that contains the same, or a subset of, the columns in the target table and referencing the target server.

Also note that there is no data for these tables on the “spider node” and there is no duplication of data, all data resides on the target node.

Installing the Spider Storage Engine

Spider is included with MariaDB Server and there is also an installation script that installs the Spider engine and also some utilities are supporting objects and it is recommended to install spider using this script. If you have installed MariaDB as an RPM this spider installation script is in /usr/share/mysql and is called install_spider.sql. To run it use the MariaDB mysql command line tool and use the source command, like this.

$ mysql -u root
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 2835
Server version: 10.4.6-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB> source /usr/share/mysql/install_spider.sql

After running the script check if Spider is installed using the SHOW ENGINES command:

MariaDB> SHOW ENGINES;
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| SPIDER | YES | Spider storage engine | YES | YES | NO |
| MRG_MyISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
| Aria | YES | Crash-safe tables with MyISAM heritage. Used for internal temporary tables and privilege tables | NO | NO | NO |
| MyISAM | YES | Non-transactional engine with good performance and small data footprint | NO | NO | NO |
| SEQUENCE | YES | Generated tables filled with sequential values | YES | NO | YES |
| InnoDB | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables | YES | YES | YES |
| PERFORMANCE_SCHEMA | YES | Performance Schema | NO | NO | NO |
| CSV | YES | Stores tables as CSV files | NO | NO | NO |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
9 rows in set (0.001 sec)

We are now ready to start using the Spider engine.

Single table link with Spider

Let’s look at an example, a very basic one. On the “target” server, we create a table. Note that on this server, Spider isn’t necessary, this is only required on the server that accesses remote data, and we will get to that next. So now we create a table on the “target” server, let’s call that “Server 2”:

$ mysql -u root -S /tmp/mariadb2.sock -u root
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 11
Server version: 10.4.8-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> CREATE DATABASE spidertest;
Query OK, 1 row affected (0.001 sec)

MariaDB [(none)]> use spidertest;
Database changed
MariaDB [spidertest]> CREATE TABLE customer(
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(200) NOT NULL,
address VARCHAR(255) NOT NULL);
Query OK, 0 rows affected (0.539 sec)

Following this, let’s insert some data into this table:

MariaDB [spidertest]> INSERT INTO customer VALUES(NULL, 'John Doe', '1 Main Street');
Query OK, 1 row affected (0.309 sec)

MariaDB [spidertest]> INSERT INTO customer VALUES(NULL, 'Bob Smith', '45 Elm Street');
Query OK, 1 row affected (0.092 sec)

MariaDB [spidertest]> INSERT INTO customer VALUES(NULL, 'Jane Jones',
'18 Second Street');
Query OK, 1 row affected (0.094 sec)

I admit that this wasn’t the most exciting experiment since Marie Curie invented Radium, but we are not done yet. What we are going to do now is to access this table from another instance of MariaDB Server. As Spider connects to the remote server as a normal user, we need to create a user and grant it some access to the table we create above, on the same MariaDB Server instance as above:

MariaDB [spidertest]> CREATE USER 'spider'@'192.168.0.11' IDENTIFIED BY 'spider';
Query OK, 0 rows affected (0.236 sec)

MariaDB [spidertest]> GRANT ALL ON spidertest.* TO 'spider'@'192.168.0.11';
Query OK, 0 rows affected (0.238 sec)

MariaDB [spidertest]> GRANT ALL ON mysql.* TO 'spider'@'192.168.0.11';
Query OK, 0 rows affected (0.238 sec)

The next step is to create a SERVER. If you haven’t used Spider you have probably not used the command before, but what it does is that it defines the parameters that are used to connect to another MariaDB Server instance, so the server is defined on the MariaDB Server instance that is to access the table we create above (let’s call that Server1).

r$ mysql -u root -S /tmp/mariadb1.sock -u root
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 12
Server version: 10.4.8-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [none]> CREATE SERVER Server2 FOREIGN DATA WRAPPER mysql
OPTIONS(HOST '192.168.0.11', DATABASE 'spidertest', PORT 10482,
USER 'spider', PASSWORD 'spider');
Query OK, 0 rows affected (0.233 sec)

With this in place, let’s create a link using Spider from Server1 to Server2. Note that we do not have to use all the fields in the target table.

$ mysql -u root -S /tmp/mariadb1.sock -u root
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 33
Server version: 10.4.8-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> DROP DATABASE IF EXISTS spidertest;
Query OK, 0 rows affected, 1 warning (0.000 sec)

MariaDB [(none)]> CREATE DATABASE spidertest;
Query OK, 1 row affected (0.001 sec)

MariaDB [(none)]> use spidertest;
Database changed
MariaDB [spidertest]> CREATE TABLE customer(
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(200) NOT NULL) ENGINE=Spider
COMMENT = 'wrapper "mysql", srv "Server2"';
Query OK, 0 rows affected (0.132 sec)

OK, no errors, then try it by SELECTing some data:

MariaDB [spidertest]> SELECT * FROM customer;
+----+------------+
| id | name |
+----+------------+
| 1 | John Doe |
| 2 | Bob Smith |
| 3 | Jane Jones |
+----+------------+
3 rows in set (0.006 sec)

Yes, this seems to work, still Marie Curie beats us in the excitement department, but we are getting there.

Uses for a Single Table Link

There are quite a few uses for even a single table link such as the one described above. In some cases it can be used to replace replication where it is just a single table that is to be replicated. Maybe you have a single table that is maintained in one database instance such as a customer table or something and then you want a reference to a customer id from a completely different application running in some other MariaDB Server instance.

One needs to understand that there are drawbacks to this kind of setup also, for example can performance be an issue, for every statement that uses a spider table a connection is made to the connected server. Join between a local table and a Spider table can also be slow, but that depends on a few things. In general Spider is pretty performant and is not a bottleneck per se and there is quite a bit of tuning possible.

A third option is when we have a single table, for example a log table, that I don’t want to mix with any other tables. Then a Spider table might be a way to achieve that.

Using Spider for Status Across Multiple Servers
If you have a MariaDB Cluster running, with a Primary and several Secondaries for example, then there is probably a need to see the status of all the servers in the cluster as one unit. Then Spider comes in handy. Here we are to look at how that can be done. We have two servers, moe and homer, that we want to monitor. Let us assume that moe is the “main” server and that we from that box want to look at the global status on both servers.

On homer we create a view for GLOBAL_STATUS which is based on the GLOBAL STATUS table in INFORMATION_SCHEMA, but adds a column with the name of the server and put that in the mysql database.

CREATE OR REPLACE VIEW global_status_homer
AS SElECT 'homer' host, gs.variable_name, gs.variable_value
FROM information_schema.global_status gs;

And then we do a similar thing on moe.

CREATE OR REPLACE VIEW global_status_moe
AS SElECT 'moe' host, gs.variable_name, gs.variable_value
FROM information_schema.global_status gs;

The next step is to create a link from what we consider the master (moe) to the other server (homer). So on moe we execute this command:

CREATE OR REPLACE SERVER homer FOREIGN DATA WRAPPER mysql
OPTIONS(HOST '192.168.0.11', DATABASE 'mysql', PORT 10482, USER 'spider',
PASSWORD 'spider');

With this we are ready to create a view on moe for viewing the status on homer.

CREATE OR REPLACE TABLE global_status_homer(host varchar(2048), variable_name VARCHAR(64), variable_value VARCHAR(64)) ENGINE=Spider
COMMENT='wrapper "mysql", srv "homer"';

On homer we now have one table and one view, one for each server and with similar schema, that represent the status across the cluster. Let’s combine these two into a single VIEW.

CREATE OR REPLACE VIEW global_status_all AS
SELECT host, variable_name, variable_value
FROM global_status_homer
UNION
SELECT host, variable_name, variable_value
FROM global_status_moe;

And then a VIEW that shows the summary status across the cluster.

CREATE OR REPLACE VIEW global_status_total AS
SELECT variable_name, SUM(variable_value) sum, MAX(variable_value) max,
MIN(variable_value) min
FROM global_status_all
GROUP BY variable_name;

Let’s give this a shot now and see how it works:

MariaDB [mysql]> SELECT * FROM global_status_total WHERE variable_name LIKE 'open%';
+--------------------------+------+------+------+
| variable_name | sum | max | min |
+--------------------------+------+------+------+
| OPENED_FILES | 629 | 477 | 152 |
| OPENED_PLUGIN_LIBRARIES | 1 | 1 | 0 |
| OPENED_TABLES | 112 | 75 | 37 |
| OPENED_TABLE_DEFINITIONS | 125 | 95 | 30 |
| OPENED_VIEWS | 85 | 43 | 42 |
| OPEN_FILES | 132 | 76 | 56 |
| OPEN_STREAMS | 0 | 0 | 0 |
| OPEN_TABLES | 77 | 46 | 31 |
| OPEN_TABLE_DEFINITIONS | 83 | 49 | 34 |
+--------------------------+------+------+------+
9 rows in set (0.029 sec)

I think this is useful, although this is a simple example. With more servers than these two it is a lot more useful.

Combining Multiple Server Tables in the General Case

In the above example, we look at how tables with the same structure with similar content can be combined into one single view, using Spider. We look at a specific example there, but in the general case there are uses for this. Let’s say that you have an application that runs in multiple instances, say some ERP application that is used by multiple departments. If you then want to do reporting across the departments from a reporting server, then you can use this model to access all the instances of this application.

The alternative to using Spider is to use multi-source replication, but that requires a lot more data to be stored in the reporting server, so Spider does have some advantages.

Sharding with Spider

Sharding is the most common use case for Spider. The examples I described above are less common, but still useful. In the cases above I mapped a table on one server to a table on another machine. With Spider used for a partitioned table on one server, each partition exists on a separate server. Except for this, there aren’t many differences in practice, although the partitioning use case allows Spider to do some interesting things and Spider has some extra performance enhancing tricks up its sleeve, for this particular use case.

To show how sharding with Spider works, let’s show a very simple example. We are going to show how to set up sharding with two shards only, just to show the principle. Let us use the customer table where we have a total of three servers, two “data” servers with the data for two shards and one server than is the “Spider” server that hold no actual data for the table we are working with, but instead points to data that resides on the other two servers.

Let’s start at the bottom, which is to create the tables we are going to use in MariaDB Servers Server2 and Server3 (this looks similar to the customer table above, but not exactly so). This is executed of both of these two Servers as user root

CREATE DATABASE IF NOT EXISTS spidertest;
CREATE TABLE spidertest.customer(
id INT NOT NULL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
address VARCHAR(255) NOT NULL);

Now we have two shards created, so let’s then create links so we can reach them from MariaDB Server1, so on Server1 execute this SQL statement, replacing my port, host, username and password with something relevant in your case.

CREATE OR REPLACE SERVER Server2 FOREIGN DATA WRAPPER mysql
OPTIONS(HOST '192.168.0.11', DATABASE 'spidertest', PORT 10482,
USER 'spider', PASSWORD 'spider');

CREATE OR REPLACE SERVER Server3 FOREIGN DATA WRAPPER mysql
OPTIONS(HOST '192.168.0.11', DATABASE 'spidertest', PORT 10483,
USER 'spider', PASSWORD 'spider');

Let us then tie it together with a partitioned table, and note that you can use any reasonable partitioning scheme here, I just picked a simple one to make a point.

CREATE TABLE spidertest.customer(id INT NOT NULL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
address VARCHAR(255) NOT NULL) ENGINE=Spider
COMMENT 'wrapper "mysql", table "customer"'
PARTITION BY RANGE(id) (
PARTITION p0 VALUES LESS THAN (1000) COMMENT = 'srv "Server2"',
PARTITION p1 VALUES LESS THAN (2000) COMMENT = 'srv "Server3"');

We can now on Server1 insert some data and see it appear as appropriate on Server2 and Server3.

INSERT INTO customer VALUES(1, 'Larry', 'Main Street 1');
INSERT INTO customer VALUES(2, 'Ed', 'Main Street 1');
INSERT INTO customer VALUES(3, 'Bob', 'Main Street 1');
INSERT INTO customer VALUES(1001, 'Monty', 'Main Street 1');
INSERT INTO customer VALUES(1002, 'David', 'Main Street 1');
INSERT INTO customer VALUES(1003, 'Allan', 'Main Street 1');

Based on our partitioning set up, the three first row goes onto Server2 and the last three to Server3. I would like to point out before I finish this section that the individual shards, as they are common MariaDB instances, without even the Spider engine, so they can be accessed shard by shard by connecting to, in this case, Server2 or Server3.

Uses for Sharding with Spider

The most obvious advantage with sharding is to increase performance when working with large datasets. But with Spider there are other advantages, most coming from the facts that the individual shards are plain MariaDB Servers that can be accessed individually, so the dataset can be looked at shard by shard, without any performance bottleneck, or you can look at it as a whole, using Spider, at the same time.

Conclusion

This blog has presented some uses for the Spider Storage Engine. All useful in their own right. Most documentation around Spider has focused on the sharding use case, so the first two uses might be surprising to some.

Happy SQL’ing
/Karlsson