Spider Storage Engine Core Concepts

A typical Spider deployment has a shared-nothing clustered architecture. The system works with any inexpensive hardware, and with a minimum of specific requirements for hardware or software. It consists of a set of computers, with one or more MariaDB processes known as nodes.

The nodes that store the data will be designed as Backend Nodes, and can be any MariaDB, MySQL, Oracle server instances using any storage engine available inside the backend.

The Spider Proxy Nodes are instances running at least MariaDB 10. Spider Proxy Nodes are used to declare per table attachment to the backend nodes. In addition Spider Proxy Nodes can be setup to enable the tables to be split and mirrored to multiple Backend Nodes.

Spider Common Usage

Spider3 Spider4

In the default high availability setup #Spider Nodes# produce SQL errors when a backend server is not responding. Per table monitoring can be setup to enable availability in case of unresponsive backends monotoring_bg_kind=1 or monotoring_bg_kind=2. The Monitoring Spider Nodes will be inter-connected with usage of the system table mysql.link_mon_servers to manage network partitioning. Better known as split brain, an even number of Spider Monitor Nodes should be setup to allow a consensus based on the majority. Rather a single separated shared Monitoring Node instance or a minimum set of 3 Spider Nodes. More information can be found here.

MariaDB starting with 10.7.5

Spider's high availability feature has been deprecated (MDEV-28479), and will be deleted. Please use other high availability solutions like replication or galera-cluster.

Spider Storage Engine Federation

Spider is a pluggable Storage Engine, acting as a proxy between the optimizer and the remote backends. When the optimizer requests multiple calls to the storage engine, Spider enforces consistency using the 2 phase commit protocol to the backends and by creating transactions on the backends to preserve atomic operations for a single SQL execution. Preserving atomic operation during execution is used at multiple levels in the architecture. For the regular optimizer plan, it refers to multiple split reads and for concurrent partition scans, it will refer to semi transactions.

Costly queries can be more efficient when it is possible to fully push down part of the execution plan on each backend and reduce the result afterwards. Spider enables such execution with some direct execution shortcuts.

Spider1

Spider Threading Model

Spider uses the per partitions and per table model to concurrently access the remote backend nodes. For memory workload that property can be used to define multiple partitions on a single remote backend node to better adapt the concurrency to available CPUs in the hardware.

Spider2

Spider maintains an internal dictionary of Table and Index statistics based on separated threads. The statistics are pulled per default on a time line basis and refer to crd for cardinality and sts for table status.

Spider Memory Model

Spider stores resultsets into memory, but spider_quick_mode=3 stores resultsets into internal temporary tables if the resultsets are larger than quick_table_size.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.