Evaluation Model for MariaDB Xpand

Overview

This section describes how a query is evaluated in the database. In Xpand we slice the data across nodes and the query is sent to the data. This is one of the fundamental principles of the database that allows it to scale almost linearly as more nodes are added.

For concepts about how the data is distributed, please refer to "Data Distribution with MariaDB Xpand" as this page will assume an understanding of those concepts. The primary concepts to remember are that tables and indices are split across nodes, and that each of them has its own distribution that allows us to know, given the lead column(s), precisely where the data lives.

Discussion

Let's go over the concepts that are involved in distributed evaluation and parallel processing of Queries.

Query Lifetime

User queries are distributed across nodes in Xpand. Here is how a query is processed:

  1. The query is associated with a session on the node where it arrives, and that node orchestrates the concurrency control for it.

  2. After parsing, the query goes through the query optimizer and planner where an optimal plan is chosen for it based on the statistics.

  3. The plan is then compiled by the compiler, which breaks it into smaller query fragments and runs code optimizations.

  4. The compiled query fragments are then run in a distributed way across nodes, and the results are returned to the node where the query started and back to the user.

Using Pipeline and Concurrent Parallelism

Xpand uses both concurrent parallelism and pipeline parallelism for massively parallel computation of your queries. Simpler queries like point selects and inserts usually only use a single core on a node for its evaluation. As more nodes are added, more cores and memory are available, allowing Xpand to handle more simple queries concurrently.

More complex analytic queries, such as those involving joins and aggregates, take advantage of both pipeline and concurrent parallelism, using cores from multiple nodes and multiple cores on a single node to evaluate queries fast. Take a look at the following examples how queries scale to better understand the parallelism.

Concurrent parallelism is between slices, each of which may be on one or more nodes.

Pipeline parallelism occurs among rows in a single query. Each row in an analytic query might involve three hops for a two-way join. The latency added to one row is the latency of three hops, so set-up time is negligible. The other possible limit is bandwidth. As previously mentioned, Xpand has independent distribution for each table and index, so we only forward rows to the correct nodes.

How Do You Scale Evaluation in a Distributed System?

This table summarizes the different elements of a scalable distributed database as well as what it takes to build one:

Throughput

Speedup

Concurrent Writes

Logical Requirement

Work increases linearly with

  • # of Queries

Analytic query speed increases linearly with

  • # of Nodes

Concurrency Overhead does not increase with

  • # of writes

System Characteristic

For evaluation of one query

  • # of messages (and work) don't increase with # nodes

For evaluation of one query

  • Multiple nodes should process it in parallel

  • Multiple cores within a node should process it in parallel

  • Every new node should add more memory to hold ever larger active working set

For evaluation of one query

  • Only one node owns each part of the data and writes to it

  • System is able to take fine-grained row level locks

  • Read and Write interference is remove through MVCC

Why Systems Fail

Many databases use sharding-like approach to slice their data

  • They co-locate indexes with base table

  • Then index lookups become broadcasts leading to the following effects:

    • Nodes read broadcast messages to look up information

    • They get data from disk

    • They realize they don't have matching data

    • This is wasted work

Many databases do not use Massively Parallel Processing

  • Do not use more than one node to evaluate the query.

  • They might push down filters when fetching data from other nodes, but that is minimal.

  • Most work is coordinated or completed by a single node

Many databases used Shared Data architecture where

  • Multiple nodes pull data from same storage node

  • Multiple nodes need to write to same data causing ping-pong of hot data

  • Database nodes take coarse-grained locks (DB-Level)

Examples of Limited Design

Co-location of indexes leading to broadcasts

  • MySQL Server

  • MongoDB

  • Sql Sharding

Single Node Query Processing leading to limited query speedup

  • Oracle RAC

  • NuoDB

  • MongoDB

Shared data leading to ping-pong effect on hot data

  • Oracle RAC

  • NuoDB

Coarse-grained locking leading to limited write concurrency

  • MongoDB

Why Xpand Shines

Xpand has

  • Independent distribution for tables and indexes

  • Queries involving any key use unicast messages

  • Results in near linear scaling of queries

Xpand has

  • Massively parallel processing

  • Multiple cores within and across nodes work in parallel to speed up analytic queries

Xpand has

  • Shared nothing architecture

  • Fine-grained row-level locking

  • MVCC to remove read-write interference

Conclusion

For simple and complex queries, Xpand is able to scale almost linearly due to our query evaluation approach. We also have seen how most of our competitors hit various bottlenecks due to the design choices they have made.