Scaling Aggregates with MariaDB Xpand

Overview

Xpand uses concurrent parallelism, pipeline parallelism, and massively parallel processing (MPP) strategies to scale aggregates:

  • Xpand uses concurrent parallelism to execute the same fragment of a query on different slices in parallel. A different node or CPU core operates on each slice.

  • Xpand uses pipeline parallelism to execute different fragments of a query on different slices in parallel. A different node or CPU core executes each fragment.

  • Xpand uses massively parallel processing (MPP) strategies to coordinate query execution and minimize data transmission between nodes.

  • For some aggregate operations, Xpand performs the aggregation in a distributed manner. At the end, the partial aggregates from each node are combined.

Compatibility

Information provided here applies to:

  • MariaDB Xpand 5.3

  • MariaDB Xpand 6.0

  • MariaDB Xpand 6.1

Example Tables

The following example tables will be used to show how Xpand evaluates queries in the following sections:

CREATE TABLE test.bundler (
   id INT PRIMARY KEY AUTO_INCREMENT,
   name CHAR(60) DEFAULT NULL
);

CREATE TABLE test.donation (
   id INT PRIMARY KEY AUTO_INCREMENT,
   bundler_id INT,
   amount DOUBLE,
   KEY bundler_key (bundler_id, amount)
);

Since the first table has a primary key and no secondary index, Xpand creates 1 representation. Since the second table has a primary key and a secondary index, Xpand creates 2 representations:

Representation

Index

Table

Distribution Key

_id_primary_bundler

Primary Key

test.bundler

id

_id_primary_donation

Primary Key

test.donation

id

_bundler_key_donation

Key

test.donation

bundler_id

Scaling Aggregates

Scaling aggregates requires aggregating in parallel on multiple nodes. This leaves the final work of combining the partial aggregation from each node to a simple task on a much smaller data size.

Example Query

For example, consider the following SELECT statement that aggregates data with SUM(). This query is used to generate a report of the sum of donations by bundler_id. Here, it makes sense to join with the bundlers table as well as to get name, but we already understand how joins work - so let's keep the example simple.

sql> SELECT id, SUM(amount) FROM donation d GROUP BY bundler_id;

The donation table uses the id column as the distribution key. However, this query contains GROUP BY bundler_id, so the results are being aggregated on a value that is not part of the distribution key. This means that multiple slices may refer to the same bundler_id value. So, this aggregation operation can only be partially distributed.

Xpand performs as much of the aggregation as possible in a distributed manner. Each node performs a partial aggregation on its local data. At the end of the operation, the original node combines all of the data and performs the final aggregation.

The following graphic shows how the data flows:

Distributed aggregates with Xpand

In some cases, Xpand can perform the entire aggregation operation in a distributed manner.

For example, consider the following SELECT DISTINCT statement:

SELECT DISTINCT bundler_id FROM test.donation;

In this case, Xpand can use the secondary index's representation called _bundler_key_donation, which uses bundler_id as a distribution key. This has two implications:

  • bundler_id values in each slice are unique.

  • bundler_id values in each slice are sorted.

At the end of the query execution, each node will send the original node a list of distinct bundler_id values that are unique and sorted. The original node does not have to perform any other aggregation. It only has to merge the results from each node.

Distributed Aggregates (Xpand)

Here, the donation table is distributed on the donation.id field. If we were to group by this field, data from across nodes would not share same grouping key. This would then require no re-aggregation. But in this example, we show a case where the same value might be on different nodes (let's ignore that we can use the index). Below is how it will work in the most general case:

Here, we see that the data motion and sequential work is minimized for aggregation. We choose distributed aggregation when the data reduction is estimated to be good based on the statistics.

Aggregates with no Re-aggregation

Please note that we need to Aggregate results of local aggregates on GTM node, only if the values from different nodes may overlap. With complex queries involving joins, this is usually the case. Let's look at a simpler query:

sql> SELECT DISTINCT bundler_id FROM donation;

In this case, there is a secondary key (or index) called _bundler_key_donation, that is distributed by bundler_id. This has two implications:

  • bundler_id values on each node are unique

  • bundler_id values on each node are sorted

To efficiently implement this Xpand will only need the distributed aggregation to hold one row at a time. So if a node reads bundler_id = 5, it will store that and forward it to GTM node. Then it will discard all subsequent values = 5 till it sees a new values 6. On the GTM node, there is no re-aggregation required since values from each node are unique, it merely merges the streams.

Single Node Aggregates (the Competition)

Most primary databases do not have Massively Parallel Processing, so all aggregation happens on the single node that is receiving the query. This is how Oracle, NuoDB, and MongoDB work. This moves a lot more data, is unable to use the resources of more than one machine, and does not scale well.