Scaling Aggregates with MariaDB Xpand
This page is part of MariaDB's Documentation.
The parent of this page is: Parallel Query Evaluation for MariaDB Xpand
Topics on this page:
Overview
Xpand uses concurrent parallelism, pipeline parallelism, and massively parallel processing (MPP) strategies to scale aggregates:
Xpand uses concurrent parallelism to execute the same fragment of a query on different slices in parallel. A different node or CPU core operates on each slice.
Xpand uses pipeline parallelism to execute different fragments of a query on different slices in parallel. A different node or CPU core executes each fragment.
Xpand uses massively parallel processing (MPP) strategies to coordinate query execution and minimize data transmission between nodes.
For some aggregate operations, Xpand performs the aggregation in a distributed manner. At the end, the partial aggregates from each node are combined.
Compatibility
Information provided here applies to:
MariaDB Xpand 5.3
MariaDB Xpand 6.0
MariaDB Xpand 6.1
Example Tables
The following example tables will be used to show how Xpand evaluates queries in the following sections:
CREATE TABLE test.bundler (
id INT PRIMARY KEY AUTO_INCREMENT,
name CHAR(60) DEFAULT NULL
);
CREATE TABLE test.donation (
id INT PRIMARY KEY AUTO_INCREMENT,
bundler_id INT,
amount DOUBLE,
KEY bundler_key (bundler_id, amount)
);
Since the first table has a primary key and no secondary index, Xpand creates 1 representation. Since the second table has a primary key and a secondary index, Xpand creates 2 representations:
Representation | Index | Table | Distribution Key |
---|---|---|---|
| Primary Key |
|
|
| Primary Key |
|
|
| Key |
|
|
Scaling Aggregates
Scaling aggregates requires aggregating in parallel on multiple nodes. This leaves the final work of combining the partial aggregation from each node to a simple task on a much smaller data size.
Example Query
For example, consider the following SELECT
statement that aggregates data with SUM()
. This query is used to generate a report of the sum of donations by bundler_id
. Here, it makes sense to join with the bundlers
table as well as to get name
, but we already understand how joins work - so let's keep the example simple.
sql> SELECT id, SUM(amount) FROM donation d GROUP BY bundler_id;
The donation
table uses the id
column as the distribution key. However, this query contains GROUP BY bundler_id
, so the results are being aggregated on a value that is not part of the distribution key. This means that multiple slices may refer to the same bundler_id
value. So, this aggregation operation can only be partially distributed.
Xpand performs as much of the aggregation as possible in a distributed manner. Each node performs a partial aggregation on its local data. At the end of the operation, the original node combines all of the data and performs the final aggregation.
The following graphic shows how the data flows:
In some cases, Xpand can perform the entire aggregation operation in a distributed manner.
For example, consider the following SELECT DISTINCT
statement:
SELECT DISTINCT bundler_id FROM test.donation;
In this case, Xpand can use the secondary index's representation called _bundler_key_donation
, which uses bundler_id
as a distribution key. This has two implications:
bundler_id
values in each slice are unique.bundler_id
values in each slice are sorted.
At the end of the query execution, each node will send the original node a list of distinct bundler_id
values that are unique and sorted. The original node does not have to perform any other aggregation. It only has to merge the results from each node.
Distributed Aggregates (Xpand)
Here, the donation
table is distributed on the donation.id
field. If we were to group by this field, data from across nodes would not share same grouping key. This would then require no re-aggregation. But in this example, we show a case where the same value might be on different nodes (let's ignore that we can use the index). Below is how it will work in the most general case:
Here, we see that the data motion and sequential work is minimized for aggregation. We choose distributed aggregation when the data reduction is estimated to be good based on the statistics.
Aggregates with no Re-aggregation
Please note that we need to Aggregate results of local aggregates on GTM node, only if the values from different nodes may overlap. With complex queries involving joins, this is usually the case. Let's look at a simpler query:
sql> SELECT DISTINCT bundler_id FROM donation;
In this case, there is a secondary key (or index) called _bundler_key_donation
, that is distributed by bundler_id
. This has two implications:
bundler_id
values on each node are uniquebundler_id
values on each node are sorted
To efficiently implement this Xpand will only need the distributed aggregation to hold one row at a time. So if a node reads bundler_id = 5
, it will store that and forward it to GTM node. Then it will discard all subsequent values = 5 till it sees a new values 6. On the GTM node, there is no re-aggregation required since values from each node are unique, it merely merges the streams.
Single Node Aggregates (the Competition)
Most primary databases do not have Massively Parallel Processing, so all aggregation happens on the single node that is receiving the query. This is how Oracle, NuoDB, and MongoDB work. This moves a lot more data, is unable to use the resources of more than one machine, and does not scale well.