Scaling Aggregates with MariaDB Xpand
This page is part of MariaDB's MariaDB Documentation.
The parent of this page is: Parallel Query Evaluation for MariaDB Xpand
Topics on this page:
Xpand uses concurrent parallelism, pipeline parallelism, and massively parallel processing (MPP) strategies to scale aggregates:
Xpand uses concurrent parallelism to execute the same fragment of a query on different slices in parallel. A different node or CPU core operates on each slice.
Xpand uses pipeline parallelism to execute different fragments of a query on different slices in parallel. A different node or CPU core executes each fragment.
Xpand uses massively parallel processing (MPP) strategies to coordinate query execution and minimize data transmission between nodes.
For some aggregate operations, Xpand performs the aggregation in a distributed manner. At the end, the partial aggregates from each node are combined.
Information provided here applies to:
MariaDB Xpand 5.3
MariaDB Xpand 6.0
MariaDB Xpand 6.1
The following example tables will be used to show how Xpand evaluates queries in the following sections:
CREATE TABLE test.bundler ( id INT PRIMARY KEY AUTO_INCREMENT, name CHAR(60) DEFAULT NULL ); CREATE TABLE test.donation ( id INT PRIMARY KEY AUTO_INCREMENT, bundler_id INT, amount DOUBLE, KEY bundler_key (bundler_id, amount) );
Since the first table has a primary key and no secondary index, Xpand creates 1 representation. Since the second table has a primary key and a secondary index, Xpand creates 2 representations:
Scaling aggregates requires aggregating in parallel on multiple nodes. This leaves the final work of combining the partial aggregation from each node to a simple task on a much smaller data size.
For example, consider the following
SELECT statement that aggregates data with
SUM(). This query is used to generate a report of the sum of donations by
bundler_id. Here, it makes sense to join with the
bundlers table as well as to get
name, but we already understand how joins work - so let's keep the example simple.
sql> SELECT id, SUM(amount) FROM donation d GROUP BY by bundler_id;
donation table uses the
id column as the distribution key. However, this query contains
GROUP BY bundler_id, so the results are being aggregated on a value that is not part of the distribution key. This means that multiple slices may refer to the same
bundler_id value. So, this aggregation operation can only be partially distributed.
Xpand performs as much of the aggregation as possible in a distributed manner. Each node performs a partial aggregation on its local data. At the end of the operation, the original node combines all of the data and performs the final aggregation.
The following graphic shows how the data flows:
In some cases, Xpand can perform the entire aggregation operation in a distributed manner.
For example, consider the following
SELECT DISTINCT statement:
SELECT DISTINCT bundler_id FROM test.donation;
In this case, Xpand can use the secondary index's representation called
_bundler_key_donation, which uses
bundler_id as a distribution key. This has two implications:
bundler_idvalues in each slice are unique.
bundler_idvalues in each slice are sorted.
At the end of the query execution, each node will send the original node a list of distinct
bundler_id values that are unique and sorted. The original node does not have to perform any other aggregation. It only has to merge the results from each node.
Distributed Aggregates (Xpand)
donation table is distributed on the
donation.id field. If we were to group by this field, data from across nodes would not share same grouping key. This would then require no re-aggregation. But in this example, we show a case where the same value might be on different nodes (let's ignore that we can use the index). Below is how it will work in the most general case:
Here, we see that the data motion and sequential work is minimized for aggregation. We choose distributed aggregation when the data reduction is estimated to be good based on the statistics.
Aggregates with no Re-aggregation
Please note that we need to Aggregate results of local aggregates on GTM node, only if the values from different nodes may overlap. With complex queries involving joins, this is usually the case. Let's look at a simpler query:
sql> SELECT DISTINCT bundler_id FROM donation;
In this case, there is a secondary key (or index) called
_bundler_key_donation, that is distributed by
bundler_id. This has two implications:
bundler_idvalues on each node are unique
bundler_idvalues on each node are sorted
To efficiently implement this Xpand will only need the distributed aggregation to hold one row at a time. So if a node reads
bundler_id = 5, it will store that and forward it to GTM node. Then it will discard all subsequent values = 5 till it sees a new values 6. On the GTM node, there is no re-aggregation required since values from each node are unique, it merely merges the streams.
Single Node Aggregates (the Competition)
Most primary databases do not have Massively Parallel Processing, so all aggregation happens on the single node that is receiving the query. This is how Oracle, NuoDB, and MongoDB work. This moves a lot more data, is unable to use the resources of more than one machine, and does not scale well.