DISTINCT removal in aggregate functions

You are viewing an old version of this article. View the current version here.


One can use DISTINCT keyword to de-duplicate the arguments of an aggregate function. For example:


In order to compute this, the optimizer has to de-duplicate the function arguments. This may be computationally expensive.

After fix for MDEV-30660 (available from MariaDB 10.5.25, 10.6.18, 10.11.8, 11.0.6, 11.1.5, 11.2.4, 11.4.2) , the optimizer is able to detect that set of aggregate function arguments is already distinct and so no de-duplication is necessary.

When one can skip de-duplication

The most basic example: if we're doing a select from one table, then the values of Attachment 'primary_key' not found are already distinct:

SELECT aggregate_func(DISTINCT tbl.primary_key, ...) FROM tbl;

If the SELECT has other constant tables, that's also ok, as they will not create duplicates.

The next step: a part of the primary key can be "bound" by the GROUP BY clause.  Consider a query:
SELECT aggregate_func(DISTINCT t1.pk1, ...) FROM t1 GROUP BY t1.pk2;

Suppose the table has PRIMARY KEY(pk1, pk2). GROUP BY pk2 fixes the value of pk2 within each. Then, the values of pk1 must be unique within each group, and de-duplication is not necessary.


EXPLAIN or EXPLAIN FORMAT=JSON do not show any details about how aggregate functions are computed. One has to look at the Optimizer Trace. Search for aggregator_type:

When de-duplication is necessary, it will show:

            "prepare_sum_aggregators": {
              "function": "count(distinct t1.col1)",
              "aggregator_type": "distinct"

When de-duplication is not necessary, it will show:

            "prepare_sum_aggregators": {
              "function": "count(distinct t1.pk1)",
              "aggregator_type": "simple"


Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.