# DISTINCT removal in aggregate functions

## Basics

One can use `DISTINCT` keyword to de-duplicate the arguments of an aggregate function. For example:

```sql
SELECT COUNT(DISTINCT col1) FROM tbl1;
```

In order to compute this, MariaDB has to collect the values of `col1` and remove the duplicates. This may be computationally expensive.

After the fix for [MDEV-30660](https://jira.mariadb.org/browse/MDEV-30660) (available from [MariaDB 10.5.25](/docs/release-notes/community-server/old-releases/10.5/10.5.25.md), [MariaDB 10.6.18](/docs/release-notes/community-server/10.6/10.6.18.md), [MariaDB 10.11.8](/docs/release-notes/community-server/10.11/10.11.8.md), [MariaDB 11.0.6](/docs/release-notes/community-server/old-releases/11.0/11.0.6.md), [MariaDB 11.1.5](/docs/release-notes/community-server/old-releases/11.1/11.1.5.md), [MariaDB 11.2.4](/docs/release-notes/community-server/old-releases/11.2/11.2.4.md), [MariaDB 11.4.2](/docs/release-notes/community-server/11.4/11.4.2.md)), the optimizer can detect certain\
cases when argument of aggregate function will not have duplicates and so de-duplication can be skipped.

## When one can skip de-duplication

A basic example: if we're doing a select from one table, then the values of `primary_key` are already distinct:

```sql
SELECT aggregate_func(DISTINCT tbl.primary_key, ...) FROM tbl;
```

If the SELECT has other constant tables, that's also ok, as they will not create duplicates.

The next step: a part of the primary key can be "bound" by the GROUP BY clause. Consider a query:

```sql
SELECT aggregate_func(DISTINCT t1.pk1, ...) FROM t1 GROUP BY t1.pk2;
```

Suppose the table has `PRIMARY KEY(pk1, pk2)`. Grouping by `pk2` fixes the value of `pk2` within each group. Then, the values of `pk1` must be unique within each group, and de-duplication is not necessary.

## Observability

`EXPLAIN` or `EXPLAIN FORMAT=JSON` do not show any details about how aggregate functions are computed. One has to look at the Optimizer Trace. Search for `aggregator_type`:

When de-duplication is necessary, it will show:

```json
{
            "prepare_sum_aggregators": {
              "function": "count(distinct t1.col1)",
              "aggregator_type": "distinct"
            }
          }
```

When de-duplication is not necessary, it will show:

```json
{
            "prepare_sum_aggregators": {
              "function": "count(distinct t1.pk1)",
              "aggregator_type": "simple"
            }
          }
```

<sub>*This page is licensed: CC BY-SA / Gnu FDL*</sub>

{% @marketo/form formId="4316" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/query-optimizations-distinct-removal-in-aggregate-functions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
