# Semi-join Subquery Optimizations

MariaDB has a set of optimizations specifically targeted at *semi-join subqueries*.

## What is a Semi-Join Subquery

A semi-join subquery has a form of

```sql
SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...
```

that is, the subquery is an IN-subquery and it is located in the WHERE clause. The most important part here is

*with semi-join subquery, we're only interested in records of outer\_tables that have matches in the subquery*

Let's see why this is important. Consider a semi-join subquery:

```sql
SELECT * FROM Country 
WHERE 
  Continent='Europe' AND 
  Country.Code IN (SELECT City.country 
                   FROM City 
                   WHERE City.Population>1*1000*1000);
```

One can execute it "naturally", by starting from countries in Europe and checking if they have populous Cities:

![semi-join-outer-to-inner](https://2988006611-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSsmexDFPv2xG2OTyO5yV%2Fuploads%2Fgit-blob-07a7f4e9c66112a196b465aa20d5f267e59392ae%2Fsemi-join-outer-to-inner.png?alt=media)

The semi-join property also allows "backwards" execution: we can start from big cities, and check which countries they are in:

![semi-join-inner-to-outer](https://2988006611-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSsmexDFPv2xG2OTyO5yV%2Fuploads%2Fgit-blob-a199b3ee3fbf01b437a0575709de2068fbdc7942%2Fsemi-join-inner-to-outer.png?alt=media)

To contrast, let's change the subquery to be non-semi-join:

```sql
SELECT * FROM Country 
WHERE 
   Country.Continent='Europe' AND 
   (Country.Code IN (SELECT City.country 
                   FROM City WHERE City.Population>1*1000*1000) 
    OR Country.SurfaceArea > 100*1000  -- Added this part
   );
```

It is still possible to start from countries, and then check

* if a country has any big cities
* if it has a large surface area:

![non-semi-join-subquery](https://2988006611-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FSsmexDFPv2xG2OTyO5yV%2Fuploads%2Fgit-blob-0f92632d9e5739cc71edd62f34116d61a8fcfce4%2Fnon-semi-join-subquery.png?alt=media)

The opposite, city-to-country way is not possible. This is not a semi-join.

### Difference from Inner Joins

Semi-join operations are similar to regular relational joins. There is a difference though: with semi-joins, you don't care how many matches an inner table has for an outer row. In the above countries-with-big-cities example, Germany will be returned once, even if it has three cities with populations of more than one million each.

## Semi-Join Optimizations in MariaDB

MariaDB uses semi-join optimizations to run IN subqueries.The optimizations are enabled by default. You can disable them by turning off their [optimizer\_switch](https://mariadb.com/docs/server/server-management/variables-and-modes/server-system-variables#optimizer_switch) like so:

```sql
SET optimizer_switch='semijoin=off'
```

MariaDB has five different semi-join execution strategies:

* [Table pullout optimization](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/subquery-optimizations/table-pullout-optimization)
* [FirstMatch execution strategy](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/optimization-strategies/firstmatch-strategy)
* [Semi-join Materialization execution strategy](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/optimization-strategies/semi-join-materialization-strategy)
* [LooseScan execution strategy](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/optimization-strategies/loosescan-strategy)
* [DuplicateWeedout execution strategy](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/optimization-strategies/duplicateweedout-strategy)

## See Also

* [Subquery Optimizations Map](https://mariadb.com/docs/server/ha-and-performance/optimization-and-tuning/query-optimizations/subquery-optimizations/subquery-optimizations-map)
* ["Observations about subquery use cases"](https://s.petrunia.net/blog/?p=35) blog post
* [http:en.wikipedia.org/wiki/Semijoin](https://en.wikipedia.org/wiki/Semijoin)

<sub>*This page is licensed: CC BY-SA / Gnu FDL*</sub>

{% @marketo/form formId="4316" %}
