MariaDB SQL Set Operators

Set operators are the SQL operators that deal with combining, in different ways, different result sets. Say you have two different SELECTs that you want to combine into a single result set, the set operators come into play. MariaDB has been supporting the UNION and UNION ALL set operators for a long time, and these are by far the most common set operators.

But we are getting ahead of ourselves here, let me first explain the set operators that we have and how they work. If you want to give this a try, you can use your existing deployment of MariaDB Server, or try this out in a MariaDB SkySQL cloud database.

UNION and UNION ALL

The UNION and UNION ALL set operators add the result of two or more result sets. Let’s start with UNION ALL and UNION will then be a variation of UNION ALL.

SQL UNION Operator

Let’s have a look at what it looks like in SQL. Let’s assume we run a webshop and that for the products we sell, we have an inventory. Now we want to see all the products that are on order or are in inventory, a query for this would look something like this:

 SELECT oi.prod_id, p.prod_name
   FROM order_items oi JOIN products p ON oi.prod_id = p.id
 UNION ALL
 SELECT i.prod_id, p.prod_name
   FROM inventory i JOIN products p ON i.prod_id = p.id;

This is, in set theory, the UNION of the sets of products that have been ordered and the sets of products that are in inventory. Which is fine in theory, but there is an issue with the result of this query. The issue is that a product that appears in both the orders and the inventory, or in multiple places in the inventory, will appear more than once in the output. This issue is why UNION ALL isn’t used much and instead UNION DISTINCT (DISTINCT is the default and can be ignored) is used. For example:

SELECT oi.prod_id, p.prod_name
   FROM order_items oi JOIN products p ON oi.prod_id = p.id
 UNION
 SELECT i.prod_id, p.prod_name
   FROM inventory i JOIN products p ON i.prod_id = p.id;

With this query, a product that is either on order or exists in the inventory is only listed once. Note that when we are removing duplicates here, it is the values that are compared, so two rows with the same values in the same column are considered equal, even though the values come from different tables or columns.

To be honest though, there is nothing in the query above that can’t be done with a regular SELECT from the products table and a few joins. In some ways a UNION may be easier to read. On the other hand, if we want to have a list of products on order or in the inventory, and also want to know which one it was, then a query would be something like this:

 SELECT 'On order', oi.prod_id, p.prod_name
   FROM order_items oi JOIN products p ON oi.prod_id = p.id
 UNION
 SELECT 'Inventory', i.prod_id, p.prod_name
   FROM inventory i JOIN products p ON i.prod_id = p.id;

Here is a query that isn’t easy to do with a simple SELECT from the products table as we are looking at the same row from the product table twice (once for the order_items and once for the inventory).

More SQL set operators

With MariaDB Server 10.3 came two new set operators, largely introduced to enhance Oracle compatibility, but these operators are useful in their own right. MariaDB Server 10.4 then adds the ability to control set operator precedence. We’ll have a look at that also. Without the ability to control the operator precedence, the set operators don’t always work as you would want or expect.

The new set operators are INTERSECT and EXCEPT and they are useful, particularly when using analytics. Also, although JOINs and other constructs can often be used instead, set operators allow for a SQL syntax that can be easier to read and understand. And, if you have Oracle applications you are migrating to MariaDB, the usefulness of these operators is obvious.

The INTERSECT set operator

The INTERSECT operator will return all items that exist in two or more sets, or in SQL terms, all rows that exist in two result sets. In this case, a cross section of the two sets of items is created. In SQL terms it means that only rows that exist in both sets are returned, so if I want to check which products I have on order and which are also in inventory a query might look like this:

 SELECT oi.prod_id, p.prod_name
   FROM order_items oi JOIN products p ON oi.prod_id = p.id
 INTERSECT
 SELECT i.prod_id, p.prod_name
   FROM inventory i JOIN products p ON i.prod_id = p.id;

Again, this query could be constructed using a JOIN on the products table, but the query above is a bit clearer about what we are trying to achieve.

The EXCEPT set operator

SQL EXCEPT Operator

In the case of the EXCEPT operator, we want the items that are in one of the sets, but not in the other. So, again using the example above, if we want to see the products that we have on order but for which we don’t have inventory, we could write a query like this:

 SELECT oi.prod_id, p.prod_name
   FROM order_items oi JOIN products p ON oi.prod_id = p.id
 EXCEPT
 SELECT i.prod_id, p.prod_name
   FROM inventory i JOIN products p ON i.prod_id = p.id;

Again, there are other ways of writing this particular query, but for other, more advanced queries when we are combining data from two different tables, this is not the case.

Combining multiple set operators

You can combine more than 2 set operators if this is useful. For example, let us see if we can find products that are on order and have been delivered or are in stock. The SQL for this would look something like this:

SELECT oi.prod_id, p.prod_name
   FROM order_items oi JOIN products p ON oi.prod_id = p.id
 INTERSECT
 SELECT d.prod_id, p.prod_name
   FROM deliveries d JOIN products p ON d.prod_id = p.id
 UNION
 SELECT i.prod_id, p.prod_name
   FROM inventory i JOIN products p ON i.prod_id = p.id;

To express this in plain language, what is going on is that I first check which products are on order and that have been delivered, and then I combine this set of products with all products in the inventory. Any product not in the result set is not in the inventory but might be on order or might have been delivered, but not both.

But now, let’s express this differently, and see what happens. I want a list of all products that are in stock or have been delivered and are on order. The SQL would then be something like this, similar to the SQL above but slightly different:

 SELECT i.prod_id, p.prod_name
   FROM inventory i JOIN products p ON i.prod_id = p.id
 UNION
 SELECT oi.prod_id, p.prod_name
   FROM order_items oi JOIN products p ON oi.prod_id = p.id
 INTERSECT
 SELECT d.prod_id, p.prod_name
   FROM deliveries d JOIN products p ON d.prod_id = p.id;

How do you interpret this then? Do you list products that are in stock and that are on order and the products that are being delivered? This is what this looks like, right? It is just that INTERSECT (and EXCEPT for that matter) has precedence over UNION. The two SQL statements produce the same result set, at least in MariaDB and this is how the SQL Standard says things should work. But there is an exception, Oracle.

How this works in Oracle

Oracle has had all four of the set operators (UNION, UNION ALL, INTERSECT and EXCEPT) for a long time, long before they were standardized, so their implementation is a bit different. Let’s try with the tables above and insert some data in them. The data is very simple and reflects a not so successful company, but it works as an example, and we are only showing the relevant columns here.

 

Table products order_items inventory deliveries
Column prod_id prod_name order_id prod_id prod_id prod_id
Data 1 Vase Blue 1 1 1 2
2 Vase Red 2 1 2 3
3 Carpet Red 2 3

With data in place, let’s look at that last SQL statement above again. There is a feature that allows you to control the precedence, and that is to use parentheses, or brackets (Introduced in MariaDB 10.4, see https://jira.mariadb.org/browse/MDEV-11953), and using these to illustrate what is going on, the statement would look like this:

 MariaDB> SELECT i.prod_id, p.prod_name
     ->   FROM inventory i JOIN products p ON i.prod_id = p.id
     -> UNION
     -> (SELECT oi.prod_id, p.prod_name
     ->   FROM order_items oi JOIN products p ON oi.prod_id = p.id
     -> INTERSECT
     -> SELECT d.prod_id, p.prod_name
     ->   FROM deliveries d JOIN products p ON d.prod_id = p.id);
 +---------+------------+
 | prod_id | prod_name  |
 +---------+------------+
 |       1 | Vase Blue  |
 |       2 | Vase Red   |
 |       3 | Carpet Red |
 +---------+------------+
 3 rows in set (0.001 sec)

Now, let’s use the same technique to enforce the three components of the query to operate in strict order:

 MariaDB> (SELECT i.prod_id, p.prod_name
     ->   FROM inventory i JOIN products p ON i.prod_id = p.id
     -> UNION
     -> SELECT oi.prod_id, p.prod_name
     ->   FROM order_items oi JOIN products p ON oi.prod_id = p.id)
     -> INTERSECT
     -> SELECT d.prod_id, p.prod_name
     ->   FROM deliveries d JOIN products p ON d.prod_id = p.id;
 +---------+------------+
 | prod_id | prod_name  |
 +---------+------------+
 |       2 | Vase Red   |
 |       3 | Carpet Red |
 +---------+------------+
 2 rows in set (0.001 sec)

And lastly without parentheses:

 MariaDB [test]> SELECT i.prod_id, p.prod_name
     ->   FROM inventory i JOIN products p ON i.prod_id = p.id
     -> UNION
     -> SELECT oi.prod_id, p.prod_name
     ->   FROM order_items oi JOIN products p ON oi.prod_id = p.id
     -> INTERSECT
     -> SELECT d.prod_id, p.prod_name
     ->   FROM deliveries d JOIN products p ON d.prod_id = p.id;
 +---------+------------+
 | prod_id | prod_name  |
 +---------+------------+
 |       1 | Vase Blue  |
 |       2 | Vase Red   |
 |       3 | Carpet Red |
 +---------+------------+
 3 rows in set (0.001 sec)

We see that MariaDB, following the standard, assumed that INTERSECT has precedence over UNION. Which brings us to Oracle. Let’s try the above SQL in Oracle using sqlplus:

 SQL> SELECT i.prod_id, p.prod_name
   2   FROM inventory i JOIN products p ON i.prod_id = p.id
   3  UNION
   4  SELECT oi.prod_id, p.prod_name
   5   FROM order_items oi JOIN products p ON oi.prod_id = p.id
   6  INTERSECT
   7  SELECT d.prod_id, p.prod_name
   8   FROM deliveries d JOIN products p ON d.prod_id = p.id;
 
    PROD_ID PROD_NAME
 ---------- ------------------------------
          2 Vase Red
          3 Carpet Red

What is going on here, you ask? Well, Oracle doesn’t follow the standard. The different set operators are treated as equal and none has precedence over the other. This is an issue when you are migrating applications from Oracle to MariaDB, and what is worse, is that this difference is kind of hard to find. No error is produced and data is returned and, in many cases, the right data is returned. But, in some cases, where data is like in our example above, the wrong data is returned, which is an issue.

Effect on Migrating Data

So how do we deal with this if we are migrating an application from Oracle to MariaDB? There are a few options:

  • Rewrite the application so that it assumes that the data returned from a query like this is in line with the SQL Standard and MariaDB.
  • Rewrite the SQL statements, using brackets, so that MariaDB returns the same data as Oracle
  • Or, and this is the smartest way, use the MariaDB SQL_MODE=Oracle setting.

For the last, and smartest way, to work, we need to run with MariaDB 10.3.7 or higher (this was suggested by yours truly in https://jira.mariadb.org/browse/MDEV-13695). Let’s check how this works. Compare the result of this SELECT with the Oracle one above (which produces the same result) and the one from MariaDB above that (which doesn’t):

 MariaDB> set SQL_MODE=Oracle;
 Query OK, 0 rows affected (0.001 sec)
 
 MariaDB> SELECT i.prod_id, p.prod_name
     ->   FROM inventory i JOIN products p ON i.prod_id = p.id
     -> UNION
     -> SELECT oi.prod_id, p.prod_name
     ->   FROM order_items oi JOIN products p ON oi.prod_id = p.id
     -> INTERSECT
     -> SELECT d.prod_id, p.prod_name
     ->   FROM deliveries d JOIN products p ON d.prod_id = p.id;
 +---------+------------+
 | prod_id | prod_name  |
 +---------+------------+
 |       2 | Vase Red   |
 |       3 | Carpet Red |
 +---------+------------+
 2 rows in set (0.002 sec)

As you can see, when SQL_MODE is set to Oracle, MariaDB really behaves like Oracle. This is not the only thing that SQL_MODE=Oracle does, obviously, but it is one of the lesser known areas.

Conclusion

The set operators INTERSECT and EXCEPT are not used that much, although they do appear here and there, and there are some uses for them. The examples in this blog are there more to illustrate how these operators work than to show really good uses for them. There are probably better examples. When you are migrating from Oracle to MariaDB though, set operators are really useful since many Oracle applications use them, and as can be seen, MariaDB can be tricked into working just like Oracle, which is non-Standard but still serves a purpose. But, by default, of course MariaDB works like it should and follows the SQL Standard.

Happy SQL’ing
/Karlsson