GROUP BY trick has been optimized away

Group-by trick example: Find the most populous city in each state:

SELECT  state, city, population, COUNT(*) AS num_cities
    FROM
      ( SELECT  state, city, population
            FROM  us
            ORDER BY  state, population DESC ) p
    GROUP BY  state
    ORDER BY  state;
+-------+-------------+------------+------------+
| state | city        | population | num_cities |
+-------+-------------+------------+------------+
| AK    | Anchorage   |     276263 |         16 |
| AL    | Birmingham  |     231621 |         58 |
| AR    | Little Rock |     184217 |         40 |
| AZ    | Phoenix     |    1428509 |         51 |
| CA    | Los Angeles |    3877129 |        447 |
...
That was the output in MySQL 5.1.  But with MariaDB 5.5.23, I get:
+-------+-------------------+------------+------------+
| state | city              | population | num_cities |
+-------+-------------------+------------+------------+
| AK    | Anchorage         |     276263 |         16 |
| AL    | Alabaster         |      26738 |         58 |
| AR    | Arkadelphia       |      11062 |         40 |
| AZ    | Apache Junction   |      34904 |         51 |
| CA    | Adelanto          |      21955 |        447 |
...

The EXPLAIN plan do longer shows a subquery, as if the inner ORDER BY has been thrown away.

Granted, there is nothing in the definition of MySQL (much less in the SQL standard) that requires that Los Angeles should be bigger than Adelanto. But the replacement code for this 'trick' is quite messy.

Am I correct in deducing (from the outside, looking in) that MariaDB's optimizations are the cause of the change?

Answer Answered by Elena Stepanova in this comment.

Hi,

You might probably switch to the old behavior for a while by setting optimizer_switch='derived_merge=off' (or optimizer_switch='derived_merge=off,derived_with_keys=off', depending on the structure of your table) in your session or globally in the config file; but really, you are just playing with fire, and most likely it won't be long before you get the problem again, possibly without such luck as noticing it in time.

While MySQL definition might not say anything about Los Angeles being bigger than Adelanto, it is perfectly clear about the type of queries that you use, and says explicitly that behavior here is undefined:

http://dev.mysql.com/doc/refman/5.5/en/group-by-hidden-columns.html

"MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses."

There is nothing that suggests that your subquery trick should make the difference and assure the deterministic result you are hoping for.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.