chris.calender

I ran into an interesting issue today, while examining some EXPLAIN outputs, and wanted to share the findings, as some of this is undocumented.

Basically, you can start with a very simple InnoDB table - 2 INT columns, Primary Key (PK) on the 1st column, regular index on the 2nd:

CREATE TABLE `t1` (
  `id1` int(10) unsigned NOT NULL,
  `id2` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id1`),
  KEY `id2` (`id2`)
) ENGINE=InnoDB;

The query is:

SELECT id1 FROM t1;

This is a straight-forward query with no WHERE clause.

Given no WHERE clause, we know there will be a full table or index scan. Let's look at EXPLAIN:

mysql> EXPLAIN SELECT id1 FROM t1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: id2
      key_len: 5
          ref: NULL
         rows: 1
        Extra: Using index

EXPLAIN returns NULL for "possible_keys". This means there was no relevant index (though we see one was used at some point):

"It is possible that key will name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan."

http://dev.mysql.com/doc/refman/5.6/en/explain-output.html

So "possible_keys" returns NULL, but key returns `id2`.

Per the above, that means `id2` is a covering index.

This part is a bit strange, as it is not readily intuitive why this is the case.

I mean, the query asks for "id1", yet EXPLAIN says "id2" is the covering index.

How is this possible and why isn't just the Primary Key chosen anyway?

This is possible because of the way InnoDB's secondary indexes are stored. In InnoDB, secondary indexes are stored along with their corresponding primary key values, so by looking at the secondary index, you can see the primary key values.

As for why this is chosen instead of the primary key, it took some digging, but I found the exact comments as to why in the source code (sql/sql_select.cc) in the find_shortest_key() function:

"As far as 
1) clustered primary key entry data set is a set of all record
	fields (key fields and not key fields) and
2) secondary index entry data is a union of its key fields and
	primary key fields (at least InnoDB and its derivatives don't
	duplicate primary key fields there, even if the primary and
	the secondary keys have a common subset of key fields),
then secondary index entry data is always a subset of primary key entry.
Unfortunately, key_info[nr].key_length doesn't show the length
of key/pointer pair but a sum of key field lengths only, thus
we can't estimate index IO volume comparing only this key_length
value of secondary keys and clustered PK.
So, try secondary keys first, and choose PK only if there are no
usable secondary covering keys or found best secondary key include
all table fields (i.e. same as PK):"

So since secondary index entry data is always a subset of primary key entry, scanning the secondary index should generate slightly less IO than scanning the primary key.

Fwiw, this also explains why key_len is less if you specify "USE INDEX (Primary)" to force it to use the PK over the secondary index:

mysql> EXPLAIN SELECT id1 FROM t1 USE INDEX (Primary)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1
        Extra: Using index

Note this reports key_len of 4, whereas the former reported 5.

Hope this helps.

 

Tags: 

About the Author

chris.calender's picture

Chris Calender is a Principal Support Engineer with MariaDB. Earlier he was a Principal Support Engineer at MySQL/Sun/Oracle. And before that, he worked as a MySQL DBA and developer for numerous Fortune 500 Companies, including Clear Channel, Western & Southern, and Cincinnati Bell. Chris has both Bachelor's and Master's degrees in Computer Science, from Miami University and the University of Cincinnati, respectively.

claudio

Great find! I took some time to look at it and noticed MariaDB takes a different approach, going straight with PRIMARY: <code>MariaDB [test]> EXPLAIN SELECT id1 FROM t1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 1 Extra: Using index 1 row in set (0.00 sec)</code> I also noticed that the same 'shortest index is best' approach is taken by MySQL 5.1, 5.5 and 5.6. <code>sql/sql_select.cc: if (usable_keys->is_set(nr)) { if (table->key_info[nr].key_length < min_length) { min_length=table->key_info[nr].key_length; best=nr; } }</code> Same exact piece of code in MariaDB introduces a different concept, 'cost': <code>sql/sql_select.cc: if (usable_keys->is_set(nr)) { double cost= table->file->keyread_time(nr, 1, table->file->records()); if (cost < min_cost) { min_cost= cost; best=nr; } }</code> Tests in different scenarios should be done to evaluate what's generally the best approach, what is clear is that MySQL focuses on the I/O impact while MariaDB uses the 'cost' concept. Interesting info can be found in keyread_time() but I'll leave that to the curious ones!
olivierdasini

Very interesting. key_len = 5 for id2 because of int {4 bytes} and NULL {1 byte} default value. The id2 index choice is counter intuitive but logic with your explanation. Thanks
冯勇_g

I had validated your option about key_len, it do work when indexed column type is int, but when indexed column type is varchar, it doesn't work, can you explain this to me? thank you! mysql> explain select pId from order_record where orderId='201405231401195646'; +----+-------------+--------------+-------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------------+-------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | order_record | const | PRIMARY | PRIMARY | 386 | const | 1 | | +----+-------------+--------------+-------+---------------+---------+---------+-------+------+-------+ 1 row in set (0.00 sec) mysql> explain select pId from order_record where name='201405231401195646'; +----+-------------+--------------+------+---------------+------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------------+------+---------------+------+---------+-------+------+-------------+ | 1 | SIMPLE | order_record | ref | name | name | 386 | const | 1 | Using where | +----+-------------+--------------+------+---------------+------+---------+-------+------+-------------+ 1 row in set (0.00 sec)
冯勇_g

[quote=冯勇_g] I had validated your option about key_len, it do work when indexed column type is int, but when indexed column type is varchar(128), and the key_len are 386 for both primary key and secondary index key, can you explain this to me? Thank you! [/quote]

Newsletter Signup

Subscribe to get MariaDB tips, tricks and news updates in your inbox: