InnoDB Primary Key versus Secondary Index: An Interesting Lesson from EXPLAIN

I ran into an interesting issue today, while examining some EXPLAIN outputs, and wanted to share the findings, as some of this is undocumented. Basically, you can start with a very simple InnoDB table – 2 INT columns, Primary Key (PK) on the 1st column, regular index on the 2nd:

CREATE TABLE `t1` (
  `id1` int(10) unsigned NOT NULL,
  `id2` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id1`),
  KEY `id2` (`id2`)
) ENGINE=InnoDB;

The query is:

SELECT id1 FROM t1;

This is a straight-forward query with no WHERE clause. Given no WHERE clause, we know there will be a full table or index scan. Let’s look at EXPLAIN:

mysql> EXPLAIN SELECT id1 FROM t1G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: id2
      key_len: 5
          ref: NULL
         rows: 1
        Extra: Using index

EXPLAIN returns NULL for “possible_keys”. This means there was no relevant index (though we see one was used at some point): “It is possible that key will name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.” http://dev.mysql.com/doc/refman/5.6/en/explain-output.html So “possible_keys” returns NULL, but key returns `id2`. Per the above, that means `id2` is a covering index. This part is a bit strange, as it is not readily intuitive why this is the case. I mean, the query asks for “id1”, yet EXPLAIN says “id2” is the covering index. How is this possible and why isn’t just the Primary Key chosen anyway? This is possible because of the way InnoDB’s secondary indexes are stored. In InnoDB, secondary indexes are stored along with their corresponding primary key values, so by looking at the secondary index, you can see the primary key values. As for why this is chosen instead of the primary key, it took some digging, but I found the exact comments as to why in the source code (sql/sql_select.cc) in the find_shortest_key() function:

"As far as 
1) clustered primary key entry data set is a set of all record
	fields (key fields and not key fields) and
2) secondary index entry data is a union of its key fields and
	primary key fields (at least InnoDB and its derivatives don't
	duplicate primary key fields there, even if the primary and
	the secondary keys have a common subset of key fields),
then secondary index entry data is always a subset of primary key entry.
Unfortunately, key_info[nr].key_length doesn't show the length
of key/pointer pair but a sum of key field lengths only, thus
we can't estimate index IO volume comparing only this key_length
value of secondary keys and clustered PK.
So, try secondary keys first, and choose PK only if there are no
usable secondary covering keys or found best secondary key include
all table fields (i.e. same as PK):"

So since secondary index entry data is always a subset of primary key entry, scanning the secondary index should generate slightly less IO than scanning the primary key. Fwiw, this also explains why key_len is less if you specify “USE INDEX (Primary)” to force it to use the PK over the secondary index:

mysql> EXPLAIN SELECT id1 FROM t1 USE INDEX (Primary)G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1
        Extra: Using index

Note this reports key_len of 4, whereas the former reported 5. Hope this helps.