Initial impressions of InnoDB Fulltext

The much-anticipated, long-awaited, and possibly irrelevant (Sphinx? Lucene?) InnoDB Fulltext finally makes a semi-official appearance in MySQL 5.6.4! MySQL 5.6.4, release December 20, 2011, is the newest “development milestone release” (“DMR”) of MySQL 5.6. It’s important to bear in mind that this is the first release of InnoDB Fulltext. Hopefully, that means that considerable improvements will be forthcoming; for now, though, we are left to try to make sense of what is available at present. Creating an InnoDB table that uses a fulltext index is dead easy:

mysql> create table ibft1 (id int unsigned not null auto_increment primary key, c1 varchar(255), fulltext(c1)) engine=innodb;
Query OK, 0 rows affected (0.08 sec)

mysql> insert into ibft1 (c1) values ('brown cow'),('brown dog'),('white cow'),('white dog'),('yellow horse'),('green iguana');
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select * from ibft1 where match (c1) against ('brown');
+----+-----------+
| id | c1        |
+----+-----------+
|  1 | brown cow |
|  2 | brown dog |
+----+-----------+
2 rows in set (0.00 sec)

mysql> select * from ibft1 where match (c1) against ('cow');
+----+-----------+
| id | c1        |
+----+-----------+
|  1 | brown cow |
|  3 | white cow |
+----+-----------+
2 rows in set (0.01 sec)

Great! I’m happy to see it works, through I was a bit surprised to see that the default minimum word length was less than the value of ft_min_word_len. Those legacy ft_% variables continue to apply only to MyISAM fulltext indexes, evidently. There’s a new bundle of system variables to control InnoDB Fulltext:

mysql> show variables like '%ft%';
+---------------------------------+----------------+
| Variable_name                   | Value          |
+---------------------------------+----------------+
| ft_boolean_syntax               | + -><()~*:""&| |
| ft_max_word_len                 | 84             |
| ft_min_word_len                 | 4              |
| ft_query_expansion_limit        | 20             |
| ft_stopword_file                | (built-in)     |
| innodb_ft_aux_table             |                |
| innodb_ft_cache_size            | 32000000       |
| innodb_ft_enable_diag_print     | ON             |
| innodb_ft_enable_stopword       | ON             |
| innodb_ft_max_token_size        | 84             |
| innodb_ft_min_token_size        | 3              |
| innodb_ft_num_word_optimize     | 2000           |
| innodb_ft_server_stopword_table |                |
| innodb_ft_sort_pll_degree       | 2              |
| innodb_ft_user_stopword_table   |                |
+---------------------------------+----------------+
15 rows in set (0.01 sec)

You’ll also notice that there are a number of new tables in information_schema related to InnoDB Fulltext:

mysql> show tables from information_schema like 'INNODB_FT%';
+-------------------------------------------+
| Tables_in_information_schema (INNODB_FT%) |
+-------------------------------------------+
| INNODB_FT_CONFIG                          |
| INNODB_FT_DELETED                         |
| INNODB_FT_DEFAULT_STOPWORD                |
| INNODB_FT_INDEX_CACHE                     |
| INNODB_FT_BEING_DELETED                   |
| INNODB_FT_INDEX_TABLE                     |
| INNODB_FT_INSERTED                        |
+-------------------------------------------+
7 rows in set (0.00 sec)

Curiously, most of those tables are completely empty!

mysql> select * from INNODB_FT_CONFIG;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_DELETED;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_DELETED;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_INDEX_CACHE;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_BEING_DELETED;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_INDEX_TABLE;
Empty set (0.00 sec)

mysql> select * from INNODB_FT_INSERTED;
Empty set (0.00 sec)

Well, that’s not so curious if you’ve read the manual, which explains that you must set the value of innodb_ft_aux_table to the database & table that you wish to examine. That seems a bit of a round-about way to get at information in information_schema, but alright:

mysql> set innodb_ft_aux_table='test/ibft1';
ERROR 1229 (HY000): Variable 'innodb_ft_aux_table' is a GLOBAL variable and should be set with SET GLOBAL

Ah, yes. This takes me back to the good old 5.0 days when so much of the new functionality of 5.0 relied on the SUPER privilege, which carries with it all sorts of things that don’t really logically go along with the analysis of InnoDB Fulltext indexes. Well, alright, let’s set the thing and take a look at some of these tables.

mysql> set global innodb_ft_aux_table='test/ibft1';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from information_schema.INNODB_FT_CONFIG;
+---------------------------+-------+
| KEY                       | VALUE |
+---------------------------+-------+
| optimize_checkpoint_limit | 180   |
| synced_doc_id             | 7     |
| last_optimized_word       |       |
| deleted_doc_count         | 0     |
| total_word_count          | 8     |
| optimize_start_time       |       |
| optimize_end_time         |       |
| stopword_table_name       |       |
| use_stopword              | 1     |
| table_state               | 0     |
+---------------------------+-------+
10 rows in set (0.00 sec)

mysql> select * from information_schema.INNODB_FT_INDEX_TABLE;
+--------+--------------+-------------+-----------+--------+----------+
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| brown  |            1 |           2 |         2 |      1 |        0 |
| brown  |            1 |           2 |         2 |      2 |        0 |
| cow    |            1 |           3 |         2 |      1 |        6 |
| cow    |            1 |           3 |         2 |      3 |        6 |
| dog    |            2 |           4 |         2 |      2 |        6 |
| dog    |            2 |           4 |         2 |      4 |        6 |
| green  |            6 |           6 |         1 |      6 |        0 |
| horse  |            5 |           5 |         1 |      5 |        7 |
| iguana |            6 |           6 |         1 |      6 |        6 |
| white  |            3 |           4 |         2 |      3 |        0 |
| white  |            3 |           4 |         2 |      4 |        0 |
| yellow |            5 |           5 |         1 |      5 |        0 |
+--------+--------------+-------------+-----------+--------+----------+
12 rows in set (0.01 sec)

Well, there’s some good stuff in there, no doubt. Before I got a sense of what these various things mean, I started wondering about how privileges affect what of this information can be viewed by various users. I created a new database (newdb) and a new table (ibft2):

mysql> create database newdb;
Query OK, 1 row affected (0.00 sec)

mysql> use newdb
Database changed
mysql> create table ibft2 like test.ibft1;
Query OK, 0 rows affected (0.03 sec)

mysql> insert into ibft2 select * from test.ibft1;
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> show create table ibft2G
*************************** 1. row ***************************
       Table: ibft2
Create Table: CREATE TABLE `ibft2` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c1` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `c1` (`c1`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

mysql> select * from ibft2;
+----+--------------+
| id | c1           |
+----+--------------+
|  1 | brown cow    |
|  2 | brown dog    |
|  3 | white cow    |
|  4 | white dog    |
|  5 | yellow horse |
|  6 | green iguana |
+----+--------------+
6 rows in set (0.00 sec)

mysql> set global innodb_ft_aux_table='newdb/ibft2';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from information_schema.INNODB_FT_CONFIG;
+---------------------------+-------+
| KEY                       | VALUE |
+---------------------------+-------+
| optimize_checkpoint_limit | 180   |
| synced_doc_id             | 1     |
| last_optimized_word       |       |
| deleted_doc_count         | 0     |
| total_word_count          |       |
| optimize_start_time       |       |
| optimize_end_time         |       |
| stopword_table_name       |       |
| use_stopword              | 1     |
| table_state               | 0     |
+---------------------------+-------+
10 rows in set (0.00 sec)

Whoa, hold on. Why is the contents of information_schema.INNODB_FT_CONFIG different than when I had innodb_ft_aux_table='test/ibft1'? Let’s confirm that…

mysql> set global innodb_ft_aux_table='test/ibft1';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from information_schema.INNODB_FT_CONFIG;
+---------------------------+-------+
| KEY                       | VALUE |
+---------------------------+-------+
| optimize_checkpoint_limit | 180   |
| synced_doc_id             | 7     |
| last_optimized_word       |       |
| deleted_doc_count         | 0     |
| total_word_count          | 8     |
| optimize_start_time       |       |
| optimize_end_time         |       |
| stopword_table_name       |       |
| use_stopword              | 1     |
| table_state               | 0     |
+---------------------------+-------+
10 rows in set (0.00 sec)

Sure enough, look at total_word_count … for newdb/ibft2 it was undefined, but for test/ibft1 it is 8. I wonder what INNODB_FT_INDEX_TABLE shows for the new table.

mysql> set global innodb_ft_aux_table='newdb/ibft2';
Query OK, 0 rows affected (0.00 sec)

mysql> select * from information_schema.INNODB_FT_INDEX_TABLE;
Empty set (0.00 sec)

mysql> select * from newdb.ibft2 where match(c1) against ('dog');
+----+-----------+
| id | c1        |
+----+-----------+
|  2 | brown dog |
|  4 | white dog |
+----+-----------+
2 rows in set (0.00 sec)

So… INNODB_FT_INDEX_TABLE is empty(!) for the new table, but I can still make queries against this table? Hmm…

mysql> select * from information_schema.INNODB_FT_INDEX_CACHE;
+--------+--------------+-------------+-----------+--------+----------+
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| brown  |            1 |           2 |         2 |      1 |        0 |
| brown  |            1 |           2 |         2 |      2 |        0 |
| cow    |            1 |           3 |         2 |      1 |        6 |
| cow    |            1 |           3 |         2 |      3 |        6 |
| dog    |            2 |           4 |         2 |      2 |        6 |
| dog    |            2 |           4 |         2 |      4 |        6 |
| green  |            6 |           6 |         1 |      6 |        0 |
| horse  |            5 |           5 |         1 |      5 |        7 |
| iguana |            6 |           6 |         1 |      6 |        6 |
| white  |            3 |           4 |         2 |      3 |        0 |
| white  |            3 |           4 |         2 |      4 |        0 |
| yellow |            5 |           5 |         1 |      5 |        0 |
+--------+--------------+-------------+-----------+--------+----------+
12 rows in set (0.00 sec)

Aha! There’s the info I’m looking for; it’s in the _CACHE table because … well, just because it is, I guess. I don’t know why it’s there now and wasn’t there after I created my first table. It turns out that you need to OPTIMIZE TABLE in order to rebuild the index and I guess merge the “new” index data with the “main search index”.

mysql> optimize table ibft2;
+-------------+----------+----------+-------------------------------------------------------------------+
| Table       | Op       | Msg_type | Msg_text                                                          |
+-------------+----------+----------+-------------------------------------------------------------------+
| newdb.ibft2 | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| newdb.ibft2 | optimize | status   | OK                                                                |
+-------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.07 sec)

Oh, first you need to set a different global variable … innodb_optimize_fulltext_only.

mysql> set global innodb_optimize_fulltext_only=1;
Query OK, 0 rows affected (0.01 sec)

mysql> optimize table ibft2;
+-------------+----------+----------+----------+
| Table       | Op       | Msg_type | Msg_text |
+-------------+----------+----------+----------+
| newdb.ibft2 | optimize | status   | OK       |
+-------------+----------+----------+----------+
1 row in set (0.05 sec)

mysql> select * from information_schema.INNODB_FT_INDEX_TABLE;
+--------+--------------+-------------+-----------+--------+----------+
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| brown  |            1 |           2 |         2 |      1 |        0 |
| brown  |            1 |           2 |         2 |      2 |        0 |
| cow    |            1 |           3 |         2 |      1 |        6 |
| cow    |            1 |           3 |         2 |      3 |        6 |
| dog    |            2 |           4 |         2 |      2 |        6 |
| dog    |            2 |           4 |         2 |      4 |        6 |
| green  |            6 |           6 |         1 |      6 |        0 |
| horse  |            5 |           5 |         1 |      5 |        7 |
| iguana |            6 |           6 |         1 |      6 |        6 |
| white  |            3 |           4 |         2 |      3 |        0 |
| white  |            3 |           4 |         2 |      4 |        0 |
| yellow |            5 |           5 |         1 |      5 |        0 |
+--------+--------------+-------------+-----------+--------+----------+
12 rows in set (0.00 sec)

mysql> select * from information_schema.INNODB_FT_CONFIG;
+---------------------------+-------+
| KEY                       | VALUE |
+---------------------------+-------+
| optimize_checkpoint_limit | 180   |
| synced_doc_id             | 7     |
| last_optimized_word       |       |
| deleted_doc_count         | 0     |
| total_word_count          | 8     |
| optimize_start_time       |       |
| optimize_end_time         |       |
| stopword_table_name       |       |
| use_stopword              | 1     |
| table_state               | 0     |
+---------------------------+-------+
10 rows in set (0.00 sec)

Alright, we’re finally back to where we started! The thing I had planned to look at is how privileges interact with this information_schema metadata strategy. Basically, you need SUPER to be able to look at the data, which seems like an obvious potential problem in terms of unprivileged users being able to access information about InnoDB Fulltext indexes on their own tables. The more interesting thing, though, is that unprivileged users can learn quite a bit abount the contents of whatever table innodb_ft_aux_table happens to be set to.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| test               |
+--------------------+
2 rows in set (0.00 sec)

mysql> use newdb;
ERROR 1044 (42000): Access denied for user 'noprivs'@'localhost' to database 'newdb'
mysql> select * from newdb.ibft2;
ERROR 1142 (42000): SELECT command denied to user 'noprivs'@'localhost' for table 'ibft2'
mysql> select table_schema,table_name from information_schema.tables where table_name='ibft2';
Empty set (0.01 sec)
mysql> select @@innodb_ft_aux_table;
+-----------------------+
| @@innodb_ft_aux_table |
+-----------------------+
| newdb/ibft2           |
+-----------------------+
1 row in set (0.00 sec)

mysql> select * from information_schema.innodb_ft_config;
+---------------------------+-------+
| KEY                       | VALUE |
+---------------------------+-------+
| optimize_checkpoint_limit | 180   |
| synced_doc_id             | 7     |
| last_optimized_word       |       |
| deleted_doc_count         | 0     |
| total_word_count          | 8     |
| optimize_start_time       |       |
| optimize_end_time         |       |
| stopword_table_name       |       |
| use_stopword              | 1     |
| table_state               | 0     |
+---------------------------+-------+
10 rows in set (0.00 sec)

mysql> select * from information_schema.innodb_ft_index_table;
+--------+--------------+-------------+-----------+--------+----------+
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| brown  |            1 |           2 |         2 |      1 |        0 |
| brown  |            1 |           2 |         2 |      2 |        0 |
| cow    |            1 |           3 |         2 |      1 |        6 |
| cow    |            1 |           3 |         2 |      3 |        6 |
| dog    |            2 |           4 |         2 |      2 |        6 |
| dog    |            2 |           4 |         2 |      4 |        6 |
| green  |            6 |           6 |         1 |      6 |        0 |
| horse  |            5 |           5 |         1 |      5 |        7 |
| iguana |            6 |           6 |         1 |      6 |        6 |
| white  |            3 |           4 |         2 |      3 |        0 |
| white  |            3 |           4 |         2 |      4 |        0 |
| yellow |            5 |           5 |         1 |      5 |        0 |
+--------+--------------+-------------+-----------+--------+----------+
12 rows in set (0.00 sec)

My user can’t even see whether a particular table or database exists, but he can get all kinds of information about the contents of the table by looking at information_schema.INNODB_FT_INDEX_TABLE/CACHE when innodb_ft_aux_table is set. Not too big a problem, though, right, because any user who sets innodb_ft_aux_table can just unset it after they’ve gathered whatever metadata/statistics they need. Not so fast!

mysql> set global innodb_ft_aux_table='';
ERROR 1231 (42000): Variable 'innodb_ft_aux_table' can't be set to the value of ''

Yikes. I am trying to be careful to remind myself that this is the first appearance of InnoDB Fulltext in a public release, but I hope these quirks are just side-effects of immaturity and not design decisions that are intended to be kept around for very long!