Comments - SET Data Type

6 years, 6 months ago Slawomir Pryczek

Just a quick comment about this, as im trying to optimize a task scheduler and I was trying to find a way to index an enum column or to redesign the table whole yesterday, as it's getting bigger and it appears that mysql can't use indexes so we're not able to pick pending tasks fast enough. I wasn't able to find anything related to the subject, beside that's not possible / bug reports that index can't be used for SET...

It appears to the contrary to what's written it can be done using VIRTUAL COLUMNS.

Table definition

`tasks_running` set('INIT_LIST','INIT_PREBUILD','SAVE_DATA'...) NOT NULL DEFAULT ''
`tasks_completed` set('INIT_LIST','INIT_PREBUILD','SAVE_DATA'...) NOT NULL DEFAULT ''
[...]
So we'll have about 10 different task types which can be running or completed and we need to select specific task type(s) to be run, or to be checked (in running state). This will result in full table scan. In this case we're getting next pending task of type INIT_LIST...

SELECT memlist FROM scheduler
  WHERE
    NOT FIND_IN_SET('INIT_LIST', tasks_running) AND
    NOT FIND_IN_SET('INIT_LIST', tasks_completed) 
    ORDER BY created ASC LIMIT 1

Now maria / mysql can only use index if we're doing static comparison (column = "a,b,c"), we can't use index for finding if some value is just present in set. But we can use virtual columns to create indexes which will help in filtering out rows...

Multi-set case

If we have sets with same values which are co-related as like in this example (task can be only in 3 states - paused, running, completed), we can create 1 char column and put index on it. Notice that IF output will be trimmed, so that's why it can be done much simpler, without nested IFs... so in this case c means that task is completed and r running.

ALTER TABLE `scheduler` ADD COLUMN
`_status_init_list` char(1) AS (
  CONCAT(
    IF(FIND_IN_SET(tasks_completed, 'INIT_LIST'), "c", ""),
    IF(FIND_IN_SET(tasks_running, 'INIT_LIST'), "r", "")
  )
) PERSISTENT AFTER _filter,
ADD INDEX (`_status_init_list`);

Including new condition, to help mariadb filter out unneded rows. Q time drop from couple seconds to <0.1s.

SELECT memlist FROM scheduler
  WHERE
    NOT FIND_IN_SET('INIT_LIST', tasks_running) AND
    NOT FIND_IN_SET('INIT_LIST', tasks_completed) AND
    _status_init_list = "" /* empty string, not completed and not running */
    ORDER BY created ASC LIMIT 1

We need to create single index and single column for each value for which we want to speed up lookup, in SET.

Single-set case

If we want to index just single SET (eg. task_completed in following example) it's much simpler than multi-set. In this case we can just use y/n to indicate if that value is present in set.

ALTER TABLE `scheduler` ADD COLUMN
`_status_compl_init_list` char(1) AS (
   IF(FIND_IN_SET(tasks_completed, 'INIT_LIST'), "y", "n")
  )
) PERSISTENT AFTER _filter,
ADD INDEX (`_status_compl_init_list`);

And code can be easily written to both:

  • automatically create needed virtual columns, when schema changes.
  • add filtering condition to the query only if needed virtual column is present
 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.