MariaDB Enterprise ColumnStore Schema Design

When designing a schema for MariaDB Enterprise ColumnStore, there are some important details to consider.

Best Practices

When you are designing a schema for Enterprise ColumnStore, you should follow some best practices to get the best performance.

Choose Appropriate Size for Numeric Data Types

Enterprise ColumnStore can use I/O much more efficiently if you choose the smallest possible size for a numeric data type, because the smaller type would allow Enterprise ColumnStore to fit more values into each extent.

For example, when the values for an integer column are small, queries can perform better when TINYINT is used instead of BIGINT, because more values can fit into each extent with TINYINT.

For best performance, choose the smallest data type possible for a given column.

Avoid Creating Long String Columns

It is recommended to avoid long string columns, because Enterprise ColumnStore can only store 8 bytes for each string value in a column's extent. Multi-byte character sets can cause the effective limit to be even lower. If a string value is longer than the effective limit, the value is stored in a separate dictionary extent, and the column's extent contains a pointer to the value in the dictionary extent.

The effective limit depends on the specific data type used by the column:

  • For the CHAR data type, a maximum of 8 bytes of the value can be the column's extent.

  • For the VARCHAR data type, a maximum of 7 bytes of the value can be stored in the column's extent, because 1 byte is required to store the length of the value.

The effective limit also depends on the specific character set used by the column:

  • For 1-byte character sets, such as latin1: if the column is declared as CHAR(8) or VARCHAR(7) or shorter, the value is stored in the column's extent.

  • For 2-byte character sets, such as ucs2: if the column is declared as CHAR(4) or VARCHAR(3) or shorter, the value is stored in the column's extent.

  • For 3-byte character sets, such as utf8: if the column is declared as CHAR(2) or VARCHAR(2) or shorter, the value is stored in the column's extent.

When a string column is stored in a separate dictionary extent, it creates multiple problems:

  • When Enterprise ColumnStore reads the values, it will have to read the pointer from the column's extent, and then it will have to read the value from the separate dictionary extent. As a consequence, Enterprise ColumnStore will have to read twice as many extents, so the query will require more I/O.

  • When Enterprise ColumnStore evaluates join conditions and filter conditions (i.e., WHERE clause), it usually uses extent elimination to reduce I/O. However, extent elimination does not apply to dictionary extents. As a consequence, Enterprise ColumnStore will have to read every column extent, so the query may require even more I/O.

These problems can be avoided by following the recommended best practices:

  • If a column's values can fit into a non-string data type, declare the column using the non-string data type, rather than the string data type. The specific details depend on the type of data:

    • If the column's values are compatible with a numeric data type (INT, FLOAT, NUMERIC, etc.), declare the column using the smallest numeric data type that is most appropriate for your data. This could apply to columns storing information like zip codes, product IDs, account numbers, etc.

    • If the column's values are compatible with a date and time data type (DATE, DATETIME, TIME, TIMESTAMP), declare the column using the date and time data type that is most appropriate for your data.

  • If a column's values can fit into a 1-byte character set, declare the column using the 1-byte character set.

  • If a column's values can only fit into a long string data type, try to avoid referencing the column in join conditions and filter conditions (i.e., WHERE clause).