MariaDB Enterprise ColumnStore Schema Design
When designing a schema for MariaDB Enterprise ColumnStore, there are some important details to consider.
When you are designing a schema for Enterprise ColumnStore, you should follow some best practices to get the best performance.
Choose Appropriate Size for Numeric Data Types
Enterprise ColumnStore can use I/O much more efficiently if you choose the smallest possible size for a numeric data type, because the smaller type would allow Enterprise ColumnStore to fit more values into each extent.
For best performance, choose the smallest data type possible for a given column.
Avoid Creating Long String Columns
It is recommended to avoid long string columns, because Enterprise ColumnStore can only store 8 bytes for each string value in a column's extent. Multi-byte character sets can cause the effective limit to be even lower. If a string value is longer than the effective limit, the value is stored in a separate dictionary extent, and the column's extent contains a pointer to the value in the dictionary extent.
The effective limit depends on the specific data type used by the column:
For the CHAR data type, a maximum of 8 bytes of the value can be the column's extent.
For the VARCHAR data type, a maximum of 7 bytes of the value can be stored in the column's extent, because 1 byte is required to store the length of the value.
The effective limit also depends on the specific character set used by the column:
For 1-byte character sets, such as
latin1: if the column is declared as
VARCHAR(7)or shorter, the value is stored in the column's extent.
For 2-byte character sets, such as
ucs2: if the column is declared as
VARCHAR(3)or shorter, the value is stored in the column's extent.
For 3-byte character sets, such as
utf8: if the column is declared as
VARCHAR(2)or shorter, the value is stored in the column's extent.
When a string column is stored in a separate dictionary extent, it creates multiple problems:
When Enterprise ColumnStore reads the values, it will have to read the pointer from the column's extent, and then it will have to read the value from the separate dictionary extent. As a consequence, Enterprise ColumnStore will have to read twice as many extents, so the query will require more I/O.
When Enterprise ColumnStore evaluates join conditions and filter conditions (i.e.,
WHEREclause), it usually uses extent elimination to reduce I/O. However, extent elimination does not apply to dictionary extents. As a consequence, Enterprise ColumnStore will have to read every column extent, so the query may require even more I/O.
These problems can be avoided by following the recommended best practices:
If a column's values can fit into a non-string data type, declare the column using the non-string data type, rather than the string data type. The specific details depend on the type of data:
If a column's values can fit into a 1-byte character set, declare the column using the 1-byte character set.
If a column's values can only fit into a long string data type, try to avoid referencing the column in join conditions and filter conditions (i.e.,