Data Types for MariaDB Enterprise ColumnStore

Overview

MariaDB Enterprise ColumnStore supports a subset of the MariaDB Enterprise Server data types.

Compatibility

  • MariaDB Enterprise ColumnStore 5

  • MariaDB Enterprise ColumnStore 6

Supported Data Types

Supported Integer Data Types

In MariaDB Enterprise ColumnStore, the supported integer data types are:

Data Type

Size

Signed Range

Unsigned Range

BOOLEAN

1 byte

0 - 1

0 - 1

TINYINT

1 byte

-126 - 127

0 - 253

SMALLINT

2 bytes

-32766 - 32767

0 - 65533

MEDIUMINT

3 bytes

-8388608 - 8388607

0 - 16777215

INT

4 bytes

-2147483646 - 2147483647

0 - 4294967293

BIGINT

8 bytes

-9223372036854775806 - 9223372036854775807

0 - 18446744073709551613

  • Enterprise ColumnStore reserves 2 integer values for internal use, so the maximum integer values are 2 less than the maximum integer values for Enterprise Server.

  • Enterprise ColumnStore uses "saturation semantics" for integer values. If a value is too big for an integer data type, Enterprise ColumnStore will substitute the maximum value for the specific integer data type. If a value is too small for an integer data type, Enterprise ColumnStore will substitute the minimum value for the specific integer data type.

Supported Fixed/Floating Point Numeric Data Types

In MariaDB Enterprise ColumnStore, the supported fixed and floating point numeric data types are:

Data Type

Aliases

Size

Format

DECIMAL[(M[,D])]

  • 2, 4, 8, or 16 bytes (ColumnStore 6)

  • 2, 4, or 8 bytes (ColumnStore 5)

Packed fixed-point number

FLOAT

4 bytes

IEEE-754 floating point number

DOUBLE

8 bytes

IEEE-754 floating point number

  • Enterprise ColumnStore stores DECIMAL, NUMERIC, and FIXED columns as a scaled integer without packing.

  • Enterprise ColumnStore supports specifying precision (M) and scale (D) for DECIMAL, NUMERIC, and FIXED:

    • In Enterprise ColumnStore 6, the maximum precision (M) is 38, and the maximum scale (D) is 38.

    • In Enterprise ColumnStore 5, the maximum precision (M) is 18, and the maximum scale (D) is 18.

    • When not specified, the default precision (M) is 10, and the default scale (D) is determined by the columnstore_decimal_scale system variable.

  • Enterprise ColumnStore does not support specifying precision and scale for FLOAT, DOUBLE, and REAL. For FLOAT, the maximum significant digits is about 6, and the range is +/- 1e38. For DOUBLE / REAL, the maximum significant digits is about 15, and the range is about +/- 1e308.

  • When executing SELECT statements bulk loads, Enterprise ColumnStore truncates, rather than rounds, numeric values that have too many decimal places.

  • When executing INSERT and UPDATE statements, Enterprise ColumnStore rounds numeric values that have too many decimal places.

Supported String Data Types

In MariaDB Enterprise ColumnStore, the supported string data types are:

Data Type

Size

Maximum Size

VARCHAR

1-4 bytes per character

8000 bytes

CHAR

1-4 bytes per character

255 bytes

  • Enterprise ColumnStore does not support the non-standard extensions VARCHAR(0) and CHAR(0), whereas other engines may support them.

  • Enterprise ColumnStore supports a maximum of 8000 bytes per VARCHAR column, which results in a maximum length for different character sets:

    • For 1-byte character sets (such as latin1), Enterprise ColumnStore supports up to VARCHAR(8000).

    • For 2-byte character sets (such as ucs2), Enterprise ColumnStore supports up to VARCHAR(4000).

    • For 3-byte character sets (such as utf8), Enterprise ColumnStore supports up to VARCHAR(2666).

    • For 4-byte character sets (such as utf8mb4), Enterprise ColumnStore supports up to VARCHAR(2000).

  • Enterprise ColumnStore supports a maximum of 255 bytes per CHAR column, which results in a maximum length for different character sets:

    • For 1-byte character sets (such as latin1), Enterprise ColumnStore supports up to CHAR(255).

    • For 2-byte character sets (such as ucs2), Enterprise ColumnStore supports up to VARCHAR(127).

    • For 3-byte character sets (such as utf8), Enterprise ColumnStore supports up to VARCHAR(85).

    • For 4-byte character sets (such as utf8mb4), Enterprise ColumnStore supports up to VARCHAR(63).

  • Enterprise ColumnStore stores large string values in dictionary extents, which do not support extent elimination, so it is recommended to avoid creating long string columns.

Supported Large Object Data Types

In MariaDB Enterprise ColumnStore, the supported large object data types are:

Data Type

Size

TINYBLOB

255 bytes

TINYTEXT

255 bytes

BLOB

64 KB

TEXT

64 KB

MEDIUMBLOB

16 MB

MEDIUMTEXT

16 MB

LONGBLOB

16 MB

LONGTEXT

16 MB

  • Enterprise ColumnStore supports a maximum size of 16 MB for the LONGBLOB and LONGTEXT data types, whereas other engines support a maximum size of 4 GB.

  • Enterprise ColumnStore stores large object values in dictionary extents, which do not support extent elimination.

Supported Date and Time Data Types

In MariaDB Enterprise ColumnStore, the supported date and time data types are:

Data Type

Size

DATE

4 bytes

DATETIME

8 bytes

TIME

8 bytes

TIMESTAMP

4 bytes

Data Type Recommendations

When you are designing a schema for Enterprise ColumnStore, you should follow some best practices to get the best performance.

Choose Appropriate Size for Numeric Data Types

Enterprise ColumnStore can use I/O much more efficiently if you choose the smallest possible size for a numeric data type, because the smaller type would allow Enterprise ColumnStore to fit more values into each extent.

For example, when the values for an integer column are small, queries can perform better when TINYINT is used instead of BIGINT, because more values can fit into each extent with TINYINT.

For best performance, choose the smallest data type possible for a given column.

Avoid Creating Long String Columns

It is recommended to avoid long string columns, because Enterprise ColumnStore can only store 8 bytes for each string value in a column's extent. Multi-byte character sets can cause the effective limit to be even lower. If a string value is longer than the effective limit, the value is stored in a separate dictionary extent, and the column's extent contains a pointer to the value in the dictionary extent.

The effective limit depends on the specific data type used by the column:

  • For the CHAR data type, a maximum of 8 bytes of the value can be the column's extent.

  • For the VARCHAR data type, a maximum of 7 bytes of the value can be stored in the column's extent, because 1 byte is required to store the length of the value.

The effective limit also depends on the specific character set used by the column:

  • For 1-byte character sets, such as latin1: if the column is declared as CHAR(8) or VARCHAR(7) or shorter, the value is stored in the column's extent.

  • For 2-byte character sets, such as ucs2: if the column is declared as CHAR(4) or VARCHAR(3) or shorter, the value is stored in the column's extent.

  • For 3-byte character sets, such as utf8: if the column is declared as CHAR(2) or VARCHAR(2) or shorter, the value is stored in the column's extent.

When a string column is stored in a separate dictionary extent, it creates multiple problems:

  • When Enterprise ColumnStore reads the values, it will have to read the pointer from the column's extent, and then it will have to read the value from the separate dictionary extent. As a consequence, Enterprise ColumnStore will have to read twice as many extents, so the query will require more I/O.

  • When Enterprise ColumnStore evaluates join conditions and filter conditions (i.e., WHERE clause), it usually uses extent elimination to reduce I/O. However, extent elimination does not apply to dictionary extents. As a consequence, Enterprise ColumnStore will have to read every column extent, so the query may require even more I/O.

These problems can be avoided by following the recommended best practices:

  • If a column's values can fit into a non-string data type, declare the column using the non-string data type, rather than the string data type. The specific details depend on the type of data:

    • If the column's values are compatible with a numeric data type (INT, FLOAT, NUMERIC, etc.), declare the column using the smallest numeric data type that is most appropriate for your data. This could apply to columns storing information like zip codes, product IDs, account numbers, etc.

    • If the column's values are compatible with a date and time data type (DATE, DATETIME, TIME, TIMESTAMP), declare the column using the date and time data type that is most appropriate for your data.

  • If a column's values can fit into a 1-byte character set, declare the column using the 1-byte character set.

  • If a column's values can only fit into a long string data type, try to avoid referencing the column in join conditions and filter conditions (i.e., WHERE clause).