Storage-Engine Independent Column Compression
MariaDB starting with 10.3.2
Storage-engine independent support for column compression was introduced in MariaDB 10.3.2
Contents
Storage-engine independent column compression enables TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT, VARCHAR and VARBINARY columns to be compressed.
This is performed by means of a new COMPRESSED column attribute:
COMPRESSED[=<compression_method>]
Field Length Compatibility
When using COMPRESSED attribute, check that FIELD LENGTH is reduced by 1, for example: BLOB have 65535 length, and BLOB COMPRESSED have 65535-1 length, more info check MDEV-15592.
Using BLOB, with max length = 65535, and inserting into BLOB COMPRESSED with max length=65534 generate an error (ERROR 1406 (22001): Data too long for column 'xxx' at row 1), increase the COMPRESSED LENGTH to +1 changing from BLOB COMPRESSED to LONGBLOB COMPRESSED to solve this problem
New System Variables
column_compression_threshold
- Description: Minimum column data length eligible for compression.
- Commandline:
--column-compression-threshold=#
- Scope: Global, Session
- Dynamic: Yes
- Data Type:
numeric
- Default Value:
100
- Range:
0
to4294967295
column_compression_zlib_level
- Description: zlib compression level (1 gives best speed, 9 gives best compression).
- Commandline:
--column-compression-threshold=#
- Scope: Global, Session
- Dynamic: Yes
- Data Type:
numeric
- Default Value:
6
- Range:
1
to9
column_compression_zlib_strategy
- Description: The strategy parameter is used to tune the compression algorithm. Use the value
DEFAULT_STRATEGY
for normal data,FILTERED
for data produced by a filter (or predictor),HUFFMAN_ONLY
to force Huffman encoding only (no string match), orRLE
to limit match distances to one (run-length encoding). Filtered data consists mostly of small values with a somewhat random distribution. In this case, the compression algorithm is tuned to compress them better. The effect ofFILTERED
is to force more Huffman coding and less string matching; it is somewhat intermediate betweenDEFAULT_STRATEGY
andHUFFMAN_ONLY
.RLE
is designed to be almost as fast asHUFFMAN_ONLY
, but give better compression for PNG image data. The strategy parameter only affects the compression ratio but not the correctness of the compressed output even if it is not set appropriately.FIXED
prevents the use of dynamic Huffman codes, allowing for a simpler decoder for special applications. - Commandline:
--column-compression-zlib_strategy=#
- Scope: Global, Session
- Dynamic: Yes
- Data Type:
enum
- Default Value:
DEFAULT_STRATEGY
- Valid Values:
DEFAULT_STRATEGY
,FILTERED
,HUFFMAN_ONLY
,RLE
,FIXED
column_compression_zlib_wrap
- Description: If set to
1
(0
is default), generate zlib header and trailer and compute adler32 check value. It can be used with storage engines that don't provide data integrity verification to detect data corruption. - Commandline:
--column-compression-zlib-wrap{=0|1}
- Scope: Global, Session
- Dynamic: Yes
- Data Type:
boolean
- Default Value:
OFF
New Status Variables
Column_compressions
- Description: Incremented each time field data is compressed.
- Scope: Global, Session
- Data Type:
numeric
Column_decompressions
- Description: Incremented each time field data is decompressed.
- Scope: Global, Session
- Data Type:
numeric
Limitations
- The only supported method currently is zlib.
- The CSV storage engine stores data uncompressed on-disk even if the COMPRESSED attribute is present.
- It is not possible to create indexes over compressed columns.
Examples
CREATE TABLE cmp (i TEXT COMPRESSED); CREATE TABLE cmp2 (i TEXT COMPRESSED=zlib);