MariaDB Enterprise ColumnStore Query Evaluation

This page is part of MariaDB's Documentation.

The parent of this page is: MariaDB Enterprise ColumnStore Architecture

Topics on this page:

Overview
Scalability
- Horizontal Scalability
- Vertical Scalability
Extent Elimination
Custom Select Handler
Smart Storage Engine
- Configure the Select Handler
Joins
- Configure In-Memory Joins
- Configure Disk-Based Joins
Aggregations
- Configure Disk-Based Aggregations
Query Planning
ExeMgr Process/Facility
Query Evaluation Process

Overview

MariaDB Enterprise ColumnStore is a smart storage engine designed to efficiently execute analytical queries using distributed query execution and massively parallel processing (MPP) techniques.

Scalability

MariaDB Enterprise ColumnStore is designed to achieve vertical and horizontal scalability for production analytics using distributed query execution and massively parallel processing (MPP) techniques.

Enterprise ColumnStore evaluates each query as a sequence of job steps using sophisticated techniques to get the best performance for complex analytical queries. Some types of job steps are designed to scale with the system's resources. As you increase the number of ColumnStore nodes or the number of cores on each node, Enterprise ColumnStore can use those resources to more efficiently execute those types of job steps.

Enterprise ColumnStore stores each column on disk in extents. The storage format is designed to maintain scalability, even as the table grows. If an operation does not read parts of a large table, I/O costs are reduced. Enterprise ColumnStore uses a technique called extent elimination that compares the maximum and minimum values in the extent map to the query's conditions, and it avoids scanning extents that don't satisfy the conditions.

Enterprise ColumnStore provides exceptional scalability for analytical queries. Enterprise ColumnStore's design supports targeted scale-out to address increased workload requirements, whether it is a larger query load or increased storage and query processing capacity.

Horizontal Scalability

MariaDB Enterprise ColumnStore provides horizontal scalability by executing some types of job steps in a distributed manner using multiple nodes.

When Enterprise ColumnStore is evaluating a job step, the ExeMgr process or facility on the initiator/aggregator node requests the PrimProc process on each node to perform the job step on different extents in parallel. As more nodes are added, Enterprise ColumnStore can perform more work in parallel.

Enterprise ColumnStore also uses massively parallel processing (MPP) techniques to speed up some types of job steps. For some types of aggregation operations, each node can perform an initial local aggregation, and then the initiator/aggregator node only needs to combine the local results and perform a final aggregation. This technique can be very efficient for some types of aggregation operations, such as for queries that use the AVG(), COUNT(), or SUM() aggregate functions.

Vertical Scalability

MariaDB Enterprise ColumnStore provides vertical scalability by executing some types of job steps in a multi-threaded manner using a thread pool.

When the PrimProc process on a node receives work, it executes the job step on an extent in a multi-threaded manner using a thread pool. Each thread operates on a different block within the extent. As more CPUs are added, Enterprise ColumnStore can work on more blocks in parallel.

Extent Elimination

MariaDB Enterprise ColumnStore uses extent elimination to scale query evaluation as table size increases.

Most databases are row-based databases that use manually-created indexes to achieve high performance on large tables. This works well for transactional workloads. However, analytical queries tend to have very low selectivity, so traditional indexes are not typically effective for analytical queries.

Enterprise ColumnStore uses extent elimination to achieve high performance, without requiring manually created indexes. Enterprise ColumnStore automatically partitions all data into extents. Enterprise ColumnStore stores the minimum and maximum values for each extent in the extent map. Enterprise ColumnStore uses the minimum and maximum values in the extent map to perform extent elimination.

When Enterprise ColumnStore performs extent elimination, it compares the query's join conditions and filter conditions (i.e., WHERE clause) to the minimum and maximum values for each extent in the extent map. If the extent's minimum and maximum values fall outside the bounds of the query's conditions, Enterprise ColumnStore skips that extent for the query.

Extent elimination is automatically performed for every query. It can significantly decrease I/O for columns with clustered values. For example, extent elimination works effectively for series, ordered, patterned, and time-based data.

Custom Select Handler

The ColumnStore storage engine plugin implements a custom select handler to fully take advantage of Enterprise ColumnStore's capabilities.

All storage engines interact with ES using an internal handler API, which is highly extensible. Storage engines can implement different features by implementing different methods within the handler API.

For select statements, the handler API transforms each query into a SELECT_LEX object, which is provided to the select handler.

Before ES 10.4, the handler API implemented a generic select handler that all storage engines used. Starting with ES 10.4, the handler API allows storage engines to implement a custom select handler. Implementing a custom select handler enables storage engines to take full advantage of unique features.

The generic select handler is not optimal for Enterprise ColumnStore, because:

Enterprise ColumnStore selects data by column, but the generic select handler selects data by row
Enterprise ColumnStore supports parallel query evaluation, but the generic select handler does not
Enterprise ColumnStore supports distributed aggregations, but the generic select handler does not
Enterprise ColumnStore supports distributed functions, but the generic select handler does not
Enterprise ColumnStore supports extent elimination, but the generic select handler does not
Enterprise ColumnStore has its own query planner, but the generic select handler cannot use it

Smart Storage Engine

The ColumnStore storage engine plugin is known as a smart storage engine, because it implements a custom select handler. MariaDB Enterprise ColumnStore integrates with MariaDB Enterprise Server using the ColumnStore storage engine plugin. The ColumnStore storage engine plugin enables MariaDB Enterprise Server to interact with ColumnStore tables.

If a storage engine implements a custom select handler, it is known as a smart storage engine.

As a smart storage engine, the ColumnStore storage engine plugin tightly integrates Enterprise ColumnStore with ES, but it has enough independence to efficiently execute analytical queries using a completely unique approach.

Configure the Select Handler

The ColumnStore storage engine can use either the custom select handler or the generic select handler. The select handler can be configured using the columnstore_select_handler system variable:

Value	Description
`AUTO`	When set to `AUTO`, Enterprise ColumnStore automatically chooses the best select handler for a given `SELECT` query. `AUTO` was added in Enterprise ColumnStore 6.
`OFF`	When set to `OFF`, Enterprise ColumnStore uses the generic select handler for all `SELECT` queries. It is not recommended to use this value, unless recommended by MariaDB Support.
`ON`	When set to `ON`, Enterprise ColumnStore uses the custom select handler for all `SELECT` queries. `ON` is the default in Enterprise ColumnStore 5 and Enterprise ColumnStore 6.

Joins

MariaDB Enterprise ColumnStore performs join operations using hash joins.

By default, hash joins are performed in memory.

Configure In-Memory Joins

MariaDB Enterprise ColumnStore can be configured to allocate more memory for hash joins.

The relevant configuration options are:

Section	Option	Description
`HashJoin`	`PmMaxMemorySmallSide`	Configures the amount of memory available for a single join. Valid values are from 0 to 4 GB. Default value is 1 GB.
`HashJoin`	`TotalUmMemory`	Configures the amount of memory available for all joins. Values can be specified as a percentage of total system memory or as a specific amount of memory. Valid percentage values are from 0 to 100%. Default value is 25%.

For example, to configure Enterprise ColumnStore to use more memory for hash joins using the mcsSetConfig utility:

$ mcsSetConfig HashJoin PmMaxMemorySmallSide 2G
$ mcsSetConfig HashJoin TotalUmMemory '40%'

Configure Disk-Based Joins

MariaDB Enterprise ColumnStore can be configured to perform disk-based joins.

The relevant configuration options are:

Section	Option	Description
`HashJoin`	`AllowDiskBasedJoin`	Enables disk-based joins Valid values are `Y` and `N` Default value is `N`
`HashJoin`	`TempFileCompression`	Enables compression for temporary files used by disk-based joins Valid values are `Y` and `N` Default value is `N`
`SystemConfig`	`SystemTempFileDir`	Configures the directory used for temporary files used by disk-based joins and aggregations Default value is `/tmp/columnstore_tmp_files`

For example, to configure Enterprise ColumnStore to perform disk-based joins using the mcsSetConfig utility:

$ mcsSetConfig HashJoin AllowDiskBasedJoin Y
$ mcsSetConfig HashJoin TempFileCompression Y
$ mcsSetConfig SystemConfig SystemTempFileDir /mariadb/tmp

Aggregations

MariaDB Enterprise ColumnStore performs aggregation operations on all nodes in a distributed manner, and then all nodes send their results to a single node, which combines the results and performs the final aggregation.

By default, aggregation operations are performed in memory.

Configure Disk-Based Aggregations

In Enterprise ColumnStore 5.6.1 and later, disk-based aggregations can be configured.

The relevant configuration options are:

Section	Option	Description
`RowAggregation`	`AllowDiskBasedAggregation`	Enables disk-based aggregations Valid values are `Y` and `N` Default value is `N`
`RowAggregation`	`Compression`	Enables compression of temporary files used for disk-based aggregations Default value is `SNAPPY`
`SystemConfig`	`SystemTempFileDir`	Configures the directory used for temporary files used by disk-based joins and aggregations Default value is `/tmp/columnstore_tmp_files`

For example, to configure Enterprise ColumnStore to perform disk-based aggregations using the mcsSetConfig utility:

$ mcsSetConfig RowAggregation AllowDiskBasedAggregation Y
$ mcsSetConfig RowAggregation Compression SNAPPY
$ mcsSetConfig SystemConfig SystemTempFileDir /mariadb/tmp

Query Planning

The ColumnStore storage engine plugin is a smart storage engine, so MariaDB Enterprise ColumnStore to plan its own queries using the custom select handler.

MariaDB Enterprise ColumnStore's query planning is divided into two steps:

ES provides the query's SELECT_LEX object to the custom select handler. The custom select handler builds a ColumnStore Execution Plan (CSEP).
The custom select handler provides the CSEP to the ExeMgr process or facility on the same node. ExeMgr performs extent elimination and creates a job list.

ExeMgr Process/Facility

The ColumnStore storage engine provides the CSEP to the ExeMgr process or facility on the same node, which will act as the initiator/aggregator node for the query.

Starting with MariaDB Enterprise ColumnStore 22.08, the ExeMgr facility has been integrated into the PrimProc process, so it is no longer a separate process.

ExeMgr performs multiple tasks:

Performs extent elimination.
Views the optimizer statistics.
Transforms the CSEP to a job list, which consists of job steps.
Assigns distributed job steps to the PrimProc process on each node.
Evaluates non-distributed job steps itself.
Provides final query results to ES.

Query Evaluation Process

When Enterprise ColumnStore executes a query, it goes through the following process:

The client or application sends the query to MariaDB MaxScale's listener port.
The query is processed by the Read/Write Split Router (readwritesplit) service associated with the listener.
The service routes the query to the ES TCP port on a ColumnStore node.
MariaDB Enterprise Server (ES) evaluates the query using the handler interface.
1. The handler interface builds a SELECT_LEX object to represent the query.
2. The handler interface provides the SELECT_LEX object to the ColumnStore storage engine's select handler.
3. The select handler transforms the SELECT_LEX object into a ColumnStore Execution Plan (CSEP).
4. The select handler provides the CSEP to the ExeMgr facility on the same node, which will act as the initiator/aggregator node for the query.
ExeMgr transforms the CSEP into a job list, which consists of job steps.
ExeMgr evaluates each job step sequentially.
1. If it is a non-distributed job step, ExeMgr evaluates the job step itself.
2. If it is a distributed job step, ExeMgr provides the job step to the PrimProc process on each node. The PrimProc process on each node evaluates the job step in a multi-threaded manner using a thread pool. After the PrimProc process on each node evaluates its job step, the results are returned to ExeMgr on the initiator/aggregator node as a Row Group.
After all job steps are evaluated, ExeMgr returns the results to ES.
ES returns the results to MaxScale.
MaxScale returns the results to the client or application.