1 of 10

High Availability

MariaDB ColumnStore ensures high availability with multi-node setups and shared storage, while MaxScale adds monitoring and failover for continuous analytics.

Analyzing Queries

Determining Active Queries

SHOW PROCESSLIST

The MariaDB SHOW PROCESSLIST statement is used to see a list of active queries on that UM:

getActiveSQLStatements

getActiveSQLStatements is a mcsadmin command that shows which SQL statements are currently being executed on the database:

Analysis of Individual Queries

Query Statistics

The calGetStats function provides statistics about resources used on the node, and network by the last run query. Example:

The output contains information on:

MaxMemPct - Peak memory utilization on the , likely in support of a large (User Module) based hash join operation.
NumTempFiles - Report on any temporary files created in support of query operations larger than available memory, typically for unusual join operations where the smaller table join cardinality exceeds some configurable threshold.
TempFileSpace - Report on space used by temporary files created in support of query operations larger than available memory, typically for unusual join operations where the smaller table join cardinality exceeds some configurable threshold.

The output is useful to determine how much physical I/O was required, how much data was cached, and how many partition blocks were eliminated through use of extent map elimination. The system maintains min / max values for each extent and uses these to help implement where clause filters to completely bypass extents where the value is outside of the min/max range. When a column is ordered (or semi-ordered) during load such as a time column this offer very large performance gains as the system can avoid scanning many extents for the column.

Query Plan / Trace

While the MariaDB Server's utility can be used to look at the query plan, it is somewhat less helpful for ColumnStore tables as ColumnStore does not use indexes or make use of MariaDB I/O functionality. The execution plan for a query on a ColumnStore table is made up of multiple steps. Each step in the query plan performs a set of operations that are issued from the to the set of in support of a given step in a query.

Full Column Scan - an operation that scans each entry in a column using all available threads on the Performance Modules. Speed of operation is generally related to the size of the data type and the total number of rows in the column. The closest analogy for a traditional system is an index scan operation.
Partitioned Column Scan - an operation that uses the Extent Map to identify that certain portions of the column do not contain any matching values for a given set of filters. The closest analogy for a traditional row-based DBMS is a partitioned index scan, or partitioned table scan operation.
Column lookup by row offset - once the set of matching filters have been applied and the minimal set of rows have been identified; additional blocks are requested using a calculation that determines exactly which block is required. The closest analogy for a traditional system is a lookup by rowid.

These operations are automatically executed together in order to execute appropriate filters and column lookup by row offset.

Viewing the ColumnStore Query Plan

In MariaDB ColumnStore there is a set of SQL tracing stored functions provided to see the distributed query execution plan between the nodes.

The basic steps to using these SQL tracing stored functions are:

Start the trace for the particular session.
Execute the SQL statement in question.
Review the trace collected for the statement. As an example, the following session starts a trace, issues a query against a 6 million row fact table and 300,000 row dimension table, and then reviews the output from the trace:

The columns headings in the output are as follows:

Desc – Operation being executed. Possible values:
- BPS - Batch Primitive Step: scanning or projecting the column blocks.
- CES - Cross Engine Step: Performing Cross engine join
- DSS - Dictionary Structure Step: a dictionary scan for a particular variable length string value.

Note: The time recorded is the time from PrimProc and ExeMgr. Execution time from withing mysqld is not tracked here. There could be extra processing time in mysqld due to a number of factors such as ORDER BY.

Cache Clearing to Enable Cold Testing

Sometimes it can be useful to clear caches to allow understanding of un-cached and cached query access. The calFlushCache() function will clear caches on all servers. This is only really useful for testing query performance:

Viewing Extent Map Information

It can be useful to view details about the extent map for a given column. This can be achieved using the edit item process on any ColumnStore server. Available arguments can be provided by using the -h flag. The most common use is to provide the column object id with the -o argument which will output details for the column and in this case the -t argument is provided to show min / max values as dates:

Here it can be seen that the extent maps for the o_orderdate (object id 3032) column are well partitioned since the order table source data was sorted by the order_date. This example shows 2 separate DBRoot values as the environment was a 2-node combined deployment.

Column object ids may be found by querying the calpontsys.syscolumn metadata table (deprecated) or information_schema.columnstore_columns table (version 1.0.6+).

Query Statistics History

MariaDB ColumnStore query statistics history can be retrieved for analysis. By default the query stats collection is disabled. To enable the collection of query stats, the element in the ColumnStore.XML configuration file should be set to Y (default is N).

Cross Engine Support must also be enabled before enabling Query Statistics. See the section.

For Querystats Cross Engine User needs INSERT Privilege on querystats table.

Example:

When enabled the history of query statistics across all sessions along with execution time, and those stats provided by calgetstats() is stored in a table in the infinidb_querystats schema. Only queries in the following ColumnStore syntax are available for statistics monitoring:

SELECT
INSERT
UPDATE
DELETE

Query Statistics Table

When QueryStats is enabled, the query statistics history is collected in the querystats table in the infinidb_querystats schema.

The columns of this table are:

queryID - A unique identifier assigned to the query
Session ID (sessionID) - The session number that executed the statement.
queryType - The type of the query whether insert, update, delete, select, delete, insert select or load data infile
query - The text of the query

Query Statistics Viewing

Users can view the query statistics by selecting the rows from the query stats table in the infinidb_querystats schema. Examples listed below:

Example 1: List execution time, rows returned for all the select queries within the past 12 hours:

Example 2: List the three slowest running select queries of session 2 within the past 12 hours:

Example 3: List the average, min and max running time of all the INSERT SELECT queries within the past 12 hours:

Performance Concepts

Introduction

The high level components of the ColumnStore architecture are:

PrimProc: PrimProc (Primitives Processor) is responsible for parsing the SQL requests into an optimized set of primitive job steps executed by one or more servers. PrimProc is thus responsible for query optimization and orchestration of query execution by the servers. While every instance has their own PrimProc in a multi-server deployment, each query begins and ends on the same PrimProc it originated from. A database load balancer such as MariaDB MaxScale can be deployed to appropriately balance external requests against individual servers. PrimProc also executes granular job steps received from the server (mariadbd) in a multi-threaded manner. ColumnStore allows distribution of the work across many servers.
Extent Maps: ColumnStore maintains metadata about each column in a shared distributed object known as the Extent Map. The primary node references the Extent Map to help assist in generating the correct primitive job steps. The primary node server references the Extent Map to identify the correct disk blocks to read. Each column is made up of one or more files and each file can contain multiple extents. As much as possible the system attempts to allocate contiguous physical storage to improve read performance.
Storage: ColumnStore can use either local storage or shared storage (e.g. SAN or EBS) to store data. Using shared storage allows for data processing to fail over to another node automatically in case of a server failing.

Data Loading

The system supports full MVCC ACID transactional logic via Insert, Update, and Delete statements. The MVCC architecture allows for concurrent query and DML / batch load. Although DML is supported, the system is optimized more for batch inserts and so larger data loads should be achieved through a batch load. The most flexible and optimal way to load data is via the cpimport tool. This tool optimizes the load path and can be run centrally or in parallel on each server.

If the data contains a time or (time correlated ascending value) column then significant performance gains will be achieved if the data is sorted by this field and also typically queried with a where clause on that column. This is because the system records a minimum and maximum value for each extent providing for a system maintained range partitioning scheme. This allows the system to completely eliminate scanning an extent map if the query includes a where clause for that field limiting the results to a subset of extent maps.

Query Execution

MariaDB ColumnStore has its own query optimizer and execution engine distinct from the MariaDB server implementation. This allows for scaling out query execution to multiple servers, and to optimize for handling data stored as columns rather than rows. As such, the factors influencing query performance are very different:

A query is first parsed by the MariaDB server (mariadbd) process and passed through to the ColumnStore storage engine. This passes the request onto the PrimProc process which is responsible for optimizing and orchestrating execution of the query. The PrimProc module's optimizer creates a series of batch primitive steps that are executed on all nodes in the cluster. Since multiple servers can be deployed, this allows for scale-out execution of the queries. The optimizer attempts to process query execution in parallel. However, certain operations inherently must be executed centrally, for example final result ordering. Filtering, joins, aggregates, and GROUP BY clauses are general.y pushed down and executed in parallel in PrimProc on all servers. In PrimProc, batch primitive steps are performed at a granular level where individual threads operate on individual 1K-8K blocks within an extent. This enables a larger multi-core server to be fully consumed and scale out within a single server. The current batch primitive steps available in the system include:

Single Column Scan: Scan one or more Extents for a given column based on a single column predicate, including operators like =, <>, IN (list), BETWEEN, and ISNULL. See the first scan section of for additional details on tuning this.
Additional Single Column Filters: Project additional columns for any rows found by a previous scan and apply additional single column predicates as needed. Access of blocks is based on row identifier, going directly to the blocks. See the additional column read section of for additional details on tuning this.

ColumnStore Query Execution Paradigms

The following items should be considered when thinking about query execution in ColumnStore vs a row based store such as InnoDB.

Data Scanning and Filtering

ColumnStore is optimized for large scale aggregation / OLAP queries over large data sets. As such indexes typically used to optimize query access for row based systems do not make sense since selectivity is low for such queries. Instead ColumnStore gains performance by only scanning necessary columns, utilizing system maintained partitioning, and utilizing multiple threads and servers to scale query response time.

Since ColumnStore only reads the necessary columns to resolve a query, only include the necessary columns required. For example, SELECT * is significantly slower than SELECT col1, col2 FROM tbl.

Datatype size is important. If say you have a column that can only have values 0 through 100 then declare this as a tinyint as this will be represented with 1 byte rather than 4 bytes for int. This reduces the I/O cost by 4 times.

For string types, an important threshold is CHAR(9) and VARCHAR(8) or greater. Each column storage file uses a fixed number of bytes per value. This enables fast positional lookup of other columns to form the row. Currently the upper limit for columnar data storage is 8 bytes. So. for strings longer than this, the system maintains an additional 'dictionary' extent where the values are stored. The columnar extent file then stores a pointer into the dictionary. For example, it is more expensive to read and process a VARCHAR(8) column than a CHAR(8) column. Where possible, you get better performance if you can utilize shorter strings, especially if you avoid the dictionary lookup. All TEXT/BLOB data types in ColumnStore 1.1 onward utilize a dictionary and do a multiple-block 8KB lookup to retrieve that data if required. The longer the data, the more blocks are retrieved, and the greater is a potential performance impact.

In a row-based system, adding redundant columns adds to the overall query cost, but in a columnar system a cost is only occurred if the column is referenced. Therefore, additional columns should be created to support different access paths. For instance, store a leading portion of a field in one column to allow for faster lookups, but additionally store the long-form value as another column. Scans on a shorter code or a leading-portion column are faster.

ColumnStore distributes function application across all nodes for greater performance, but this requires a distributed implementation of the function in addition to the MariaDB server implementation. See for the full list.

Its important to note that ColumnStore does not have a cost based optimizer, so for optimal extent elimination and performance, your first WHERE clause predicate order should be based on the same column order that the data is imported by. Example: Most use cases with a date column benefit from a natural sort. (Today's data are being inserted after yesterday's data.) Having the first column to filter by date helps efficiently filter through records. WHERE DATE='x' outperforms a query based on a column with random values as the first predicate. Compare different query plans using calSetTrace and calGetTrace. Optimizing for the lowest PIO/LIO and highest PBE. See also .

Joins

Hash joins are utilized by ColumnStore to optimize for large scale joins and avoid the need for indexes and the overhead of nested loop processing. ColumnStore maintains table statistics so as to determine the optimal join order. This is implemented by first identifying the small table side (based on extent map data) and materializing the necessary rows from that table for the join. If the size of this is less than the configuration setting PmMaxMemorySmallSide, the join is pushed down into PrimProc for distributed in-memory processing. Otherwise, the larger side rows is not processed in a distributed manner for joining, and only the WHERE clause on that side is executed across all PrimProc modules in the cluster. If the join is too large for memory, disk-based join can be enabled to allow the query to complete.

Aggregations

Similarly to scalar functions ColumnStore distributes aggregate evaluation as much as possible. However some post processing is required to combine the final results. Enough memory must exist to handle queries with a very large number of values in the aggregate columns.

Aggregation performance is also influenced by the number of distinct aggregate column values. Generally, the same number of rows with 100 distinct values computes faster than 10000 distinct values. This is due to increased memory management as well as transfer overhead.

SELECT COUNT() is internally optimized to be SELECT COUNT(COL-N), where COL-N is the column that uses the least number of bytes for storage. For example it would be pick a CHAR(1) column over int column because CHAR(1) uses 1 byte for storage and int uses 4 bytes. The implementation still honors ANSI semantics in that SELECT COUNT() will include nulls in the total count as opposed to an explicit SELECT(COL-N) which excludes NULL values in the count.

ORDER BY and LIMIT

ORDER BY and LIMIT are implemented at the very end by the mariadbd server process on the temporary result set table. This means that the unsorted results must be fully retrieved before either are applied. The performance overhead of this is minimal on small to medium results, but for larger results, it can be significant.

Complex Queries

Subqueries are executed in sequence thus the subquery intermediate results must be materialized and then the join logic applies with the outer query.

Window functions are executed as part of final aggregation in PrimProc due to the need for ordering of the window results. The ColumnStore window function engines uses a dedicated faster sort process.

Partitioning

Automated system partitioning of columns is provided by ColumnStore. As data is loaded into extent maps, the system will capture and maintain min/max values of column data in that extent map. New rows are appended to each extent map until full at which point a new extent map is created. For column values that are ordered or semi-ordered this allows for very effective data partitioning. By using the min and max values, entire extent maps can be eliminated and not read to filter data. This generally works particularly well for time dimension / series data or similar values that increase over time.

Performance Related Configuration Settings

MariaDB ColumnStore

Introduction

A number of system configuration variables exist to allow fine tuning of the system to suit the physical hardware and query characteristics. In general the default values will work relatively well for many cases.

The configuration parameters are maintained in the /etc/Columnstore.xml file. In a multiple server deployment these should only be edited on the PM1 server as this will be automatically replicated to other servers by the system. A system restart will be required for the configuration change to take affect.

Convenience utility programs getConfig and setConfig are available to safely update the Columnstore.xml without needing to be comfortable with editing XML files. The -h argument will display usage information.

Memory Management

NumBlocksPct

The NumBlocksPct configuration parameter specifies the percentage of physical memory to utilize for disk block caching. The default value is 25, to ensure enough physical memory.

The NumBlocksPct configuration parameter specifies the percentage of physical memory to utilize for disk block caching. The default value is 50, to ensure enough physical memory.

TotalUmMemory

The TotalUmMemory configuration parameter specifies the percentage of physical memory to utilize for joins, intermediate results and set operations. This specifies an upper limit for small table results in joins rather than a pre-allocation of memory. The default value is 50.

Memory Requirements

In a single server or combined deployment, the sum of NumBlocksPct and TotalUmMemory should typically not exceed 75% of physical memory. With very large memory servers this could be raised but the key point is to leave enough memory for other processes including mariadbd.

From version 1.2.2, these can be set to static numeric limits instead of percentages by entering a number with 'M' or 'G' at the end to signify MiB or GiB.

Query Concurrency - MaxOutstandingRequests

ColumnStore handles concurrent query execution by managing the rate of concurrent batch primitive steps. This is configured using the MaxOutstandingRequests parameter and has a default value of 20. Each batch primitive step is executed within the context of 1 extent column according to this high level process:

ColumnStore issues up to MaxOutstandingRequests number of batch primitive steps.
PrimProc processes the request, using many threads and returns its response. These generally take a fraction of a second up to a low number of seconds depending on the amount of Physical I/O and the performance of that storage.
ColumnStore issues new requests as prior requests complete maintaining the maximum number of outstanding requests.

This scheme allows for large queries to use all available resources when not otherwise being consumed and for smaller queries to execute with minimal delay. Lower values optimize for higher throughput of smaller queries while a larger value optimizes for response time of that query. The default value should work well under most circumstances however the value should be increased as the number of nodes is increased.

How many Queries are running and how many queries are currently in the queue can be checked with

Join Tuning - PmMaxMemorySmallSide

ColumnStore maintains statistics for table and utilizes this to determine which is the larger table of the two. This is based both on the number of blocks in that table and estimation of the predicate cardinality. The first step is to apply any filters as appropriate to the smaller table and returning this data set to memory. The size of this data set is compared against the configuration parameter PmMaxMemorySmallSide which has a default value of 64 (MB). This value can be set all the way up to 4GB. This default allows for approximately 1M rows on the small table side to be joined against billions (or trillions) on the large table side. If the size of the small data set is less than PmMaxMemorySmallSide the dataset is sent to PrimProc for creation of a distributed hashmap. Thus this setting is important to tuning of joins and whether the operation can be distributed or not. This should be set to support your largest expected small table join size up to available memory:

Although this will increase the size of data between nodes to support the join, it means that the join and subsequent aggregates are pushed down, scaled out, and a smaller data set is returned back.
In a multiple server deployment, the sizing should be based from available physical memory on the servers, how much memory to reserve for block caching, and the number of simultaneous join operations that can be expected to run times the average small table join data size.

Multi-Table Join Tuning

The above logic for a single table join extrapolates out to multi table joins where the small table values are precalculated and performed as one single scan against the large table. This works well for the typical star schema case joining multiple dimension tables with a large fact table. For some join scenarios it may be necessary to sequence joins to create the intermediate datasets for joining, this would happen for instance with a snowflake schema structure. In some extreme cases it may be hard for the optimizer to be able to determine the most optimal join path. In this case a hint is available to force a join ordering. The INFINIDB_ORDERED hint will force the first table in the from clause to be considered the largest table and override any statistics based decision, for example:

Note: INFINIDB\_ORDERED is deprecated and does not work anymore for ColumnStore 1.2 and above.

use set infinidb_ordered_only=ON;

and for 1.4 set columnstore_ordered_only=ON;

Disk-Based Joins - AllowDiskBasedJoin

When a join is very large and exceeds the PmMaxMemorySmallSide setting, it is performed in memory. For very large joins, this could exceed the available memory, in which case this is detected and a query error reported. Several configuration parameters are available to enable and configure usage of disk overflow should this occur:

AllowDiskBasedJoin – Controls the option to use disk Based joins or not. Valid values are Y (enabled) or N (disabled). By default, this option is disabled.
TempFileCompression – Controls whether the disk join files are compressed or noncompressed. Valid values are Y (use compressed files) or N (use non-compressed files).
TempFilePath – The directory path used for the disk joins. By default, this path is the tmp directory for your installation (i.e., /tmp/columnstore_tmp_files/). Files (named infinidb-join-data*

A MariaDB global or session variable is available to specify a memory limit at which point the query is switched over to disk-based joins:

infinidb_um_mem_limit - Memory limit in MB per user (i.e., switch to disk-based join if this limit is exceeded). By default, this limit is not set (value of 0).

Query Tuning

MariaDB ColumnStore query tuning optimizes analytics using data types, joins, projection elimination, WHERE clauses, and EXPLAIN for performance insights.

Collecting Statistics with ANALYZE TABLE

Overview

In MariaDB Enterprise ColumnStore 6, the ExeMgr process uses optimizer statistics in its query planning process.

ColumnStore uses the optimizer statistics to add support for queries that contain circular inner joins.

In Enterprise ColumnStore 5 and before, ColumnStore would raise the following error when a query containing a circular inner join was executed:

ERROR 1815 (HY000): Internal error: IDB-1003: Circular joins are not supported.

The optimizer statistics store each column's NDV (Number of Distinct Values), which can help the ExeMgr process choose the optimal join order for queries with circular joins. When Enterprise ColumnStore executes a query with a circular join, the query's execution can take longer if ColumnStore chooses a sub-optimal join order. When you collect optimizer statistics for your ColumnStore tables, the ExeMgr process is less likely to choose a sub-optimal join order.

Enterprise ColumnStore's optimizer statistics can be collected for ColumnStore tables by executing :

Enterprise ColumnStore's optimizer statistics are not updated automatically. To update the optimizer statistics for a ColumnStore table, must be re-executed.

Enterprise ColumnStore does not implement an interface to show optimizer statistics.

Query Tuning Recommendations

When tuning queries for MariaDB Enterprise ColumnStore, there are some important details to consider.

Avoid Selecting Unnecessary Columns

Enterprise ColumnStore only reads the columns that are necessary to resolve a query.

For example, the following query selects every column in the table:

Whereas the following query only selects two columns in the table, so it requires less I/O:

For best performance, only select the columns that are necessary to resolve a query.

Query Plans and Optimizer Trace

MariaDB ColumnStore's query plans and Optimizer Trace show how analytical queries run in parallel across its distributed, columnar architecture, aiding performance tuning.

Execution Plan (CSEP)

Overview

The ColumnStore storage engine uses a ColumnStore Execution Plan (CSEP) to represent a query plan internally.

When the select handler receives the SELECT_LEX object, it transforms it into a CSEP as part of the query planning and optimization process. For additional information, see "MariaDB Enterprise ColumnStore Query Evaluation."

Viewing the CSEP

The CSEP for a given query can be viewed by performing the following:

Calling the calSetTrace(1) function:

Executing the query:

Calling the calGetTrace() function:

Job Steps

Overview

When Enterprise ColumnStore executes a query, the on the initiator/aggregator node translates the ColumnStore execution plan (CSEP) into a job list. A job list is a sequence of job steps.

Enterprise ColumnStore uses many different types of job steps that provide different scalability benefits:

Some types of job steps perform operations in a distributed manner, using multiple nodes to operate to different extents. Distributed operations provide horizontal scalability.

Performance Related Configuration Settings

MariaDB ColumnStore

Introduction

Memory Management

NumBlocksPct

The NumBlocksPct configuration parameter specifies the percentage of physical memory to utilize for disk block caching. The default value is 25, to ensure enough physical memory.

The NumBlocksPct configuration parameter specifies the percentage of physical memory to utilize for disk block caching. The default value is 50, to ensure enough physical memory.

TotalUmMemory

Memory Requirements

From version 1.2.2, these can be set to static numeric limits instead of percentages by entering a number with 'M' or 'G' at the end to signify MiB or GiB.

Query Concurrency - MaxOutstandingRequests

ColumnStore issues up to MaxOutstandingRequests number of batch primitive steps.
PrimProc processes the request, using many threads and returns its response. These generally take a fraction of a second up to a low number of seconds depending on the amount of Physical I/O and the performance of that storage.
ColumnStore issues new requests as prior requests complete maintaining the maximum number of outstanding requests.

How many Queries are running and how many queries are currently in the queue can be checked with

Join Tuning - PmMaxMemorySmallSide

Although this will increase the size of data between nodes to support the join, it means that the join and subsequent aggregates are pushed down, scaled out, and a smaller data set is returned back.
In a multiple server deployment, the sizing should be based from available physical memory on the servers, how much memory to reserve for block caching, and the number of simultaneous join operations that can be expected to run times the average small table join data size.

Multi-Table Join Tuning

Note: INFINIDB\_ORDERED is deprecated and does not work anymore for ColumnStore 1.2 and above.

use set infinidb_ordered_only=ON;

and for 1.4 set columnstore_ordered_only=ON;

Disk-Based Joins - AllowDiskBasedJoin

AllowDiskBasedJoin – Controls the option to use disk Based joins or not. Valid values are Y (enabled) or N (disabled). By default, this option is disabled.
TempFileCompression – Controls whether the disk join files are compressed or noncompressed. Valid values are Y (use compressed files) or N (use non-compressed files).
TempFilePath – The directory path used for the disk joins. By default, this path is the tmp directory for your installation (i.e., /tmp/columnstore_tmp_files/). Files (named infinidb-join-data*

A MariaDB global or session variable is available to specify a memory limit at which point the query is switched over to disk-based joins:

infinidb_um_mem_limit - Memory limit in MB per user (i.e., switch to disk-based join if this limit is exceeded). By default, this limit is not set (value of 0).

Analyzing Queries

Determining Active Queries

SHOW PROCESSLIST

The MariaDB SHOW PROCESSLIST statement is used to see a list of active queries on that UM:

MariaDB [test]> SHOW PROCESSLIST;
+----+------+-----------+-------+---------+------+-------+--------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+-------+---------+------+-------+--------------+
| 73 | root | localhost | ssb10 | Query | 0 | NULL | show processlist
+----+------+-----------+-------+---------+------+-------+--------------+
1 row in set (0.01 sec)

getActiveSQLStatements

getActiveSQLStatements is a mcsadmin command that shows which SQL statements are currently being executed on the database:

Analysis of Individual Queries

Query Statistics

The calGetStats function provides statistics about resources used on the node, and network by the last run query. Example:

The output contains information on:

MaxMemPct - Peak memory utilization on the , likely in support of a large (User Module) based hash join operation.
NumTempFiles - Report on any temporary files created in support of query operations larger than available memory, typically for unusual join operations where the smaller table join cardinality exceeds some configurable threshold.
TempFileSpace - Report on space used by temporary files created in support of query operations larger than available memory, typically for unusual join operations where the smaller table join cardinality exceeds some configurable threshold.

Query Plan / Trace

Full Column Scan - an operation that scans each entry in a column using all available threads on the Performance Modules. Speed of operation is generally related to the size of the data type and the total number of rows in the column. The closest analogy for a traditional system is an index scan operation.
Partitioned Column Scan - an operation that uses the Extent Map to identify that certain portions of the column do not contain any matching values for a given set of filters. The closest analogy for a traditional row-based DBMS is a partitioned index scan, or partitioned table scan operation.
Column lookup by row offset - once the set of matching filters have been applied and the minimal set of rows have been identified; additional blocks are requested using a calculation that determines exactly which block is required. The closest analogy for a traditional system is a lookup by rowid.

These operations are automatically executed together in order to execute appropriate filters and column lookup by row offset.

Viewing the ColumnStore Query Plan

In MariaDB ColumnStore there is a set of SQL tracing stored functions provided to see the distributed query execution plan between the nodes.

The basic steps to using these SQL tracing stored functions are:

Start the trace for the particular session.
Execute the SQL statement in question.
Review the trace collected for the statement. As an example, the following session starts a trace, issues a query against a 6 million row fact table and 300,000 row dimension table, and then reviews the output from the trace:

The columns headings in the output are as follows:

Desc – Operation being executed. Possible values:
- BPS - Batch Primitive Step: scanning or projecting the column blocks.
- CES - Cross Engine Step: Performing Cross engine join
- DSS - Dictionary Structure Step: a dictionary scan for a particular variable length string value.

Cache Clearing to Enable Cold Testing

Viewing Extent Map Information

Column object ids may be found by querying the calpontsys.syscolumn metadata table (deprecated) or information_schema.columnstore_columns table (version 1.0.6+).

Query Statistics History

Cross Engine Support must also be enabled before enabling Query Statistics. See the section.

For Querystats Cross Engine User needs INSERT Privilege on querystats table.

Example:

SELECT
INSERT
UPDATE
DELETE

Query Statistics Table

When QueryStats is enabled, the query statistics history is collected in the querystats table in the infinidb_querystats schema.

The columns of this table are:

queryID - A unique identifier assigned to the query
Session ID (sessionID) - The session number that executed the statement.
queryType - The type of the query whether insert, update, delete, select, delete, insert select or load data infile
query - The text of the query

Query Statistics Viewing

Users can view the query statistics by selecting the rows from the query stats table in the infinidb_querystats schema. Examples listed below:

Example 1: List execution time, rows returned for all the select queries within the past 12 hours:

Example 2: List the three slowest running select queries of session 2 within the past 12 hours:

Example 3: List the average, min and max running time of all the INSERT SELECT queries within the past 12 hours:

Performance Concepts

Introduction

The high level components of the ColumnStore architecture are:

PrimProc: PrimProc (Primitives Processor) is responsible for parsing the SQL requests into an optimized set of primitive job steps executed by one or more servers. PrimProc is thus responsible for query optimization and orchestration of query execution by the servers. While every instance has their own PrimProc in a multi-server deployment, each query begins and ends on the same PrimProc it originated from. A database load balancer such as MariaDB MaxScale can be deployed to appropriately balance external requests against individual servers. PrimProc also executes granular job steps received from the server (mariadbd) in a multi-threaded manner. ColumnStore allows distribution of the work across many servers.
Extent Maps: ColumnStore maintains metadata about each column in a shared distributed object known as the Extent Map. The primary node references the Extent Map to help assist in generating the correct primitive job steps. The primary node server references the Extent Map to identify the correct disk blocks to read. Each column is made up of one or more files and each file can contain multiple extents. As much as possible the system attempts to allocate contiguous physical storage to improve read performance.
Storage: ColumnStore can use either local storage or shared storage (e.g. SAN or EBS) to store data. Using shared storage allows for data processing to fail over to another node automatically in case of a server failing.

Data Loading

Query Execution

Single Column Scan: Scan one or more Extents for a given column based on a single column predicate, including operators like =, <>, IN (list), BETWEEN, and ISNULL. See the first scan section of for additional details on tuning this.
Additional Single Column Filters: Project additional columns for any rows found by a previous scan and apply additional single column predicates as needed. Access of blocks is based on row identifier, going directly to the blocks. See the additional column read section of for additional details on tuning this.

ColumnStore Query Execution Paradigms

The following items should be considered when thinking about query execution in ColumnStore vs a row based store such as InnoDB.

Data Scanning and Filtering

Since ColumnStore only reads the necessary columns to resolve a query, only include the necessary columns required. For example, SELECT * is significantly slower than SELECT col1, col2 FROM tbl.

Joins

Aggregations

ORDER BY and LIMIT

Complex Queries

Subqueries are executed in sequence thus the subquery intermediate results must be materialized and then the join logic applies with the outer query.

High Availability

Analyzing Queries

Determining Active Queries

SHOW PROCESSLIST

getActiveSQLStatements

Analysis of Individual Queries

Query Statistics

Query Plan / Trace

Viewing the ColumnStore Query Plan

Cache Clearing to Enable Cold Testing

Viewing Extent Map Information

Query Statistics History

Query Statistics Table

Query Statistics Viewing

Performance Concepts

Introduction

Data Loading

Query Execution

ColumnStore Query Execution Paradigms

Data Scanning and Filtering

Joins

Aggregations

ORDER BY and LIMIT

Complex Queries

Partitioning

Performance Related Configuration Settings

MariaDB ColumnStore

Introduction

Memory Management

NumBlocksPct

TotalUmMemory

Memory Requirements

Query Concurrency - MaxOutstandingRequests

Join Tuning - PmMaxMemorySmallSide

Multi-Table Join Tuning

Disk-Based Joins - AllowDiskBasedJoin

Query Tuning

Collecting Statistics with ANALYZE TABLE

Overview

Query Tuning Recommendations

Avoid Selecting Unnecessary Columns

Query Plans and Optimizer Trace

Execution Plan (CSEP)

Overview

Viewing the CSEP

Job Steps

Overview

High Availability

Query Tuning

Performance Related Configuration Settings

MariaDB ColumnStore

Introduction

Memory Management

NumBlocksPct

TotalUmMemory

Memory Requirements

Query Concurrency - MaxOutstandingRequests

Join Tuning - PmMaxMemorySmallSide

Multi-Table Join Tuning

Disk-Based Joins - AllowDiskBasedJoin

Analyzing Queries

Determining Active Queries

SHOW PROCESSLIST

getActiveSQLStatements

Analysis of Individual Queries

Query Statistics

Query Plan / Trace

Viewing the ColumnStore Query Plan

Cache Clearing to Enable Cold Testing

Viewing Extent Map Information

Query Statistics History

Query Statistics Table

Query Statistics Viewing

Collecting Statistics with ANALYZE TABLE

Overview

Query Tuning Recommendations

Avoid Selecting Unnecessary Columns

Job Steps

Overview

Avoid Excessive Aggregations