Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
MariaDB ColumnStore Quickstart Guides provide concise, Docker-friendly steps to quickly set up, configure, and explore the ColumnStore analytic engine.
Discover MariaDB ColumnStore, the powerful columnar storage engine for analytical workloads. Learn about its architecture, features, and how it enables high-performance data warehousing and analytics.
This section provides instructions for installing and configuring MariaDB ColumnStore. It covers various deployment scenarios, including single- and multi-node setups with both local and S3 storage.
Managing MariaDB ColumnStore involves setup, configuration, and tools like mcsadmin and cpimport for efficient analytics.
MariaDB ColumnStore backup and restore manage distributed data using snapshots or tools like mariadb-backup, with restoration ensuring cluster sync via cpimport or file system recovery.
Managing MariaDB ColumnStore means deploying its architecture, scaling modules, and maintaining performance through monitoring, optimization, and backups.
MariaDB ColumnStore ensures high availability with multi-node setups and shared storage, while MaxScale adds monitoring and failover for continuous analytics.
MariaDB ColumnStore uses MariaDB Server’s security—encryption, access control, auditing, and firewall—for secure analytics.
MariaDB ColumnStore supports standard MariaDB tools, BI connectors (e.g., Tableau, Power BI), data ingestion (cpimport, Kafka), and REST APIs for admin.
The ColumnStore StorageManager manages columnar data storage and retrieval, optimizing analytical queries.
When using ColumnStore, MariaDB Server creates a series of system databases used for operational purposes.
calpontsys
Database maintains table metadata about ColumnStore tables.
infinidb_querystats
Database maintains information about query performance. For more information, see .
columnstore_info
The database for stored procedures is used to retrieve information about ColumnStore usage. For more information, see the tables.
MariaDB ColumnStore uses a shared-nothing, distributed architecture with separate modules for SQL and storage, enabling scalable, high-performance analytics.
MariaDB ColumnStore is ideal for real-time analytics and complex queries on large datasets across industries.
MariaDB ColumnStore query tuning optimizes analytics using data types, joins, projection elimination, WHERE clauses, and EXPLAIN for performance insights.
MariaDB ColumnStore's query plans and Optimizer Trace show how analytical queries run in parallel across its distributed, columnar architecture, aiding performance tuning.
This page details MariaDB ColumnStore hardware requirements (CPU, RAM, storage, and network).
MariaDB ColumnStore is designed for analytical workloads and scales linearly with hardware resources. While the performance generally improves with more CPU cores, memory, and servers, understanding the minimum hardware specifications is crucial for successful deployment, especially in development and production environments.
MariaDB ColumnStore's performance directly benefits from additional hardware resources:
More CPU cores enable greater parallel processing, improving query processing time.
More memory allows for more data caching (reducing I/O), and more servers enable a larger distributed architecture.
HDDs vs. SSDs: SSDs don't deliver as much benefit as you might assume because ColumnStore is optimized towards block streaming, which usually performs well enough on HDDs.
Bare metal vs. virtual servers: Bare metal servers are recommended — they provide additional performance because ColumnStore can fully consume CPU cores and memory.
The specifications differentiate between a basic development environment and a production-ready setup:
CPU: A minimum of 8 CPU cores.
Memory (RAM): A minimum of 32 GB.
Storage: Local disk storage is acceptable for development purposes.
CPU: A minimum of 64 CPU cores.
This recommendation underscores the highly parallel nature of ColumnStore, which can effectively utilize a large number of cores for analytical processing.
Memory (RAM): A minimum of 128 GB.
Network interconnectivity plays a role for multi-server deployments.
Minimum Network: A minimum of a 1 Gigabit (1G) network is recommended.
This facilitates efficient data transfer between nodes via TCP/IP for replication and query processing across the distributed architecture. For optimal performance in heavy-load scenarios, higher bandwidth (for instance, 10G or more) is highly beneficial.
Adhering to these minimum specifications will provide a baseline for ColumnStore functionality. For specific workload requirements, it's always advisable to conduct performance testing and scale hardware accordingly.
For AWS, ColumnStore internal testing generally uses m4.4xlarge instance types as a cost-effective middle ground. The R4.8xlarge has also been tested, and performs about twice as fast for about twice the price.
Clients issue a query to the MariaDB Server, which has the ColumnStore storage engine installed. MariaDB Server parses the SQL, identifies the involved ColumnStore tables, and creates an initial logical query execution plan.
Using the ColumnStore storage engine interface (ha_columnstore), MariaDB Server converts involved table references into ColumnStore internal objects. These are then handed off to the ExeMgr, which is responsible for managing and orchestrating query execution across the cluster.
The ExeMgr analyzes the query plan and translates it into a distributed ColumnStore execution plan. It determines the necessary query steps and the execution order, including any required parallelization.
The ExeMgr then references the extent map to identify which PrimProc instances hold the relevant data segments. It applies extent elimination to exclude any PrimProc nodes whose extents do not match the query’s filter criteria.
The ExeMgr dispatches commands to the selected PrimProc instances to perform data block I/O operations.
The PrimProc components perform operations such as
Predicate filtering
Join processing
Initial aggregation
Data retrieval from local disk or external storage (e.g., S3 or cloud object storage)
They then return intermediate result sets to the ExeMgr.
The ExeMgr handles:
Final-stage aggregation
Window function evaluation
Result-set sorting and shaping
The completed result set is returned to the MariaDB Server, which performs any remaining SQL operations like ORDER BY, LIMIT, or computed expressions in the SELECT list.
Finally, the MariaDB Server returns the result set to the client.
To switchover to a new primary node with Enterprise ColumnStore, perform the following procedure.
The primary node can be switched in MaxScale using :
Use or another supported REST client.
Call a module command using the call command command.
As the first argument, provide the name for the module, which is .
As the second argument, provide the module command, which is switchover .
As the third argument, provide the name of the monitor.
For example:
With the above syntax, MaxScale will choose the most up-to-date replica to be the new primary.
If you want to manually select a new primary, provide the server name of the new primary as the fourth argument:
MaxScale is capable of checking the status of using :
List the servers using the list servers command, like this:
If switchover was properly performed, the State column of the new primary shows Master, Running.
This page is about security vulnerabilities that have been fixed for or still affect MariaDB ColumnStore. In addition, links are included to fixed security vulnerabilities in MariaDB Server since MariaDB ColumnStore is based on MariaDB Server.
Sensitive security issues can be sent directly to the persons responsible for MariaDB security: security [AT] mariadb (dot) org.
CVE® stands for "Common Vulnerabilities and Exposures". It is a publicly available and free-to-use database of known software vulnerabilities maintained at
The appropriate release notes listed document CVEs fixed within a given release. Additional information can also be found at .
There are no known CVEs on ColumnStore-specific infrastructure outside of the MariaDB server at this time.
MariaDB ColumnStore has a hard limit of 4096 columns per table.
However, it's likely that you run into other limitations before hitting that limit, including:
Row size limit of tables. This varies, depending on the storage engine you're using. For example, which indirectly limits the number of columns.
Size limit of .frm files. Those files hold the column description of tables. Column descriptions vary in length. Once all column descriptions combined reach a length of 64KB, the table's .frm file is full, limiting the number of columns you can have in a table.
Given that, the maximum number of columns a ColumnStore table can effectively have is around 2000 columns.
The ColumnStore storage engine uses a ColumnStore Execution Plan (CSEP) to represent a query plan internally.
When the select handler receives the SELECT_LEX object, it transforms it into a CSEP as part of the query planning and optimization process. For additional information, see "MariaDB Enterprise ColumnStore Query Evaluation."
The CSEP for a given query can be viewed by performing the following:
Calling the calSetTrace(1) function:
Executing the query:
Calling the calGetTrace() function:
MariaDB ColumnStore is a columnar storage engine that utilizes a massively parallel distributed data architecture. It's a columnar storage system built by porting InfiniDB 4.6.7 to MariaDB and released under the GPL license.
is available as a storage engine for MariaDB Server. Before then, it is available as a separate download.
It is designed for big data scaling to process petabytes of data, linear scalability, and exceptional performance with real-time response to analytical queries. It leverages the I/O benefits of columnar storage, compression, just-in-time projection, and horizontal and vertical partitioning to deliver tremendous performance when analyzing large data sets.
Links:
.
A Google Group exists for MariaDB ColumnStore that can be used to discuss ideas and issues and communicate with the community: Send email to mariadb-columnstore@googlegroups.com or use the
Bugs can be reported in MariaDB Jira: (see ). Please file bugs under the MCOL project and include the output from the if possible.
MariaDB ColumnStore is released under the GPL license.
In MariaDB Enterprise ColumnStore 6, the ExeMgr process uses optimizer statistics in its query planning process.
ColumnStore uses the optimizer statistics to add support for queries that contain circular inner joins.
In Enterprise ColumnStore 5 and before, ColumnStore would raise the following error when a query containing a circular inner join was executed:
ERROR 1815 (HY000): Internal error: IDB-1003: Circular joins are not supported.The optimizer statistics store each column's NDV (Number of Distinct Values), which can help the ExeMgr process choose the optimal join order for queries with circular joins. When Enterprise ColumnStore executes a query with a circular join, the query's execution can take longer if ColumnStore chooses a sub-optimal join order. When you collect optimizer statistics for your ColumnStore tables, the ExeMgr process is less likely to choose a sub-optimal join order.
Enterprise ColumnStore's optimizer statistics can be collected for ColumnStore tables by executing :
Enterprise ColumnStore's optimizer statistics are not updated automatically. To update the optimizer statistics for a ColumnStore table, must be re-executed.
Enterprise ColumnStore does not implement an interface to show optimizer statistics.
# Sample storagemanager.cnf
[ObjectStorage]
service = S3
object_size = 5M
metadata_path = /var/lib/columnstore/storagemanager/metadata
journal_path = /var/lib/columnstore/storagemanager/journal
max_concurrent_downloads = 21
max_concurrent_uploads = 21
common_prefix_depth = 3
[S3]
region = us-west-1
bucket = my_columnstore_bucket
endpoint = s3.amazonaws.com
aws_access_key_id = AKIAR6P77BUKULIDIL55
aws_secret_access_key = F38aR4eLrgNSWPAKFDJLDAcax0gZ3kYblU79
[LocalStorage]
path = /var/lib/columnstore/storagemanager/fake-cloud
fake_latency = n
max_latency = 50000
[Cache]
cache_size = 2g
path = /var/lib/columnstore/storagemanager/cacheStep 6: Install MariaDB MaxScale
This page details step 6 of the 9-step procedure "Deploy ColumnStore Shared Local Storage Topology".
This step installs MariaDB MaxScale 22.08. ColumnStore Object Storage requires 1 or more MaxScale nodes.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
Step 3: Install MariaDB Enterprise Server
This page details step 3 of the 9-step procedure "Deploy ColumnStore Object Storage Topology".
This step installs MariaDB Enterprise Server, MariaDB Enterprise ColumnStore, CMAPI, and dependencies.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
MariaDB Enterprise ColumnStore acquires table locks for some operations, and it provides utilities to view and clear those locks.
MariaDB Enterprise ColumnStore acquires table locks for some operations, such as:
DDL statements
DML statements
Bulk data loads
If an operation fails, the table lock does not always get released. If you try to access the table, you can see errors like the following:
From Columnstore 5.5.2, you can use AWS IAM roles in order to connect to S3 buckets without explicitly entering credentials into the storagemanager.cnf config file.
You need to modify the IAM role of your Amazon EC2 instance to allow for this. Please follow the AWS before beginning this process.
It is important to note that you must update the AWS S3 endpoint based on your chosen region; otherwise, you might face delays in propagation as discussed and .
For a complete list of AWS service endpoints, visit the AWS .
Learn about data ingestion for MariaDB ColumnStore. This section covers various methods and tools for efficiently loading large datasets into your columnar database for analytical workloads.
ColumnStore provides several mechanisms to ingest data:
provides the fastest performance for inserting data and ability to route data to particular PrimProc nodes. Normally, this should be the default choice for loading data .
provides another means of bulk inserting data.
Storage: StorageManager (S3) is recommended.
This implies leveraging cloud-object storage (like AWS S3 or compatible services) for scalable and durable data persistence in production.
Before data can be imported into the tables, create a matching schema.
On the primary server, create the schema:
For each database that you are importing, create the database with the statement:
For each table that you are importing, create the table with the statement:
Enterprise ColumnStore supports multiple methods to import data into ColumnStore tables.
MariaDB Enterprise ColumnStore includes cpimport, which is a command-line utility designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a TSV (tab-separated values) file, on the primary server run cpimport:
When data is loaded with the statement, MariaDB Enterprise ColumnStore loads the data using cpimport, which is a command-line utility designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a TSV (tab-separated values) file, on the primary server use statement:
MariaDB Enterprise ColumnStore can also import data directly from a remote database. A simple method is to query the table using the statement, and then pipe the results into cpimport, which is a command-line utility that is designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a remote MariaDB database:
Navigation in the Single-Node Enterprise ColumnStore topology with Local storage deployment procedure:
This page was step 5 of 5.
This procedure is complete.
MariaDB Corporation provides package repositories for CentOS / RHEL (YUM) and Debian / Ubuntu (APT). A download token is required to access the MariaDB Enterprise Repository.
Customer Download Tokens are customer-specific and are available through the MariaDB Customer Portal.
To retrieve the token for your account:
Navigate to
Log in.
Copy the Customer Download Token.
Substitute your token for CUSTOMER_DOWNLOAD_TOKEN when configuring the package repositories.
On the MaxScale node, install the prerequisites for downloading the software from the Web. Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
On the MaxScale node, configure package repositories and specify MariaDB MaxScale 22.08:
Checksums of the various releases of the mariadb_es_repo_setup script can be found in the section at the bottom of the page. Substitute ${checksum} in the example above with the latest checksum.
On the MaxScale node, install MariaDB MaxScale.
Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
Navigation in the procedure "Deploy ColumnStore Shared Local Storage Topology".
This page was step 6 of 9.
MariaDB Corporation provides package repositories for CentOS / RHEL (YUM) and Debian / Ubuntu (APT). A download token is required to access the MariaDB Enterprise Repository.
Customer Download Tokens are customer-specific and are available through the MariaDB Customer Portal.
To retrieve the token for your account:
Navigate to https://customers.mariadb.com/downloads/token/
Log in.
Copy the Customer Download Token.
Substitute your token for CUSTOMER_DOWNLOAD_TOKEN when configuring the package repositories.
On each Enterprise ColumnStore node, install the prerequisites for downloading the software from the Web. Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
On each Enterprise ColumnStore node, configure package repositories and specify Enterprise Server:
Checksums of the various releases of the mariadb_es_repo_setup script can be found in the section at the bottom of the page. Substitute ${checksum} in the example above with the latest checksum.
On each Enterprise ColumnStore node, install additional dependencies:
Install on CentOS and RHEL (YUM):
Install on Debian 9 and Ubuntu 18.04 (APT)
Install on Debian 10 and Ubuntu 20.04 (APT):
On each Enterprise ColumnStore node, install MariaDB Enterprise Server and MariaDB Enterprise ColumnStore:
Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
Navigation in the procedure "Deploy ColumnStore Object Storage Topology".
This page was step 3 of 9.
To solve this problem, MariaDB Enterprise ColumnStore provides two utilities to view and clear the table locks:
cleartablelock
viewtablelock
The viewtablelock utility shows table locks currently held by MariaDB Enterprise ColumnStore:
To view all table locks:
To view table locks for a specific table, specify the database and table:
The cleartablelock utility clears table locks currently held by MariaDB Enterprise ColumnStore.
To clear a table lock, specify the lock ID shown by the viewtablelock utility:
ERROR 1815 (HY000): Internal error: CAL0009: Drop table failed due to IDB-2009: Unable to perform the drop table operation because cpimport with PID 16301 is currently holding the table lock for session -1.Passwords are encrypted using the cspasswd utility
MariaDB Enterprise ColumnStore 6
MariaDB Enterprise ColumnStore 22.08
MariaDB Enterprise ColumnStore 23.02
MariaDB Enterprise ColumnStore stores its password encryption keys in the plain-text file /var/lib/columnstore/.secrets.
The encryption keys are not created by default, but can be generated by executing the cskeys utility:
In a multi-node Enterprise ColumnStore cluster, every ColumnStore node should have the same encryption keys. Therefore, it is recommended to execute cskeys on the primary server and then copy /var/lib/columnstore/.secrets to every other ColumnStore node and fix the file's permissions:
To encrypt a password:
Generate an encrypted password using the cspasswd utility:
If the --interactive command-line option is specified, cspasswd prompts for the password.
Set the encrypted password in Columnstore.xml using the mcsSetConfig utility:
To decrypt a password, execute the cspasswd utility and specify the --decrypt command-line option:
Call a module command using the call command command.
As the first argument, provide the name for the module, which is .
As the second argument, provide the module command, which is rejoin .
As the third argument, provide the name of the monitor.
As the fourth argument, provide the name of the server.
For example:
MaxScale is capable of checking the status of using :
List the servers using the list servers command, like this:
If the node properly rejoined, the State column of the node shows Slave, Running.
maxctrl list serversMariaDB Enterprise ColumnStore requires a table lock for write operations.
MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.
When a bulk data load is running:
Read queries will not be blocked.
Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.
The write metadata lock (MDL) can be monitored with the .
For additional information, see "MariaDB Enterprise ColumnStore Data Loading".
MariaDB Enterprise ColumnStore supports online schema changes, so that supported DDL operations can be performed without blocking reads. The supported DDL operations only require a write metadata lock (MDL) on the target table.
A system of record is the authoritative data source for a given piece of information. Organizations often store duplicate information in several systems, but only a single system can be the authoritative data source.
Enterprise ColumnStore is designed to handle analytical processing for OLAP, data warehousing, DSS, and hybrid workloads on very large data sets. Analytical processing does not generally happen on the system of record. Instead, analytical processing generally occurs on a specialized database that is loaded with data from the separate system of record. Additionally, very large data sets can be difficult to back up. Therefore, it may be beneficial to only backup the system of record.
If Enterprise ColumnStore is not acting as the system of record for your data, you should determine how the system of record affects your backup plan:
If your system of record is another database server, you should ensure that the other database server is properly backed up and that your organization has procedures to reload Enterprise ColumnStore from the other database server.
If your system of record is a set of data files, you should ensure that the set of data files is properly backed up and that your organization has procedures to reload Enterprise ColumnStore from the set of data files.
MariaDB Enterprise ColumnStore supports full backup and restore for all storage types. A full backup includes:
Enterprise ColumnStore's data and metadata
With S3: an S3 snapshot of the S3-compatible object storage and a file system snapshot or copy of the Storage Manager directory Without S3: a file system snapshot or copy of the DB Root directories.
The MariaDB data directory from the primary node
To see the procedure to perform a full backup and restore, choose the storage type:
Edit your Storage Manager configuration file located at /etc/columnstore/storagemanager.cnf in order to look similar to the example below (replacing those in the [S3] section with your own custom variables):
cpimport process.In transactional mode, DML inserts are performed, which is significantly slower and also consumes both binlog transaction files and ColumnStore VersionBuffer files.
DML, i.e. INSERT, UPDATE, and DELETE, provide row-level changes. ColumnStore is optimized towards bulk modifications, so these operations are slower than they would be in, for instance, InnoDB.
Currently ColumnStore does not support operating as a replication replica target.
Bulk DML operations will in general perform better than multiple individual statements.
with autocommit behaves similarly to LOAD DATE INFILE because, internally, it is mapped to cpimport for higher performance.
Bulk update operations based on a join with a small staging table can be relatively fast, especially if updating a single column.
Using ColumnStore Bulk Write SDK or ColumnStore Streaming Data Adapters.
Due to the frequent code changes and deviation from the AWS standards, none are approved at this time.
maxctrl list serversSELECT calSetTrace(1);SELECT column1, column2
FROM columnstore_tab
WHERE column1 > '2020-04-01'
AND column1 < '2020-11-01';[[analyze-table|ANALYZE TABLE]] columnstore_tab;Step 3: Install MariaDB Enterprise Server
This page details step 3 of the 9-step procedure "Deploy ColumnStore Shared Local Storage Topology".
This step installs MariaDB Enterprise Server, MariaDB Enterprise ColumnStore, CMAPI, and dependencies.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
MariaDB Corporation provides package repositories for CentOS / RHEL (YUM) and Debian / Ubuntu (APT). A download token is required to access the MariaDB Enterprise Repository.
Customer Download Tokens are customer-specific and are available through the MariaDB Customer Portal.
To retrieve the token for your account:
Navigate to
Log in.
Copy the Customer Download Token.
Substitute your token for CUSTOMER_DOWNLOAD_TOKEN when configuring the package repositories.
On each Enterprise ColumnStore node, install the prerequisites for downloading the software from the Web. Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
On each Enterprise ColumnStore node, configure package repositories and specify Enterprise Server:
Checksums of the various releases of the mariadb_es_repo_setup script can be found in the section at the bottom of the page. Substitute ${checksum} in the example above with the latest checksum.
On each Enterprise ColumnStore node, install additional dependencies:
Install on CentOS and RHEL (YUM):
Install on Debian 9 and Ubuntu 18.04 (APT)
Install on Debian 10 and Ubuntu 20.04 (APT):
On each Enterprise ColumnStore node, install MariaDB Enterprise Server and MariaDB Enterprise ColumnStore:
Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
Navigation in the procedure "Deploy ColumnStore Shared Local Storage Topology".
This page was step 3 of 9.
Step 9: Import Data
This page details step 9 of the 9-step procedure "Deploy ColumnStore Shared Local Storage Topology".
This step bulk imports data to Enterprise ColumnStore.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
Before data can be imported into the tables, create a matching schema.
On the primary server, create the schema:
For each database that you are importing, create the database with the CREATE DATABASE statement:
For each table that you are importing, create the table with the CREATE TABLE statement:
Enterprise ColumnStore supports multiple methods to import data into ColumnStore tables.
MariaDB Enterprise ColumnStore includes , which is a command-line utility designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a TSV (tab-separated values) file, on the primary server run :
When data is loaded with the LOAD DATA INFILE statement, MariaDB Enterprise ColumnStore loads the data using , which is a command-line utility designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a TSV (tab-separated values) file, on the primary server use LOAD DATA INFILE statement:
MariaDB Enterprise ColumnStore can also import data directly from a remote database. A simple method is to query the table using the SELECT statement, and then pipe the results into , which is a command-line utility that is designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a remote MariaDB database:
Navigation in the procedure "Deploy ColumnStore Shared Local Storage Topology".
This page was step 9 of 9.
This procedure is complete.
Step 6: Install MariaDB MaxScale
This page details step 6 of the 9-step procedure "Deploy ColumnStore Object Storage Topology".
This step installs MariaDB MaxScale 22.08.
ColumnStore Object Storage requires 1 or more MaxScale nodes.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
MariaDB Corporation provides package repositories for CentOS / RHEL (YUM) and Debian / Ubuntu (APT). A download token is required to access the MariaDB Enterprise Repository.
Customer Download Tokens are customer-specific and are available through the MariaDB Customer Portal.
To retrieve the token for your account:
Navigate to
Log in.
Copy the Customer Download Token.
Substitute your token for CUSTOMER_DOWNLOAD_TOKEN when configuring the package repositories.
On the MaxScale node, install the prerequisites for downloading the software from the Web. Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
On the MaxScale node, configure package repositories and specify MariaDB MaxScale 22.08:
Checksums of the various releases of the mariadb_es_repo_setup script can be found in the section at the bottom of the page. Substitute ${checksum} in the example above with the latest checksum.
On the MaxScale node, install MariaDB MaxScale.
Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
Navigation in the procedure "Deploy ColumnStore Object Storage Topology":
This page was step 6 of 9.
.
MariaDB offers varied deployment topologies by workload and technology, each named and diagrammed with benefits listed. Custom configurations are also supported.
MariaDB products can be deployed in many different topologies. The topologies described in this section are representative of the overall structure. MariaDB products can be deployed to form other topologies, leverage advanced product capabilities, or combine the capabilities of multiple topologies.
Topologies are the arrangements of nodes and links to achieve a purpose. This documentation describes a few of the many topologies that can be deployed using MariaDB database products.
We group topologies by workload (transactional, analytical, or hybrid) and technologies (Enterprise Spider). Single-node topologies are listed separately.
To help you select the correct topology:
Each topology is named, and this name is used consistently throughout the documentation.
MariaDB Enterprise ColumnStore integrates with MariaDB Enterprise Server using the ColumnStore storage engine plugin. The ColumnStore storage engine plugin enables MariaDB Enterprise Server to interact with ColumnStore tables.
For deployment instructions and available documentation, see "MariaDB Enterprise ColumnStore."
The ColumnStore storage engine has the following features:
Step 1: Prepare Systems for Enterprise ColumnStore Nodes
This page details step 1 of a 5-step procedure for deploying Single-Node Enterprise ColumnStore with Object storage.
This step prepares the system to host MariaDB Enterprise Server and MariaDB Enterprise ColumnStore.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
Step 4: Test Enterprise ColumnStore
This page details step 4 of a 5-step procedure for deploying Single-Node Enterprise ColumnStore with Object storage.
This step tests MariaDB Enterprise Server and MariaDB Enterprise ColumnStore.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
Quickstart guide for MariaDB ColumnStore
MariaDB ColumnStore is a specialized columnar storage engine designed for high-performance analytical processing and big data workloads. Unlike traditional row-based storage engines, ColumnStore organizes data by columns, which is highly efficient for analytical queries that often access only a subset of columns across vast datasets.
Step 2: Install Enterprise ColumnStore
This page details step 2 of a 5-step procedure for deploying Single-Node Enterprise ColumnStore with Object storage.
This step installs MariaDB Enterprise Server and MariaDB Enterprise ColumnStore.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
Step 5: Bulk Import of Data
MariaDB ColumnStore utilizes an Extent Map to manage data distribution across extents—logical blocks within physical segment files ranging from 8 to 64 MB. Each extent holds a consistent number of rows, with the Extent Map cataloging these extents, their corresponding block identifiers (LBIDs), and the minimum and maximum values for each column's data within the extent.​
The primary node maintains the master copy of the Extent Map. Upon system startup, this map is loaded into memory and propagated to other nodes for redundancy and quick access. Corruption of the master Extent Map can render the system unusable and lead to data loss.​
Step 2: Install Enterprise ColumnStore
This page provides a major release upgrade procedure for MariaDB Enterprise ColumnStore. A major release upgrade is an upgrade from an older major release to a newer major release, such as an upgrade from MariaDB Enterprise ColumnStore 5 to MariaDB Enterprise ColumnStore 22.08.
Enterprise ColumnStore 5
When Enterprise ColumnStore executes a query, the on the initiator/aggregator node translates the ColumnStore execution plan (CSEP) into a job list. A job list is a sequence of job steps.
Enterprise ColumnStore uses many different types of job steps that provide different scalability benefits:
Some types of job steps perform operations in a distributed manner, using multiple nodes to operate to different extents. Distributed operations provide horizontal scalability.
When tuning queries for MariaDB Enterprise ColumnStore, there are some important details to consider.
Enterprise ColumnStore only reads the columns that are necessary to resolve a query.
For example, the following query selects every column in the table:
Whereas the following query only selects two columns in the table, so it requires less I/O:
For best performance, only select the columns that are necessary to resolve a query.
CREATE DATABASE inventory;CREATE TABLE inventory.products (
product_name VARCHAR(11) NOT NULL DEFAULT '',
supplier VARCHAR(128) NOT NULL DEFAULT '',
quantity VARCHAR(128) NOT NULL DEFAULT '',
unit_cost VARCHAR(128) NOT NULL DEFAULT ''
) ENGINE=Columnstore DEFAULT CHARSET=utf8;$ sudo cpimport -s '\t' inventory products /tmp/inventory-products.tsvLOAD DATA INFILE '/tmp/inventory-products.tsv'
INTO TABLE inventory.products;$ mariadb --quick \
--skip-column-names \
--execute="SELECT * FROM inventory.products" \
| cpimport -s '\t' inventory products$ sudo yum install curl$ sudo apt install curl apt-transport-https$ curl -LsSO https://dlm.mariadb.com/enterprise-release-helpers/mariadb_es_repo_setup
$ echo "${checksum} mariadb_es_repo_setup" \
| sha256sum -c -
$ chmod +x mariadb_es_repo_setup
$ sudo ./mariadb_es_repo_setup --token="CUSTOMER_DOWNLOAD_TOKEN" --apply \
--skip-server \
--skip-tools \
--mariadb-maxscale-version="22.08"$ sudo yum install maxscale$ sudo apt install maxscale$ sudo yum install curl$ sudo apt install curl apt-transport-https$ curl -LsSO https://dlm.mariadb.com/enterprise-release-helpers/mariadb_es_repo_setup$ echo "${checksum} mariadb_es_repo_setup" \
| sha256sum -c -$ chmod +x mariadb_es_repo_setup$ sudo ./mariadb_es_repo_setup --token="CUSTOMER_DOWNLOAD_TOKEN" --apply \
--skip-maxscale \
--skip-tools \
--mariadb-server-version="11.4"$ sudo yum install jemalloc jq curl$ sudo apt install libjemalloc1 jq curl$ sudo apt install libjemalloc2 jq curl$ sudo yum install MariaDB-server \
MariaDB-backup \
MariaDB-shared \
MariaDB-client \
MariaDB-columnstore-engine \
MariaDB-columnstore-cmapi$ sudo apt install mariadb-server \
mariadb-backup \
libmariadb3 \
mariadb-client \
mariadb-plugin-columnstore \
mariadb-columnstore-cmapiviewtablelock
There is 1 table lock
Table LockID Process PID Session Txn CreationTime State DBRoots
hq_sales.invoices 1 cpimport 16301 BulkLoad n/a Wed April 7 14:20:42 2021 LOADING 1viewtablelock hq_sales invoices
There is 1 table lock
Table LockID Process PID Session Txn CreationTime State DBRoots
hq_sales.invoices 1 cpimport 16301 BulkLoad n/a Wed April 7 14:20:42 2021 LOADING 1cleartablelock 1$ cskeys$ scp 192.0.2.1:/var/lib/columnstore/.secrets /var/lib/columnstore/.secrets
$ sudo chown mysql:mysql /var/lib/columnstore/.secrets
$ sudo chmod 0400 /var/lib/columnstore/.secrets$ cspasswd util_user_passwd$ sudo mcsSetConfig CrossEngineSupport Password util_user_encrypted_passwd$ cspasswd --decrypt util_user_encrypted_passwdmaxctrl call command \
mariadbmon \
rejoin \
mcs_monitor \
mcs3[ObjectStorage]
service = S3
object_size = 5M
metadata_path = /var/lib/columnstore/storagemanager/metadata
journal_path = /var/lib/columnstore/storagemanager/journal
max_concurrent_downloads = 21
max_concurrent_uploads = 21
common_prefix_depth = 3
[S3]
ec2_iam_mode=enabled
bucket = my_mcs_bucket
region = us-west-2
endpoint = s3.us-west-2.amazonaws.com
[LocalStorage]
path = /var/lib/columnstore/storagemanager/fake-cloud
fake_latency = n
max_latency = 50000
[Cache]
cache_size = 2g
path = /var/lib/columnstore/storagemanager/cachemaxctrl call command \
mariadbmon \
switchover \
mcs_monitormaxctrl call command \
mariadbmon \
switchover \
mcs_monitor \
mcs2SELECT calGetTrace();A thumbnail diagram provides a small-scale summary of the topology's architecture.
Finally, we provide a list of the benefits of the topology.
Although multiple topologies are listed on this page, the listed topologies are not the only options. MariaDB products are flexible, configurable, and extensible, so it is possible to deploy different topologies that combine the capabilities of multiple topologies listed on this page. The topologies listed on this page are primarily intended to be representative of the most commonly requested use cases.
MariaDB Replication
Highly available
Asynchronous or semi-synchronous replication
Automatic failover via MaxScale
Manual provisioning of new nodes from backup
Galera Cluster Topology Multi-Primary Cluster Powered by Galera for Transactional/OLTP Workloads
InnoDB Storage Engine
Highly available
Virtually synchronous, certification-based replication
Automated provisioning of new nodes (IST/SST)
Columnar storage engine with shared local storage
Highly available
Automatic failover via MaxScale and CMAPI
Scales reads via MaxScale
Bulk data import
Columnar storage engine with S3-compatible object storage
Highly available
Automatic failover via MaxScale and CMAPI
Scales reads via MaxScale
Bulk data import
Single-stack hybrid transactional/analytical workloads
ColumnStore for analytics with scalable S3-compatible object storage
InnoDB for transactions• Cross-engine JOINs
Enterprise Server, Enterprise ColumnStore, MaxScale
Storage Engine
ColumnStore
Availability
ES 10.5+, CS 10.5+
MariaDB Enterprise Server
Workload Optimization
OLAP and Hybrid
Table Orientation
Columnar
ACID-compliant
Yes
Indexes
To create a ColumnStore table, use the statement with the ENGINE=ColumnStore option:
To deploy a multi-node Enterprise ColumnStore deployment, a configuration similar to below is required:
To configure the mandatory utility user account, use the mcsSetConfig command:
MariaDB Enterprise ColumnStore performs best with Linux kernel optimizations.
On each server to host an Enterprise ColumnStore node, optimize the kernel:
Set the relevant kernel parameters in a sysctl configuration file. To ensure proper change management, use an Enterprise ColumnStore-specific configuration file.
Create a /etc/sysctl.d/90-mariadb-enterprise-columnstore.conf file:
Use the sysctl command to set the kernel parameters at runtime
The Linux Security Modules (LSM) should be temporarily disabled on each Enterprise ColumnStore node during installation.
The LSM will be configured and re-enabled later in this deployment procedure.
The steps to disable the LSM depend on the specific LSM used by the operating system.
SELinux must be set to permissive mode before installing MariaDB Enterprise ColumnStore.
To set SELinux to permissive mode:
Set SELinux to permissive mode:
Set SELinux to permissive mode by setting SELINUX=permissive in /etc/selinux/config.
For example, the file will usually look like this after the change:
Confirm that SELinux is in permissive mode:
SELinux will be configured and re-enabled later in this deployment procedure. This configuration is not persistent. If you restart the server before configuring and re-enabling SELinux later in the deployment procedure, you must reset the enforcement to permissive mode.
AppArmor must be disabled before installing MariaDB Enterprise ColumnStore.
Disable AppArmor:
Reboot the system.
Confirm that no AppArmor profiles are loaded using aa-status:
AppArmor will be configured and re-enabled later in this deployment procedure.
When using MariaDB Enterprise ColumnStore, it is recommended to set the system's locale to UTF-8.
On RHEL 8, install additional dependencies:
Set the system's locale to en_US.UTF-8 by executing localedef:
If you want to use S3-compatible storage, it is important to create the S3 bucket before you start ColumnStore. If you already have an S3 bucket, confirm that the bucket is empty.
S3 bucket configuration will be performed later in this procedure.
Navigation in the Single-Node Enterprise ColumnStore topology with Object storage deployment procedure:
This page was step 1 of 5.
MariaDB Enterprise ColumnStore 23.10 includes a testS3Connection command to test the S3 configuration, permissions, and connectivity.
On each Enterprise ColumnStore node, test the S3 configuration:
If the testS3Connection command does not return OK, investigate the S3 configuration.
Connect to the server using using the root@localhost user account:
Query and confirm that the ColumnStore storage engine plugin is ACTIVE:
Create a test database, if it does not exist:
Create a ColumnStore table:
Add sample data into the table:
Read data from table:
Create an InnoDB table:
Add data to the table:
Perform a cross-engine join:
Navigation in the Single-Node Enterprise ColumnStore topology with Object storage deployment procedure:
This page was step 4 of 5.
Connect to the server using MariaDB Client using the root@localhost user account:
Query and confirm that the ColumnStore storage engine plugin is ACTIVE:
Create a test database, if it does not exist:
Create a ColumnStore table:
Add sample data into the table:
Read data from table:
Create an InnoDB table:
Add data to the table:
Perform a cross-engine join:
Navigation in the Single-Node Enterprise ColumnStore topology with Local storage deployment procedure:
This page was step 4 of 5.
MariaDB ColumnStore is a columnar storage engine that integrates with MariaDB Server. It employs a massively parallel distributed data architecture, making it ideal for processing petabytes of data with linear scalability. It was originally ported from InfiniDB and is released under the GPL license.
Exceptional Analytical Performance: Delivers superior performance for complex analytical queries (OLAP) due to its columnar nature, which minimizes disk I/O by reading only necessary columns.
High Data Compression: Columnar storage allows for much higher compression ratios compared to row-based storage, reducing disk space usage and improving query speed.
Massive Scalability: Designed to scale horizontally across multiple nodes, processing petabytes of data with ease.
Just-in-Time Projection: Only the required columns are processed and returned, further optimizing query execution.
Real-time Analytics: Capable of handling real-time analytical queries efficiently.
MariaDB ColumnStore utilizes a distributed architecture with different components working together:
User Module (UM): Handles incoming SQL queries, optimizes them for columnar processing, and distributes tasks.
Performance Module (PM): Manages data storage, compression, and execution of query fragments on the data segments.
Data Files: Data is stored in column-segments across the nodes, highly compressed.
MariaDB ColumnStore is installed as a separate package that integrates with MariaDB Server. The exact installation steps vary depending on your operating system and desired deployment type (single server or distributed cluster).
General Steps (conceptual):
Install MariaDB Server: Ensure you have a compatible MariaDB Server version installed (e.g., MariaDB 10.5.4 or later).
Install ColumnStore Package: Download and install the specific MariaDB ColumnStore package for your OS. This package includes the ColumnStore storage engine and its associated tools.
Linux (e.g., Debian/Ubuntu): You would typically add the MariaDB repository configured for ColumnStore and then install mariadb-plugin-columnstore.
Single Server vs. Distributed: For a single-server setup, you install all ColumnStore components on one machine. For a distributed setup, you install and configure components across multiple machines.
Configure MariaDB: After installation, you might need to adjust your MariaDB server configuration (my.cnf or equivalent) to properly load and manage the ColumnStore engine.
Initialize ColumnStore: Run a specific columnstore-setup or post-install script to initialize the ColumnStore environment.
Once MariaDB ColumnStore is installed and configured, you can create and interact with ColumnStore tables using standard SQL.
Specify ENGINE=ColumnStore when creating your table. Note that ColumnStore tables do not support primary keys in the same way as InnoDB, as their primary focus is analytical processing.
You can insert data using standard INSERT statements. For large datasets, bulk loading utilities (for instance, LOAD DATA INFILE) are highly recommended for performance.
Perform analytical queries. ColumnStore will efficiently process these, often leveraging its columnar nature and parallelism.
MariaDB Corporation provides package repositories for CentOS / RHEL (YUM) and Debian / Ubuntu (APT). A download token is required to access the MariaDB Enterprise Repository.
Customer Download Tokens are customer-specific and are available through the MariaDB Customer Portal.
To retrieve the token for your account:
Navigate to https://customers.mariadb.com/downloads/token/
Log in.
Copy the Customer Download Token.
Substitute your token for CUSTOMER_DOWNLOAD_TOKEN when configuring the package repositories.
On each Enterprise ColumnStore node, install the prerequisites for downloading the software from the Web. Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
On each Enterprise ColumnStore node, configure package repositories and specify Enterprise Server:
Checksums of the various releases of the mariadb_es_repo_setup script can be found in the section at the bottom of the page. Substitute ${checksum} in the example above with the latest checksum.
Install additional dependencies:
Install on CentOS / RHEL (YUM)
Install of Debian 10 and Ubuntu 20.04 (APT):
Install on Debian 9 and Ubuntu 18.04 (APT):
Install MariaDB Enterprise Server and MariaDB Enterprise ColumnStore:
Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
Navigation in the Single-Node Enterprise ColumnStore topology with Object storage deployment procedure:
This page was step 2 of 5.
Before data can be imported into the tables, create a matching schema.
On the primary server, create the schema:
For each database that you are importing, create the database with the statement:
For each table that you are importing, create the table with the statement:
Enterprise ColumnStore supports multiple methods to import data into ColumnStore tables.
MariaDB Enterprise ColumnStore includes cpimport, which is a command-line utility designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a TSV (tab-separated values) file, on the primary server run cpimport:
When data is loaded with the LOAD DATA INFILE statement, MariaDB Enterprise ColumnStore loads the data using cpimport, which is a command-line utility designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a TSV (tab-separated values) file, on the primary server use LOAD DATA INFILE statement:
MariaDB Enterprise ColumnStore can also import data directly from a remote database. A simple method is to query the table using the statement, and then pipe the results into cpimport, which is a command-line utility that is designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a remote MariaDB database:
Navigation in the Single-Node Enterprise ColumnStore topology with Object storage deployment procedure:
This page was step 5 of 5.
This procedure is complete.
ColumnStore's extent map is a smart structure that underpins its performance. By providing a logical partitioning scheme, it avoids the overhead associated with indexing and other common row-based database optimizations.
The primary node in a ColumnStore cluster holds the master copy of the extent map. Upon system startup, this master copy is read into memory and then replicated to all other participating nodes for high availability and disaster recovery. Nodes keep the extent map in memory for rapid access during query processing. As data within extents is modified, these updates are broadcast to all participating nodes to maintain consistency.
If the master copy of the extent map becomes corrupted, the entire system could become unusable, potentially leading to data loss. Having a recent backup of the extent map allows for a much faster recovery compared to reloading the entire database in such a scenario.
To safeguard against potential Extent Map corruption, regularly back up the master copy:
Lock Table:
Save BRM:
Create Backup Directory:
Copy Extent Map:
Unlock Tables:
Stop ColumnStore:
Rename Corrupted Map:
Clear Versioning Files:
Restore Backup:
Set Ownership:
Start ColumnStore:
Shutdown Cluster:
Rename Corrupted Map:
Clear Versioning Files:
Restore Backup:
Set Ownership:
Start Cluster:
Incorporate the save_brm command into your data import scripts (e.g., those using cpimport) to automate Extent Map backups. This practice ensures regular backups without manual intervention.
Refer to the MariaDB ColumnStore Backup Script for an example implementation.​
MariaDB Corporation provides package repositories for CentOS / RHEL (YUM) and Debian / Ubuntu (APT). A download token is required to access the MariaDB Enterprise Repository.
Customer Download Tokens are customer-specific and are available through the MariaDB Customer Portal.
To retrieve the token for your account:
Navigate to https://customers.mariadb.com/downloads/token/
Log in.
Copy the Customer Download Token.
Substitute your token for CUSTOMER_DOWNLOAD_TOKEN when configuring the package repositories.
On each Enterprise ColumnStore node, install the prerequisites for downloading the software from the Web. Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
On each Enterprise ColumnStore node, configure package repositories and specify Enterprise Server:
Checksums of the various releases of the mariadb_es_repo_setup script can be found in the section at the bottom of the page. Substitute ${checksum} in the example above with the latest checksum.
Install additional dependencies:
Install on CentOS / RHEL (YUM)
Install of Debian 10 and Ubuntu 20.04 (APT):
Install on Debian 9 and Ubuntu 18.04 (APT):
Install MariaDB Enterprise Server and MariaDB Enterprise ColumnStore:
Install on CentOS / RHEL (YUM):
Install on Debian / Ubuntu (APT):
Navigation in the Single-Node Enterprise ColumnStore topology with Local storage deployment procedure:
This page was step 2 of 5.
/etc/Columnstore.xml file. In a multiple server deployment these should only be edited on the PM1 server as this will be automatically replicated to other servers by the system. A system restart will be required for the configuration change to take affect.Convenience utility programs getConfig and setConfig are available to safely update the Columnstore.xml without needing to be comfortable with editing XML files. The -h argument will display usage information.
The NumBlocksPct configuration parameter specifies the percentage of physical memory to utilize for disk block caching. The default value is 25, to ensure enough physical memory.
The NumBlocksPct configuration parameter specifies the percentage of physical memory to utilize for disk block caching. The default value is 50, to ensure enough physical memory.
The TotalUmMemory configuration parameter specifies the percentage of physical memory to utilize for joins, intermediate results and set operations. This specifies an upper limit for small table results in joins rather than a pre-allocation of memory. The default value is 50.
The TotalUmMemory configuration parameter specifies the percentage of physical memory to utilize for joins, intermediate results and set operations. This specifies an upper limit for small table results in joins rather than a pre-allocation of memory. The default value is 25.
In a single server or combined deployment, the sum of NumBlocksPct and TotalUmMemory should typically not exceed 75% of physical memory. With very large memory servers this could be raised but the key point is to leave enough memory for other processes including mariadbd.
ColumnStore handles concurrent query execution by managing the rate of concurrent batch primitive steps. This is configured using the MaxOutstandingRequests parameter and has a default value of 20. Each batch primitive step is executed within the context of 1 extent column according to this high level process:
ColumnStore issues up to MaxOutstandingRequests number of batch primitive steps.
PrimProc processes the request, using many threads and returns its response. These generally take a fraction of a second up to a low number of seconds depending on the amount of Physical I/O and the performance of that storage.
ColumnStore issues new requests as prior requests complete maintaining the maximum number of outstanding requests.
This scheme allows for large queries to use all available resources when not otherwise being consumed and for smaller queries to execute with minimal delay. Lower values optimize for higher throughput of smaller queries while a larger value optimizes for response time of that query. The default value should work well under most circumstances however the value should be increased as the number of nodes is increased.
How many Queries are running and how many queries are currently in the queue can be checked with
ColumnStore maintains statistics for table and utilizes this to determine which is the larger table of the two. This is based both on the number of blocks in that table and estimation of the predicate cardinality. The first step is to apply any filters as appropriate to the smaller table and returning this data set to memory. The size of this data set is compared against the configuration parameter PmMaxMemorySmallSide which has a default value of 64 (MB). This value can be set all the way up to 4GB. This default allows for approximately 1M rows on the small table side to be joined against billions (or trillions) on the large table side. If the size of the small data set is less than PmMaxMemorySmallSide the dataset is sent to PrimProc for creation of a distributed hashmap. Thus this setting is important to tuning of joins and whether the operation can be distributed or not. This should be set to support your largest expected small table join size up to available memory:
Although this will increase the size of data between nodes to support the join, it means that the join and subsequent aggregates are pushed down, scaled out, and a smaller data set is returned back.
In a multiple server deployment, the sizing should be based from available physical memory on the servers, how much memory to reserve for block caching, and the number of simultaneous join operations that can be expected to run times the average small table join data size.
The above logic for a single table join extrapolates out to multi table joins where the small table values are precalculated and performed as one single scan against the large table. This works well for the typical star schema case joining multiple dimension tables with a large fact table. For some join scenarios it may be necessary to sequence joins to create the intermediate datasets for joining, this would happen for instance with a snowflake schema structure. In some extreme cases it may be hard for the optimizer to be able to determine the most optimal join path. In this case a hint is available to force a join ordering. The INFINIDB_ORDERED hint will force the first table in the from clause to be considered the largest table and override any statistics based decision, for example:
Note: INFINIDB\_ORDERED is deprecated and does not work anymore for ColumnStore 1.2 and above.
use set infinidb_ordered_only=ON;
and for 1.4 set columnstore_ordered_only=ON;
When a join is very large and exceeds the PmMaxMemorySmallSide setting, it is performed in memory. For very large joins, this could exceed the available memory, in which case this is detected and a query error reported. Several configuration parameters are available to enable and configure usage of disk overflow should this occur:
AllowDiskBasedJoin – Controls the option to use disk Based joins or not. Valid values are Y (enabled) or N (disabled). By default, this option is disabled.
TempFileCompression – Controls whether the disk join files are compressed or noncompressed. Valid values are Y (use compressed files) or N (use non-compressed files).
TempFilePath – The directory path used for the disk joins. By default, this path is the tmp directory for your installation (i.e., /tmp/columnstore_tmp_files/). Files (named infinidb-join-data*) in this directory will be created and cleaned on an as-needed basis. The entire directory is removed and recreated by ExeMgr at startup. It is strongly recommended that this directory be stored on a dedicated partition.
A MariaDB global or session variable is available to specify a memory limit at which point the query is switched over to disk-based joins:
infinidb_um_mem_limit - Memory limit in MB per user (i.e., switch to disk-based join if this limit is exceeded). By default, this limit is not set (value of 0).
Set the server object to maintenance mode using the set server command.
As the first argument, provide the name for the server.
As the second argument, provide maintenance as the state.
For example:
If the specified server is a primary server, then MaxScale will allow open transactions to complete before closing any connections.
If you would like MaxScale to immediately close all connections, the --force option can be provided as a third argument:
Confirm the state of the server object in MaxScale using :
List the servers using the list servers command, like this:
If the node is properly in maintenance mode, then the State column will show Maintenance as one of the states.
Now that the server is in maintenance mode in MaxScale, you can perform your maintenance.
While the server is in maintenance mode:
MaxScale doesn't route traffic to the node.
MaxScale doesn't select the node to be primary during failover.
The node can be rebooted.
The node's services can be restarted.
Maintenance mode for the server object for the node can be cleared in MaxScale using :
Use or another supported REST client.
Clear the server object's state using the clear server command.
As the first argument, provide the name for the server.
As the second argument, provide maintenance as the state.
For example:
Confirm the state of the server object in MaxScale using :
List the servers using the list servers command, like this:
If the node is no longer in maintenance mode, the State column no longer shows Maintenance as one of the states.
Enterprise ColumnStore 22.08
This procedure assumes that the new Enterprise ColumnStore version will be installed onto new servers.
To reuse existing servers for the new Enterprise ColumnStore version, you must adapt the procedure detailed below. After step 1, confirm all data has been backed-up and verify backups. The old version of Enterprise ColumnStore should then be uninstalled, and all Enterprise ColumnStore files should be deleted before continuing with step 2.
On the old ColumnStore cluster, perform a full backup.
MariaDB recommends backing up the table schemas to a single SQL file and backing up the table data to table-specific CSV files.
For each table, obtain the table's schema by executing the SHOW CREATE TABLE :
Backup the table schemas by copying the output to an SQL file. This procedure assumes that the SQL file is named schema-backup.sql.
For each table, backup the table data to a CSV file using the SELECT .. INTO OUTFILE :
Copy the SQL file containing the table schemas and the CSV files containing the table data to the primary node of the new ColumnStore cluster.
On the new ColumnStore cluster, follow the deployment instructions of the desired topology for the new ColumnStore version.
For deployment instructions, see "".
On the new ColumnStore cluster, restore the table schemas and data.
Restore the schema backup using :
HOST and PORT should refer to the following:
If you are connecting with MaxScale as a proxy, they should refer to the host and port of the MaxScale listener
If you are connecting directly to a multi-node ColumnStore cluster, they should refer to the host and port of the primary ColumnStore node
If you are connecting directly to single-node ColumnStore, they should refer to the host and port of the ColumnStore node
When the command is executed, mariadb client prompts for the user password
For each table, restore the data from the table's CSV file by executing the on the primary ColumnStore node:
On the new ColumnStore cluster, verify that the table schemas and data have been restored.
For each table, verify the table's definition by executing the SHOW CREATE TABLE statement:
For each table, verify the number of rows in the table by executing SELECT COUNT(*):
For each table, verify the data in the table executing the statement.
If the table is very large, you can limit the number of rows in the result set by adding a LIMIT clause:
Some types of job steps perform operations in a multi-threaded manner using a thread pool. Performing multi-threaded operations provides vertical scalability.
As you increase the number of ColumnStore nodes or the number of cores on each node, Enterprise ColumnStore can use those resources to more efficiently execute job steps.
For additional information, see "MariaDB Enterprise ColumnStore Query Evaluation.".
Enterprise ColumnStore defines a batch primitive step to handle many types of tasks, such as scanning/filtering columns, JOIN operations, aggregation, functional filtering, and projecting (putting values into a SELECT list).
In calGetTrace() output, a batch primitive step is abbreviated BPS.
Batch primitive steps are evaluated on multiple nodes in parallel. The PrimProc process on each node evaluates the batch primitive step to one extent at a time. The PrimProc process uses a thread pool to operate on individual blocks within the extent in parallel.
Enterprise ColumnStore defines a cross-engine step to perform cross-engine joins, in which a ColumnStore table is joined with a table that uses a different storage engine.
In calGetTrace() output, a cross-engine step is abbreviated CES.
Cross-engine steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.
Enterprise ColumnStore can perform cross-engine joins when the mandatory utility user is properly configured.
For additional information, refer to the "Mandatory Utility User Account"
Enterprise ColumnStore defines a dictionary structure step to scan the dictionary extents that ColumnStore uses to store variable-length string values.
In calGetTrace() output, a dictionary structure step is abbreviated DSS.
Dictionary structure steps are evaluated on multiple nodes in parallel. The PrimProc process on each node evaluates the dictionary structure step to one extent at a time. It uses a thread pool to operate on individual blocks within the extent in parallel.
Dictionary structure steps can require a lot of I/O for a couple of reasons:
Dictionary structure steps do not support extent elimination, so all extents for the column must be scanned.
Dictionary structure steps must read the column extents to find each pointer and the dictionary extents to find each value, so it doubles the number of extents to scan.
It is generally recommended to avoid queries that will cause dictionary scans.
For additional information, see "Avoid Creating Long String Columns".
Enterprise ColumnStore defines a hash join step to perform a hash join between two tables.
In calGetTrace() output, a hash join step is abbreviated HJS.
Hash join steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.
Enterprise ColumnStore performs the hash join in memory by default. If you perform large joins, you may be able get better performance by changing some configuration defaults with mcsSetConfig:
Enterprise ColumnStore can be configured to use more memory for in-memory hash joins.
Enterprise ColumnStore can be configured to use disk-based joins.
For additional information, see "Configure in-memory joins" and "Configure Disk-Based Joins".
Enterprise ColumnStore defines a having step to evaluate a HAVING clause on a result set.
In calGetTrace() output, a having step is abbreviated HVS.
Enterprise ColumnStore defines a subquery step to evaluate a subquery.
In calGetTrace() output, a subquery step is abbreviated SQS.
Enterprise ColumnStore defines a tuple aggregation step to collect intermediate aggregation prior to the final aggregation and evaluation of the results.
In calGetTrace() output, a tuple aggregation step is abbreviated TAS.
Tuple aggregation steps are primarily evaluated by the ExeMgr process on the initiator/aggregator node. However, the PrimProc process on each node also plays a role, since the PrimProc process on each node provides the intermediate aggregation results to the ExeMgr process on the initiator/aggregator node.
Enterprise ColumnStore defines a tuple annexation step to perform the final aggregation and evaluation of the results.
In calGetTrace() output, a tuple annexation step is abbreviated TNS.
Tuple annexation steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.
Enterprise ColumnStore 5 performs aggregation operations in memory. As a consequence, more complex aggregation operations require more memory in that version.
In Enterprise ColumnStore 6, disk-based aggregations can be enabled.
For additional information, see "Configure Disk-Based Aggregations".
Enterprise ColumnStore defines a tuple union step to perform a union of two subqueries.
In calGetTrace() output, a tuple union step is abbreviated TUS.
Tuple union steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.
Enterprise ColumnStore defines a tuple constant step to evaluate constant values.
In calGetTrace() output, a tuple constant step is abbreviated TCS.
Tuple constant steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.
Enterprise ColumnStore defines a window function step to evaluate window functions.
In calGetTrace() output, a window function step is abbreviated WFS.
Window function steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.
When Enterprise ColumnStore performs ORDER BY and LIMIT operations, the operations are performed in a single-threaded manner after the rest of the query processing has been completed, and the full unsorted result-set has been retrieved. For large data sets, the performance overhead can be significant.
When Enterprise ColumnStore 5 performs aggregations (i.e., DISTINCT, GROUP BY, COUNT(*), etc.), all of the aggregation work happens in-memory by default. As a consequence, more complex aggregation operations require more memory in that version.
For example, the following query could require a lot of memory in Enterprise ColumnStore 5, since it has to calculate many distinct values in memory:
Whereas the following query could require much less memory in Enterprise ColumnStore 5, since it has to calculate fewer distinct values:
In Enterprise ColumnStore 6, disk-based aggregations can be enabled.
For best performance, avoid excessive aggregations or enable disk-based aggregations.
For additional information, see "Configure Disk-Based Aggregations".
When Enterprise ColumnStore evaluates built-in functions and aggregate functions, it can often evaluate the function in a distributed manner. Distributed evaluation of functions can significantly improve performance.
Enterprise ColumnStore supports distributed evaluation for some built-in functions. For other built-in functions, the function must be evaluated serially on the final result set.
Enterprise ColumnStore also supports distributed evaluation for user-defined functions developed with ColumnStore's User-Defined Aggregate Function (UDAF) C++ API. For functions developed with Enterprise Server's standard User-Defined Function (UDF) API, the function must be evaluated serially on the final result set.
For best performance, avoid non-distributed functions.
By default, Enterprise ColumnStore performs all joins as in-memory hash joins.
If the joined tables are very large, the in-memory hash join can require too much memory for the default configuration. There are a couple options to work around this:
Enterprise ColumnStore can be configured to use more memory for in-memory hash joins.
Enterprise ColumnStore can be configured to use disk-based joins.
Enterprise ColumnStore can use optimizer statistics to better optimize the join order.
For additional information, see "Configure In-Memory Joins", "Configure Disk-Based Joins", and "Optimizer Statistics".
Enterprise ColumnStore uses extent elimination to optimize queries. Extent elimination uses the minimum and maximum values in the extent map to determine which extents can be skipped for a query.
When data is loaded into Enterprise ColumnStore, it appends the data to the latest extent. When an extent reaches the maximum number of column values, Enterprise ColumnStore creates a new extent. As a consequence, if ordered data is loaded in its proper order, then similar values will be clustered together in the same extent. This can improve query performance, because extent elimination performs best when similar values are clustered together.
For example, if you expect to query a table with a filter on a timestamp column, you should sort the data using the timestamp column before loading it into Enterprise ColumnStore. Later, when the table is queried with a filter on the timestamp column, Enterprise ColumnStore would be able to skip many extents using extent elimination.
For best performance, load ordered data in proper order.
When Enterprise ColumnStore performs mathematical operations with very big values using the , , and data types, the operation can sometimes overflow ColumnStore's maximum precision or scale. The maximum precision and scale depend on the version of Enterprise ColumnStore:
In Enterprise ColumnStore 6, the maximum precision (M) is 38, and the maximum scale (D) is 38.
In Enterprise ColumnStore 5, the maximum precision (M) is 18, and the maximum scale (D) is 18.
In Enterprise ColumnStore 6, applications can configure Enterprise ColumnStore to check for decimal overflows by setting the columnstore_decimal_overflow_check system variable, but only when the column has a decimal precision that is 18 or more:
When decimal overflow checks are enabled, math operations have extra overhead.
When the decimal overflow check fails, MariaDB Enterprise ColumnStore raises an error with the ER_INTERNAL_ERROR error SQL, and it writes detailed information about the overflow check failure to the ColumnStore system logs.
MariaDB Enterprise ColumnStore supports Enterprise Server's standard User-Defined Function (UDF) API. However, UDFs developed using that API cannot be executed in a distributed manner.
To support distributed execution of custom SQL, MariaDB Enterprise ColumnStore supports a Distributed User Defined Aggregate Functions (UDAF) C++ API:
The Distributed User Defined Aggregate Functions (UDAF) C++ API allows anyone to create aggregate functions of arbitrary complexity for distributed execution in the ColumnStore storage engine.
These functions can also be used as Analytic (Window) functions just like any built-in aggregate function.
Shell
SQL access is not required
SQL
Shell access is not required
Remote Database
Use normal database client
Avoid dumping data to intermediate filed


Step 3: Start and Configure Enterprise ColumnStore
This page details step 3 of a 5-step procedure for deploying Single-Node Enterprise ColumnStore with Local storage.
This step starts and configures MariaDB Enterprise Server and MariaDB Enterprise ColumnStore.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
Mandatory system variables and options for Single-Node Enterprise ColumnStore include:
Start and enable the MariaDB Enterprise Server service, so that it starts automatically upon reboot:
Start and enable the MariaDB Enterprise ColumnStore service, so that it starts automatically upon reboot:
Enterprise ColumnStore requires a mandatory utility user account. By default, it connects to the server using the root user with no password. MariaDB Enterprise Server 10.6 will reject this login attempt by default, so you will need to configure Enterprise ColumnStore to use a different user account and password and create this user account on Enterprise Server.
On the Enterprise ColumnStore node, create the user account with the statement:
On the Enterprise ColumnStore node, grant the user account SELECT privileges on all databases with the GRANT statement:
Configure Enterprise ColumnStore to use the utility user:
Set the password:
For details about how to encrypt the password, see "".
Passwords should meet your organization's password policies. If your MariaDB Enterprise Server instance has a password validation plugin installed, then the password should also meet the configured requirements.
The specific steps to configure the security module depend on the operating system.
Configure SELinux for Enterprise ColumnStore:
To configure SELinux, you have to install the packages required for audit2allow. On CentOS 7 and RHEL 7, install the following:
On RHEL 8, install the following:
Allow the system to run under load for a while to generate SELinux audit events.
After the system has taken some load, generate an SELinux policy from the audit events using audit2allow:
If no audit events were found, this will print the following:
If audit events were found, the new SELinux policy can be loaded using semodule:
Set SELinux to enforcing mode by setting SELINUX=enforcing in /etc/selinux/config.
For example, the file will usually look like this after the change:
Set SELinux to enforcing mode:
For information on how to create a profile, see on ubuntu.com.
Navigation in the Single-Node Enterprise ColumnStore topology with Local storage deployment procedure:
This page was step 3 of 5.
Step 9: Import Data
This page details step 9 of the 9-step procedure "Deploy ColumnStore Object Storage Topology".
This step bulk imports data to Enterprise ColumnStore.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
Before data can be imported into the tables, create a matching schema.
On the primary server, create the schema:
For each database that you are importing, create the database with the CREATE DATABASE statement:
For each table that you are importing, create the table with the CREATE TABLE statement:
Enterprise ColumnStore supports multiple methods to import data into ColumnStore tables.
MariaDB Enterprise ColumnStore includes , which is a command-line utility designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a TSV (tab-separated values) file, on the primary server run :
When data is loaded with the LOAD DATA INFILE statement, MariaDB Enterprise ColumnStore loads the data using , which is a command-line utility designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a TSV (tab-separated values) file, on the primary server use LOAD DATA INFILE statement:
MariaDB Enterprise ColumnStore can also import data directly from a remote database. A simple method is to query the table using the SELECT statement, and then pipe the results into , which is a command-line utility that is designed to efficiently load data in bulk. Alternative methods are available.
To import your data from a remote MariaDB database:
Navigation in the procedure "Deploy ColumnStore Object Storage Topology":
This page was step 9 of 9.
This procedure is complete.
The Read Replicas feature in MariaDB ColumnStore enables horizontal scaling of read performance by incorporating read-only nodes into a multi-node cluster. These replicas differ from standard ColumnStore nodes, in that they don't run the WriteEngineServer process. This means Read Replica nodes cannot handle write operations directly — instead, any write queries attempted on a replica are automatically forwarded to a read-write (RW) node.
Replicas utilize shared storage with other nodes in the cluster, ensuring data consistency without duplication. A key requirement is maintaining at least one RW node — a cluster consisting solely of read replicas is not operational and cannot process reads or writes.
Read-only nodes are incompatible with S3 as the storage backend.
Additionally, there is no automatic promotion of a read replica to RW mode if the only RW node fails, which could lead to temporary downtime until manual intervention.
Horizontal Read Scaling: Adds compute power for handling more read-intensive queries without impacting write performance.
Write Forwarding: Ensures writes on replicas are redirected to RW nodes, maintaining data integrity.
Shared Storage: Replicas access the same DBRoots as RW nodes, promoting efficiency and reducing storage overhead.
Add Read Replica. To introduce a read-only node for scaling reads, run this command:
Remove Node. To safely remove any node (RW or replica) from the cluster, run this command:
This reassigns resources as needed without cluster disruption.
Verify Status. To monitor the cluster's health and node roles, issue:
Node addition is restricted to private IPs only.
Incompatible with S3 storage, limiting use to shared file systems.
No automatic failover or promotion mechanism if the sole RW node goes down, requiring manual recovery.
At least one RW node must always be present for the cluster to function properly, supporting both read and write operations.
Refer to for exact mount points details.
Set Up MariaDB Repository
Run the following to add the MariaDB repository (adjust "11.4" to the latest stable version):
See for additional details about the ES repo setup.
Install Packages
Step 2: Configure Shared Local Storage
Step 2: Configure Shared Local Storage
This page details step 2 of the 9-step procedure "Deploy ColumnStore Shared Local Storage Topology".
This step configures shared local storage on systems hosting Enterprise ColumnStore.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
Step 1: Prepare Systems for Enterprise ColumnStore Nodes
This page provides information on optimizing Linux kernel parameters for improved performance with MariaDB ColumnStore.
The high level components of the ColumnStore architecture are:
PrimProc: PrimProc (Primitives Processor) is responsible for parsing the SQL requests into an optimized set of primitive job steps executed by one or more servers. PrimProc is thus responsible for query optimization and orchestration of query execution by the servers. While every instance has their own PrimProc in a multi-server deployment, each query begins and ends on the same PrimProc it originated from. A database load balancer such as MariaDB MaxScale can be deployed to appropriately balance external requests against individual servers. PrimProc also executes granular job steps received from the server (mariadbd) in a multi-threaded manner. ColumnStore allows distribution of the work across many servers.
CREATE DATABASE columnstore_db;
CREATE TABLE columnstore_db.analytics_test (
id INT,
str VARCHAR(50)
) ENGINE = ColumnStore;[mariadb]
log_error = mariadbd.err
character_set_server = utf8
collation_server = utf8_general_ci
log_bin = mariadb-bin
log_bin_index = mariadb-bin.index
relay_log = mariadb-relay
relay_log_index = mariadb-relay.index
log_slave_updates = ON
gtid_strict_mode = ON
# This must be unique on each cluster node
server_id = 1sudo mcsSetConfig CrossEngineSupport Host 127.0.0.1
sudo mcsSetConfig CrossEngineSupport Port 3306
sudo mcsSetConfig CrossEngineSupport User cross_engine
sudo mcsSetConfig CrossEngineSupport Password cross_engine_passwd# minimize swapping
vm.swappiness = 1
# Increase the TCP max buffer size
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# Increase the TCP buffer limits
# min, default, and max number of bytes to use
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1
# for 1 GigE, increase this to 2500
# for 10 GigE, increase this to 30000
net.core.netdev_max_backlog = 2500$ sudo sysctl --load=/etc/sysctl.d/90-mariadb-enterprise-columnstore.conf$ sudo setenforce permissive# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=permissive
# SELINUXTYPE= can take one of three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted$ sudo getenforce
Permissive$ sudo systemctl disable apparmor$ sudo aa-statusapparmor module is loaded.
0 profiles are loaded.
0 profiles are in enforce mode.
0 profiles are in complain mode.
0 processes have profiles defined.
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.$ sudo yum install glibc-locale-source glibc-langpack-en$ sudo localedef -i en_US -f UTF-8 en_US.UTF-8$ sudo testS3ConnectionStorageManager[26887]: Using the config file found at /etc/columnstore/storagemanager.cnf
StorageManager[26887]: S3Storage: S3 connectivity & permissions are OK
S3 Storage Manager Configuration OK$ sudo mariadbWelcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 38
Server version: 11.4.5-3-MariaDB-Enterprise MariaDB Enterprise Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>SELECT PLUGIN_NAME, PLUGIN_STATUS
FROM information_schema.PLUGINS
WHERE PLUGIN_LIBRARY LIKE 'ha_columnstore%';+---------------------+---------------+
| PLUGIN_NAME | PLUGIN_STATUS |
+---------------------+---------------+
| Columnstore | ACTIVE |
| COLUMNSTORE_COLUMNS | ACTIVE |
| COLUMNSTORE_TABLES | ACTIVE |
| COLUMNSTORE_FILES | ACTIVE |
| COLUMNSTORE_EXTENTS | ACTIVE |
+---------------------+---------------+CREATE DATABASE IF NOT EXISTS test;CREATE TABLE IF NOT EXISTS test.contacts (
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100)
) ENGINE=ColumnStore;INSERT INTO test.contacts (first_name, last_name, email)
VALUES
("Kai", "Devi", "kai.devi@example.com"),
("Lee", "Wang", "lee.wang@example.com");SELECT * FROM test.contacts;
+------------+-----------+----------------------+
| first_name | last_name | email |
+------------+-----------+----------------------+
| Kai | Devi | kai.devi@example.com |
| Lee | Wang | lee.wang@example.com |
+------------+-----------+----------------------+CREATE TABLE test.addresses (
email VARCHAR(100),
street_address VARCHAR(255),
city VARCHAR(100),
state_code VARCHAR(2)
) ENGINE = InnoDB;INSERT INTO test.addresses (email, street_address, city, state_code)
VALUES
("kai.devi@example.com", "1660 Amphibious Blvd.", "Redwood City", "CA"),
("lee.wang@example.com", "32620 Little Blvd", "Redwood City", "CA");SELECT name AS "Name", addr AS "Address"
FROM (SELECT CONCAT(first_name, " ", last_name) AS name,
email FROM test.contacts) AS contacts
INNER JOIN (SELECT CONCAT(street_address, ", ", city, ", ", state_code) AS addr,
email FROM test.addresses) AS addr
WHERE contacts.email = addr.email;+----------+-----------------------------------------+
| Name | Address |
+----------+-----------------------------------------+
| Kai Devi | 1660 Amphibious Blvd., Redwood City, CA |
| Lee Wang | 32620 Little Blvd, Redwood City, CA |
+----------+-----------------------------------------+
+-------------------+-------------------------------------+
| Name | Address |
+-------------------+-------------------------------------+
| Walker Percy | 500 Thomas More Dr., Covington, LA |
| Flannery O'Connor | 300 Tarwater Rd., Milledgeville, GA |
+-------------------+-------------------------------------+$ sudo mariadbWelcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 38
Server version: 11.4.5-3-MariaDB-Enterprise MariaDB Enterprise Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>SELECT PLUGIN_NAME, PLUGIN_STATUS
FROM information_schema.PLUGINS
WHERE PLUGIN_LIBRARY LIKE 'ha_columnstore%';+---------------------+---------------+
| PLUGIN_NAME | PLUGIN_STATUS |
+---------------------+---------------+
| Columnstore | ACTIVE |
| COLUMNSTORE_COLUMNS | ACTIVE |
| COLUMNSTORE_TABLES | ACTIVE |
| COLUMNSTORE_FILES | ACTIVE |
| COLUMNSTORE_EXTENTS | ACTIVE |
+---------------------+---------------+CREATE DATABASE IF NOT EXISTS test;CREATE TABLE IF NOT EXISTS test.contacts (
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100)
) ENGINE=ColumnStore;INSERT INTO test.contacts (first_name, last_name, email)
VALUES
("Kai", "Devi", "kai.devi@example.com"),
("Lee", "Wang", "lee.wang@example.com");SELECT * FROM test.contacts;+------------+-----------+----------------------+
| first_name | last_name | email |
+------------+-----------+----------------------+
| Kai | Devi | kai.devi@example.com |
| Lee | Wang | lee.wang@example.com |
+------------+-----------+----------------------+CREATE TABLE test.addresses (
email VARCHAR(100),
street_address VARCHAR(255),
city VARCHAR(100),
state_code VARCHAR(2)
) ENGINE = InnoDB;INSERT INTO test.addresses (email, street_address, city, state_code)
VALUES
("kai.devi@example.com", "1660 Amphibious Blvd.", "Redwood City", "CA"),
("lee.wang@example.com", "32620 Little Blvd", "Redwood City", "CA");SELECT name AS "Name", addr AS "Address"
FROM (SELECT CONCAT(first_name, " ", last_name) AS name,
email FROM test.contacts) AS contacts
INNER JOIN (SELECT CONCAT(street_address, ", ", city, ", ", state_code) AS addr,
email FROM test.addresses) AS addr
WHERE contacts.email = addr.email;+----------+-----------------------------------------+
| Name | Address |
+----------+-----------------------------------------+
| Kai Devi | 1660 Amphibious Blvd., Redwood City, CA |
| Lee Wang | 32620 Little Blvd, Redwood City, CA |
+----------+-----------------------------------------+
+-------------------+-------------------------------------+
| Name | Address |
+-------------------+-------------------------------------+
| Walker Percy | 500 Thomas More Dr., Covington, LA |
| Flannery O'Connor | 300 Tarwater Rd., Milledgeville, GA |
+-------------------+-------------------------------------+CREATE TABLE sales_data (
sale_id INT,
product_name VARCHAR(255),
category VARCHAR(100),
sale_date DATE,
quantity INT,
price DECIMAL(10, 2)
) ENGINE=ColumnStore;INSERT INTO sales_data (sale_id, product_name, category, sale_date, quantity, price) VALUES
(1, 'Laptop', 'Electronics', '2023-01-15', 1, 1200.00),
(2, 'Mouse', 'Electronics', '2023-01-15', 2, 25.00),
(3, 'Keyboard', 'Electronics', '2023-01-16', 1, 75.00);-- Get total sales per category
SELECT category, SUM(quantity * price) AS total_sales
FROM sales_data
WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY category
ORDER BY total_sales DESC;
-- Count distinct products
SELECT COUNT(DISTINCT product_name) FROM sales_data;$ sudo yum install curl$ sudo apt install curl apt-transport-https$ curl -LsSO https://dlm.mariadb.com/enterprise-release-helpers/mariadb_es_repo_setup$ echo "${checksum} mariadb_es_repo_setup" \
| sha256sum -c -$ chmod +x mariadb_es_repo_setup$ sudo ./mariadb_es_repo_setup --token="CUSTOMER_DOWNLOAD_TOKEN" --apply \
--skip-maxscale \
--skip-tools \
--mariadb-server-version="11.4"$ sudo yum install epel-release
$ sudo yum install jemalloc$ sudo apt install libjemalloc2$ sudo apt install libjemalloc1$ sudo yum install MariaDB-server \
MariaDB-backup \
MariaDB-shared \
MariaDB-client \
MariaDB-columnstore-engine$ sudo apt install mariadb-server \
mariadb-backup \
libmariadb3 \
mariadb-client \
mariadb-plugin-columnstoreCREATE DATABASE inventory;CREATE TABLE inventory.products (
product_name VARCHAR(11) NOT NULL DEFAULT '',
supplier VARCHAR(128) NOT NULL DEFAULT '',
quantity VARCHAR(128) NOT NULL DEFAULT '',
unit_cost VARCHAR(128) NOT NULL DEFAULT ''
) ENGINE=Columnstore DEFAULT CHARSET=utf8;$ sudo cpimport -s '\t' inventory products /tmp/inventory-products.tsvLOAD DATA INFILE '/tmp/inventory-products.tsv'
INTO TABLE inventory.products;$ mariadb --quick \
--skip-column-names \
--execute="SELECT * FROM inventory.products" \
| cpimport -s '\t' inventory productsmariadb -e "FLUSH TABLES WITH READ LOCK;"save_brmmkdir -p /extent_map_backupcp -f /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_em /extent_map_backupmariadb -e "UNLOCK TABLES;"systemctl stop mariadb-columnstoremv /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_em /tmp/BRM_saves_em.bad> /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_vbbm > /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_vsscp -f /extent_map_backup/BRM_saves_em /var/lib/columnstore/data1/systemFiles/dbrm/chown -R mysql:mysql /var/lib/columnstore/data1/systemFiles/dbrm/systemctl start mariadb-columnstorecurl -s -X PUT https://127.0.0.1:8640/cmapi/0.4.0/cluster/shutdown \ --header 'Content-Type:application/json' \ --header 'x-api-key:your_api_key' \ --data '{"timeout":60}' -kmv /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_em /tmp/BRM_saves_em.bad> /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_vbbm > /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_vssmv cp -f /extent_map_backup/BRM_saves_em /var/lib/columnstore/data1/systemFiles/dbrm/chown -R mysql:mysql /var/lib/columnstore/data1/systemFiles/dbrmcurl -s -X PUT https://127.0.0.1:8640/cmapi/0.4.0/cluster/start \ --header 'Content-Type:application/json' \ --header 'x-api-key:your_api_key' \ --data '{"timeout":60}' -k$ sudo yum install curl$ sudo apt install curl apt-transport-https$ curl -LsSO https://dlm.mariadb.com/enterprise-release-helpers/mariadb_es_repo_setup$ echo "${checksum} mariadb_es_repo_setup" \
| sha256sum -c -$ chmod +x mariadb_es_repo_setup$ sudo ./mariadb_es_repo_setup --token="CUSTOMER_DOWNLOAD_TOKEN" --apply \
--skip-maxscale \
--skip-tools \
--mariadb-server-version="11.4"$ sudo yum install epel-release
$ sudo yum install jemalloc$ sudo apt install libjemalloc2$ sudo apt install libjemalloc1$ sudo yum install MariaDB-server \
MariaDB-backup \
MariaDB-shared \
MariaDB-client \
MariaDB-columnstore-engine$ sudo apt install mariadb-server \
mariadb-backup \
libmariadb3 \
mariadb-client \
mariadb-plugin-columnstoreSELECT calgetsqlcount();SELECT /*! INFINIDB_ORDERED */ r_regionkey
FROM region r, customer c, nation n
WHERE r.r_regionkey = n.n_regionkey
AND n.n_nationkey = c.c_nationkeymaxctrl set server \
mcs3 \
maintenancemaxctrl set server \
mcs3 \
maintenance \
--forcemaxctrl list serversmaxctrl clear server \
mcs3 \
maintenancemaxctrl list serversSHOW CREATE TABLE DATABASE_NAME.TABLE_NAME\GSELECT * INTO OUTFILE '/path/to/DATABASE_NAME-TABLE_NAME.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM DATABASE_NAME.TABLE_NAME;mariadb --host HOST --port PORT --user USER --password < schema-backup.sqlSHOW CREATE TABLE DATABASE_NAME.TABLE_NAME\GSELECT COUNT(*) FROM DATABASE_NAME.TABLE_NAME;SELECT * FROM DATABASE_NAME.TABLE_NAME LIMIT 100;SELECT * FROM tab;SELECT col1, col2 FROM tab;SELECT DISTINCT col1 FROM tab LIMIT 10000;SELECT DISTINCT col1 FROM tab LIMIT 100;SET SESSION columnstore_decimal_overflow_check=ON;
SELECT (big_decimal1 * big_decimal2) AS product
FROM columnstore_tab;$ sudo yum install curl$ sudo apt install curl apt-transport-https$ curl -LsSO https://dlm.mariadb.com/enterprise-release-helpers/mariadb_es_repo_setup$ echo "${checksum} mariadb_es_repo_setup" \
| sha256sum -c -$ chmod +x mariadb_es_repo_setup$ sudo ./mariadb_es_repo_setup --token="CUSTOMER_DOWNLOAD_TOKEN" --apply \
--skip-maxscale \
--skip-tools \
--mariadb-server-version="11.4"$ sudo yum install jemalloc jq curl$ sudo apt install libjemalloc1 jq curl$ sudo apt install libjemalloc2 jq curl$ sudo yum install MariaDB-server \
MariaDB-backup \
MariaDB-shared \
MariaDB-client \
MariaDB-columnstore-engine \
MariaDB-columnstore-cmapi$ sudo apt install mariadb-server \
mariadb-backup \
libmariadb3 \
mariadb-client \
mariadb-plugin-columnstore \
mariadb-columnstore-cmapiCREATE DATABASE inventory;CREATE TABLE inventory.products (
product_name VARCHAR(11) NOT NULL DEFAULT '',
supplier VARCHAR(128) NOT NULL DEFAULT '',
quantity VARCHAR(128) NOT NULL DEFAULT '',
unit_cost VARCHAR(128) NOT NULL DEFAULT ''
) ENGINE=Columnstore DEFAULT CHARSET=utf8;$ sudo cpimport -s '\t' inventory products /tmp/inventory-products.tsvLOAD DATA INFILE '/tmp/inventory-products.tsv'
INTO TABLE inventory.products;$ mariadb --quick \
--skip-column-names \
--execute="SELECT * FROM inventory.products" \
| cpimport -s '\t' inventory products$ sudo yum install curl$ sudo apt install curl apt-transport-https$ curl -LsSO https://dlm.mariadb.com/enterprise-release-helpers/mariadb_es_repo_setup
$ echo "${checksum} mariadb_es_repo_setup" \
| sha256sum -c -
$ chmod +x mariadb_es_repo_setup
$ sudo ./mariadb_es_repo_setup --token="CUSTOMER_DOWNLOAD_TOKEN" --apply \
--skip-server \
--skip-tools \
--mariadb-maxscale-version="22.08"$ sudo yum install maxscale$ sudo apt install maxscaleFor RPM-based systems, run this command:
Refer to this blog post for additional information.
For DEB-based systems, run these commands:
Start and Enable Services
Configure the Initial RW Node
On the primary RW node, set up the cluster API key (use a secure API key):
Add the Initial RW Node to the Cluster
Run this from the primary RW node:
Add Read Replica Nodes
From the primary RW node, add each read replica:
Verify the Cluster
Check the status to ensure nodes are added and the cluster is healthy:
Configure Replication Between Nodes
See for instructions on how to set up replication, and this page for instructions how to create user accounts and configure replication for multi-node local storage.
Configure MaxScale
See this page for instructions.
Scales read via MaxScale.
Enterprise Server 10.3+, MaxScale 2.5+
Scales reads via MaxScale Enterprise Server 10.3+, MariaDB Enterprise Cluster (powered by Galera), MaxScale 2.5+
Enterprise Server, Enterprise ColumnStore, MaxScale
Optional Read Replica topology
Enterprise Server, Enterprise ColumnStore, MaxScale





In a ColumnStore Object Storage topology, MariaDB Enterprise ColumnStore requires the Storage Manager directory to be located on shared local storage.
The Storage Manager directory is at the following path:
/var/lib/columnstore/storagemanager
The Storage Manager directory must be mounted on every ColumnStore node.
Select a Shared Local Storage solution for the Storage Manager directory:
For additional information, see "Shared Local Storage Options".
EBS is a high-performance block-storage service for AWS (Amazon Web Services). EBS Multi-Attach allows an EBS volume to be attached to multiple instances in AWS. Only clustered file systems, such as GFS2, are supported.
For Enterprise ColumnStore deployments in AWS:
EBS Multi-Attach is a recommended option for the Storage Manager directory.
Amazon S3 storage is the recommended option for data.
Consult the vendor documentation for details on how to configure EBS Multi-Attach.
EFS is a scalable, elastic, cloud-native NFS file system for AWS (Amazon Web Services)
For deployments in AWS:
EFS is a recommended option for the Storage Manager directory.
Amazon S3 storage is the recommended option for data.
Consult the vendor documentation for details on how to configure EFS.
Filestore is high-performance, fully managed storage for GCP (Google Cloud Platform).
For Enterprise ColumnStore deployments in GCP:
Filestore is the recommended option for the Storage Manager directory.
Google Object Storage (S3-compatible) is the recommended option for data.
Consult the vendor documentation for details on how to configure Filestore.
GlusterFS is a distributed file system. GlusterFS is a shared local storage option, but it is not one of the recommended options.
For more information, see "Recommended Storage Options".
On each Enterprise ColumnStore node, install GlusterFS.
Install on CentOS / RHEL 8 (YUM):
Install on CentOS / RHEL 7 (YUM):
Install on Debian (APT):
Install on Ubuntu (APT):
Start the GlusterFS daemon:
Before you can create a volume with GlusterFS, you must probe each node from a peer node.
On the primary node, probe all of the other cluster nodes:
On one of the replica nodes, probe the primary node to confirm that it is connected:
On the primary node, check the peer status:
Create the GlusterFS volumes for MariaDB Enterprise ColumnStore. Each volume must have the same number of replicas as the number of Enterprise ColumnStore nodes.
On each Enterprise ColumnStore node, create the directory for each brick in the /brick directory:
On the primary node, create the GlusterFS volumes:
On the primary node, start the volume:
On each Enterprise ColumnStore node, create mount points for the volumes:
On each Enterprise ColumnStore node, add the mount points to /etc/fstab:
On each Enterprise ColumnStore node, mount the volumes:
NFS is a distributed file system. NFS is available in most Linux distributions. If NFS is used for an Enterprise ColumnStore deployment, the storage must be mounted with the sync option to ensure that each node flushes its changes immediately.
For on-premises deployments:
NFS is the recommended option for the Storage Manager directory.
Any S3-compatible storage is the recommended option for data.
Consult the documentation for your NFS implementation for details on how to configure NFS.
Navigation in the procedure "Deploy ColumnStore Object Storage Topology":
This page was step 2 of 9.
In a ColumnStore Object Storage topology, MariaDB Enterprise ColumnStore requires the Storage Manager directory to be located on shared local storage.
The Storage Manager directory is at the following path:
/var/lib/columnstore/storagemanager
The N in dataN represents a range of integers that starts at 1 and stops at the number of nodes in the deployment. For example, with a 3-node Enterprise ColumnStore deployment, this would refer to the following directories:
/var/lib/columnstore/data1
/var/lib/columnstore/data2
/var/lib/columnstore/data3
The DB Root directories must be mounted on every ColumnStore node.
Select a Shared Local Storage solution for the Storage Manager directory:
EBS (Elastic Block Store) Multi-Attach
EFS (Elastic File System)
Filestore
GlusterFS
NFS (Network File System)
For additional information, see "Shared Local Storage Options".
EBS is a high-performance block-storage service for AWS (Amazon Web Services). EBS Multi-Attach allows an EBS volume to be attached to multiple instances in AWS. Only clustered file systems, such as GFS2, are supported.
For Enterprise ColumnStore deployments in AWS:
EBS Multi-Attach is a recommended option for the Storage Manager directory.
Amazon S3 storage is the recommended option for data.
Consult the vendor documentation for details on how to configure EBS Multi-Attach.
EFS is a scalable, elastic, cloud-native NFS file system for AWS (Amazon Web Services)
For deployments in AWS:
EFS is a recommended option for the Storage Manager directory.
Amazon S3 storage is the recommended option for data.
Consult the vendor documentation for details on how to configure EFS.
Filestore is high-performance, fully managed storage for GCP (Google Cloud Platform).
For Enterprise ColumnStore deployments in GCP:
Filestore is the recommended option for the Storage Manager directory.
Google Object Storage (S3-compatible) is the recommended option for data.
Consult the vendor documentation for details on how to configure Filestore.
GlusterFS is a distributed file system.
GlusterFS is a shared local storage option, but it is not one of the recommended options.
For more information, see "Recommended Storage Options".
On each Enterprise ColumnStore node, install GlusterFS.
Install on CentOS / RHEL 8 (YUM):
Install on CentOS / RHEL 7 (YUM):
Install on Debian (APT):
Install on Ubuntu (APT):
Start the GlusterFS daemon:
Before you can create a volume with GlusterFS, you must probe each node from a peer node.
On the primary node, probe all of the other cluster nodes:
On one of the replica nodes, probe the primary node to confirm that it is connected:
On the primary node, check the peer status:
Number of Peers: 2
Create the GlusterFS volumes for MariaDB Enterprise ColumnStore. Each volume must have the same number of replicas as the number of Enterprise ColumnStore nodes.
On each Enterprise ColumnStore node, create the directory for each brick in the /brick directory:
On the primary node, create the GlusterFS volumes:
On the primary node, start the volume:
On each Enterprise ColumnStore node, create mount points for the volumes:
On each Enterprise ColumnStore node, add the mount points to /etc/fstab:
On each Enterprise ColumnStore node, mount the volumes:
NFS is a distributed file system. NFS is available in most Linux distributions. If NFS is used for an Enterprise ColumnStore deployment, the storage must be mounted with the sync option to ensure that each node flushes its changes immediately.
For on-premises deployments:
NFS is the recommended option for the Storage Manager directory.
Any S3-compatible storage is the recommended option for data.
Consult the documentation for your NFS implementation for details on how to configure NFS.
Navigation in the procedure "Deploy ColumnStore Shared Local Storage Topology".
This page was step 2 of 9.
MariaDB Enterprise ColumnStore performs best with Linux kernel optimizations.
On each server to host an Enterprise ColumnStore node, optimize the kernel:
Set the relevant kernel parameters in a sysctl configuration file. To ensure proper change management, use an Enterprise ColumnStore-specific configuration file.
Create a /etc/sysctl.d/90-mariadb-enterprise-columnstore.conf file:
Use the sysctl command to set the kernel parameters at runtime
The Linux Security Modules (LSM) should be temporarily disabled on each Enterprise ColumnStore node during installation.
The LSM will be configured and re-enabled later in this deployment procedure.
The steps to disable the LSM depend on the specific LSM used by the operating system.
SELinux must be set to permissive mode before installing MariaDB Enterprise ColumnStore.
To set SELinux to permissive mode:
Set SELinux to permissive mode:
Set SELinux to permissive mode by setting SELINUX=permissive in /etc/selinux/config.
For example, the file will usually look like this after the change:
Confirm that SELinux is in permissive mode:
SELinux will be configured and re-enabled later in this deployment procedure. This configuration is not persistent. If you restart the server before configuring and re-enabling SELinux later in the deployment procedure, you must reset the enforcement to permissive mode.
AppArmor must be disabled before installing MariaDB Enterprise ColumnStore.
Disable AppArmor:
Reboot the system.
Confirm that no AppArmor profiles are loaded using aa-status:
AppArmor will be configured and re-enabled later in this deployment procedure.
When using MariaDB Enterprise ColumnStore, it is recommended to set the system's locale to UTF-8.
On RHEL 8, install additional dependencies:
Set the system's locale to en_US.UTF-8 by executing localedef:
Navigation in the Single-Node Enterprise ColumnStore topology with Local storage deployment procedure:
This page was step 1 of 5.
The following table lists the recommended optimized Linux kernel parameters for MariaDB ColumnStore:
For more information refer to .
vm.overcommit_memory
1
Disables overcommitting of memory, ensuring sufficient memory is available for MariaDB ColumnStore.
vm.dirty_background_ratio
5
Sets the percentage of dirty memory that can be written back to disk in the background. A lower value reduces the amount of dirty memory, improving performance.
vm.dirty_ratio
10
Sets the percentage of dirty memory that can be written back to disk before the kernel starts to write out clean pages. A lower value reduces the amount of dirty memory, improving performance.
vm.vfs_cache_pressure
50
To configure these parameters, you can add them to the /etc/sysctl.conf file. For example:
After making changes to the /etc/sysctl.conf file, you need to apply the changes by running the following command:
These optimized parameters are recommended for all MariaDB ColumnStore deployments, regardless of the specific workload. They can improve performance for various use cases, including:
Large-scale data warehousing
Real-time analytics
Business intelligence
Machine learning
By optimizing the Linux kernel parameters, you can significantly improve the performance of your MariaDB ColumnStore deployments. These recommendations provide a starting point for optimizing your system, and you may need to adjust the values based on your specific hardware and workload.
Extent Maps: ColumnStore maintains metadata about each column in a shared distributed object known as the Extent Map. The primary node references the Extent Map to help assist in generating the correct primitive job steps. The primary node server references the Extent Map to identify the correct disk blocks to read. Each column is made up of one or more files and each file can contain multiple extents. As much as possible the system attempts to allocate contiguous physical storage to improve read performance.
Storage: ColumnStore can use either local storage or shared storage (e.g. SAN or EBS) to store data. Using shared storage allows for data processing to fail over to another node automatically in case of a server failing.
The system supports full MVCC ACID transactional logic via Insert, Update, and Delete statements. The MVCC architecture allows for concurrent query and DML / batch load. Although DML is supported, the system is optimized more for batch inserts and so larger data loads should be achieved through a batch load. The most flexible and optimal way to load data is via the cpimport tool. This tool optimizes the load path and can be run centrally or in parallel on each server.
If the data contains a time or (time correlated ascending value) column then significant performance gains will be achieved if the data is sorted by this field and also typically queried with a where clause on that column. This is because the system records a minimum and maximum value for each extent providing for a system maintained range partitioning scheme. This allows the system to completely eliminate scanning an extent map if the query includes a where clause for that field limiting the results to a subset of extent maps.
MariaDB ColumnStore has its own query optimizer and execution engine distinct from the MariaDB server implementation. This allows for scaling out query execution to multiple servers, and to optimize for handling data stored as columns rather than rows. As such, the factors influencing query performance are very different:
A query is first parsed by the MariaDB server (mariadbd) process and passed through to the ColumnStore storage engine. This passes the request onto the PrimProc process which is responsible for optimizing and orchestrating execution of the query. The PrimProc module's optimizer creates a series of batch primitive steps that are executed on all nodes in the cluster. Since multiple servers can be deployed, this allows for scale-out execution of the queries. The optimizer attempts to process query execution in parallel. However, certain operations inherently must be executed centrally, for example final result ordering. Filtering, joins, aggregates, and GROUP BY clauses are general.y pushed down and executed in parallel in PrimProc on all servers. In PrimProc, batch primitive steps are performed at a granular level where individual threads operate on individual 1K-8K blocks within an extent. This enables a larger multi-core server to be fully consumed and scale out within a single server. The current batch primitive steps available in the system include:
Single Column Scan: Scan one or more Extents for a given column based on a single column predicate, including operators like =, <>, IN (list), BETWEEN, and ISNULL. See the first scan section of performance configuration for additional details on tuning this.
Additional Single Column Filters: Project additional columns for any rows found by a previous scan and apply additional single column predicates as needed. Access of blocks is based on row identifier, going directly to the blocks. See the additional column read section of performance configuration for additional details on tuning this.
Table Level Filters: Project additional columns as required for any table level filters such as column1 < column2, or more advanced functions and expressions. Access of blocks is again based on row identifier, going directly to the blocks.
Project Join Columns for Joins: Project additional join columns as needed for any join operations. Access of blocks is again based on row identifier, going directly to the blocks. See the join tuning section of for additional details on tuning this.
Execute Multi-Join: Apply one or more hash join operation against projected join columns, and use that value to probe a previously built hash map. Build out tuples as needed to satisfy inner or outer join requirements. See the multi-table join section of for additional details on tuning this.
Cross-Table Level Filters: Project additional columns from the range of rows for the Primitive Step as needed for any cross-table level filters such as table1.column1 < table2.column2, or more advanced functions and expressions. Access of blocks is again based on row identifier, going directly to the blocks.
Aggregation/Distinct Operation Part 1: Apply any local group by, distinct, or aggregation operation against the set of joined rows assigned to a given Batch Primitive. Part 1 of this process is handled by PrimProc.
Aggregation/Distinct Operation Part 2: Apply any final group by, distinct, or aggregation operation against the set of joined rows assigned to a given Batch Primitive. This processing is handled by PrimProc. See the memory management section of for additional details on tuning this.
The following items should be considered when thinking about query execution in ColumnStore vs a row based store such as InnoDB.
ColumnStore is optimized for large scale aggregation / OLAP queries over large data sets. As such indexes typically used to optimize query access for row based systems do not make sense since selectivity is low for such queries. Instead ColumnStore gains performance by only scanning necessary columns, utilizing system maintained partitioning, and utilizing multiple threads and servers to scale query response time.
Since ColumnStore only reads the necessary columns to resolve a query, only include the necessary columns required. For example, SELECT * is significantly slower than SELECT col1, col2 FROM tbl.
Datatype size is important. If say you have a column that can only have values 0 through 100 then declare this as a tinyint as this will be represented with 1 byte rather than 4 bytes for int. This reduces the I/O cost by 4 times.
For string types, an important threshold is CHAR(9) and VARCHAR(8) or greater. Each column storage file uses a fixed number of bytes per value. This enables fast positional lookup of other columns to form the row. Currently the upper limit for columnar data storage is 8 bytes. So. for strings longer than this, the system maintains an additional 'dictionary' extent where the values are stored. The columnar extent file then stores a pointer into the dictionary. For example, it is more expensive to read and process a VARCHAR(8) column than a CHAR(8) column. Where possible, you get better performance if you can utilize shorter strings, especially if you avoid the dictionary lookup. All TEXT/BLOB data types in ColumnStore 1.1 onward utilize a dictionary and do a multiple-block 8KB lookup to retrieve that data if required. The longer the data, the more blocks are retrieved, and the greater is a potential performance impact.
In a row-based system, adding redundant columns adds to the overall query cost, but in a columnar system a cost is only occurred if the column is referenced. Therefore, additional columns should be created to support different access paths. For instance, store a leading portion of a field in one column to allow for faster lookups, but additionally store the long-form value as another column. Scans on a shorter code or a leading-portion column are faster.
ColumnStore distributes function application across all nodes for greater performance, but this requires a distributed implementation of the function in addition to the MariaDB server implementation. See Distributed Functions for the full list.
Hash joins are utilized by ColumnStore to optimize for large scale joins and avoid the need for indexes and the overhead of nested loop processing. ColumnStore maintains table statistics so as to determine the optimal join order. This is implemented by first identifying the small table side (based on extent map data) and materializing the necessary rows from that table for the join. If the size of this is less than the configuration setting PmMaxMemorySmallSide, the join is pushed down into PrimProc for distributed in-memory processing. Otherwise, the larger side rows is not processed in a distributed manner for joining, and only the WHERE clause on that side is executed across all PrimProc modules in the cluster. If the join is too large for memory, disk-based join can be enabled to allow the query to complete.
Similarly to scalar functions ColumnStore distributes aggregate evaluation as much as possible. However some post processing is required to combine the final results. Enough memory must exist to handle queries with a very large number of values in the aggregate columns.
Aggregation performance is also influenced by the number of distinct aggregate column values. Generally, the same number of rows with 100 distinct values computes faster than 10000 distinct values. This is due to increased memory management as well as transfer overhead.
ORDER BY and LIMIT are implemented at the very end by the mariadbd server process on the temporary result set table. This means that the unsorted results must be fully retrieved before either are applied. The performance overhead of this is minimal on small to medium results, but for larger results, it can be significant.
Subqueries are executed in sequence thus the subquery intermediate results must be materialized and then the join logic applies with the outer query.
Window functions are executed as part of final aggregation in PrimProc due to the need for ordering of the window results. The ColumnStore window function engines uses a dedicated faster sort process.
Automated system partitioning of columns is provided by ColumnStore. As data is loaded into extent maps, the system will capture and maintain min/max values of column data in that extent map. New rows are appended to each extent map until full at which point a new extent map is created. For column values that are ordered or semi-ordered this allows for very effective data partitioning. By using the min and max values, entire extent maps can be eliminated and not read to filter data. This generally works particularly well for time dimension / series data or similar values that increase over time.
Unnecessary
Compression
Yes
High Availability (HA)
Yes
Main Memory Caching
Yes
Transaction Logging
Yes
Garbage Collection
Yes
Online Schema changes
Yes
Non-locking Reads
Yes
Set this system variable to utf8
Set this system variable to utf8_general_ci
columnstore_use_import_for_batchinsert
Set this system variable to ALWAYS to always use cpimport for and statements.
Shell
SQL access is not required
SQL
Shell access is not required
Remote Database
Use normal database client
Avoid dumping data to intermediate filed
This guide provides steps for deploying a single-node ColumnStore, setting up the environment, installing the software, and bulk importing data for online analytical processing (OLAP) workloads.
This procedure describes the deployment of the Single-Node Enterprise ColumnStore topology with Local storage.
MariaDB Enterprise ColumnStore is a columnar storage engine for MariaDB Enterprise Server 10.6. Enterprise ColumnStore is best suited for Online Analytical Processing (OLAP) workloads.
This procedure has 5 steps, which are executed in sequence.
This page provides an overview of the topology, requirements, and deployment procedures.
Please read and understand this procedure before executing.
Customers can obtain support by .
The following components are deployed during this procedure:
The Single-Node Enterprise ColumnStore topology provides support for Online Analytical Processing (OLAP) workloads to MariaDB Enterprise Server.
The Enterprise ColumnStore node:
Receives queries from the application
Executes queries
Uses the local disk for storage.
Single-Node Enterprise ColumnStore does not provide high availability (HA) for Online Analytical Processing (OLAP). If you would like to deploy Enterprise ColumnStore with high availability, see .
These requirements are for the Single-Node Enterprise ColumnStore, when deployed with MariaDB Enterprise Server 10.6 and MariaDB Enterprise ColumnStore 23.10.
Debian 11 (x86_64, ARM64)
Debian 12 (x86_64, ARM64)
Red Hat Enterprise Linux 8 (x86_64, ARM64)
Red Hat Enterprise Linux 9 (x86_64, ARM64)
MariaDB Enterprise ColumnStore's minimum hardware requirements are not intended for production environments, but the minimum hardware requirements can be appropriate for development and test environments. For production environments, see the instead.
The minimum hardware requirements are:
MariaDB Enterprise ColumnStore will refuse to start if the system has less than 3 GB of memory.
If Enterprise ColumnStore is started on a system with less memory, the following error message will be written to the ColumnStore system log called crit.log:
And the following error message will be raised to the client:
MariaDB Enterprise ColumnStore's recommended hardware requirements are intended for production analytics.
The recommended hardware requirements are:
MariaDB Enterprise Server packages are configured to read configuration files from different paths, depending on the operating system. Making custom changes to Enterprise Server default configuration files is not recommended because custom changes may be overwritten by other default configuration files that are loaded later.
To ensure that your custom changes will be read last, create a custom configuration file with the z- prefix in one of the include directories.
The systemctl command is used to start and stop the MariaDB Enterprise Server service.
Navigation in the Single-Node Enterprise ColumnStore topology with Local storage deployment procedure:
Next: Step 1: Install MariaDB Enterprise ColumnStore 23.10.
Step 7: Start and Configure MariaDB MaxScale
This page details step 7 of the 9-step procedure "Deploy ColumnStore Shared Local Storage Topology".
This step starts and configures MariaDB MaxScale 22.08.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
MariaDB MaxScale installations include a configuration file with some example objects. This configuration file should be replaced.
On the MaxScale node, replace the default /etc/maxscale.cnf with the following configuration:
For additional information, see "Global Parameters".
On the MaxScale node, restart the MaxScale service to ensure that MaxScale picks up the new configuration:
For additional information, see "Start and Stop Services".
On the MaxScale node, use to create a server object for each Enterprise ColumnStore node:
MaxScale uses monitors to retrieve additional information from the servers. This information is used by other services in filtering and routing connections based on the current state of the node. For MariaDB Enterprise ColumnStore, use the MariaDB Monitor (mariadbmon).
On the MaxScale node, use to create a MariaDB Monitor:
In this example:
columnstore_monitor is an arbitrary name that is used to identify the new monitor.
mariadbmon is the name of the module that implements the MariaDB Monitor.
user=MAXSCALE_USER sets the user parameter to the database user account that MaxScale uses to monitor the ColumnStore nodes.
Routers control how MaxScale balances the load between Enterprise ColumnStore nodes. Each router uses a different approach to routing queries. Consider the specific use case of your application and database load and select the router that best suits your needs.
Use to route connections to replica servers for a read-only pool.
On the MaxScale node, use to create a router:
In this example:
connection_router_service is an arbitrary name that is used to identify the new service.
readconnroute is the name of the module that implements the Read Connection Router.
user=MAXSCALE_USER sets the user parameter to the database user account that MaxScale uses to connect to the ColumnStore nodes.
These instructions reference TCP port 3308. You can use a different TCP port. The TCP port used must not be bound by any other listener.
On the MaxScale node, use the command to configure MaxScale to use a listener for the :
In this example:
connection_router_service is the name of the readconnroute service that was previously created.
connection_router_listener is an arbitrary name that is used to identify the new listener.
3308 is the TCP port.
MaxScale performs query-based load balancing. The router routes write queries to the primary and read queries to the replicas.
On the MaxScale node, use the maxctrl create service command to configure MaxScale to use the :
In this example:
query_router_service is an arbitrary name that is used to identify the new service.
readwritesplit is the name of the module that implements the Read/Write Split Router.
user=MAXSCALE_USER sets the user parameter to the database user account that MaxScale uses to connect to the ColumnStore nodes.
These instructions reference TCP port 3307. You can use a different TCP port. The TCP port used must not be bound by any other listener.
On the MaxScale node, use the command to configure MaxScale to use a listener for the :
In this example:
query_router_service is the name of the readwritesplit service that was previously created.
query_router_listener is an arbitrary name that is used to identify the new listener.
3307 is the TCP port.
protocol=MariaDBClient
To start the services and monitors, on the MaxScale node use :
Navigation in the procedure "Deploy ColumnStore Shared Local Storage Topology".
This page was step 7 of 9.
Step 1: Prepare ColumnStore Nodes
This page details step 1 of the 9-step procedure "Deploy ColumnStore Object Storage Topology".
This step prepares systems to host MariaDB Enterprise Server and MariaDB Enterprise ColumnStore.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
MariaDB Enterprise ColumnStore performs best with Linux kernel optimizations.
On each server to host an Enterprise ColumnStore node, optimize the kernel:
Set the relevant kernel parameters in a sysctl configuration file. To ensure proper change management, use an Enterprise ColumnStore-specific configuration file.
Create a /etc/sysctl.d/90-mariadb-enterprise-columnstore.conf file:
Use the sysctl command to set the kernel parameters at runtime
The Linux Security Modules (LSM) should be temporarily disabled on each Enterprise ColumnStore node during installation.
The LSM will be configured and re-enabled later in this deployment procedure.
The steps to disable the LSM depend on the specific LSM used by the operating system.
SELinux must be set to permissive mode before installing MariaDB Enterprise ColumnStore.
To set SELinux to permissive mode:
Set SELinux to permissive mode:
Set SELinux to permissive mode by setting SELINUX=permissive in /etc/selinux/config.
For example, the file will usually look like this after the change:
Confirm that SELinux is in permissive mode:
SELinux will be configured and re-enabled later in this deployment procedure. This configuration is not persistent. If you restart the server before configuring and re-enabling SELinux later in the deployment procedure, you must reset the enforcement to permissive mode.
AppArmor must be disabled before installing MariaDB Enterprise ColumnStore.
Disable AppArmor:
Reboot the system.
Confirm that no AppArmor profiles are loaded using aa-status:
AppArmor will be configured and re-enabled later in this deployment procedure.
MariaDB Enterprise ColumnStore requires the following TCP ports:
The firewall should be temporarily disabled on each Enterprise ColumnStore node during installation.
The firewall will be configured and re-enabled later in this deployment procedure.
The steps to disable the firewall depend on the specific firewall used by the operating system.
Check if the firewalld service is running:
If the firewalld service is running, stop it:
Firewalld will be configured and re-enabled later in this deployment procedure.
Check if the UFW service is running:
If the UFW service is running, stop it:
UFW will be configured and re-enabled later in this deployment procedure.
To install Enterprise ColumnStore on Amazon Web Services (AWS), the security group must be modified prior to installation.
Enterprise ColumnStore requires all internal communications to be open between Enterprise ColumnStore nodes. Therefore, the security group should allow all protocols and all ports to be open between the Enterprise ColumnStore nodes and the MaxScale proxy.
When using MariaDB Enterprise ColumnStore, it is recommended to set the system's locale to UTF-8.
On RHEL 8, install additional dependencies:
Set the system's locale to en_US.UTF-8 by executing localedef:
MariaDB Enterprise ColumnStore requires all nodes to have host names that are resolvable on all other nodes. If your infrastructure does not configure DNS centrally, you may need to configure static DNS entries in the /etc/hosts file of each server.
On each Enterprise ColumnStore node, edit the /etc/hosts file to map host names to the IP address of each Enterprise ColumnStore node:
Replace the IP addresses with the addresses in your own environment.
With the ColumnStore Object Storage topology, it is important to create the S3 bucket before you start ColumnStore. All Enterprise ColumnStore nodes access data from the same bucket.
If you already have an S3 bucket, confirm that the bucket is empty.
S3 bucket configuration will be performed later in this procedure.
Navigation in the procedure "Deploy ColumnStore Object Storage Topology":
This page was step 1 of 9.
.
MariaDB Enterprise ColumnStore supports backup and restore. If Enterprise ColumnStore uses S3-compatible object storage for data and shared local storage for the Storage Manager directory, the S3 bucket, the Storage Manager directory, and the MariaDB data directory must be backed up separately.
MariaDB Enterprise ColumnStore supports multiple .
This page discusses how to backup and restore Enterprise ColumnStore when it uses for data and shared local storage (such as NFS) for the .
Any file can become corrupt due to hardware issues, crashes, power loss, and other reasons. If the Enterprise ColumnStore data or metadata become corrupt, Enterprise ColumnStore could become unusable, resulting in data loss.
If Enterprise ColumnStore is your , it should be backed up regularly.
If Enterprise ColumnStore uses S3-compatible object storage for data and shared local storage for the , the following items must be backed up:
The MariaDB Data directory is backed up using .
The S3 bucket must be backed up using the vendor's snapshot procedure.
The must be backed up.
See the instructions below for more details.
Use the following process to take a backup:
Determine which node is the primary server using curl to send the status command to the CMAPI Server:
The output will show "dbrm_mode": "master" for the primary server:
Connect to the primary server using MariaDB Client as a user account that has privileges to lock the database:
Lock the database with the statement:
Ensure that the client remains connected to the primary server, so that the lock is held for the remaining steps.
Make a copy or snapshot of the . By default, it is located at /var/lib/columnstore/storagemanager.
For example, to make a copy of the directory with rsync:
Use to backup the MariaDB data directory:
Use to prepare the backup:
Create a snapshot of the S3-compatible storage. Consult the storage vendor's manual for details on how to do this.
Ensure that all previous operations are complete.
In the original client connection to the primary server, unlock the database with the statement:
Use the following process to restore a backup:
, so that you can restore the backup to an empty deployment.
Ensure that all services are stopped on each node:
Restore the backup of the . By default, it is located at /var/lib/columnstore/storagemanager.
For example, to restore the backup with rsync:
Use to restore the backup of the MariaDB data directory:
Restore the snapshot of your S3-compatible storage to the new S3 bucket that you plan to use. Consult the storage vendor's manual for details on how to do this.
Update storagemanager.cnf to configure Enterprise ColumnStore to use the S3 bucket. By default, it is located at /etc/columnstore/storagemanager.cnf.
For example:
The default local cache size is 2 GB.
The default local cache path is /var/lib/columnstore/storagemanager/cache.
Ensure that the local cache path has sufficient store space to store the local cache.
The bucket option must be set to the name of the bucket that you created from your snapshot in the previous step.
Start the services on each node:
Step 7: Start and Configure MariaDB MaxScale
This page details step 7 of the 9-step procedure "Deploy ColumnStore Object Storage Topology".
This step starts and configures MariaDB MaxScale 22.08.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
MariaDB MaxScale installations include a configuration file with some example objects. This configuration file should be replaced.
On the MaxScale node, replace the default /etc/maxscale.cnf with the following configuration:
For additional information, see "Global Parameters".
On the MaxScale node, restart the MaxScale service to ensure that MaxScale picks up the new configuration:
For additional information, see "Start and Stop Services".
On the MaxScale node, use to create a server object for each Enterprise ColumnStore node:
MaxScale uses monitors to retrieve additional information from the servers. This information is used by other services in filtering and routing connections based on the current state of the node. For MariaDB Enterprise ColumnStore, use the MariaDB Monitor (mariadbmon).
On the MaxScale node, use to create a MariaDB Monitor:
In this example:
columnstore_monitor is an arbitrary name that is used to identify the new monitor.
mariadbmon is the name of the module that implements the MariaDB Monitor.
user=MAXSCALE_USER sets the user parameter to the database user account that MaxScale uses to monitor the ColumnStore nodes.
Routers control how MaxScale balances the load between Enterprise ColumnStore nodes. Each router uses a different approach to routing queries. Consider the specific use case of your application and database load and select the router that best suits your needs.
Use to route connections to replica servers for a read-only pool.
On the MaxScale node, use to create a router:
In this example:
connection_router_service is an arbitrary name that is used to identify the new service.
readconnroute is the name of the module that implements the Read Connection Router.
user=MAXSCALE_USER sets the user parameter to the database user account that MaxScale uses to connect to the ColumnStore nodes.
These instructions reference TCP port 3308. You can use a different TCP port. The TCP port used must not be bound by any other listener.
On the MaxScale node, use the command to configure MaxScale to use a listener for the :
In this example:
connection_router_service is the name of the readconnroute service that was previously created.
connection_router_listener is an arbitrary name that is used to identify the new listener.
3308 is the TCP port.
MaxScale performs query-based load balancing. The router routes write queries to the primary and read queries to the replicas.
On the MaxScale node, use the maxctrl create service command to configure MaxScale to use the :
In this example:
query_router_service is an arbitrary name that is used to identify the new service.
readwritesplit is the name of the module that implements the Read/Write Split Router.
user=MAXSCALE_USER sets the user parameter to the database user account that MaxScale uses to connect to the ColumnStore nodes.
These instructions reference TCP port 3307. You can use a different TCP port. The TCP port used must not be bound by any other listener.
On the MaxScale node, use the command to configure MaxScale to use a listener for the :
In this example:
query_router_service is the name of the readwritesplit service that was previously created.
query_router_listener is an arbitrary name that is used to identify the new listener.
3307 is the TCP port.
protocol=MariaDBClient
To start the services and monitors, on the MaxScale node use :
Navigation in the procedure "Deploy ColumnStore Object Storage Topology":
This page was step 7 of 9.
MariaDB Enterprise ColumnStore supports backup and restore. If Enterprise ColumnStore uses shared local storage for the DB Root directories, the DB Root directories and the MariaDB data directory must be backed up separately.
MariaDB Enterprise ColumnStore supports multiple .
This page discusses how to backup and restore Enterprise ColumnStore when it uses (such as NFS) for the .
Any file can become corrupt due to hardware issues, crashes, power loss, and other reasons. If the Enterprise ColumnStore data or metadata become corrupt, Enterprise ColumnStore could become unusable, resulting in data loss.
If Enterprise ColumnStore is your , it should be backed up regularly.
If Enterprise ColumnStore uses for the , the following items must be backed up:
The MariaDB Data directory is backed up using
The must be backed up
Each must be backed up
See the instructions below for more details.
Use the following process to take a backup:
Determine which node is the primary server using curl to send the status command to the CMAPI Server:
The output will show dbrm_mode: master for the primary server:
Connect to the primary server using MariaDB Client as a user account that has privileges to lock the database:
Lock the database with the statement:
Ensure that the client remains connected to the primary server, so that the lock is held for the remaining steps.
Make a copy or snapshot of the . By default, it is located at /var/lib/columnstore/storagemanager.
For example, to make a copy of the directory with rsync:
Make a copy or snapshot of the . By default, they are located at /var/lib/columnstore/dataN, where the N in dataN represents a range of integers that starts at 1 and stops at the number of nodes in the deployment.
For example, to make a copy of the directories with rsync in a 3-node deployment:
Use to backup the :
Use to prepare the backup:
Ensure that all previous operations are complete.
In the original client connection to the primary server, unlock the database with the statement:
Use the following process to restore a backup:
, so that you can restore the backup to an empty deployment.
Ensure that all services are stopped on each node:
Restore the backup of the . By default, it is located at /var/lib/columnstore/storagemanager.
For example, to restore the backup with rsync:
Restore the backup of the DB Root directories. By default, they are located at /var/lib/columnstore/dataN, where the N in dataN represents a range of integers that starts at 1 and stops at the number of nodes in the deployment.
For example, to restore the backup with rsync in a 3-node deployment:
Use to restore the backup of the MariaDB data directory:
Start the services on each node:
Step 3: Start and Configure Enterprise ColumnStore
This page details step 3 of a 5-step procedure for deploying Single-Node Enterprise ColumnStore with Object storage.
This step starts and configures MariaDB Enterprise Server and MariaDB Enterprise ColumnStore.
Interactive commands are detailed. Alternatively, the described operations can be performed using automation.
Mandatory system variables and options for Single-Node Enterprise ColumnStore include:
Configure Enterprise ColumnStore S3 Storage Manager to use S3-compatible storage by editing the /etc/columnstore/storagemanager.cnf configuration file:
The S3-compatible object storage options are configured under [S3]:
The bucket option must be set to the name of the bucket that you created in "Create an S3 Bucket".
The endpoint option must be set to the endpoint for the S3-compatible object storage.
The aws_access_key_id and aws_secret_access_key options must be set to the access key ID and secret access key for the S3-compatible object storage.
The local cache options are configured under [Cache]:
The cache_size option is set to 2 GB by default.
The path option is set to /var/lib/columnstore/storagemanager/cache by default.
Ensure that the specified path has sufficient storage space for the specified cache size.
Start and enable the MariaDB Enterprise Server service, so that it starts automatically upon reboot:
Start and enable the MariaDB Enterprise ColumnStore service, so that it starts automatically upon reboot:
Enterprise ColumnStore requires a mandatory utility user account to perform cross-engine joins and similar operations.
Create the user account with the statement:
Grant the user account SELECT privileges on all databases with the statement:
Configure Enterprise ColumnStore to use the utility user:
Set the password:
For details about how to encrypt the password, see "".
Passwords should meet your organization's password policies. If your MariaDB Enterprise Server instance has a password validation plugin installed, then the password should also meet the configured requirements.
The specific steps to configure the security module depend on the operating system.
Configure SELinux for Enterprise ColumnStore:
To configure SELinux, you have to install the packages required for audit2allow. On CentOS 7 and RHEL 7, install the following:
On RHEL 8, install the following:
Allow the system to run under load for a while to generate SELinux audit events.
After the system has taken some load, generate an SELinux policy from the audit events using audit2allow:
If no audit events were found, this will print the following:
If audit events were found, the new SELinux policy can be loaded using semodule:
Set SELinux to enforcing mode by setting SELINUX=enforcing in /etc/selinux/config.
For example, the file will usually look like this after the change:
Set SELinux to enforcing mode:
For information on how to create a profile, see on ubuntu.com.
Navigation in the Single-Node Enterprise ColumnStore topology with Object storage deployment procedure:
This page was step 3 of 5.
To remove a node from Enterprise ColumnStore, perform the following procedure.
The server object for the node must be unlinked from the service using :
Unlink the server object from the service using the unlink service command.
As the first argument, provide the name of the service.
As the second argument, provide the name of the server.
To confirm that the server object was properly unlinked from the service, the service should be checked using :
Show the services using the show services command, like this:
The server object for the node must be unlinked from the monitor using :
Unlink a server object from the monitor using the unlink monitor command.
As the first argument, provide the name of the monitor.
As the second argument, provide the name of the server.
To confirm that the server object was properly unlinked from the monitor, the monitor should be checked using :
Show the monitors using the show monitors command, like this:
The server object for the node must also be removed from MaxScale using :
Use or another supported REST client.
Remove the server object using the destroy server command.
As the first argument, provide the name for the server.
For example:
To confirm that the server object was properly removed, the server objects should be checked using :
Show the server objects using the show servers command, like this:
The Enterprise Server. Enterprise ColumnStore, and CMAPI services can be stopped using the systemctl command.
Perform the following procedure on the node:
Stop the MariaDB Enterprise Server service:
Stop the MariaDB Enterprise ColumnStore service:
Stop the CMAPI service:
The node must be removed from Enterprise ColumnStore using :
Remove the node using the endpoint path.
Use a , such as curl .
Format the JSON output using jq for enhanced readability.
For example, if the primary node's host name is mcs1 and the IP address for the node to remove is 192.0.2.3:
In ES 10.5.10-7 and later:
In ES 10.5.9-6 and earlier:
Example output:
To confirm that the node was properly removed, the status of Enterprise ColumnStore should be checked using :
Check the status using the endpoint path.
For example, if the primary node's host name is mcs1:
Example output:
Learn how to import data into MariaDB ColumnStore. This section covers various methods and tools for efficiently loading large datasets into your columnar database for analytical workloads.
MariaDB Enterprise ColumnStore supports very efficient bulk data loads.
MariaDB Enterprise ColumnStore performs bulk data loads very efficiently using a variety of mechanisms, including the cpimport tool, specialized handling of certain SQL statements, and minimal locking during data import.
MariaDB Enterprise ColumnStore includes a bulk data loading tool called cpimport, which provides several benefits:
Bypasses the SQL layer to decrease overhead
Does not block read queries
Requires a write metadata lock on the table, which can be monitored with the
Appends the new data to the table. While the bulk load is in progress, the newly appended data is temporarily hidden from queries. After the bulk load is complete, the newly appended data is visible to queries.
MariaDB Enterprise ColumnStore enables batch insert mode by default.
When batch insert mode is enabled, MariaDB Enterprise ColumnStore has special handling for the following statements:
[[|load-data-infileLOAD DATA [ LOCAL ] INFILE]]
Enterprise ColumnStore uses the following rules:
If the statement is executed outside of a transaction, Enterprise ColumnStore loads the data using cpimport, which is a command-line utility that is designed to efficiently load data in bulk. It executes cpimport using a wrapper called cpimport.bin.
If the statement is executed inside of a transaction, Enterprise ColumnStore loads the data using the DML interface, which is slower.
Batch insert mode can be disabled by setting the columnstore_use_import_for_batchinsert system variable to OFF. When batch insert mode is disabled, Enterprise ColumnStore executes the statements using the DML interface, which is slower.
MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.
When a bulk data load is running:
Read queries will not be blocked.
Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.
The write metadata lock (MDL) can be monitored with the .
MariaDB Enterprise ColumnStore automatically translates INSERT INTO .. SELECT FROM .. statements into bulk data loads. By default, it translates the statement into a bulk data load that uses cpimport.bin, which is an internal wrapper around the cpimport tool.
You can load data using INSERT INTO .. SELECT FROM .. in the following cases:
You are loading data into a ColumnStore table by querying one or more local tables.
MariaDB Enterprise ColumnStore enables batch insert mode by default.
When batch insert mode is enabled, MariaDB Enterprise ColumnStore has special handling for statements.
Enterprise ColumnStore uses the following rules:
If the statement is executed outside of a transaction, Enterprise ColumnStore loads the data using cpimport, which is a command-line utility that is designed to efficiently load data in bulk. Enterprise ColumnStore executes cpimport using a wrapper called cpimport.bin.
If the statement is executed inside of a transaction, Enterprise ColumnStore loads the data using the DML interface, which is slower.
Batch insert mode can be disabled by setting the columnstore_use_import_for_batchinsert system variable to OFF. When batch insert mode is disabled, Enterprise ColumnStore executes the statements using the DML interface, which is slower.
MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.
When a bulk data load is running:
Read queries will not be blocked.
Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.
The write metadata lock (MDL) can be monitored with the .
Before data can be imported into the tables, the schema must be created.
Connect to the primary server using :
After the command is executed, it will prompt you for a password.
For each database that you are importing, create the database with the statement:
For each table that you are importing, create the table with the statement:
When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read. Appending data reduces the I/O requirements of bulk data loads, so that larger data sets can be loaded very efficiently.
While the bulk load is in progress, the newly appended data is temporarily hidden from queries.
After the bulk load is complete, the newly appended data is visible to queries.
When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read.
The order of data can have a significant effect on performance with Enterprise ColumnStore. If your data is not already sorted, it can be helpful to sort the query results using an ORDER BY clause.
For example:
For additional information, see "".
Before importing a table's data into MariaDB Enterprise ColumnStore, confirm that the field delimiter is not present in the data.
The field delimiter is determined by the columnstore_import_for_batchinsert_delimiter system variable. By default, Enterprise ColumnStore sets the field delimiter to the ASCII value 7, which corresponds to the BEL character.
To use a different delimiter, you can set the field delimiter.
When the data is passed to cpimport, each value is separated by a field delimiter. The field delimiter is determined by the columnstore_import_for_batchinsert_delimiter system variable.
By default, Enterprise ColumnStore sets the field delimiter to the ASCII value 7, which corresponds to the BEL character. In general, setting the field delimiter is only required if your data contains this value.
Set the field delimiter by setting the columnstore_import_for_batchinsert_delimiter system variable to the ASCII value for the desired delimiter character.
For example, if you want to use a comma (,) as the field delimiter, you would set columnstore_import_for_batchinsert_delimiter to 44:
When the data is passed to cpimport, each value is enclosed by a quote character. The quote character is determined by the columnstore_import_for_batchinsert_enclosed_by system variable.
By default, Enterprise ColumnStore sets the quote character to the ASCII value 17, which corresponds to the DC1 character. In general, setting the quote character is only required if your data contains this value.
Set the quote character by setting the columnstore_import_for_batchinsert_enclosed_by system variable to the ASCII value for the desired quote character.
For example, if you want to use a double quote (") as the quote character, you would set columnstore_import_for_batchinsert_enclosed_by to 34:
Step 1: Prepare ColumnStore Nodes
MariaDB ColumnStore automatically creates logical horizontal partitions across every column. For ordered or semi-ordered data fields such as an order date this will result in a highly effective partitioning scheme based on that column. This allows for increased performance of queries filtering on that column since partition elimination can be performed. It also allows for data lifecycle management as data can be disabled or dropped by partition cheaply. Caution should be used when disabling or dropping partitions as these commands are destructive.
It is important to understand that a Partition in ColumnStore terms is actually 2 extents (16 million rows) and that extents & partitions are created according to the following algorithm in 1.0.x:
The ColumnStore engine does not fully support recursive Common Table Expressions (CTEs). Attempting to use recursive CTEs directly against ColumnStore tables typically results in an error.
The purpose of the following examples is to demonstrate three potential workarounds for this issue. The best fit for your organization will depend on your specific needs and ability to refactor queries and adjust your approach.
It simulates a simple organizational chart with employees and managers to illustrate the problem and the workarounds.
First, an InnoDB table for comparison:
MariaDB Query Accelerator is an Alpha release. Do not use it in production environments. Query Accelerator works only in ColumnStore 25.10.0 and with MariaDB Enterprise Server 11.8.3+.
Query Accelerator allows MariaDB to use ColumnStore to execute queries that are otherwise executed by InnoDB. Under the hood, Columnstore:
sudo systemctl start mariadb
sudo systemctl enable mariadb
sudo systemctl start mariadb-columnstore-cmapi
sudo systemctl enable mariadb-columnstore-cmapisudo mcs cluster set api-key --key <your-api-key-here>sudo mcs node add --node <private-ip-of-rw-node>sudo mcs node add --read-replica --node <private-ip-of-replica>sudo mcs cluster statussudo mcs node add --read-replica --node <private-ip>sudo mcs node remove --node <private-ip>sudo mcs cluster statuswget https://dlm.mariadb.com/enterprise-release-helpers/mariadb_es_repo_setup ;
chmod +x mariadb_es_repo_setup;
./mariadb_es_repo_setup --token="xxxxx" --apply --mariadb-server-version="11.4"sudo dnf install -y \
MariaDB-server MariaDB-columnstore-engine MariaDB-columnstore-cmapisudo apt update
sudo apt install -y mariadb-server mariadb-plugin-columnstore mariadb-columnstore-cmapi$ sudo yum install --enablerepo=PowerTools glusterfs-server$ sudo yum install centos-release-gluster
$ sudo yum install glusterfs-server$ wget -O - https://download.gluster.org/pub/gluster/glusterfs/LATEST/rsa.pub | apt-key add -
$ DEBID=$(grep 'VERSION_ID=' /etc/os-release | cut -d '=' -f 2 | tr -d '"')
$ DEBVER=$(grep 'VERSION=' /etc/os-release | grep -Eo '[a-z]+')
$ DEBARCH=$(dpkg --print-architecture)
$ echo deb https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/${DEBID}/${DEBARCH}/apt ${DEBVER} main > /etc/apt/sources.list.d/gluster.list$ sudo apt update$ sudo apt install glusterfs-server$ sudo apt update$ sudo apt install glusterfs-server$ sudo systemctl start glusterd$ sudo systemctl enable glusterd$ sudo gluster peer probe mcs2$ sudo gluster peer probe mcs3$ sudo gluster peer probe mcs1peer probe: Host mcs1 port 24007 already in peer list$ sudo gluster peer statusNumber of Peers: 2
Hostname: mcs2
Uuid: 3c8a5c79-22de-45df-9034-8ae624b7b23e
State: Peer in Cluster (Connected)
Hostname: mcs3
Uuid: 862af7b2-bb5e-4b1c-8311-630fa32ed451
State: Peer in Cluster (Connected)$ sudo mkdir -p /brick/storagemanager$ sudo gluster volume create storagemanager \
replica 3 \
mcs1:/brick/storagemanager \
mcs2:/brick/storagemanager \
mcs3:/brick/storagemanager \
force$ sudo gluster volume start storagemanager$ sudo mkdir -p /var/lib/columnstore/storagemanager127.0.0.1:storagemanager /var/lib/columnstore/storagemanager glusterfs defaults,_netdev 0 0$ sudo mount -a$ sudo yum install --enablerepo=PowerTools glusterfs-server$ sudo yum install centos-release-gluster
$ sudo yum install glusterfs-server$ wget -O - https://download.gluster.org/pub/gluster/glusterfs/LATEST/rsa.pub | apt-key add -
$ DEBID=$(grep 'VERSION_ID=' /etc/os-release | cut -d '=' -f 2 | tr -d '"')
$ DEBVER=$(grep 'VERSION=' /etc/os-release | grep -Eo '[a-z]+')
$ DEBARCH=$(dpkg --print-architecture)
$ echo deb https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/${DEBID}/${DEBARCH}/apt ${DEBVER} main > /etc/apt/sources.list.d/gluster.list$ sudo apt update$ sudo apt install glusterfs-server$ sudo apt update$ sudo apt install glusterfs-server$ sudo systemctl start glusterd$ sudo systemctl enable glusterd$ sudo gluster peer probe mcs2$ sudo gluster peer probe mcs3$ sudo gluster peer probe mcs1
peer probe: Host mcs1 port 24007 already in peer list$ sudo gluster peer statusHostname: mcs2
Uuid: 3c8a5c79-22de-45df-9034-8ae624b7b23e
State: Peer in Cluster (Connected)
Hostname: mcs3
Uuid: 862af7b2-bb5e-4b1c-8311-630fa32ed451
State: Peer in Cluster (Connected)$ sudo mkdir -p /brick/storagemanager$ sudo gluster volume create storagemanager \
replica 3 \
mcs1:/brick/storagemanager \
mcs2:/brick/storagemanager \
mcs3:/brick/storagemanager \
force$ sudo gluster volume start storagemanager$ sudo mkdir -p /var/lib/columnstore/storagemanager127.0.0.1:storagemanager /var/lib/columnstore/storagemanager glusterfs defaults,_netdev 0 0$ sudo mount -a# minimize swapping
vm.swappiness = 1
# Increase the TCP max buffer size
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# Increase the TCP buffer limits
# min, default, and max number of bytes to use
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1
# for 1 GigE, increase this to 2500
# for 10 GigE, increase this to 30000
net.core.netdev_max_backlog = 2500$ sudo sysctl --load=/etc/sysctl.d/90-mariadb-enterprise-columnstore.conf$ sudo setenforce permissive# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=permissive
# SELINUXTYPE= can take one of three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targetedsudo getenforce
Permissive$ sudo systemctl disable apparmor$ sudo aa-statusapparmor module is loaded.
0 profiles are loaded.
0 profiles are in enforce mode.
0 profiles are in complain mode.
0 processes have profiles defined.
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.$ sudo yum install glibc-locale-source glibc-langpack-en$ sudo localedef -i en_US -f UTF-8 en_US.UTF-8vm.overcommit_memory=1
vm.dirty_background_ratio=5
vm.dirty_ratio=10
vm.vfs_cache_pressure=50
net.core.netdev_max_backlog=2500
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.tcp_timestamps=0sudo sysctl -pcat /proc/sys/kernel/threads-max
cat /proc/sys/kernel/pid_max
cat /proc/sys/vm/max_map_count
# Rhel /etc/sysctl.conf
sudo echo "vm.max_map_count=4262144" >> /etc/sysctl.conf
sudo echo "kernel.pid_max = 4194304" >> /etc/sysctl.conf
sudo echo "kernel.threads-max = 2000000" >> /etc/sysctl.conf
# There may be a file called 50-pid-max.conf or perhaps something similar. If so, modify it
sudo echo "vm.max_map_count=4262144" > /usr/lib/sysctl.d/50-max_map_count.conf
sudo echo "kernel.pid_max = 4194304" > /usr/lib/sysctl.d/50-pid-max.conf
sudo sysctl -p