MariaDB Enterprise ColumnStore Storage Architecture
This page is part of MariaDB's MariaDB Documentation.
The parent of this page is: MariaDB Enterprise ColumnStore Architecture
Topics on this page:
MariaDB Enterprise ColumnStore's storage architecture is designed to provide great performance for analytical queries.
Columnar Storage Engine
MariaDB Enterprise ColumnStore is a columnar storage engine for MariaDB Enterprise Server. MariaDB Enterprise ColumnStore enables ES to perform analytical workloads, including online analytical processing (OLAP), data warehousing, decision support systems (DSS), and hybrid transactional-analytical processing (HTAP) workloads.
Most traditional relational databases use row-based storage engines. In row-based storage engines, all columns for a table are stored contiguously. Row-based storage engines perform very well for transactional workloads, but are less performant for analytical workloads.
Columnar storage engines store each column separately. Columnar storage engines perform very well for analytical workloads. Analytical workloads are characterized by ad hoc queries on very large data sets by relatively few users.
MariaDB Enterprise ColumnStore automatically partitions each column into extents, which helps improve query performance without using indexes.
MariaDB Enterprise ColumnStore enables MariaDB Enterprise Server to perform analytical or online analytical processing (OLAP) workloads.
OLAP workloads are generally characterized by ad hoc queries on very large data sets. Some other typical characteristics are:
Each query typically reads a subset of columns in the table
Most activity typically consists of read-only queries that perform aggregations, window functions, and various calculations
Analytical applications typically require only a few concurrent queries
Analytical applications typically require scalability of large, complex queries
Analytical applications typically require efficient bulk loads of new data
OLAP workloads are typically required for:
Business intelligence (BI)
Historical data mining
Row-based storage engines have a disadvantage for OLAP workloads. Indexes are not usually very useful for OLAP workloads, because the large size of the data set and the ad hoc nature of the queries preclude the use of indexes to optimize queries.
Columnar storage engines are much better suited for OLAP workloads. MariaDB Enterprise ColumnStore is a columnar storage engine that is designed for OLAP workloads:
When a query reads a subset of columns in the table, Enterprise ColumnStore is able to reduce I/O by reading those columns and ignoring all others, because each column is stored separately
When most activity consists of read-only queries that perform aggregations, window functions, and various calculations, Enterprise ColumnStore is able to efficiently execute those queries using extent elimination, distributed query execution, and massively parallel processing (MPP) techniques
When only a few concurrent queries are required, Enterprise ColumnStore is able to maximize the use of system resources by using multiple threads and multiple nodes to perform work for each query
When scalability of large, complex queries is required, Enterprise ColumnStore is able to achieve horizontal and vertical scalability using distributed query execution and massively parallel processing (MPP) techniques
When efficient bulk loads of new data is required, Enterprise ColumnStore is able to bulk load new data without affecting existing data using automatic partitioning with the extent map
MariaDB Enterprise Server has had excellent performance for transactional or online transactional processing (OLTP) workloads since the beginning.
OLTP workloads are generally characterized by a fixed set of queries using a relatively small data set. Some other typical characteristics are:
Each query typically reads and/or writes many columns in the table
Most activity typically consists of small transactions that only read and/or write a small number of rows
Transactional applications typically require many concurrent transactions
Transactional applications typically require a fast response time and low latency
Transactional applications typically require ACID properties to protect data
OLTP workloads are typically required for:
Financial transactions performed by financial institutions and e-commerce sites
Store inventory changes performed by brick-and-mortar stores and e-commerce sites
Account metadata changes performed by many sites that stores personal data
Row-based storage engines have several advantages for OLTP workloads:
When a query reads and/or writes many columns in the table, row-based storage engines can find all columns on a single page, so the I/O costs of the operation are low
When a transaction reads/writes a small number of rows, row-based storage engines can use an index to find the page for each row without a full table scan
When many concurrent transactions are operating, row-based storage engines can implement transactional isolation by storing multiple versions of changed rows
When a fast response time and low latency are required, row-based storage engines can use indexes to optimize the most common queries
When ACID properties are required, row-based storage engines can implement consistency and durability with fewer performance trade-offs, since each row's columns are stored contiguously
InnoDB is ES's default storage engine, and it is a highly performant row-based storage engine.
MariaDB Enterprise ColumnStore enables MariaDB Enterprise Server to function as a single-stack solution for Hybrid transactional-analytical processing (HTAP) workloads.
Hybrid workloads are characterized by a mix of transactional and analytical queries. Hybrid workloads are also known as "Smart Transactions", "Augmented Transactions" "Translytical", or "Hybrid Operational-Analytical Processing (HOAP)".
Hybrid workloads are typically required for applications that require real-time analytics that lead to immediate action:
Financial institutions use transactional queries to handle financial transactions and analytical queries to analyze the transactions for business intelligence
Insurance companies use transactional queries to accept/process claims and analytical queries to analyze those claims for business opportunities or risks
Health providers use transactional queries to track electronic health records (EHR) and analytical queries to analyze the EHRs to discover health trends or prevent adverse drug interactions
MariaDB Enterprise Server provides multiple components to perform hybrid workloads:
For analytical queries, the Enterprise ColumnStore storage engine can be used.
For transactional queries, row-based storage engines, such as InnoDB, can be used.
For queries that reference both analytical and transactional data, ES's cross-engine join functionality can be used to join Enterprise ColumnStore tables with InnoDB tables.
MariaDB MaxScale is a high-performance database proxy, which can dynamically route analytical queries to Enterprise ColumnStore and transactional queries to the transactional storage engine.
MariaDB Enterprise ColumnStore supports multiple storage types:
Non-Shared Local Storage
S3-Compatible Object Storage
MariaDB Enterprise ColumnStore supports S3-compatible object storage.
S3-compatible object storage is optional, but highly recommended. If S3-compatible object storage is used, Enterprise ColumnStore requires the Storage Manager directory to use shared local storage (such as NFS) for high availability.
S3-compatible object storage is:
Compatible: Many object storage services are compatible with the Amazon S3 API.
Economical: S3-compatible object storage is often very low cost.
Flexible: S3-compatible object storage is available for both cloud and on-premises deployments.
Limitless: S3-compatible object storage is often virtually limitless.
Resilient: S3-compatible object storage is often low maintenance and highly available, since many services use resilient cloud infrastructure.
Scalable: S3-compatible object storage is often highly optimized for read and write scaling.
Secure: S3-compatible object storage is often encrypted-at-rest.
Many S3-compatible object storage services exist. MariaDB Corporation cannot make guarantees about all S3-compatible object storage services, because different services provide different functionality.
If you have any questions about using specific S3-compatible object storage with MariaDB Enterprise ColumnStore, contact us.
MariaDB Enterprise ColumnStore can use any object store that is compatible with the Amazon S3 API.
Many object storage services are compatible with the Amazon S3 API, and compatible object storage services are available for cloud deployments and on-premises deployments, so vendor lock-in is not a concern.
MariaDB Enterprise ColumnStore's Storage Manager enables remote S3-compatible object storage to be efficiently used. The Storage Manager uses a persistent local disk cache for read/write operations, so that network latency has minimal performance impact on Enterprise ColumnStore. In some cases, it will even perform better than local disk operations.
Enterprise ColumnStore only uses the Storage Manager when S3-compatible storage is used for data.
Storage Manager is configured using storagemanager.cnf.
Storage Manager Directory
MariaDB Enterprise ColumnStore's Storage Manager directory is at the following path by default:
To enable high availability when S3-compatible object storage is used, the Storage Manager directory should use shared local storage and be mounted on every ColumnStore node.
Configure the S3 Storage Manager
When you want to use S3-compatible storage for Enterprise ColumnStore, you must configure Enterprise ColumnStore's S3 Storage Manager to use S3-compatible storage.
To configure Enterprise ColumnStore to use S3-compatible storage, edit
[ObjectStorage] … service = S3 … [S3] bucket = your_columnstore_bucket_name endpoint = your_s3_endpoint aws_access_key_id = your_s3_access_key_id aws_secret_access_key = your_s3_secret_key # iam_role_name = your_iam_role # sts_region = your_sts_region # sts_endpoint = your_sts_endpoint # ec2_iam_mode=enabled # port_number = your_port_number [Cache] cache_size = your_local_cache_size path = your_local_cache_path
The S3-compatible object storage options are configured under
bucketoption must be set to the name of the bucket.
endpointoption must be set to the endpoint for the S3-compatible object storage.
aws_secret_access_keyoptions must be set to the access key ID and secret access key for the S3-compatible object storage.
To use a specific IAM role, you must uncomment and set
To use the IAM role assigned to an EC2 instance, you must uncomment
To use a non-default port number, you must set
port_numberto the desired port.
The local cache options are configured under
cache_sizeoption is set to 2 GB by default.
pathoption is set to
Ensure that the specified path has sufficient storage space for the specified cache size.
Recommended Storage Options
For best results, MariaDB Corporation would recommend the following storage options:
Shared Local Storage
For Storage Manager
Amazon S3 storage
Google Object Storage (S3-compatible)
Any S3-compatible object storage
MariaDB Enterprise ColumnStore's storage format is optimized for analytical queries.
DB Root Directories
MariaDB Enterprise ColumnStore stores data in DB Root directories when S3-compatible object storage is not configured.
In multi-node Enterprise ColumnStore, each node has its own DB Root directory.
The DB Root directories are at the following path by default:
dataN represents a range of integers that starts at
1 and stops at the number of nodes in the deployment. For example, with a 3-node Enterprise ColumnStore deployment, this would refer to the following directories:
To enable high availability for the DB Root directories, each directory should be mounted on every ColumnStore node using shared local storage.
Each column in a table is stored in units called extents.
By default, each extent contains the column values for 8 million rows. The physical size of each extent can range from 8 MB to 64 MB. When an extent reaches the maximum number of column values, Enterprise ColumnStore creates a new extent.
Each extent is stored in 8 KB blocks, and each block has a logical block identifier (LBID).
If a string column is longer than 8 characters, the value is stored in a separate dictionary file, and a pointer to the value is stored in the extent.
A segment file is used to store Enterprise ColumnStore data within a DB Root directory.
A segment file always contains two extents. When a segment file reaches its maximum size, Enterprise ColumnStore creates a new segment file.
The relevant configuration options are:
Configures the maximum number of extents that can be stored in each segment file.
Default value is 2.
For example, to configure Enterprise ColumnStore to store more extents in each segment file using the mcsSetConfig utility:$ mcsSetConfig ExtentMap ExtentsPerSegmentFile 4
Enterprise ColumnStore automatically groups a column's segment files into column partitions.
On disk, each column partition is represented by a directory in the DB Root. The directory contains the segment files for the column partition.
By default, a column partition can contain four segment files, but you can configure Enterprise ColumnStore to store more segment files in each column partition. When a column partition reaches the maximum number of segment files, Enterprise ColumnStore creates a new column partition.
The relevant configuration options are:
Configures the maximum number of segment files that can be stored in each column partition.
Default value is 4.
For example, to configure Enterprise ColumnStore to store more segment files in each column partition using the mcsSetConfig utility:$ mcsSetConfig ExtentMap FilesPerColumnPartition 8
Enterprise ColumnStore maintains an Extent Map to determine which values are located in each extent.
The Extent Map identifies each extent using its logical block identifier (LBID) values, and it maintains the minimum and maximum values within each extent.
The Extent Map is used to implement a performance optimization called Extent Elimination.
The primary node has a master copy of the Extent Map. When Enterprise ColumnStore is started, the primary node copies the Extent Map to the replica nodes.
While Enterprise ColumnStore is running, each node maintains a copy of the Extent Map in its main memory, so that it can be accessed quickly without additional I/O.
If the Extent Map gets corrupted, the
mcsRebuildEM utility can rebuild the Extent Map from the contents of the database file system. The
mcsRebuildEM utility is available starting in MariaDB Enterprise ColumnStore 6.2.2.
Enterprise ColumnStore automatically compresses all data on disk using either snappy or LZ4 compression. See the columnstore_
Since Enterprise ColumnStore stores a single column's data in each segment file, the data in each segment file tends to be very similar. The similar data usually allows for excellent compressibility. However, the specific data compression ratio will depend on a lot of factors, such as the randomness of the data and the number of distinct values.
Enterprise ColumnStore's compression strategy is tuned to optimize the performance of I/O-bound queries, because the decompression rate is optimized to maximize read performance.
For additional information, see "Configure Compression".
Enterprise ColumnStore uses the version buffer to store blocks that are being modified.
The version buffer is used for multiple tasks:
It is used to rollback a transaction.
It is used for multi-version concurrency control (MVCC). With MVCC, Enterprise ColumnStore can implement read snapshots, which allows a statement to have a consistent view of the database, even if some of the underlying rows have changed. The snapshot for a given statement is identified by the system change number (SCN).
The version buffer is split between data structures that are in-memory and on-disk.
The in-memory data structures are hash tables that keep track of in-flight transaction. The hash tables store the LBIDs for each block that is being modified by a transaction. The in-memory hash tables start at 4 MB, and they grow as-needed. The size of the hash tables increases as the number of modified blocks increases.
An on-disk version buffer file is stored in each DB Root. By default, the on-disk version buffer file is 1 GB, but you can configure Enterprise ColumnStore to use a different file size. The relevant configuration options are:
Configures the size of the on-disk version buffer in each DB Root.
Default value is 1 GB.
For example, to configure Enterprise ColumnStore to use a larger on-disk version buffer file using the mcsSetConfig utility:$ mcsSetConfig VersionBuffer VersionBufferFileSize 2GB