Architectural Overview
Overview
MariaDB ColumnStore is a collumnar storage engine designed for distributed massively parallel processing. It consists of a number of different components, working to gather. These components include:
- User Module:
- The User Module is made up of the front end MariaDB Server instance and a number of processes specific to MariaDB ColumnStore that handle concurrency scaling. The storage engine plugin for MariaDB ColumnStore, hands over query to one of these process which then then further breaks down SQL requests and distributing the various parts to one or more Performance Modules to process the query. Finally, the User Module assembles all the query results from the various participating Performance Modules to form the complete query result set that is returned to the user.
- Performance Module:
- The Performance Module is responsible for storing, retrieving, and managing data, processing block requests for query operations, and passing it back to the User Module(s) to finalize query requests. The Performance Module selects data from disk and caches it in a shared nothing data cache that is part of the server on which the Performance Module resides. MPP is accomplished by allowing the user to configure as many Performance Modules as they would like; each additional Performance Module adds more cache to the overall database as well as more processing power.
- Storage:
- MariaDB ColumnStore is extremely flexible with respect to the storage system. When running on premise, it can use either local storage or shared storage (e.g. SAN) to store data. In the Amazon EC2 environment, it can use ephemeral or Elastic Block Store (EBS) volumes. When data redundancy is required for a shared-nothing deployment, it is built to integrate with GlusterFS and the Apache Hadoop Distributed File System (HDFS). In the first alpha release only local storage, shared storage and EBS configuration have been tested.
User Module
User Module manages and controls the operation of end user queries. It maintains the state of each query, issues requests to one or more Performance Modules to perform work on the behalf of a query, and performs end resolution of a query by aggregating the various result sets from all participating Performance Modules into one that is ultimately returned to the end user.
The User Module contains several processes:
- The MariaDB Server process: mysqld
- This mysqld is the modified MariaDB Server binaries that comes specifically with MariaDB ColumnStore.
- It does the same normal tasks as MariaDB Server - Connection validation, Parsing of SQL statements, General SQL plan generation, Final result set distribution activities.
- Additional it converts the MariaDB Server query plan into MariaDB ColumnStore query plan format. The MariaDB ColumnStore query plan format is still essentially a parse tree, but execution hints from the optimizer are added to assist the User Module in converting the parse tree to a Job List.
- Execution Manager: ExeMgr
- ExeMgr listens on a TCP/IP port for query parse trees from the mysqld. ExeMgr is responsible for converting the query parse tree into a job list, which is a construct in MariaDB ColumnStore that represent the sequence of instructions necessary to answer the query. ExeMgr walks the query parse tree and iteratively generates job steps, optimizing and re-optimizing the job list as it goes. The major categories of job steps are application of a column filter, processing a table join, and projection of returned columns. Each operation in a query plan is executed in parallel by the job list itself and has the capability of running entirely on the User Module, entirely on the Performance Module or in some combination. Each node uses the Extent Map to determine which Performance Modules to send work orders to (see the later section on the Extent
- DML, DDL and import distribution managers (DMLProc, DDLProc and cpimport).
- DMLProc and DDLProc distribute DML and DDL to the appropriate Performance Module. Cpimport, when run on the User Module, distributes source files to the Performance Modules.
- The Process Manager: ProcMgr
- The ProcMgr is responsible for starting, monitoring and re-starting all MariaDB ColumnStore processes. It utilizes another process, called Process Monitor (ProcMon) on each machine to keep track of MariaDB ColumnStore processes.
The User Module performs these core functions for MariaDB ColumnStore :
- Transform MariaDB plan into an ColumnStore Job List
- Perform InfiniDB OID (object ID) lookups from the MariaDB ColumnStore system catalog
- Inspects the Extent Map (described in detail later in this document) to reduce I/O, which is accomplished via the elimination of unnecessary extents
- Issues instructions (sometimes referred to as ‘primitive operations’) to the Performance Modules
- Executes hash joins as needed (depending on size of smaller table in the join). Helps manage distributed hash joins by sending any hash maps needing processing to Performance Modules
- Execute cross-table-scope functions and expressions that happen after a hash join
- Receives data from the Performance Modules and re-transmits back to them if needed
- Executes follow-up steps for all aggregation and distinct processing
- Return data back to the MySQL interface The primary job of the User Module is to handle concurrency scaling. It never directly touches database files and does not require visibility to them. It uses a machine’s RAM in a transitory manner to assemble partial query results into a complete answer that is ultimately returned to a user.
Performance Module
The Performance Module is responsible for performing I/O operations in support of query and write processing. It receives its instructions from a User Module with respect to the work that it does. A Performance Module doesn’t see the query itself, but only the set of instructions given it by a User Module. The Performance Module delivers three critical behaviors key to scaling out database behavior: distributed scans, distributed hash joins, and distributed aggregation. The combination of these three behaviors enables true MPP behavior for query intensive environments.
Query Processing
- The PrimProc handles query execution. The instructions sent by User module are received and executed by PrimProc as block oriented I/O operations to perform predicate filtering, join processing, initial aggregation of data. And then PrimProc sends data back to the User Module.
Shared Nothing Data Cache
All Performance Modules utilize a shared nothing data cache . When data is first accessed a Performance Module acts upon the amount of data that it has been instructed to by a User Module and caches them in an LRU-based cache for subsequent access. On dedicated servers running Performance Module, the majority of the box’s RAM can be dedicated to a Performance Module’s data cache. As Performance Module cache is a shared nothing design
- There is no data block pinging between participating Performance Module nodes as sometimes occurs in other multi-instance/ shared disk database systems.
- As more Performance Module nodes are added to a system, the overall cache size for the database is greatly increased
Load and Write Processing
A Performance Module node is given the task of performing loads and writes to the underlying persistent storage. There are two processes for handling write operations on Performance Module.
- WriteEngineServer: WriteEngineServer is responsible for coordinating DML, DDL and imports on each Performance Module. DDL changes are persisted within the MariaDB ColumnStore System Catalog which keeps track of all ColumnStore metadata.
- cpimport: This performs the database file updates, when bulk data is loaded. cpimport is aware of which module It is running on and, when running on the Performance Module, handles the actual updates of the database disk files. In this manner, MariaDB ColumnStore supports fully parallel load capabilities.
Storage