ColumnStoreストレージアーキテクチャー

You are viewing an old version of this article. View the current version here.

ストレージアーキテクチャー

MariaDB ColumnStoreにてテーブルが作られる際、カラムごとに少なくとも1つのファイルが作成されます。例えば、3カラムのテーブルが作成される場合には、最低3個のファイルがSANやディスクに作成されます。 datastorage-diagram

  • 各カラムはエクステントと呼ばれる800万行のファイルとして保存されます。1byteのデータ型のエクステントは8MBとなり、2byteなら16MB、4byteなら32MB、8byteなら64MB、また可変データ型の場合も、64MBとなります。Extentがいっぱいになると、新しいエクステントが自動生成されます。
  • エクステントは一連のブロックとして物理的に保存されます。
  • ブロックは8KBです。全てのデータベースブロックは、論理ブロック識別子(LBID)により一意に識別されます。
  • 8文字以上のStringカラムはメインのカラムファイルにインデックスを保存し、実際の値は別の辞書ファイルに格納されます。
  • Segmentファイルはカラムのデータを保持するディスク上の物理ファイルです。セグメントファイルがエクステントの最大数に達すると、新しいセグメントファイルが自動的に作られます。セグメントファイル内の最大エクステント数は、ColumnStore.xmlファイル内のExtentsPerSegmentFileに記載されており、Db Rootsの倍数を設定する必要があります。デフォルト値は4です。
  • 集合的に、1つ以上のエクステントに相当する全てのカラムセグメントファイルはパーティションとなります。これはカラムストアにおける水平パーティショニングです。
  • パーティションはフォルダなどのような階層構造に保存されます。
  • MariaDB ColumnStoreメタストアは、パーティションに使用される情報と同様に、DBスキーマへのファイル構造・ファイル一をマップします。パーティションごとの最大ファイル数は、ColumnStore.xmlファイルのFilesPerColumnPartitionに記載しており、デフォルト時は2です。
  • デフォルト値は2です。

. The maximum number of files per partition is configured as FilesPerColumnPartition in ColumnStore.xml. Default value is 2.

  • By default the data is compressed.

Extent Map

MariaDB ColumnStore utilizes a smart structure known as the Extent Map which provides logical range partitioning and removes the need for indexing, manual table partitioning, materialized views, summary tables, and other structures and objects that row-based databases must implement for query performance.

An extent is a logical block of space that exists within a physical segment file, and is anywhere from 8-64MB in size. Each extent supports the same number of rows, with smaller data types using less space on disk.

The Extent Map catalogs all extents and their corresponding blocks (LBID’s) along with minimum and maximum values for a column’s data within an extent.

A master copy of the Extent Map exists on the primary Performance Module. Upon system startup, the file is read into memory and then physically copied to all other participating User and Performance Modules for disaster recovery and failover purposes. All nodes keep the Extent Map resident in memory for quick access. As extents are modified, updates are broadcast to all participating nodes as well.

How Extent Map works

Extent Map provides the ability to do logical range partitioning and the ability to only retrieve the blocks needed to satisfy a query. This is accomplished via what is called "extent elimination" in MariaDB ColumnStore. 'Extent Elimination' is the process of eliminating those extents for the purpose of I/O operation that do not meet the join and filter condition of a query.

Extent elimination is first accomplished in MariaDB ColumnStore via only scanning columns in join and filter conditions. Then the logical horizontal partitioning information of each extent along with the minimum and maximum value for the column is used for further elimination. To eliminate an extent when a column scan involves a filter, that filter value is compared to the minimum and maximum values stored in each extent for the column: If the filter value is outside of a extent's min-max value range, the extent is eliminated.

This automatic extent elimination behavior is well suited for series, ordered, patterned, or time-based data where data is loaded frequently and often referenced by time. Any column with clustered values is a good candidate for extent elimination.

Example:

extent-elimination

Compression with real-time decompression

Columnar storage offers excellent compressibility because similar data is stored within each column file. Most data sets will show excellent compression rates, saving between 65% and 95% of space, however the actual space savings depends on the randomness of the data and the number of distinct values that exist.

MariaDB ColumnStore's compression strategy is optimized for read performance from disk while still offering excellent compression. It is tuned to accelerate the decompression rate, maximizing the performance benefits when reading from disk. This allows for systems that are I/O bound when reading from disk to improve performance.

Compression mode

Compression is on by default, but can be turned on or off at the table level or the column level and in addition can be controlled at a session level by setting a variable: infinidb_compression_type. When turned on, MariaDB ColumnStore uses snappy compression.

Version Buffer

MariaDB ColumnStore utilizes a structure termed a Version Buffer to store the disk blocks which are being modified. It is also used to manage rollback activities for transactions as well as service the MVCC (multi-version concurrency control) or “snapshot read” function of the database so that it can offer a query consistent view of the database. All statements in MariaDB ColumnStore run at a particular version (or snapshot) of the database, which the system refers to as the System Change Number (SCN). Although it is called the Version Buffer, it is composed of both memory and disk structures.

How the Version Buffer files work

The Version Buffer utilizes in-memory hash tables to supply memory access to in-flight transaction information. The initial size upon startup is 4MB with the memory region growing from that amount to handle blocks that are being modified by a transaction. Each entry in the hash table is a 40-byte reference to an 8K block that is being modified.

The number of rows being updated is not the limiting factor for the Version Buffer, rather the number of disk blocks that are being updated. The size can be increased but caution should be used since updating more disk blocks can allow the update/delete statements to run for long periods of time and if a problem is encountered a rollback would also take a long period of time.

The Version Buffer files default to a 1GB-sized file on each DBRoot, which is configurable via the VersionBufferFileSize parameter. The Version Buffer files are spread across each DBRoot in the system.

NOTE: While MariaDB ColumnStore on HDFS has not been tested in the current release, it inherits this behavior from InfiniDB when configured to run as a Hadoop query engine over HDFS. In this configuration the MVCC function (and hence usage of the Version Buffer files) is disabled. HDFS is a write-only file system and hence block-level versioning in the MVCC model is not operationally practical. MariaDB ColumnStore supports statement level tracking and rollback capability for DML operations when running over HDFS. Queries over HDFS operate in more of an “eventual consistency” mode where queries will converge to the proper result at a point in time when updates are completed, but queries issued concurrently with updates will use whatever block happens to be current at the time the block-level execution occurs.

Transaction Log

MariaDB ColumnStore supports logging committed transactions to MariaDB Server's binary log.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.