All pages
Powered by GitBook
1 of 13

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Sample storagemanager.cnf

# Sample storagemanager.cnf

[ObjectStorage]
service = S3
object_size = 5M
metadata_path = /var/lib/columnstore/storagemanager/metadata
journal_path = /var/lib/columnstore/storagemanager/journal
max_concurrent_downloads = 21
max_concurrent_uploads = 21
common_prefix_depth = 3

[S3]
region = us-west-1
bucket = my_columnstore_bucket
endpoint = s3.amazonaws.com
aws_access_key_id = AKIAR6P77BUKULIDIL55
aws_secret_access_key = F38aR4eLrgNSWPAKFDJLDAcax0gZ3kYblU79

[LocalStorage]
path = /var/lib/columnstore/storagemanager/fake-cloud
fake_latency = n
max_latency = 50000

[Cache]
cache_size = 2g
path = /var/lib/columnstore/storagemanager/cache

Note: A region is required even when using an on-premises solution like ActiveScale due to header expectations within the API.

Certified S3 Object Storage Providers

Hardware (On Premises)

  • (Formerly known as CleverSafe)

StorageManager

The ColumnStore StorageManager manages columnar data storage and retrieval, optimizing analytical queries.

DELL EMC

Cloud (IaaS)

  • AWS S3

  • Google GCS

Software-Based

Due to the frequent code changes and deviation from the AWS standards, none are approved at this time.

Quantum ActiveScale
IBM Cloud Object Storage

Using StorageManager With IAM Role

AWS IAM Role Configuration

From Columnstore 5.5.2, you can use AWS IAM roles in order to connect to S3 buckets without explicitly entering credentials into the storagemanager.cnf config file.

You need to modify the IAM role of your Amazon EC2 instance to allow for this. Please follow the AWS documentation before beginning this process.

It is important to note that you must update the AWS S3 endpoint based on your chosen region; otherwise, you might face delays in propagation as discussed here and here.

For a complete list of AWS service endpoints, visit the AWS reference guide.

Sample Configuration

Edit your Storage Manager configuration file located at /etc/columnstore/storagemanager.cnf in order to look similar to the example below (replacing those in the [S3] section with your own custom variables):

Note: This is an AWS only feature. For other deployment methods, see the example .

Clients & Tools

MariaDB ColumnStore supports standard MariaDB tools, BI connectors (e.g., Tableau, Power BI), data ingestion (cpimport, Kafka), and REST APIs for admin.

here

Data Import

Learn how to import data into MariaDB ColumnStore. This section covers various methods and tools for efficiently loading large datasets into your columnar database for analytical workloads.

Overview

MariaDB Enterprise ColumnStore supports very efficient bulk data loads.

MariaDB Enterprise ColumnStore performs bulk data loads very efficiently using a variety of mechanisms, including the cpimport tool, specialized handling of certain SQL statements, and minimal locking during data import.

cpimport

MariaDB Enterprise ColumnStore includes a bulk data loading tool called cpimport, which provides several benefits:

  • Bypasses the SQL layer to decrease overhead

  • Does not block read queries

  • Requires a write metadata lock on the table, which can be monitored with the

  • Appends the new data to the table. While the bulk load is in progress, the newly appended data is temporarily hidden from queries. After the bulk load is complete, the newly appended data is visible to queries.

Batch Insert Mode

MariaDB Enterprise ColumnStore enables batch insert mode by default.

When batch insert mode is enabled, MariaDB Enterprise ColumnStore has special handling for the following statements:

  • [[|load-data-infileLOAD DATA [ LOCAL ] INFILE]]

Enterprise ColumnStore uses the following rules:

  • If the statement is executed outside of a transaction, Enterprise ColumnStore loads the data using cpimport, which is a command-line utility that is designed to efficiently load data in bulk. It executes cpimport using a wrapper called cpimport.bin.

  • If the statement is executed inside of a transaction, Enterprise ColumnStore loads the data using the DML interface, which is slower.

Batch insert mode can be disabled by setting the columnstore_use_import_for_batchinsert system variable to OFF. When batch insert mode is disabled, Enterprise ColumnStore executes the statements using the DML interface, which is slower.

Locking

MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

When a bulk data load is running:

  • Read queries will not be blocked.

  • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

  • The write metadata lock (MDL) can be monitored with the .

Choose a Data Load Method

Performance
Method
Interface
Format(s)
Location(s)
Benefits

Data Ingestion Methods & Tools

Learn about data ingestion for MariaDB ColumnStore. This section covers various methods and tools for efficiently loading large datasets into your columnar database for analytical workloads.

ColumnStore provides several mechanisms to ingest data:

  • provides the fastest performance for inserting data and ability to route data to particular PrimProc nodes. Normally, this should be the default choice for loading data .

  • provides another means of bulk inserting data.

[ObjectStorage]
service = S3
object_size = 5M
metadata_path = /var/lib/columnstore/storagemanager/metadata
journal_path = /var/lib/columnstore/storagemanager/journal
max_concurrent_downloads = 21
max_concurrent_uploads = 21
common_prefix_depth = 3

[S3]
ec2_iam_mode=enabled
bucket = my_mcs_bucket
region = us-west-2
endpoint = s3.us-west-2.amazonaws.com

[LocalStorage]
path = /var/lib/columnstore/storagemanager/fake-cloud
fake_latency = n
max_latency = 50000

[Cache]
cache_size = 2g
path = /var/lib/columnstore/storagemanager/cache

Inserts each row in the order the rows are read from the source file. Users can optimize data loads for Enterprise ColumnStore's automatic partitioning by loading presorted data files. For additional information, see "Load Ordered Data in Proper Order".

  • Supports parallel distributed bulk loads

  • Imports data from text files

  • Imports data from binary files

  • Imports data from standard input (stdin)

  • • S3-compatible object storage

    • Loads data from the cloud. • Translates operation to cpimport command. • Non-blocking

    Fast

    SQL

    • Text file.

    • Server file system • Client file system

    • Translates operation to cpimport command. • Non-blocking

    Slow

    SQL

    • Other table(s).

    • Same MariaDB server

    • Translates operation to cpimport command. • Non-blocking

    Fastest

    cpimport

    Shell

    • Text file. • Binary file • Standard input (stdin)

    • Server file system

    Lowest latency. • Bypasses SQL layer. • Non-blocking

    Fast

    columnstore_info.load_from_s3

    SQL

    • Text file.

    By default, with autocommit on, it internally streams the data to an instance of the cpimport process.
  • In transactional mode, DML inserts are performed, which is significantly slower and also consumes both binlog transaction files and ColumnStore VersionBuffer files.

  • DML, i.e. INSERT, UPDATE, and DELETE, provide row-level changes. ColumnStore is optimized towards bulk modifications, so these operations are slower than they would be in, for instance, InnoDB.

    • Currently ColumnStore does not support operating as a replication replica target.

    • Bulk DML operations will in general perform better than multiple individual statements.

      • with autocommit behaves similarly to LOAD DATE INFILE because, internally, it is mapped to cpimport for higher performance.

      • Bulk update operations based on a join with a small staging table can be relatively fast, especially if updating a single column.

  • Using ColumnStore Bulk Write SDK or ColumnStore Streaming Data Adapters.

  • cpimport
    LOAD DATA INFILE
    INSERT INTO SELECT

    Data Loading with cpimport

    Overview

    MariaDB Enterprise ColumnStore includes a bulk data loading tool called cpimport, which bypasses the SQL layer to decrease the overhead of bulk data loading.

    Refer to the cpimport modes for additional information and to ColumnStore Bulk Data Loading.

    The cpimport tool:

    • Bypasses the SQL layer to decrease overhead;

    • Does not block read queries;

    • Requires a write metadata lock on the table, which can be monitored with the ;

    • Appends the new data to the table. While the bulk load is in progress, the newly appended data is temporarily hidden from queries. After the bulk load is complete, the newly appended data is visible to queries;

    • Inserts each row in the order the rows are read from the source file. Users can optimize data loads for Enterprise ColumnStore's automatic partitioning by loading presorted data files;

    • Supports parallel distributed bulk loads;

    • Imports data from text files;

    • Imports data from binary files;

    • Imports data from standard input (stdin).

    Intended Use Cases

    You can load data using the cpimport tool in the following cases:

    • You are loading data into a ColumnStore table from a text file stored on the primary node's file system.

    • You are loading data into a ColumnStore table from a binary file stored on the primary node's file system.

    • You are loading data into a ColumnStore table from the output of a command running on the primary node.

    Locking

    MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

    When a bulk data load is running:

    • Read queries will not be blocked.

    • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

    • The write metadata lock (MDL) can be monitored with the .

    Importing the Schema

    Before data can be imported into the tables, the schema must be created.

    1. Connect to the primary server using :

    After the command is executed, it prompts for a password.

    1. For each imported database, create the database with the statement:

    1. For each imported table, create the table with the statement:

    To get the best performance from Enterprise ColumnStore, make sure to follow Enterprise ColumnStore's best practices for schema design.

    Appending Data

    When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read. Appending data reduces the I/O requirements of bulk data loads, so that larger data sets can be loaded very efficiently.

    While the bulk load is in progress, the newly appended data is temporarily hidden from queries.

    After the bulk load is complete, the newly appended data is visible to queries.

    Sorting the Input File

    When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read.

    The order of data can have a significant effect on performance with Enterprise ColumnStore, so it can be helpful to sort the data in the input file prior to importing it.

    For additional information, see "".

    Confirming the Field Delimiter

    Before importing a file into MariaDB Enterprise ColumnStore, confirm that the field delimiter is not present in the data.

    The default field delimiter for the cpimport tool is a pipe (|).

    To use a different delimiter, you can set the field delimiter.

    Importing from Text Files

    The cpimport tool can import data from a text file if a file is provided as an argument after the database and table name.

    For example, to import the file inventory-products.txt into the products table in the inventory database:

    Importing from Binary Files

    The cpimport tool can import data from a binary file if the -I1 or -I2 option is provided and a file is provided as an argument after the database and table name.

    For example, to import the file inventory-products.bin into the products table in the inventory database:

    The -I1 and -I2 options allow two different binary import modes to be selected:

    Option
    Description

    The binary file should use the following format for data:

    Data Type(s)
    Format

    Binary DATE Format

    In binary input files, the cpimport tool expects columns to be in the following format:

    Binary DATETIME Format

    In binary input files, the cpimport tool expects columns to be in the following format:

    Importing from Standard Input

    The cpimport tool can import data from standard input (stdin) if no file is provided as an argument.

    Importing from standard input is useful in many scenarios.

    One scenario is when you want to import data from a remote database. You can use to query the table using the statement, and then pipe the results into the standard input of the cpimport tool:

    Importing from S3 Using AWS CLI

    The cpimport tool can import data from a file stored in a remote S3 bucket.

    You can use the AWS CLI to copy the file from S3, and then pipe the contents into the standard input of the cpimport tool:

    Alternatively, the columnstore_info.load_from_s3 stored procedure can import data from S3-compatible cloud object storage.

    Setting the Field Delimiter

    The default field delimiter for the cpimport tool is a pipe sign (|).

    If your data file uses a different field delimiter, you can specify the field delimiter with the -s option.

    For a TSV (tab-separated values) file:

    For a CSV (comma-separated values) file:

    Setting the Quoting Style

    By default, the cpimport tool does not expect fields to be quoted.

    If your data file uses quotes around fields, you can specify the quote character with the -E option.

    To load a TSV (tab-separated values) file that uses double quotes:

    To load a CSV (comma-separated values) file that uses optional single quotes:

    Logging

    The cpimport tool writes logs to different directories, depending on the Enterprise ColumnStore version:

    • In Enterprise ColumnStore 5.5.2 and later, logs are written to /var/log/mariadb/columnstore/bulk/

    • In Enterprise ColumnStore 5 releases before 5.5.2, logs are written to /var/lib/columnstore/data/bulk/

    • In Enterprise ColumnStore 1.4, logs are written to /usr/local/mariadb/columnstore/bulk/

    Special Handling

    Column Order

    The cpimport tool requires column values to be in the same order in the input file as the columns in the table definition.

    Date Format

    The cpimport tool requires values to be specified in the format YYYY-MM-DD.

    Transaction Log

    The cpimport tool does not write bulk data loads to the transaction log, so they are not transactional.

    Binary Log

    The cpimport tool does not write bulk data loads to the binary log, so they cannot be replicated using .

    EFS Storage

    When Enterprise ColumnStore uses object storage and the Storage Manager directory uses EFS in the default Bursting Throughput mode, the cpimport tool can have performance problems if multiple data load operations are executed consecutively. The performance problems can occur because the Bursting Throughput mode scales the rate relative to the size of the file system, so the burst credits for a small Storage Manager volume can be fully consumed very quickly.

    When this problem occurs, some solutions are:

    • Avoid using burst credits by using Provisioned Throughput mode instead of Bursting Throughput mode

    • Monitor burst credit balances in AWS and run data load operations when burst credits are available

    • Increase the burst credit balance by increasing the file system size (for example, by creating a dummy file)

    Additional information is available .

    Data Loading with LOAD DATA INFILE

    Overview

    MariaDB Enterprise ColumnStore automatically translates LOAD DATA [ LOCAL ] INFILE statements into bulk data loads. By default, it translates the statement into a bulk data load that uses cpimport.bin, which is an internal wrapper around the cpimport tool.

    Data Loading with INSERT .. SELECT

    Overview

    MariaDB Enterprise ColumnStore automatically translates INSERT INTO .. SELECT FROM .. statements into bulk data loads. By default, it translates the statement into a bulk data load that uses cpimport.bin, which is an internal wrapper around the cpimport tool.

    FLOAT

    Native IEEE floating point format NULL: 0xFFAAAAAA

    INT

    Little-endian integer format Signed NULL: 0x80000000 Unsigned NULL: 0xFFFFFFFE

    SMALLINT

    Little-endian integer format Signed NULL: 0x8000 Unsigned NULL: 0xFFFE

    TINYINT

    Little-endian integer format Signed NULL: 0x80 Unsigned NULL: 0xFE

    VARCHAR

    String padded with '0' to match the length of the field NULL: '0' for the full length of the field

    -I1

    Numeric fields containing NULL will be treated as NULL unless the column has a default value

    -I2

    Numeric fields containing NULL will be saturated

    BIGINT

    Little-endian integer format Signed NULL: 0x8000000000000000ULL Unsigned NULL: 0xFFFFFFFFFFFFFFFEULL

    CHAR

    String padded with '0' to match the length of the field NULL: '0' for the full length of the field

    DATE

    Use the format represented by the struct Date NULL: 0xFFFFFFFE

    DATETIME

    Use the format represented by the struct DateTime NULL: 0xFFFFFFFFFFFFFFFEULL

    DECIMAL

    Use an integer representation of the value without the decimal point Sizing depends on precision: * 1-2: use 2 bytes * 3-4: use 3 bytes * 4-9: use 4 bytes * 10+: use 8 bytes Signed and unsigned NULL: See equivalent-sized integer

    DOUBLE

    Native IEEE floating point format NULL: 0xFFFAAAAAAAAAAAAAULL

    Load Ordered Data in Proper Order
    here
    Intended Use Cases

    You can load data using the statement in the following cases:

    • You are loading data into a ColumnStore table from a text file stored on the primary node's file system.

    • You are loading data into a ColumnStore table from a text file stored on the client's file system. In this case, LOAD DATA LOCAL INFILE must be used.

    Batch Insert Mode

    ECStoreDataLoadingS3FlowChart

    MariaDB Enterprise ColumnStore enables batch insert mode by default.

    When batch insert mode is enabled, MariaDB Enterprise ColumnStore has special handling for statements.

    Enterprise ColumnStore uses the following rules:

    • If the statement is executed outside of a transaction, Enterprise ColumnStore loads the data using cpimport, which is a command-line utility that is designed to efficiently load data in bulk. Enterprise ColumnStore executes cpimport using a wrapper called cpimport.bin.

    • If the statement is executed inside of a transaction, Enterprise ColumnStore loads the data using the DML interface, which is slower.

    Batch insert mode can be disabled by setting the columnstore_use_import_for_batchinsert system variable to OFF. When batch insert mode is disabled, Enterprise ColumnStore executes the statements using the DML interface, which is slower.

    Insert Cache

    Starting with MariaDB Enterprise ColumnStore 6, an insert cache can be enabled to speed up statements.

    The insert cache is disabled by default, but it can be enabled by configuring columnstore_cache_inserts=ON:

    The cache is flushed to ColumnStore in the following scenarios:

    When the number of cached rows exceeds the value of columnstore_cache_flush_threshold

    When a statement other than or is executed, cpimport is used to flush the insert cache to ColumnStore when columnstore_cache_use_import=ON is configured.

    Locking

    MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

    When a bulk data load is running:

    • Read queries will not be blocked.

    • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

    • The write metadata lock (MDL) can be monitored with the .

    Importing the Schema

    Before data can be imported into the tables, the schema must be created.

    1. Connect to the primary server using :

    After the command is executed, it will prompt you for a password.

    1. For each database that you are importing, create the database with the statement:

    For each table that you are importing, create the table with the statement:

    To get the best performance from Enterprise ColumnStore, make sure to follow Enterprise ColumnStore's best practices for schema design.

    Appending Data

    When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read. Appending data reduces the I/O requirements of bulk data loads, so that larger data sets can be loaded very efficiently.

    While the bulk load is in progress, the newly appended data is temporarily hidden from queries.

    After the bulk load is complete, the newly appended data is visible to queries.

    Sorting the Input File

    When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read.

    The order of data can have a significant effect on performance with Enterprise ColumnStore, so it can be helpful to sort the data in the input file prior to importing it.

    For additional information, see "Load Ordered Data in Proper Order".

    Confirming the Field Delimiter

    Before importing a file into MariaDB Enterprise ColumnStore, confirm that the field delimiter is not present in the data.

    The field delimiter is determined by the columnstore_import_for_batchinsert_delimiter system variable. By default, Enterprise ColumnStore sets the field delimiter to the ASCII value 7, which corresponds to the BEL character.

    To use another delimiter, you can set the field delimiter.

    Loading a Local Input File on the Client

    If you are loading a file located on the client, you can use the statement. However, for this statement to work, the client must explicitly enable usage of a local infile.

    If you are using the --local-infile option can be used:

    If you are using , the MYSQL_OPT_LOCAL_INFILE option can be set with the mysql_optionsv() function:

    If you are using , the allowLocalInfile parameter can be set for the connection:

    If you are using , the permitLocalInfile parameter can be set for the connection:

    If you are using , the local_infile parameter can be set for the connection:

    The default field delimiter for the statement is a tab.

    If your data file uses a different field delimiter, you can specify the field delimiter with the FIELDS TERMINATED BY clause.

    To load a TSV (tab-separated values) file:

    Setting the Quoting Style

    By default, the statement does not expect fields to be quoted.

    If your data file uses quotes around fields, you can specify the quote character with the FIELDS [OPTIONALLY] ENCLOSED BY clause.

    To load a TSV (tab-separated values) file that uses double quotes:

    Limitations

    MariaDB Enterprise ColumnStore ignores the ON DUPLICATE KEY clause.

    Ensure that duplicate data is removed prior to performing a bulk data load.

    Intended Use Cases

    You can load data using INSERT INTO .. SELECT FROM .. in the following cases:

    You are loading data into a ColumnStore table by querying one or more local tables.

    Batch Insert Mode

    MariaDB Enterprise ColumnStore enables batch insert mode by default.

    When batch insert mode is enabled, MariaDB Enterprise ColumnStore has special handling for statements.

    Enterprise ColumnStore uses the following rules:

    • If the statement is executed outside of a transaction, Enterprise ColumnStore loads the data using cpimport, which is a command-line utility that is designed to efficiently load data in bulk. Enterprise ColumnStore executes cpimport using a wrapper called cpimport.bin.

    • If the statement is executed inside of a transaction, Enterprise ColumnStore loads the data using the DML interface, which is slower.

    Batch insert mode can be disabled by setting the columnstore_use_import_for_batchinsert system variable to OFF. When batch insert mode is disabled, Enterprise ColumnStore executes the statements using the DML interface, which is slower.

    Locking

    MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

    When a bulk data load is running:

    • Read queries will not be blocked.

    • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

    • The write metadata lock (MDL) can be monitored with the .

    Importing the Schema

    Before data can be imported into the tables, the schema must be created.

    1. Connect to the primary server using :

    After the command is executed, it will prompt you for a password.

    1. For each database that you are importing, create the database with the statement:

    1. For each table that you are importing, create the table with the statement:

    To get the best performance from Enterprise ColumnStore, make sure to follow Enterprise ColumnStore's best practices for schema design.

    Appending Data

    When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read. Appending data reduces the I/O requirements of bulk data loads, so that larger data sets can be loaded very efficiently.

    While the bulk load is in progress, the newly appended data is temporarily hidden from queries.

    After the bulk load is complete, the newly appended data is visible to queries.

    Sorting the Query Results

    When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read.

    The order of data can have a significant effect on performance with Enterprise ColumnStore. If your data is not already sorted, it can be helpful to sort the query results using an ORDER BY clause.

    For example:

    For additional information, see "Load Ordered Data in Proper Order".

    Confirming the Field Delimiter

    Before importing a table's data into MariaDB Enterprise ColumnStore, confirm that the field delimiter is not present in the data.

    The field delimiter is determined by the columnstore_import_for_batchinsert_delimiter system variable. By default, Enterprise ColumnStore sets the field delimiter to the ASCII value 7, which corresponds to the BEL character.

    To use a different delimiter, you can set the field delimiter.

    Setting the Field Delimiter

    When the data is passed to cpimport, each value is separated by a field delimiter. The field delimiter is determined by the columnstore_import_for_batchinsert_delimiter system variable.

    By default, Enterprise ColumnStore sets the field delimiter to the ASCII value 7, which corresponds to the BEL character. In general, setting the field delimiter is only required if your data contains this value.

    Set the field delimiter by setting the columnstore_import_for_batchinsert_delimiter system variable to the ASCII value for the desired delimiter character.

    For example, if you want to use a comma (,) as the field delimiter, you would set columnstore_import_for_batchinsert_delimiter to 44:

    Setting the Quoting Style

    When the data is passed to cpimport, each value is enclosed by a quote character. The quote character is determined by the columnstore_import_for_batchinsert_enclosed_by system variable.

    By default, Enterprise ColumnStore sets the quote character to the ASCII value 17, which corresponds to the DC1 character. In general, setting the quote character is only required if your data contains this value.

    Set the quote character by setting the columnstore_import_for_batchinsert_enclosed_by system variable to the ASCII value for the desired quote character.

    For example, if you want to use a double quote (") as the quote character, you would set columnstore_import_for_batchinsert_enclosed_by to 34:

    ColumnStore Streaming Data Adapters

    The ColumnStore Bulk Data API enables the creation of higher performance adapters for ETL integration and data ingestions. The Streaming Data Adapters are out of box adapters using these API for specific data sources and use cases.

    • MaxScale CDC Data Adapter is integration of the MaxScale CDC streams into MariaDB ColumnStore.

    • Kafka Data Adapter is integration of the Kafka streams into MariaDB ColumnStore.

    MaxScale CDC Data Adapter

    The MaxScale CDC Data Adapter has been deprecated.

    The MaxScale CDC Data Adapter allows streaming change data events (binary log events) from MariaDB Master hosting non-columnstore engines (InnoDB, MyRocks, MyISAM) to MariaDB ColumnStore. In other words, replicate data from a MariaDB master server to MariaDB ColumnStore. It acts as a CDC Client for MaxScale and uses the events received from MaxScale as input to MariaDB ColumnStore Bulk Data API to push the data to MariaDB ColumnStore.

    It registers with MariaDB MaxScale as a CDC Client using the , receiving change data records from MariaDB MaxScale (that are converted from binlog events received from the Master on MariaDB TX) in a JSON format. Then, using the MariaDB ColumnStore bulk write SDK, it converts the JSON data into API calls and streams it to a MariaDB PM node. The adapter has options to insert all the events in the same schema as the source database table or insert each event with metadata as well as table data. The event meta data includes the event timestamp, the GTID, event sequence and event type (insert, update, delete).

    Installation

    Pre-requisite:

    • Download and install MaxScale CDC Connector API from .

    • Download and install MariaDB ColumnStore bulk write SDK from columnstore-bulk-write-sdk.md.

    CentOS 7

    Debian 9/Ubuntu Xenial:

    Debian 8:

    Usage

    Streaming Multiple Tables

    To stream multiple tables, use the -f parameter to define a path to a TSV formatted file. The file must have one database and one table name per line. The database and table must be separated by a TAB character and the line must be terminated in a newline (\n).

    Here is an example file with two tables, t1 and t2 both in the test database:

    Automated Table Creation on ColumnStore

    You can have the adapter automatically create the tables on the ColumnStore instance with the -an option. In this case, the user used for cross-engine queries will be used to create the table (the values in Columnstore.CrossEngineSupport). This user requires CREATE privileges on all streamed databases and tables.

    Data Transformation Mode

    The -z option enables the data transformation mode. In this mode, the data is converted from historical, append-only data to the current version of the data. In practice, this replicates changes from a MariaDB master server to ColumnStore via the MaxScale CDC.

    This mode is not as fast as the append-only mode and might not be suitable for heavy workloads. This is due to the fact that the data transformation is done via various DML statements.

    Quick Start

    Download and install both and .

    Copy the Columnstore.xml file from /usr/local/mariadb/columnstore/etc/Columnstore.xml from one of the ColumnStore PrimProc nodes to the server where the adapter is installed.

    Configure MaxScale according to the .

    Create a CDC user by executing the following MaxAdmin command on the MaxScale server. Replace the <service> with the name of the avrorouter service and <user> and <password> with the credentials that are to be created.

    Then we can start the adapter by executing the following command.

    The <database> and <table> define the table that is streamed to ColumnStore. This table should exist on the master server where MaxScale is reading events from. If the table is not created on ColumnStore, the adapter will print instructions on how to define it in the correct way.

    The <user> and <password> are the users created for the CDC user, <host> is the MaxScale address and <port> is the port where the CDC service listener is listening.

    The -c flag is optional if you are running the adapter on the server where ColumnStore is located.

    Kafka to ColumnStore Adapter

    The Kafka data adapter streams all messages published to Apache Kafka topics in Avro format to MariaDB ColumnStore automatically and continuously - enabling data from many sources to be streamed and collected for analysis without complex code. The Kafka adapter is built using and the MariaDB ColumnStore bulk write SDK

    A tutorial for the Kafka adapter for ingesting Avro formatted data can be found in the document.

    ColumnStore - Pentaho Data Integration - Data Adapter

    Starting with MariaDB ColumnStore 1.1.4, a data adapter for Pentaho Data Integration (PDI) / Kettle is available to import data directly into ColumnStore’s WriteEngine. It is built on MariaDB’s rapid-paced Bulk Write SDK.

    Compatibility notice

    The plugin was designed for the following software composition:

    • Operating system: Windows 10 / Ubuntu 16.04 / RHEL/CentOS 7+

    • MariaDB ColumnStore >= 1.1.4

    • MariaDB Java Database client* >= 2.2.1

    • Java >= 8

    *Only needed if you want to execute DDL.

    Installation

    The following steps are necessary to install the ColumnStore Data adapter (bulk loader plugin):

    1. Build the plugin from or download it from our

    2. Extract the archive mariadb-columnstore-kettle-bulk-exporter-plugin-*.zip into your PDI installation directory $PDI-INSTALLATION/plugins.

    3. Copy mariadb-java-client-2.2.x.jar into PDI's lib directory $PDI-INSTALLATION/lib.

    4. Install the additional library dependencies

    Ubuntu dependencies

    CentOS dependencies

    Windows 10 dependencies

    On Windows the installation of the is required.

    Configuration

    Each MariaDB ColumnStore Bulk Loader block needs to be configured. On the one hand, it needs to know how to connect to the underlying Bulk Write SDK to inject data into ColumnStore, and on the other hand, it needs to have a proper JDBC connection to execute DDL.

    Both configurations can be set in each block’s settings tab.

    The database connection configuration follows PDI’s default schema.

    By default, the plugin tries to use ColumnStore's default configuration /usr/local/mariadb/columnstore/etc/Columnstore.xml to connect to the ColumnStore instance through the Bulk Write SDK. In addition, individual paths or variables can be used too.

    Information on how to prepare the Columnstore.xml configuration file can be found here.

    Usage

    Once a block is configured and all inputs are connected in PDI, the inputs have to be mapped to ColumnStore’s table format.

    One can either choose “Map all inputs”, which sets target columns of adequate type, or choose a custom mapping based on the structure of the existing table.

    The SQL button can be used to generate DDL based on the defined mapping and to execute it.

    Limitations

    This plugin is a beta release.

    In addition, it can't handle blob data types and only supports multiple inputs to one block if the input field names are equal for all input sources.

    $ mariadb --host 192.168.0.100 --port 5001 \
              --user db_user --password \
              --default-character-set=utf8
    CREATE DATABASE inventory;
    CREATE TABLE inventory.products (
       product_name VARCHAR(11) NOT NULL DEFAULT '',
       supplier VARCHAR(128) NOT NULL DEFAULT '',
       quantity VARCHAR(128) NOT NULL DEFAULT '',
       unit_cost VARCHAR(128) NOT NULL DEFAULT ''
    ) ENGINE=Columnstore DEFAULT CHARSET=utf8;
    $ sudo cpimport \
       inventory products \
       inventory-products.txt
    $ sudo cpimport -I1 \
       inventory products \
       inventory-products.bin
    struct Date
    {
      unsigned spare : 6;
      unsigned day : 6;
      unsigned month : 4;
      unsigned year : 16
    };
    struct DateTime
    {
      unsigned msecond : 20;
      unsigned second : 6;
      unsigned minute : 6;
      unsigned hour : 6;
      unsigned day : 6;
      unsigned month : 4;
      unsigned year : 16
    };
    $ mariadb --quick \
       --skip-column-names \
       --execute="SELECT * FROM inventory.products" \
       | cpimport -s '\t' inventory products
    $ aws s3 cp --quiet s3://columnstore-test/inventory-products.csv - \
       | cpimport -s ',' inventory products
    $ sudo cpimport -s '\t' \
       inventory products \
       inventory-products.tsv
    $ sudo cpimport -s ',' \
       inventory products \
       inventory-products.csv
    $ sudo cpimport -s '\t' -E '"' \
       inventory products \
       inventory-products.tsv
    $ sudo cpimport -s ',' -E "'" \
       inventory products \
       inventory-products.csv
    [mariadb]
    ...
    columnstore_cache_inserts=ON\
    $ mariadb --host 192.168.0.100 --port 5001 \
          --user db_user --password \
          --default-character-set=utf8
    CREATE DATABASE inventory;
    CREATE TABLE inventory.products (
       product_name VARCHAR(11) NOT NULL DEFAULT '',
       supplier VARCHAR(128) NOT NULL DEFAULT '',
       quantity VARCHAR(128) NOT NULL DEFAULT '',
       unit_cost VARCHAR(128) NOT NULL DEFAULT ''
    ) ENGINE=Columnstore DEFAULT CHARSET=utf8;
    $ mariadb --host 192.168.0.1 \
          --user db_user --password \
          --default-character-set=utf8 \
          --local-infile
    /* enable local infile */
    unsigned int enable_local_infile = 1;
    mysql_optionsv(mysql, MYSQL_OPT_LOCAL_INFILE, (void *) &enable_local_infile);
    Connection connection = DriverManager.getConnection("jdbc:mariadb://192.168.0.1/test?user=test_user&password=myPassword&allowLocalInfile=true");
    mariadb.createConnection({
       host: '192.168.0.1',
       user:'test_user',
       password: 'myPassword',
       permitLocalInfile: 'true'
     });
    conn = mariadb.connect(
       user="test_user",
       password="myPassword",
       host="192.168.0.1",
       port=3306,
       local_infile=true)
    Set the Field Delimiter
    LOAD DATA INFILE 'inventory-products.tsv'
    INTO TABLE inventory.products;
    To load a CSV (comma-separated values) file:
    LOAD DATA LOCAL INFILE 'inventory-products.csv'
    INTO TABLE accounts.contacts
    FIELDS TERMINATED BY ',';
    LOAD DATA INFILE 'inventory-products.tsv'
    INTO TABLE inventory.products
    FIELDS ENCLOSED BY '"';
    To load a CSV (comma-separated values) file that uses optional single quotes:
    LOAD DATA LOCAL INFILE 'inventory-products.csv'
    INTO TABLE accounts.contacts
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\'';
    $ mariadb --host 192.168.0.100 --port 5001 \
          --user db_user --password \
          --default-character-set=utf8
    CREATE DATABASE inventory;
    CREATE TABLE inventory.products (
       product_name VARCHAR(11) NOT NULL DEFAULT '',
       supplier VARCHAR(128) NOT NULL DEFAULT '',
       quantity VARCHAR(128) NOT NULL DEFAULT '',
       unit_cost VARCHAR(128) NOT NULL DEFAULT ''
    ) ENGINE=Columnstore DEFAULT CHARSET=utf8;
    # Perform import from other table
    # and insert in sorted order
    INSERT INTO inventory.orders
    SELECT *
    FROM innodb_inventory.orders
    ORDER BY order_date;
    # Set field delimiter
    SET SESSION columnstore_import_for_batchinsert_delimiter=44;
    
    # Perform import from other table
    INSERT INTO inventory.products
    SELECT *
    FROM innodb_inventory.products;
    # Set quote character
    SET SESSION columnstore_import_for_batchinsert_enclosed_by=34;
    
    # Perform import from other table
    INSERT INTO inventory.products
    SELECT *
    FROM innodb_inventory.products;
    Pentaho Data Integration >= 7 +not officially supported by Pentaho.
    maxscale-cdc-adapter
    MaxScale CDC Connector API
    connector
    MaxScale
    ColumnStore
    librdkafka
    kafka-to-columnstore-data-adapter
    source
    website
    MariaDB's JDBC Client
    Visual Studio 2015/2017 C++ Redistributable (x64)
    kafka-data-adapter
    PDI Plugin Block info graphic
    PDI Plugin Block settings info graphic
    PDI Plugin Block mapping info graphic
    sudo yum -y install epel-release
    sudo yum -y install <data adapter>.rpm
    sudo apt-get update
    sudo dpkg -i <data adapter>.deb
    sudo apt-get -f install
    sudo echo "deb http://httpredir.debian.org/debian jessie-backports main contrib non-free" >> /etc/apt/sources.list
    sudo apt-get update
    sudo dpkg -i <data adapter>.deb
    sudo apt-get -f install
    Usage: mxs_adapter [OPTION]... DATABASE TABLE
    
     -f FILE      TSV file with database and table names to stream (must be in `database TAB table NEWLINE` format)
      -h HOST      MaxScale host (default: 127.0.0.1)
      -P PORT      Port number where the CDC service listens (default: 4001)
      -u USER      Username for the MaxScale CDC service (default: admin)
      -p PASSWORD  Password of the user (default: mariadb)
      -c CONFIG    Path to the Columnstore.xml file (default: '/usr/local/mariadb/columnstore/etc/Columnstore.xml')
      -a           Automatically create tables on ColumnStore
      -z           Transform CDC data stream from historical data to current data (implies -n)
      -s           Directory used to store the state files (default: '/var/lib/mxs_adapter')
      -r ROWS      Number of events to group for one bulk load (default: 1)
      -t TIME      Connection timeout (default: 10)
      -n           Disable metadata generation (timestamp, GTID, event type)
      -i TIME      Flush data every TIME seconds (default: 5)
      -l FILE      Log output to FILE instead of stdout
      -v           Print version and exit
      -d           Enable verbose debug output
    test	t1
    test	t2
    maxadmin call command cdc add_user <service> <user> <password>
    mxs_adapter -u <user> -p <password> -h <host> -P <port> -c <path to Columnstore.xml> <database><table>
    sudo apt-get install libuv1 libxml2 libsnappy1v5
    sudo yum install epel-release
    sudo yum install libuv libxml2 snappy

    ColumnStore Bulk Data Loading

    Overview

    cpimport is a high-speed bulk load utility that imports data into ColumnStore tables in a fast and efficient manner. It accepts as input any flat file containing data that contains a delimiter between fields of data (i.e. columns in a table). The default delimiter is the pipe (‘|’) character, but other delimiters such as commas may be used as well. The data values must be in the same order as the create table statement, i.e. column 1 matches the first column in the table and so on. Date values must be specified in the format 'yyyy-mm-dd'.

    cpimport – performs the following operations when importing data into a MariaDB ColumnStore database:

    • Data is read from specified flat files.

    • Data is transformed to fit ColumnStore’s column-oriented storage design.

    • Redundant data is tokenized and logically compressed.

    • Data is written to disk.

    It is important to note that:

    • The bulk loads are an append operation to a table, so they allow existing data to be read and remain unaffected during the process.

    • The bulk loads do not write their data operations to the transaction log; they are not transactional in nature but are considered an atomic operation at this time. Information markers, however, are placed in the transaction log so the DBA is aware that a bulk operation did occur.

    • Upon completion of the load operation, a high-water mark in each column file is moved in an atomic operation that allows for any subsequent queries to read the newly loaded data. It appends operation provides for consistent read but does not incur the overhead of logging the data.

    There are two primary steps to using the cpimport utility:

    1. Optionally create a job file that is used to load data from a flat file into multiple tables.

    2. Run the cpimport utility to perform the data import.

    Syntax

    The simplest form of cpimport command is

    The full syntax is like this:

    cpimport modes

    Mode 1: Bulk Load from a central location with single data source file

    In this mode, you run the cpimport from your primary node (mcs1). The source file is located at this primary location and the data from cpimport is distributed across all the nodes. If no mode is specified, then this is the default.

    Example:

    Mode 2: Bulk load from central location with distributed data source files

    In this mode, you run the cpimport from your primary node (mcs1). The source data is in already partitioned data files residing on the PMs. Each PM should have the source data file of the same name but containing the partitioned data for the PM

    Example:

    Mode 3: Parallel distributed bulk load

    In this mode, you run cpimport from the individual nodes independently, which will import the source file that exists on that node. Concurrent imports can be executed on every node for the same table.

    Example:

    Note:

    • The bulk loads are an append operation to a table, so they allow existing data to be read and remain unaffected during the process.

    • The bulk loads do not write their data operations to the transaction log; they are not transactional in nature but are considered an atomic operation at this time. Information markers, however, are placed in the transaction log so the DBA is aware that a bulk operation did occur.

    • Upon completion of the load operation, a high-water mark in each column file is moved in an atomic operation that allows for any subsequent queries to read the newly loaded data. It appends operation provides for consistent read but does not incur the overhead of logging the data.

    Bulk loading data from STDIN

    Data can be loaded from STDIN into ColumnStore by simply not including the loadFile parameter

    Example:

    Bulk loading from AWS S3

    Similarly the AWS cli utility can be utilized to read data from an s3 bucket and pipe the output into cpimport allowing direct loading from S3. This assumes the aws cli program has been installed and configured on the host:

    Example:

    For troubleshooting connectivity problems remove the --quiet option which suppresses client logging including permission errors.

    Bulk loading output of SELECT FROM Table(s)

    Standard in can also be used to directly pipe the output from an arbitrary SELECT statement into cpimport. The select statement may select from non-columnstore tables such as or . In the example below, the db2.source_table is selected from, using the -N flag to remove non-data formatting. The -q flag tells the mysql client to not cache results which will avoid possible timeouts causing the load to fail.

    Example:

    Bulk loading from JSON

    Let's create a sample ColumnStore table:

    Now let's create a sample products.json file like this:

    We can then bulk load data from JSON into Columnstore by first piping the data to and then to using a one-line command.

    Example:

    In this example, the JSON data is coming from a static JSON file, but this same method will work for, and output streamed from any datasource using JSON such as an API or NoSQL database. For more information on 'jq', please view the manual here .

    Bulk loading into multiple tables

    There are two ways multiple tables can be loaded:

    1. Run multiple cpimport jobs simultaneously. Tables per import should be unique or for each import should be unique if using mode 3.

    2. Use colxml utility: colxml creates an XML job file for your database schema before you can import data. Multiple tables may be imported by either importing all tables within a schema or listing specific tables using the -t option in colxml. Then, using cpimport, that uses the job file generated by colxml. Here is an example of how to use colxml and cpimport to import data into all the tables in a database schema

    colxml syntax

    Example usage of colxml

    The following tables comprise a database name ‘tpch2’:

    1. First, put delimited input data file for each table in /usr/local/mariadb/columnstore/data/bulk/data/import. Each file should be named .tbl.

    2. Run colxml for the load job for the ‘tpch2’ database as shown here:

    Now actually run cpimport to use the job file generated by the colxml execution

    Handling Differences in Column Order and Values

    If there are some differences between the input file and table definition then the colxml utility can be utilized to handle these cases:

    • Different order of columns in the input file from table order

    • Input file column values to be skipped / ignored.

    • Target table columns to be defaulted.

    In this case run the colxml utility (the -t argument can be useful for producing a job file for one table if preferred) to produce the job xml file and then use this a template for editing and then subsequently use that job file for running cpimport.

    Consider the following simple table example:

    This would produce a colxml file with the following table element:

    If your input file had the data such that hire_date comes before salary then the following modification will allow correct loading of that data to the original table definition (note the last 2 Column elements are swapped):

    The following example would ignore the last entry in the file and default salary to it's default value (in this case null):

    • IgnoreFields instructs cpimport to ignore and skip the particular value at that position in the file.

    • DefaultColumn instructs cpimport to default the current table column and not move the column pointer forward to the next delimiter.

    Both instructions can be used indepedently and as many times as makes sense for your data and table definition.

    Binary Source Import

    It is possible to import using a binary file instead of a CSV file using fixed length rows in binary data. This can be done using the '-I' flag which has two modes:

    • -I1 - binary mode with NULLs accepted Numeric fields containing NULL will be treated as NULL unless the column has a default value

    • -I2 - binary mode with NULLs saturated NULLs in numeric fields will be saturated

    The following table shows how to represent the data in the binary format:

    Datatype
    Description

    For NULL values the following table should be used:

    Datatype
    Signed NULL
    Unsigned NULL

    Date Struct

    The spare bits in the Date struct "must" be set to 0x3E.

    DateTime Struct

    Working Folders & Logging

    As of version 1.4, cpimport uses the /var/lib/columnstore/bulk folder for all work being done. This folder contains:

    1. Logs

    2. Rollback info

    3. Job info

    4. A staging folder

    The log folder typically contains:

    A typical log might look like this:

    Prior to version 1.4, this folder was located at /usr/local/mariadb/columnstore/bulk.

    DECIMAL

    As equiv. INT

    As equiv. INT

    FLOAT

    0xFFAAAAAA

    N/A

    DOUBLE

    0xFFFAAAAAAAAAAAAAULL

    N/A

    DATE

    0xFFFFFFFE

    N/A

    DATETIME

    0xFFFFFFFFFFFFFFFEULL

    N/A

    CHAR/VARCHAR

    Fill with '\0'

    N/A

    INT/TINYINT/SMALLINT/BIGINT

    Little-endian format for the numeric data

    FLOAT/DOUBLE

    IEEE format native to the computer

    CHAR/VARCHAR

    Data padded with '\0' for the length of the field. An entry that is all '\0' is treated as NULL

    DATE

    Using the Date struct below

    DATETIME

    Using the DateTime struct below

    DECIMAL

    Stored using an integer representation of the DECIMAL without the decimal point. With precision/width of 2 or less 2 bytes should be used, 3-4 should use 3 bytes, 4-9 should use 4 bytes and 10+ should use 8 bytes

    BIGINT

    0x8000000000000000ULL

    0xFFFFFFFFFFFFFFFEULL

    INT

    0x80000000

    0xFFFFFFFE

    SMALLINT

    0x8000

    0xFFFE

    TINYINT

    0x80

    jq
    cpimport
    here
    PMs
    cpimport-mode1
    cpimport-mode2
    cpimport-mode3

    0xFE

    cpimport dbName tblName [loadFile]
    cpimport dbName tblName [loadFile]
    [-h] [-m mode] [-f filepath] [-d DebugLevel]
    [-c readBufferSize] [-b numBuffers] [-r numReaders]
    [-e maxErrors] [-B libBufferSize] [-s colDelimiter] [-E EnclosedByChar]
    [-C escChar] [-j jobID] [-p jobFilePath] [-w numParsers]
    [-n nullOption] [-P pmList] [-i] [-S] [-q batchQty]
    
    positional parameters:
    	dbName     Name of the database to load
    	tblName    Name of table to load
    	loadFile   Optional input file name in current directory,
    			unless a fully qualified name is given.
    			If not given, input read from STDIN.
    Options:
    	-b	Number of read buffers
    	-c	Application read buffer size(in bytes)
    	-d	Print different level(1-3) debug message
    	-e	Max number of allowable error per table per PM
    	-f	Data file directory path.
    			Default is current working directory.
    			In Mode 1, -f represents the local input file path.
    			In Mode 2, -f represents the PM based input file path.
    			In Mode 3, -f represents the local input file path.
    	-l	Name of import file to be loaded, relative to -f path. (Cannot be used with -p)
    	-h	Print this message.
    	-q	Batch Quantity, Number of rows distributed per batch in Mode 1
    	-i	Print extended info to console in Mode 3.
    	-j	Job ID. In simple usage, default is the table OID.
    			unless a fully qualified input file name is given.
    	-n	NullOption (0-treat the string NULL as data (default);
    			1-treat the string NULL as a NULL value)
    	-p	Path for XML job description file.
    	-r	Number of readers.
    	-s	The delimiter between column values.
    	-B	I/O library read buffer size (in bytes)
    	-w	Number of parsers.
    	-E	Enclosed by character if field values are enclosed.
    	-C	Escape character used in conjunction with 'enclosed by'
    			character, or as part of NULL escape sequence ('\N');
    			default is '\'
    	-I	Import binary data; how to treat NULL values:
    			1 - import NULL values
    			2 - saturate NULL values
    	-P	List of PMs ex: -P 1,2,3. Default is all PMs.
    	-S	Treat string truncations as errors.
    	-m	mode
    			1 - rows will be loaded in a distributed manner across PMs.
    			2 - PM based input files loaded onto their respective PM.
    			3 - input files will be loaded on the local PM.
    cpimport -m1 mytest mytable mytable.tbl
    cpimport -m2 mytest mytable -l /home/mydata/mytable.tbl
    cpimport -m3 mytest mytable /home/mydata/mytable.tbl
    cpimport db1 table1
    aws s3 cp --quiet s3://dthompson-test/trades_bulk.csv - | cpimport test trades -s ","
    mariadb -q -e 'select * from source_table;' -N <source-db> | cpimport -s '\t' <target-db> <target-table>
    CREATE DATABASE `json_columnstore`;
    
    USE `json_columnstore`;
    
    CREATE TABLE `products` (
      `product_name` VARCHAR(11) NOT NULL DEFAULT '',
      `supplier` VARCHAR(128) NOT NULL DEFAULT '',
      `quantity` VARCHAR(128) NOT NULL DEFAULT '',
      `unit_cost` VARCHAR(128) NOT NULL DEFAULT ''
    ) ENGINE=Columnstore DEFAULT CHARSET=utf8;
    [{
      "_id": {
        "$oid": "5968dd23fc13ae04d9000001"
      },
      "product_name": "Sildenafil Citrate",
      "supplier": "Wisozk Inc",
      "quantity": 261,
      "unit_cost": "$10.47"
    }, {
      "_id": {
        "$oid": "5968dd23fc13ae04d9000002"
      },
      "product_name": "Mountain Juniperus Ashei",
      "supplier": "Keebler-Hilpert",
      "quantity": 292,
      "unit_cost": "$8.74"
    }, {
      "_id": {
        "$oid": "5968dd23fc13ae04d9000003"
      },
      "product_name": "Dextromethorphan HBR",
      "supplier": "Schmitt-Weissnat",
      "quantity": 211,
      "unit_cost": "$20.53"
    }]
    cat products.json | jq -r '.[] | [.product_name,.supplier,.quantity,.unit_cost] | @csv' | cpimport json_columnstore products -s ',' -E '"'
    colxml mytest -j299
    cpimport -m1 -j299
    Usage: colxml [options] dbName
    
    Options: 
       -d Delimiter (default '|')
       -e Maximum allowable errors (per table)
       -h Print this message
       -j Job id (numeric)
       -l Load file name
       -n "name in quotes"
       -p Path for XML job description file that is generated
       -s "Description in quotes"
       -t Table name
       -u User
       -r Number of read buffers
       -c Application read buffer size (in bytes)
       -w I/O library buffer size (in bytes), used to read files
       -x Extension of file name (default ".tbl")
       -E EnclosedByChar (if data has enclosed values)
       -C EscapeChar
       -b Debug level (1-3)
    MariaDB[tpch2]> show tables;
    +---------------+
    | Tables_in_tpch2 |
    +--------------+
    | customer    |
    | lineitem    |
    | nation      |
    | orders      |
    | part        |
    | partsupp    |
    | region      |
    | supplier    |
    +--------------+
    8 rows in set (0.00 sec)
    /usr/local/mariadb/columnstore/bin/colxml tpch2 -j500
    Running colxml with the following parameters:
    2015-10-07 15:14:20 (9481) INFO :
    Schema: tpch2
    Tables:
    Load Files:
    -b 0
    -c 1048576
    -d |
    -e 10
    -j 500
    -n
    -p /usr/local/mariadb/columnstore/data/bulk/job/
    -r 5
    -s
    -u
    -w 10485760
    -x tbl
    File completed for tables:
    tpch2.customer
    tpch2.lineitem
    tpch2.nation
    tpch2.orders
    tpch2.part
    tpch2.partsupp
    tpch2.region
    tpch2.supplier
    Normal exit.
    /usr/local/mariadb/columnstore/bin/cpimport -j 500
    Bulkload root directory : /usr/local/mariadb/columnstore/data/bulk
    job description file : Job_500.xml
    2015-10-07 15:14:59 (9952) INFO : successfully load job file /usr/local/mariadb/columnstore/data/bulk/job/Job_500.xml
    2015-10-07 15:14:59 (9952) INFO : PreProcessing check starts
    2015-10-07 15:15:04 (9952) INFO : PreProcessing check completed
    2015-10-07 15:15:04 (9952) INFO : preProcess completed, total run time : 5 seconds
    2015-10-07 15:15:04 (9952) INFO : No of Read Threads Spawned = 1
    2015-10-07 15:15:04 (9952) INFO : No of Parse Threads Spawned = 3
    2015-10-07 15:15:06 (9952) INFO : For table tpch2.customer: 150000 rows processed and 150000 rows inserted.
    2015-10-07 15:16:12 (9952) INFO : For table tpch2.nation: 25 rows processed and 25 rows inserted.
    2015-10-07 15:16:12 (9952) INFO : For table tpch2.lineitem: 6001215 rows processed and 6001215 rows inserted.
    2015-10-07 15:16:31 (9952) INFO : For table tpch2.orders: 1500000 rows processed and 1500000 rows inserted.
    2015-10-07 15:16:33 (9952) INFO : For table tpch2.part: 200000 rows processed and 200000 rows inserted.
    2015-10-07 15:16:44 (9952) INFO : For table tpch2.partsupp: 800000 rows processed and 800000 rows inserted.
    2015-10-07 15:16:44 (9952) INFO : For table tpch2.region: 5 rows processed and 5 rows inserted.
    2015-10-07 15:16:45 (9952) INFO : For table tpch2.supplier: 10000 rows processed and 10000 rows inserted.
    CREATE TABLE emp (
    emp_id INT, 
     dept_id INT,
    name VARCHAR(30), 
    salary INT, 
    hire_date DATE) ENGINE=columnstore;
    <Table tblName="test.emp" 
          loadName="emp.tbl" maxErrRow="10">
       <Column colName="emp_id"/>
       <Column colName="dept_id"/>
       <Column colName="name"/>
       <Column colName="salary"/>
       <Column colName="hire_date"/>
     </Table>
    <Table tblName="test.emp" 
          loadName="emp.tbl" maxErrRow="10">
       <Column colName="emp_id"/>
       <Column colName="dept_id"/>
       <Column colName="name"/>
       <Column colName="hire_date"/>
       <Column colName="salary"/>
     </Table>
    <Table tblName="test.emp"        
               loadName="emp.tbl" maxErrRow="10">
          <Column colName="emp_id"/>
          <Column colName="dept_id"/>
          <Column colName="name"/>
          <Column colName="hire_date"/>
          <IgnoreField/>
          <DefaultColumn colName="salary"/>
        </Table>
    Example
    cpimport -I1 mytest mytable /home/mydata/mytable.bin
    struct Date
    {
      unsigned spare : 6;
      unsigned day : 6;
      unsigned month : 4;
      unsigned year : 16
    };
    struct DateTime
    {
      unsigned msecond : 20;
      unsigned second : 6;
      unsigned minute : 6;
      unsigned hour : 6;
      unsigned day : 6;
      unsigned month : 4;
      unsigned year : 16
    };
    -rw-r--r--. 1 root  root        0 Dec 29 06:41 cpimport_1229064143_21779.err
    -rw-r--r--. 1 root  root     1146 Dec 29 06:42 cpimport_1229064143_21779.log
    2020-12-29 06:41:44 (21779) INFO : Running distributed import (mode 1) on all PMs...
    2020-12-29 06:41:44 (21779) INFO2 : /usr/bin/cpimport.bin -s , -E " -R /tmp/columnstore_tmp_files/BrmRpt112906414421779.rpt -m 1 -P pm1-21779 -T SYSTEM -u388952c1-4ab8-46d6-9857-c44827b1c3b9 bts flights
    2020-12-29 06:41:58 (21779) INFO2 : Received a BRM-Report from 1
    2020-12-29 06:41:58 (21779) INFO2 : Received a Cpimport Pass from PM1
    2020-12-29 06:42:03 (21779) INFO2 : Received a BRM-Report from 2
    2020-12-29 06:42:03 (21779) INFO2 : Received a Cpimport Pass from PM2
    2020-12-29 06:42:03 (21779) INFO2 : Received a BRM-Report from 3
    2020-12-29 06:42:03 (21779) INFO2 : BRM updated successfully
    2020-12-29 06:42:03 (21779) INFO2 : Received a Cpimport Pass from PM3
    2020-12-29 06:42:04 (21779) INFO2 : Released Table Lock
    2020-12-29 06:42:04 (21779) INFO2 : Cleanup succeed on all PMs
    2020-12-29 06:42:04 (21779) INFO : For table bts.flights: 374573 rows processed and 374573 rows inserted.
    2020-12-29 06:42:04 (21779) INFO : Bulk load completed, total run time : 20.3052 seconds
    2020-12-29 06:42:04 (21779) INFO2 : Shutdown of all child threads Finished!!

    Data Loading With load_from_s3

    Overview

    MariaDB Enterprise ColumnStore includes a stored procedure called columnstore_info.load_from_s3, which can load data from a plain-text file containing delimiter-separated values (such as CSV or TSV) stored on S3-compatible cloud object storage.

    Compatibility

    Available from MariaDB Enterprise ColumnStore 23.02.

    System Variables

    Before you import data with the columnstore_info.load_from_s3 stored procedure, the authentication credentials and the region can be configured using system variables:

    • columnstore_s3_key

    • columnstore_s3_secret

    • columnstore_s3_region

    For example, the following statements show how to set the system variables for your current session:

    Import Data

    To import data with the columnstore_info.load_from_s3 stored procedure, use the CALL statement:

    • Replace 'BUCKET_URL' with the URL of your bucket. The protocol in the URL must be s3: for AWS S3 or gs: for Google Cloud Storage

    • Replace 'FILE_NAME' with the file name to load from. The file must be a plain-text file containing delimiter-separated values, such as a comma-separated values (CSV) or tab-separated values (TSV) file. The supported file format is similar to the plain-text file formats supported by cpimport and LOAD DATA [LOCAL] INFILE. Please note that this stored procedure can't load dump files created by mariadb-dump

    All parameters are mandatory.

    For example, to load a comma-separated values (CSV) file from AWS S3:

    When the stored procedure completes, it returns JSON containing the status of the operation. If the JSON shows an error or "success": false, check your table to see if some or all of your data was loaded, because many errors are non-fatal.

    Permissions

    When the data file is stored in Amazon S3, the AWS user only requires the s3:GetObject action on the bucket.

    For example, the AWS user can use a user policy like the following:

    Replace 'DATABASE_NAME' with the database to import into

  • Replace 'TABLE_NAME' with the table name to import into

  • Replace 'TERMINATED_BY' with the field terminator used in the file, similar to the -s command-line option for cpimport

  • Replace 'ENCLOSED_BY' with the quotes used in the file, similar to the -E command-line option for cpimport

  • Replace 'ESCAPED_BY' with the escape character used in the file, similar to the -C command-line option for cpimport

  • SET columnstore_s3_key='S3_KEY';
    SET columnstore_s3_secret='S3_SECRET';
    SET columnstore_s3_region='S3_REGION';
    CALL columnstore_info.load_from_s3('BUCKET_URL',
          'FILE_NAME',
          'DATABASE_NAME',
          'TABLE_NAME',
          'TERMINATED_BY',
          'ENCLOSED_BY',
          'ESCAPED_BY');
    CALL columnstore_info.load_from_s3('s3://mariadb-columnstore-test-data/',
       'test-data-db1-tab1.csv',
       'db1',
       'tab1',
       ',',
       '"',
       '\\');
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "readBucket",
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject"
                ],
                "Resource": [
                    "arn:aws:s3:::my-bucket",
                    "arn:aws:s3:::my-bucket/*"
                ]
            }
        ]
    }
    MariaDB Connector/C
    MariaDB Connector/J
    MariaDB Connector/Node.js
    MariaDB Connector/Python
    CDC tutorial
    METADATA_LOCK_INFO plugin
    INSERT INTO .. SELECT FROM ..
    METADATA_LOCK_INFO plugin
    LOAD DATA [ LOCAL ] INFILE
    INSERT .. SELECT
    METADATA_LOCK_INFO plugin
    METADATA_LOCK_INFO plugin
    MariaDB Client
    CREATE DATABASE
    CREATE TABLE
    DATE
    DATETIME
    MariaDB Client
    SELECT
    DATE
    MariaDB replication
    LOAD DATA INFILE
    LOAD DATA INFILE
    LOAD DATA INFILE
    INSERT
    LOAD DATA INFILE
    METADATA_LOCK_INFO plugin
    MariaDB Client
    CREATE DATABASE
    CREATE TABLE
    LOAD DATA LOCAL INFILE
    MariaDB Client
    LOAD DATA INFILE
    LOAD DATA INFILE
    INSERT INTO .. SELECT FROM ..
    METADATA_LOCK_INFO plugin
    MariaDB Client
    CREATE DATABASE
    CREATE TABLE
    MyISAM
    InnoDB

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    This page is: Copyright © 2025 MariaDB. All rights reserved.