Bulk Data Loading
cpimport is a high-speed bulk load utility that imports data into ColumnStore tables in a fast and efficient manner. It accepts as input any flat file containing data that contains a delimiter between fields of data (i.e. columns in a table). The default delimiter is the pipe (‘|’) character, but other delimiters such as commas may be used as well. cpimport – performs the following operations when importing data into an MariaDB ColumnStore database:
- Data is read from specified flat files
- Data is transformed to fit InfiniDB’s column-oriented storage design
- Redundant data is tokenized and logically compressed
- Data is written to disk
The 2 most-common ways to use cpimport are: 1) from the UM: cpimport will distribute rows to all Performance Modules; and 2) from a PM: cpimport will load the imported rows only on the PM from which is was invoked.
There are two primary steps to using the cpimport utility:
- Optionally create a job file that is used to load data from a flat file into multiple tables
- Run the cpimport utility to perform the data import
Note:
- The bulk loads are an append operation to a table so they allow existing data to be read and remain unaffected during the process.
- The bulk loads do not write their data operations to the transaction log; they are not transactional in nature but are considered an atomic operation at this time. Information markers, however, are placed in the transaction log so the DBA is aware that a bulk operation did occur.
- Upon completion of the load operation, a high water mark in each column file is moved in an atomic operation that allows for any subsequent queries to read the newly loaded data. This append operation provides for consistent read but does not incur the overhead of logging the data.