Bulk Data Loading

You are viewing an old version of this article. View the current version here.

Overview

cpimport is a high-speed bulk load utility that imports data into ColumnStore tables in a fast and efficient manner. It accepts as input any flat file containing data that contains a delimiter between fields of data (i.e. columns in a table). The default delimiter is the pipe (‘|’) character, but other delimiters such as commas may be used as well. cpimport – performs the following operations when importing data into an MariaDB ColumnStore database:

  • Data is read from specified flat files
  • Data is transformed to fit InfiniDB’s column-oriented storage design
  • Redundant data is tokenized and logically compressed
  • Data is written to disk

It is important to note that:

  • The bulk loads are an append operation to a table so they allow existing data to be read and remain unaffected during the process.
  • The bulk loads do not write their data operations to the transaction log; they are not transactional in nature but are considered an atomic operation at this time. Information markers, however, are placed in the transaction log so the DBA is aware that a bulk operation did occur.
  • Upon completion of the load operation, a high water mark in each column file is moved in an atomic operation that allows for any subsequent queries to read the newly loaded data. This append operation provides for consistent read but does not incur the overhead of logging the data.

There are two primary steps to using the cpimport utility:

  1. Optionally create a job file that is used to load data from a flat file into multiple tables
  2. Run the cpimport utility to perform the data import

Syntax

The simplest form of cpimport command is

cpimport dbName tblName [loadFile]

The full syntax is like this:

cpimport dbName tblName [loadFile]
[-h] [-m mode] [-f filepath] [-d DebugLevel]
[-c readBufferSize] [-b numBuffers] [-r numReaders]
[-e maxErrors] [-B libBufferSize] [-s colDelimiter] [-E EnclosedByChar]
[-C escChar] [-j jobID] [-p jobFilePath] [-w numParsers]
[-n nullOption] [-P pmList] [-i] [-S] [-q batchQty]

cpimport Modes

Mode 1: Bulk Load from a central location with Single Data Source File

In this mode, you run the cpimport from a central location. The source file is located at this central location and the data from cpimport is distributed across all the PM nodes.

cpimport-mode1

Example
cpimport -m1 mytest mytable mytable.tbl

Mode 2: Bulk Load from central location with Distributed Data Source Files

In this mode, you run the cpimport from a central location(either UM or PM). The source data is in already partitioned data files residing on the PMs. Each PM should have the source data file of the same name but containing the partitioned data for the PM

cpimport-mode2

Example
cpimport -m2 mytest mytable /home/mydata/mytable.tbl

Mode 3: Parallel Distributed Bulk Load

In this mode, you run the cpimport from individual PM node independently, which will import the source file that exists on that PM. Concurrent imports can be executed on every PM for the same table.

cpimport-mode3

Example
cpimport -m3 mytest mytable /home/mydata/mytable.tbl

Note:

  • The bulk loads are an append operation to a table so they allow existing data to be read and remain unaffected during the process.
  • The bulk loads do not write their data operations to the transaction log; they are not transactional in nature but are considered an atomic operation at this time. Information markers, however, are placed in the transaction log so the DBA is aware that a bulk operation did occur.
  • Upon completion of the load operation, a high water mark in each column file is moved in an atomic operation that allows for any subsequent queries to read the newly loaded data. This append operation provides for consistent read but does not incur the overhead of logging the data.

There are two primary steps to using the cpimport utility:

  1. Optionally create a job file that is used to load data from a flat file into multiple tables
  2. Run the cpimport utility to perform the data import

STDIN

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.