Data ingestion options for MariaDB ColumnStore: cpimport for fast bulk loads, LOAD DATA INFILE, batch insert mode, INSERT INTO .. SELECT, DML, plus Bulk Write SDK and streaming adapters.
sudo yum -y install epel-release
sudo yum -y install <data adapter>.rpmsudo apt-get update
sudo dpkg -i <data adapter>.deb
sudo apt-get -f installsudo echo "deb http://httpredir.debian.org/debian jessie-backports main contrib non-free" >> /etc/apt/sources.list
sudo apt-get update
sudo dpkg -i <data adapter>.deb
sudo apt-get -f installUsage: mxs_adapter [OPTION]... DATABASE TABLE
-f FILE TSV file with database and table names to stream (must be in `database TAB table NEWLINE` format)
-h HOST MaxScale host (default: 127.0.0.1)
-P PORT Port number where the CDC service listens (default: 4001)
-u USER Username for the MaxScale CDC service (default: admin)
-p PASSWORD Password of the user (default: mariadb)
-c CONFIG Path to the Columnstore.xml file (default: '/usr/local/mariadb/columnstore/etc/Columnstore.xml')
-a Automatically create tables on ColumnStore
-z Transform CDC data stream from historical data to current data (implies -n)
-s Directory used to store the state files (default: '/var/lib/mxs_adapter')
-r ROWS Number of events to group for one bulk load (default: 1)
-t TIME Connection timeout (default: 10)
-n Disable metadata generation (timestamp, GTID, event type)
-i TIME Flush data every TIME seconds (default: 5)
-l FILE Log output to FILE instead of stdout
-v Print version and exit
-d Enable verbose debug outputtest t1
test t2maxadmin call command cdc add_user <service> <user> <password>mxs_adapter -u <user> -p <password> -h <host> -P <port> -c <path to Columnstore.xml> <database><table>sudo apt-get install libuv1 libxml2 libsnappy1v5sudo yum install epel-release
sudo yum install libuv libxml2 snappy



cpimport dbName tblName [loadFile]cpimport dbName tblName [loadFile]
[-h] [-m mode] [-f filepath] [-d DebugLevel]
[-c readBufferSize] [-b numBuffers] [-r numReaders]
[-e maxErrors] [-B libBufferSize] [-s colDelimiter] [-E EnclosedByChar]
[-C escChar] [-j jobID] [-p jobFilePath] [-w numParsers]
[-n nullOption] [-P pmList] [-i] [-S] [-q batchQty]
positional parameters:
dbName Name of the database to load
tblName Name of table to load
loadFile Optional input file name in current directory,
unless a fully qualified name is given.
If not given, input read from STDIN.
Options:
-b Number of read buffers
-c Application read buffer size(in bytes)
-d Print different level(1-3) debug message
-e Max number of allowable error per table per PM
-f Data file directory path.
Default is current working directory.
In Mode 1, -f represents the local input file path.
In Mode 2, -f represents the PM based input file path.
In Mode 3, -f represents the local input file path.
-l Name of import file to be loaded, relative to -f path. (Cannot be used with -p)
-h Print this message.
-q Batch Quantity, Number of rows distributed per batch in Mode 1
-i Print extended info to console in Mode 3.
-j Job ID. In simple usage, default is the table OID.
unless a fully qualified input file name is given.
-n NullOption (0-treat the string NULL as data (default);
1-treat the string NULL as a NULL value)
-p Path for XML job description file.
-r Number of readers.
-s The delimiter between column values.
-B I/O library read buffer size (in bytes)
-w Number of parsers.
-E Enclosed by character if field values are enclosed.
-C Escape character used in conjunction with 'enclosed by'
character, or as part of NULL escape sequence ('\N');
default is '\'
-I Import binary data; how to treat NULL values:
1 - import NULL values
2 - saturate NULL values
-P List of PMs ex: -P 1,2,3. Default is all PMs.
-S Treat string truncations as errors.
-m mode
1 - rows will be loaded in a distributed manner across PMs.
2 - PM based input files loaded onto their respective PM.
3 - input files will be loaded on the local PM.cpimport -m1 mytest mytable mytable.tblcpimport -m2 mytest mytable -l /home/mydata/mytable.tblcpimport -m3 mytest mytable /home/mydata/mytable.tblcpimport db1 table1aws s3 cp --quiet s3://dthompson-test/trades_bulk.csv - | cpimport test trades -s ","mariadb -q -e 'select * from source_table;' -N <source-db> | cpimport -s '\t' <target-db> <target-table>CREATE DATABASE `json_columnstore`;
USE `json_columnstore`;
CREATE TABLE `products` (
`product_name` VARCHAR(11) NOT NULL DEFAULT '',
`supplier` VARCHAR(128) NOT NULL DEFAULT '',
`quantity` VARCHAR(128) NOT NULL DEFAULT '',
`unit_cost` VARCHAR(128) NOT NULL DEFAULT ''
) ENGINE=Columnstore DEFAULT CHARSET=utf8;[{
"_id": {
"$oid": "5968dd23fc13ae04d9000001"
},
"product_name": "Sildenafil Citrate",
"supplier": "Wisozk Inc",
"quantity": 261,
"unit_cost": "$10.47"
}, {
"_id": {
"$oid": "5968dd23fc13ae04d9000002"
},
"product_name": "Mountain Juniperus Ashei",
"supplier": "Keebler-Hilpert",
"quantity": 292,
"unit_cost": "$8.74"
}, {
"_id": {
"$oid": "5968dd23fc13ae04d9000003"
},
"product_name": "Dextromethorphan HBR",
"supplier": "Schmitt-Weissnat",
"quantity": 211,
"unit_cost": "$20.53"
}]cat products.json | jq -r '.[] | [.product_name,.supplier,.quantity,.unit_cost] | @csv' | cpimport json_columnstore products -s ',' -E '"'colxml mytest -j299
cpimport -m1 -j299Usage: colxml [options] dbName
Options:
-d Delimiter (default '|')
-e Maximum allowable errors (per table)
-h Print this message
-j Job id (numeric)
-l Load file name
-n "name in quotes"
-p Path for XML job description file that is generated
-s "Description in quotes"
-t Table name
-u User
-r Number of read buffers
-c Application read buffer size (in bytes)
-w I/O library buffer size (in bytes), used to read files
-x Extension of file name (default ".tbl")
-E EnclosedByChar (if data has enclosed values)
-C EscapeChar
-b Debug level (1-3)MariaDB[tpch2]> show tables;
+---------------+
| Tables_in_tpch2 |
+--------------+
| customer |
| lineitem |
| nation |
| orders |
| part |
| partsupp |
| region |
| supplier |
+--------------+
8 rows in set (0.00 sec)/usr/local/mariadb/columnstore/bin/colxml tpch2 -j500
Running colxml with the following parameters:
2015-10-07 15:14:20 (9481) INFO :
Schema: tpch2
Tables:
Load Files:
-b 0
-c 1048576
-d |
-e 10
-j 500
-n
-p /usr/local/mariadb/columnstore/data/bulk/job/
-r 5
-s
-u
-w 10485760
-x tbl
File completed for tables:
tpch2.customer
tpch2.lineitem
tpch2.nation
tpch2.orders
tpch2.part
tpch2.partsupp
tpch2.region
tpch2.supplier
Normal exit./usr/local/mariadb/columnstore/bin/cpimport -j 500
Bulkload root directory : /usr/local/mariadb/columnstore/data/bulk
job description file : Job_500.xml
2015-10-07 15:14:59 (9952) INFO : successfully load job file /usr/local/mariadb/columnstore/data/bulk/job/Job_500.xml
2015-10-07 15:14:59 (9952) INFO : PreProcessing check starts
2015-10-07 15:15:04 (9952) INFO : PreProcessing check completed
2015-10-07 15:15:04 (9952) INFO : preProcess completed, total run time : 5 seconds
2015-10-07 15:15:04 (9952) INFO : No of Read Threads Spawned = 1
2015-10-07 15:15:04 (9952) INFO : No of Parse Threads Spawned = 3
2015-10-07 15:15:06 (9952) INFO : For table tpch2.customer: 150000 rows processed and 150000 rows inserted.
2015-10-07 15:16:12 (9952) INFO : For table tpch2.nation: 25 rows processed and 25 rows inserted.
2015-10-07 15:16:12 (9952) INFO : For table tpch2.lineitem: 6001215 rows processed and 6001215 rows inserted.
2015-10-07 15:16:31 (9952) INFO : For table tpch2.orders: 1500000 rows processed and 1500000 rows inserted.
2015-10-07 15:16:33 (9952) INFO : For table tpch2.part: 200000 rows processed and 200000 rows inserted.
2015-10-07 15:16:44 (9952) INFO : For table tpch2.partsupp: 800000 rows processed and 800000 rows inserted.
2015-10-07 15:16:44 (9952) INFO : For table tpch2.region: 5 rows processed and 5 rows inserted.
2015-10-07 15:16:45 (9952) INFO : For table tpch2.supplier: 10000 rows processed and 10000 rows inserted.CREATE TABLE emp (
emp_id INT,
dept_id INT,
name VARCHAR(30),
salary INT,
hire_date DATE) ENGINE=columnstore;<Table tblName="test.emp"
loadName="emp.tbl" maxErrRow="10">
<Column colName="emp_id"/>
<Column colName="dept_id"/>
<Column colName="name"/>
<Column colName="salary"/>
<Column colName="hire_date"/>
</Table><Table tblName="test.emp"
loadName="emp.tbl" maxErrRow="10">
<Column colName="emp_id"/>
<Column colName="dept_id"/>
<Column colName="name"/>
<Column colName="hire_date"/>
<Column colName="salary"/>
</Table><Table tblName="test.emp"
loadName="emp.tbl" maxErrRow="10">
<Column colName="emp_id"/>
<Column colName="dept_id"/>
<Column colName="name"/>
<Column colName="hire_date"/>
<IgnoreField/>
<DefaultColumn colName="salary"/>
</Table>Example
cpimport -I1 mytest mytable /home/mydata/mytable.binstruct Date
{
unsigned spare : 6;
unsigned day : 6;
unsigned month : 4;
unsigned year : 16
};struct DateTime
{
unsigned msecond : 20;
unsigned second : 6;
unsigned minute : 6;
unsigned hour : 6;
unsigned day : 6;
unsigned month : 4;
unsigned year : 16
};-rw-r--r--. 1 root root 0 Dec 29 06:41 cpimport_1229064143_21779.err
-rw-r--r--. 1 root root 1146 Dec 29 06:42 cpimport_1229064143_21779.log2020-12-29 06:41:44 (21779) INFO : Running distributed import (mode 1) on all PMs...
2020-12-29 06:41:44 (21779) INFO2 : /usr/bin/cpimport.bin -s , -E " -R /tmp/columnstore_tmp_files/BrmRpt112906414421779.rpt -m 1 -P pm1-21779 -T SYSTEM -u388952c1-4ab8-46d6-9857-c44827b1c3b9 bts flights
2020-12-29 06:41:58 (21779) INFO2 : Received a BRM-Report from 1
2020-12-29 06:41:58 (21779) INFO2 : Received a Cpimport Pass from PM1
2020-12-29 06:42:03 (21779) INFO2 : Received a BRM-Report from 2
2020-12-29 06:42:03 (21779) INFO2 : Received a Cpimport Pass from PM2
2020-12-29 06:42:03 (21779) INFO2 : Received a BRM-Report from 3
2020-12-29 06:42:03 (21779) INFO2 : BRM updated successfully
2020-12-29 06:42:03 (21779) INFO2 : Received a Cpimport Pass from PM3
2020-12-29 06:42:04 (21779) INFO2 : Released Table Lock
2020-12-29 06:42:04 (21779) INFO2 : Cleanup succeed on all PMs
2020-12-29 06:42:04 (21779) INFO : For table bts.flights: 374573 rows processed and 374573 rows inserted.
2020-12-29 06:42:04 (21779) INFO : Bulk load completed, total run time : 20.3052 seconds
2020-12-29 06:42:04 (21779) INFO2 : Shutdown of all child threads Finished!!


SELECT, UPDATE, DELETE, INSERT ... SELECT)DELETEUPDATEINSERT INTO t1 VALUES ();INSERT INTO t1 VALUES (), ();columnstore_cache_inserts[mariadb]
loose-columnstore_cache_inserts=ONSHOW VARIABLES LIKE 'columnstore_cache_inserts';columnstore_cache_flush_thresholdcolumnstore_cache_use_import