Data Ingestion Methods & Tools

Learn about data ingestion for MariaDB ColumnStore. This section covers various methods and tools for efficiently loading large datasets into your columnar database for analytical workloads.

ColumnStore provides several mechanisms to ingest data:

  • cpimport provides the fastest performance for inserting data and ability to route data to particular PrimProc nodes. Normally, this should be the default choice for loading data .

  • LOAD DATA INFILE provides another means of bulk inserting data.

    • By default, with autocommit on, it internally streams the data to an instance of the cpimport process.

    • In transactional mode, DML inserts are performed, which is significantly slower and also consumes both binlog transaction files and ColumnStore VersionBuffer files.

  • DML, i.e. INSERT, UPDATE, and DELETE, provide row-level changes. ColumnStore is optimized towards bulk modifications, so these operations are slower than they would be in, for instance, InnoDB.

    • Currently ColumnStore does not support operating as a replication replica target.

    • Bulk DML operations will in general perform better than multiple individual statements.

      • INSERT INTO SELECT with autocommit behaves similarly to LOAD DATE INFILE because, internally, it is mapped to cpimport for higher performance.

      • Bulk update operations based on a join with a small staging table can be relatively fast, especially if updating a single column.

  • Using ColumnStore Bulk Write SDK or ColumnStore Streaming Data Adapters.

Last updated

Was this helpful?