Measuring cpimport Performance Baseline with ioping

The purpose of this guide is to detail the process for establishing a performance baseline for the cpimport bulk data loading utility and diagnosing storage subsystem bottlenecks using ioping. Storage design choices—such as local disk deployment versus High Availability Network File Systems (NFS HA)—profoundly impact chunk ingestion execution times.

The reference benchmarks documented here were established on a system containing AWS 4vCPUs and 16GB of RAM.

Baseline Performance Reference Metrics

The following matrix reflects baseline loading rates recorded for a standard testing profile consisting of 10,000 records with 250 columns across varying disk architectures:

Infrastructure Layout
Ingestion Runtime
Observed Storage Latency (ioping)

No NFS (Local Storage)

5 to 6 seconds

data1: 450 µs | data2: 475 µs | data3: 430 µs

With NFS HA

29 to 32 seconds

data1: 3.7 ms | data2: 3.1 ms | data3: 3.1 ms

With NFS HA & Low Latency Storage at bulkrollback

14 to 15 seconds

data1: 3.3 ms (bulkRollback: 590 µs)

data2: 3.2 ms (bulkRollback: 560 µs)

data3: 3.8 ms (bulkRollback: 490 µs)

Manual Benchmarking Procedure

To run a manual performance check and track read latencies across your column store directories, execute the following steps in sequence:

  1. Create the ColumnStore testing table: Initialize a test table (cptest) matching the 250-column layout using the COLUMNSTORE storage engine:

    CREATE TABLE cptest (
      group_name CHAR(2),
      number1 DECIMAL(12, 2),
      -- [Truncated for brevity; insert columns up to col250]
      col250 varchar(250)
    ) ENGINE=COLUMNSTORE;
  2. Generate data and populate the table: Seed the table with 10,000 randomized evaluation rows using the system sequence loop engine:

    INSERT INTO cptest
    SELECT 
      CAST(ROUND(RAND() * 10, 0) AS CHAR),
      ROUND(RAND() * 10000, 2),
      -- [Truncated for brevity; complete selections to match your column layout]
      substring(MD5(RAND()),1,2)
    FROM seq_1_to_10000;
  3. Export data to a local CSV file: Dump the resulting ColumnStore table fields out to a temporary flat comma-separated file:

    SELECT * FROM cptest INTO OUTFILE '/tmp/cptest.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"';
  4. Benchmark the cpimport execution rate: Execute the ingestion tool directly to capture baseline loading speeds:

    cpimport -E '"' -s "," test cptest /tmp/cptest.csv
  5. Measure disk sub-system latency using ioping: Install the latency verification tool and run a sequence of 30 tests directly on your active database storage directories to analyze performance consistency:

    sudo yum install ioping
    cd /var/lib/columnstore/dataX/
    ioping -c 30 .

Evaluating ioping Output Summary

When the test completes, evaluate the statistical summary metrics:

  • Target Guideline: For consistent and dependable storage throughput, target a low AVG execution result under 700 µs, combined with a low standard deviation (mdev).

Automated Scripted Performance Testing

You can automate this benchmarking process by saving the comprehensive verification logic into an administrative script named cpimportBench.bash.

Script Implementation

Last updated

Was this helpful?