Advanced Bulk Insert

In this example we will insert 2 rows in a more complex table. This will demonstrate using different kinds of data types, chanined methods and getting the summary information at the end of a transaction.

You will need the following table in the test database to execute this:

example/advanced_bulk_insert.sql
1
2
3
4
5
6
7
CREATE TABLE `t2` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(40) DEFAULT NULL,
  `dob` date DEFAULT NULL,
  `added` datetime DEFAULT NULL,
  `salary` decimal(9,2) DEFAULT NULL
) ENGINE=Columnstore;
example/advanced_bulk_insert.py
26
27
28
29
30
import pymcsapi

try:
    driver = pymcsapi.ColumnStoreDriver()
    bulk = driver.createBulkInsert("test", "t2", 0, 0)

As with the basic example we create an instance of the driver and use it to create a bulk insert instance.

example/advanced_bulk_insert.py
31
32
33
34
35
36
    bulk.setColumn(0, 1)
    bulk.setColumn(1, "Andrew")
    bulk.setColumn(2, "1936-12-24")
    bulk.setColumn(3, "2017-07-07 15:14:12")
    bulk.setColumn(4, "15239.45");
    bulk.writeRow()

This demonstrates setting several different data types using strings of data. The second column (column 1) is a VARCHAR(40) and is set to “Andrew”. The third column is a DATE column and the API will automatically convert this into a binary date format before transmitting it to ColumnStore. The fourth is a DATETIME and the fifth a DECIMAL column which again the API will convert from the strings into the binary format.

example/advanced_bulk_insert.py
37
    bulk.setColumn(0, 2)[0].setColumn(1, "David")[0].setColumn(2, pymcsapi.ColumnStoreDateTime(1972, 5, 23))[0].setColumn(3, pymcsapi.ColumnStoreDateTime(2017, 7, 7, 15, 20, 18))[0].setColumn(4, pymcsapi.ColumnStoreDecimal(2347623, 2))[0].writeRow()

Many of the ColumnStoreBulkInsert methods return a pointer to the class and a return status which means multiple calls can be chained together in a similar way to ORM APIs. Here we use additional datatypes ColumnStoreDateTime and ColumnStoreDecimal.

ColumnStoreDateTime is used to create an entry for a DATE or DATETIME column. It can be used to define custom formats for dates and times using the strptime format.

A decimal is created using the ColumnStoreDecimal class. It can be set using a string, double or a pair of integers. The first integer is the precision and the second integer is the scale. So this number becomes 23476.23.

example/advanced_bulk_insert.py
38
39
40
41
42
43
44
    bulk.commit()
    summary = bulk.getSummary()
    print("Execution time: %s" % (summary.getExecutionTime(),))
    print("Rows inserted: %s" % (summary.getRowsInsertedCount(),))
    print("Truncation count: %s" %(summary.getTruncationCount(),))
    print("Saturated count: %s" %(summary.getSaturatedCount(),))
    print("Invalid count: %s" %(summary.getInvalidCount(),))

After a commit or rollback we can obtain summary information from the bulk insert class. This is done using the ColumnStoreBulkInsert.getSummary() method which will return a reference ColumnStoreSummary class. In this example we get the number of rows inserted (or would be inserted if there was a rollback) and the execution time from the moment the bulk insert class is created until the commit or rollback is complete.

example/advanced_bulk_insert.py
45
46
except RuntimeError as err:
    print("Error caught: %s" % (err,))

At the end we clean up in the same was as the basic bulk insert example.