Comments - on duplicate key work around

6 years, 2 months ago David Thompson

Since columnstore doesn't have unique indexes, then yes more or less. Another option is to load your inbound data into a staging table and then doing an insert select where you join and filter out the duplicates in what gets selected.

 
6 years, 2 months ago Mike Thibodeau

Thanks David. That is what I thought. I was going handle the inserts as you suggested and the updates with a left join. Any thoughts on weather that will provide better performance than deleting the range and re-inserting the whole set?

 
6 years, 2 months ago David Thompson

Doing the insert select should be faster even just purely from the I/O involved i think. The case where it might be faster would be if you were to truncate the table or use partition drop but i assume that you are planning to just do some subset.

 
6 years, 2 months ago Mike Thibodeau

My data comes in somewhat randomly for some id's. As an innodb table we used insert on dup key update... What I have found in testing various workarounds is that loading the new data into a column store staging table. select the staging table and join to the table to be updated. Then left join to the table to be updated for the inserts. Also it was faster to just do the update without a predicate to select only the rows with differences in the numeric_val column.

Deleting the rows via the join and re-inserting took about the same time as the update and insert.

One note... when I was doing these tests I observed that when I delete a row (via a join to another column store table) a row the server reports 0 rows affected even when rows are deleted.

Here's a sample...

-- version=10.2.10-MariaDB-log         
-- version_comment=Columnstore 1.1.2-1         

CREATE TABLE cs1 (
    id              INT(10) UNSIGNED NOT NULL,
    `timestamp_`    DATETIME,
    numeric_val     DOUBLE DEFAULT NULL
) ENGINE=columnstore;
INSERT INTO cs1 VALUES (1, '2018-01-09 21:59:02', 11.11)

CREATE TABLE cs2 (
    id              INT(10) UNSIGNED NOT NULL,
    `timestamp_`    DATETIME,
    numeric_val     DOUBLE DEFAULT NULL
) ENGINE=columnstore;

INSERT INTO cs2 VALUES (1, '2018-01-09 21:59:02', 22.22);
    
SELECT * FROM cs1;
DELETE FROM  cs1 WHERE id = 1;
-- Query: delete from cs1 where id = 1
-- 1 row(s) affected
SELECT * FROM cs1;
-- Query: select * from cs1 LIMIT 0, 5000
-- 0 row(s) affected

INSERT INTO cs1 VALUES (1, '2018-01-09 21:59:02', 11.11);

SELECT * FROM cs1;
SELECT * FROM cs2;
DELETE cs1 FROM cs1 
    JOIN cs2 USING(id,timestamp_);
-- Query: DELETE cs1 from cs1 join cs2 using(id,timestamp_)
-- 0 row(s) affected

SELECT * FROM cs1;
-- Query: SELECT * FROM cs1 LIMIT 0, 5000
-- 0 row(s) affected


SELECT * FROM cs2;
-- Query: SELECT * FROM cs2 LIMIT 0, 5000
-- 1 row(s) affected
 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.