All pages
Powered by GitBook
1 of 7

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

ColumnStore Disk-Based Joins

Overview

Joins are performed in memory unless disk-based joins are enabled via AllowDiskBasedJoin in the columnstore.xml. When a join operation exceeds the memory allocated for query joins, the query is aborted with an error code IDB-2001.

Disk-based joins enable such queries to use disk for intermediate join data in case when the memory needed for the join exceeds the memory limit. Although slower in performance as compared to a fully in-memory join and bound by the temporary space on disk, it does allow such queries to complete.

Disk-based joins do not include aggregation and DML joins.

The following variables in the HashJoin element in the Columnstore.xml configuration file relate the o disk-based joins. Columnstore.xml resides in the etc. directory for your installation (/usr/local/mariadb/columnstore/etc).

  • AllowDiskBasedJoin: Option to use disk-based joins. Valid values are Y (enabled) or N (disabled). The default is disabled.

  • TempFileCompression: Option to use compression for disk join files. Valid values are Y (use compressed files) or N (use non-compressed files).

  • TempFilePath: The directory path used for the disk joins. By default, this path is the tmp directory for your installation (i.e., /tmp/columnstore_tmp_files/joins/). Files in this directory will be created and cleaned on an as-needed basis. The entire directory is removed and recreated by ExeMgr at startup.)

When using disk-based joins, it is strongly recommended that the TempFilePath reside on its partition, as the partition may fill up as queries are executed.

Per-User Join Memory Limit

In addition to the system-wide flags at the SQL global and session levels, the following system variables exist for managing per-user memory limits for joins.

  • columnstore_um_mem_limit - A value for memory limit in MB per user. When this limit is exceeded by a join, it will switch to a disk-based join. By default, the limit is not set (value of 0).

For modification at the global level: In my.cnf file (example: /etc/my.cnf.d/server.cnf):

where value is the value in MB for in memory limitation per user.

For modification at the session level, before issuing your join query from the SQL client, set the session variable as follows.

[mysqld]
...
columnstore_um_mem_limit = value
SET columnstore_um_mem_limit = value

ColumnStore INSERT

The INSERT statement allows you to add data to tables.

Syntax

INSERT 
 INTO tbl_name [(col,...)]
 {VALUES | VALUE} ({expr | DEFAULT},...),(...),...

The following statement inserts a row with all column values into the customer table:

The following statement inserts two rows with all column values into the customer table:

INSERT SELECT

With INSERT ... SELECT, you can quickly insert many rows into a table from one or more other tables.

  • ColumnStore ignores the ON DUPLICATE KEY clause.

  • Non-transactional INSERT ... SELECT is directed to ColumnStores cpimport tool by default, which significantly increases performance.

  • Transactional INSERT ... SELECT statements (that is with AUTOCOMMIT off or after a START TRANSACTION

AUTO_INCREMENT

Example for using AUTO_INCREMENT in ColumnStore:

INSERT INTO customer (custno, custname, custaddress, phoneno, cardnumber, comments) 
  VALUES (12, ‘JohnSmith’, ‘100 First Street, Dallas’, ‘(214) 555-1212’,100, ‘On Time’)
INSERT INTO customer (custno, custname, custaddress, phoneno, cardnumber, comments) VALUES 
  (12, ‘JohnSmith’, ‘100 First Street, Dallas’, ‘(214) 555-1212’,100, ‘On Time’),
  (13, ‘John Q Public’, ‘200 Second Street, Dallas’, ‘(972) 555-1234’, 200, ‘LatePayment’);
) are processed through normal DML processes.
CREATE TABLE autoinc_test(
id INT,
name VARCHAR(10))
ENGINE=columnstore COMMENT 'autoincrement=id';

INSERT INTO autoinc_test (name) VALUES ('John');
INSERT INTO autoinc_test (name) VALUES ('Doe');

ColumnStore UPDATE

The UPDATE statement changes data stored in rows.

Syntax

Single-Table Syntax

UPDATE  table_reference 
  SET col1={expr1|DEFAULT} [,col2={expr2|DEFAULT}] ...
  [WHERE where_condition]
  [ORDER BY ...]
  [LIMIT row_count]

Multiple-Table Syntax

Only one table can be updated from the table list in table_reference. However, multiple columns can be updated.

UPDATE table_references
    SET col1={expr1|DEFAULT} [, col2={expr2|DEFAULT}] ...
    [WHERE where_condition]

ColumnStore SELECT

The SELECT statement is used to query the database and display table data. You can add many clauses to filter the data.

Syntax

Projection List (SELECT)

If the same column needs to be referenced more than once in the projection list, a unique name is required for each column using a column alias. The total length of the name of a column, inclusive of the length of functions, in the projection list must be 64 characters or less.

WHERE

The WHERE clause filters data retrieval based on criteria. Note that column_alias cannot be used in the WHERE clause. The following statement returns rows in the region table where the region = ‘ASIA’:

GROUP BY

GROUP BY groups data based on values in one or more specific columns. The following statement returns rows from the lineitem table where /orderkey_is less than 1 000 000 and groups them by the quantity._

HAVING

HAVING is used in combination with the GROUP BY clause. It can be used in a SELECT statement to filter the records that a GROUP BY returns. The following statement returns shipping dates, and the respective quantity where the quantity is 2500 or more.

ORDER BY

The ORDER BY clause presents results in a specific order. Note that the ORDER BY clause represents a statement that is post-processed by MariaDB. The following statement returns an ordered quantity column from the lineitem table.

The following statement returns an ordered shipmode column from the lineitem table.

NOTE: When ORDER BY is used in an inner query and LIMIT on an outer query, LIMIT is applied first and then ORDER BY is applied when returning results.

UNION

Used to combine the result from multiple SELECT statements into a single result set. The UNION or UNION DISTINCT clause returns query results from multiple queries into one display and discards duplicate results. The UNION ALL clause displays query results from multiple queries and does not discard the duplicates. The following statement returns the p_name rows in the part table and the partno table and discards the duplicate results:

The following statement returns all the p_name rows in the part table and the partno table:

LIMIT

A limit is used to constrain the number of rows returned by the SELECT statement. LIMIT can have up to two arguments. LIMIT must contain a row count and may optionally contain an offset of the first row to return (the initial row is 0).

The following statement returns 5 customer keys from the customer table:

The following statement returns 5 customer keys from the customer table beginning at offset 1000:

When LIMIT is used in a nested query, and the inner query contains an ORDER BY clause, LIMIT is applied before ORDER BY is applied.

SELECT
[ALL | DISTINCT ]
    select_expr [, select_expr ...]
    [ FROM table_references
      [WHERE where_condition]
      [GROUP BY {col_name | expr | POSITION} [ASC | DESC], ... [WITH ROLLUP]]
      [HAVING where_condition]
      [ORDER BY {col_name | expr | POSITION} [ASC | DESC], ...]
      [LIMIT {[offset,] ROW_COUNT | ROW_COUNT OFFSET OFFSET}]
      [PROCEDURE procedure_name(argument_list)]
      [INTO OUTFILE 'file_name' [CHARACTER SET charset_name] [export_options]
         | INTO DUMPFILE 'file_name' | INTO var_name [, var_name] ]
export_options:
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
SELECT * FROM region WHERE name = ’ASIA’;
SELECT quantity, COUNT(*) FROM lineitem WHERE orderkey < 1000000 GROUP BY quantity;
SELECT shipdate, COUNT(*) FROM lineitem GROUP BYshipdate HAVING COUNT(*) >= 2500;
SELECT quantity FROM lineitem WHERE orderkey < 1000000 ORDER BY quantity;
SELECT shipmode FROM lineitem WHERE orderkey < 1000000 ORDER BY 1;
SELECT p_name FROM part UNION SELECT p_name FROM  partno;
SELECT p_name FROM part UNION ALL SELECT p_name FROM  partno;
SELECT custkey FROM customer LIMIT 5;
SELECT custkey FROM customer LIMIT 1000,5;

ColumnStore LOAD DATA INFILE

Overview

The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. The file name must be given as a literal string.

LOAD DATA [LOCAL] INFILE 'file_name' 
  INTO TABLE tbl_name
  [CHARACTER SET charset_name]
  [{FIELDS | COLUMNS}
    [TERMINATED BY 'string']
    [[OPTIONALLY] ENCLOSED BY 'char']
    [ESCAPED BY 'char']
  ]
  [LINES
    [STARTING BY 'string']
    [TERMINATED BY 'string']
]
  • ColumnStore ignores the ON DUPLICATE KEY clause.

  • Non-transactional LOAD DATA INFILE is directed to ColumnStores cpimport tool by default, which significantly increases performance.

  • Transactional LOAD DATA INFILE statements (that is, with AUTOCOMMIT off or after a START TRANSACTION) are processed through normal DML processes.

  • Use cpimport for importing UTF-8 data that contains multi-byte values

The following example loads data into a simple 5- column table: A file named /simpletable.tblhas the following data in it.

The data can then be loaded into the simpletable table with the following syntax:

If the default mode is set to use cpimport internally, any output error files will be written to /var/log/mariadb/columnstore/cpimport/ directory. It can be consulted for troubleshooting any errors reported.

See Also

1|100|1000|10000|Test Number 1|
2|200|2000|20000|Test Number 2|
3|300|3000|30000|Test Number 3|
LOAD DATA INFILE 'simpletable.tbl' INTO TABLE simpletable FIELDS TERMINATED BY '|'
LOAD DATA INFILE

ColumnStore Data Manipulation Statements

Learn data manipulation statements for MariaDB ColumnStore. This section covers INSERT, UPDATE, DELETE, and LOAD DATA operations, optimized for efficient handling of large analytical datasets.

This page is: Copyright © 2025 MariaDB. All rights reserved.

This page is: Copyright © 2025 MariaDB. All rights reserved.

This page is: Copyright © 2025 MariaDB. All rights reserved.

This page is: Copyright © 2025 MariaDB. All rights reserved.

This page is: Copyright © 2025 MariaDB. All rights reserved.

ColumnStore DELETE

The DELETE statement is used to remove rows from tables.

Syntax

DELETE 
 [FROM] tbl_name 
    [WHERE where_condition]
    [ORDER BY ...]
    [LIMIT row_count]

No disk space is recovered after a DELETE. TRUNCATE and DROP PARTITION can be used to recover space, or CREATE TABLE, loading only the remaining rows, then using DROP TABLE on the original table and RENAME TABLE.

LIMIT will limit the number of rows deleted, which will perform the DELETE more quickly. The DELETE ... LIMIT statement can then be performed multiple times to achieve the same effect as DELETE with no LIMIT.

The following statement deletes customer records with a customer key identification between 1001 and 1999:

DELETE FROM customer 
  WHERE custkey > 1000 AND custkey <2000

This page is: Copyright © 2025 MariaDB. All rights reserved.