xpand_import for MariaDB Xpand

Overview

xpand_import is the fast parallel import tool introduced in the 23.09 release.

xpand_import replaces clustrix_import and clx_import, which are now deprecated.

xpand_import reads SQL dump or CSV files and inserts rows into MariaDB Xpand rapidly by splitting the data set into chunks and spawning multiple threads to import concurrently.

xpand_import is only intended for use with SQL dump files generated by mysqldump or maria-dump, using the sqldump mode.

xpand_import has been tested with MariaDB Xpand versions 23.09 and later.

Requirements

  • xpand_import requires Python 3.6 and MariaDB Connector/Python

  • xpand_import is automatically included in all Xpand deployments (version 23.09+) but should be run from a separate server (not the database server) - simply copy the xpand_import script to a separate server

  • xpand_import is designed to work on SQL files generated by maria-dump or mysqldump, and requires:

    • Using fully-qualified names if the same table name is used in different databases (e.g., `db`.`table`)

    • No extraneous whitespace

    • SQL keywords are capitalized

    • Statements end at line endings (multi-line statements are supported)

    • Inserts to a given table are contiguous

    • Table and database names don't have actual backticks or quotes or periods in them

    • DELIMITER must always be exact ';'

Example Usage of xpand_import

You can use xpand_import by supplying a single host, and by default xpand_import will auto-detect the other nodes in the cluster and send traffic to all nodes. We recommend running xpand_import from a machine other than the database nodes.

For example:

./xpand_import -H zen015 --sql-file ~/import/test.sql --db test_insert

To get the full list of arguments:

./xpand_import --help
usage: xpand_import [-h] [--version] --host HOSTNAME[:PORT]
[--sql-file FILENAME] [--csv-dir CSV_DIR] [--ldi ldi_stmt]
[--csv-options csv_options] [--user USERNAME] [--passwd PW]
[--prompt-passwd] [--ssl] [--ssl-ca CA_PATH] [--ssl-verify-server-cert]
[--error-file SQL_FILE] [--log-file LOG_FILE] [--db database]
[--dry-run] [--sql-size-factor SQL_SIZE_FACTOR]
[--csv-size-factor CSV_SIZE_FACTOR] [--presql PRESQL]
[--insert-ignore] [--no-globals] [--allow-multi]
[--no-autodiscover-nodes] [--no-binlog]
[--no-auto-slice] [--skip-gui]

Required Arguments

The following arguments are required for xpand_import:

Argument

Description

--host <hostname>[:<port>]
-H <hostname>[:<port>]

A MariaDB Xpand host (all hosts in its cluster will be used)

--sql-file <filename>

SQL file to import

--csv-dir <CSV_dir>

A directory containing a ddl.sql and multiple <tablename>.csv files for CSV import

--ldi <LDI_stmt>

Similar to LOAD DATA LOCAL INFILE LDI_stmt. Must be enclosed in "" (quotes)

Optional Arguments

The following arguments are optional for xpand_import:

Argument

Description

--help
-h

Show this help message and exit

--version
-v

Show program's version number and exit

--csv-options

csv_options --ldi sub-clauses controlling import of --csv-dir

--user <username>
-u <username>

Database user

--passwd <pwd>

Database password

--prompt-passwd
-p

Prompt for database password

--ssl

Require SSL for database connections

--ssl-ca <CA_path>

Specify the path to your SSL CA file in PEM format. Implies --ssl

--ssl-verify-server-cert

Verify the Common Name attribute in the server's cert is the same as the hostname. Implies --ssl

--error-file <SQL_file>

File to write failed statements to. Defaults to creating a temporary file

--log-file <log_file

File to write log messages to. Defaults to creating a temporary file

--db <database>

Database to use. Do not specify this if the mysqldump contains any USE statements

--dry-run

Scan files and print preamble and postamble. Do not execute any SQL

--sql-size-factor <SQL_size_factor>

Conversion factor to estimate on-disk table size from the SQL input file

--csv-size-factor <CSV_size_factor>

Conversion factor to estimate on-disk table size from the CSV input file

--presql <PRESQL>

SQL file to run after all DDL, but before any inserts. Assumes statements end with ;\n

--insert-ignore

Use INSERT IGNORE statements rather than INSERT

--no-globals

Do not modify rebalancer global variables. Will cause a slower import with less impact to other workloads running on the cluster

--allow-multi

Allow multiple xpand_import processes run against one cluster concurrently. Implies --no-globals

--no-autodiscover-nodes
-N

Only connect to specified host, do not read node addresses from the database upon connection. Use this if you connect to the cluster through a load balancer

--no-binlog

Disable binlogging of this import (will not replicate to slaves)

--no-auto-slice

Do not try to predict slicing before import

--skip-gui

Disable UI