All pages
Powered by GitBook
1 of 9

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Using CONNECT - Condition Pushdown

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

The ODBC, JDBC, MYSQL, TBL and WMI table types use engine condition pushdown in order to restrict the number of rows returned by the RDBS source or the WMI component.

The CONDITION_PUSHDOWN argument used in old versions of CONNECT is no longer needed because CONNECT uses condition pushdown unconditionally.

This page is licensed: GPLv2

Using CONNECT - Exporting Data From MariaDB

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Exporting data from MariaDB is obviously possible with CONNECT in particular for all formats not supported by the SELECT INTO OUTFILE statement. Let us consider the query:

SELECT
    plugin_name AS handler,
    plugin_version AS version,
    plugin_author AS author,
    plugin_description AS description,
    plugin_maturity AS maturity
FROM
    information_schema.plugins
WHERE
    plugin_type = 'STORAGE ENGINE';

Supposing you want to get the result of this query into a file handlers.htm in XML/HTML format, allowing displaying it on an Internet browser, this is how you can do it:

Just create the CONNECT table that are used to make the file:

CREATE TABLE handout
ENGINE=CONNECT
table_type=XML
file_name='handout.htm'
header=yes
option_list='name=TABLE,coltype=HTML,attribute=border=1;cellpadding=5,headattr=bgcolor=yellow'
AS
SELECT
    plugin_name AS handler,
    plugin_version AS version,
    plugin_author AS author,
    plugin_description AS description,
    plugin_maturity AS maturity
FROM
    information_schema.plugins
WHERE
    plugin_type = 'STORAGE ENGINE';

Here the column definition is not given and will come from the Select statement following the Create. The CONNECT options are the same we have seen previously. This will do both actions, creating the matching handlers CONNECT table and 'filling' it with the query result.

Note 1: This could not be done in only one statement if the table type had required using explicit CONNECT column options. In this case, firstly create the table, and then populate it with an Insert statement.

Note 2: The source “plugins” table column “description” is a long text column, data type not supported for CONNECT tables. It has been silently internally replaced by varchar(256).

This page is licensed: GPLv2

Using CONNECT

The CONNECT storage engine has been deprecated.

Using CONNECT - Importing File Data Into MariaDB Tables

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Directly using external (file) data has many advantages, such as to work on “fresh” data produced for instance by cash registers, telephone switches, or scientific apparatus. However, you may want in some case to import external data into your MariaDB database. This is extremely simple and flexible using the CONNECT handler. For instance, let us suppose you want to import the data of the xsample.xml XML file previously given in example into a MyISAM table called biblio belonging to the connect database. All you have to do is to create it by:

CREATE TABLE biblio ENGINE=myisam SELECT * FROM xsampall2;

This last statement creates the MyISAM table and inserts the original XML data, translated to tabular format by the xsampall2 CONNECT table, into the MariaDB biblio table. Note that further transformation on the data could have been achieved by using a more elaborate Select statement in the Create statement, for instance using filters, alias or applying functions to the data. However, because the Create Table process copies table data, later modifications of the xsample.xml file will not change the biblio table and changes to the biblio table will not modify the xsample.xml file.

All these can be combined or transformed by further SQL operations. This makes working with CONNECT much more flexible than just using the statement.

This page is licensed: GPLv2

LOAD

Using CONNECT - General Information

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

The main characteristic of CONNECT is to enable accessing data scattered on a machine as if it was a centralized database. This, and the fact that locking is not used by connect (data files are open and closed for each query) makes CONNECT very useful for importing or exporting data into or from a MariaDB database and also for all types of Business Intelligence applications. However, it is not suited for transactional applications.

For instance, the index type used by CONNECT is closer to bitmap indexing than to B-trees. It is very fast for retrieving result but not when updating is done. In fact, even if only one indexed value is modified in a big table, the index is entirely remade (yet this being four to five times faster than for a b-tree index). But normally in Business Intelligence applications, files are not modified so often.

If you are using CONNECT to analyze files that can be modified by an external process, the indexes are of course not modified by it and become outdated. Use the OPTIMIZE TABLE command to update them before using the tables based on them.

This means also that CONNECT is not designed to be used by centralized servers, which are mostly used for transactions and often must run a long time without human intervening.

Performance

Performances vary a great deal depending on the table type. For instance, ODBC tables are only retrieved as fast as the other DBMS can do. If you have a lot of queries to execute, the best way to optimize your work can be sometime to translate the data from one type to another. Fortunately this is very simple with CONNECT. Fixed formats like FIX, BIN or VEC tables can be created from slower ones by commands such as:

FIX and BIN are often the better choice because the I/O functions are done on blocks of BLOCK_SIZE rows. VEC tables can be very efficient for tables having many columns only a few being used in each query. Furthermore, for tables of reasonable size, the MAPPED option can very often speed up many queries.

Create Table statement

Be aware of the two broad kinds of CONNECT tables:

Drop Table statement

For outward tables, the statement just removes the table definition but does not erase the table data. However, dropping an inward tables also erase the table data as well.

Alter Table statement

Be careful using the statement. Currently the data compatibility is not tested and the modified definition can become incompatible with the data. In particular, Alter modifies the table definition only but does not modify the table data. Consequently, the table type should not be modified this way, except to correct an incorrect definition. Also adding, dropping or modifying columns may be wrong because the default offset values (when not explicitly given by the FLAG option) may be wrong when recompiled with missing columns.

Safe use of ALTER is for indexing, as we have seen earlier, and to change options such as MAPPED or HUGE those do not impact the data format but just the way the data file is accessed. Modifying the BLOCK_SIZE option is all right with FIX, BIN, DBF, split VEC tables; however it is unsafe for VEC tables that are not split (only one data file) because at their creation the estimate size has been made a multiple of the block size. This can cause errors if this estimate is not a multiple of the new value of the block size.

In all cases, it is safer to drop and re-create the table (outward tables) or to make another one from the table that must be modified.

Update and Delete for File Tables

CONNECT can execute these commands using two different algorithms:

  • It can do it in place, directly modifying rows (update) or moving rows (delete) within the table file. This is a fast way to do it in particular when indexing is used.

  • It can do it using a temporary file to make the changes. This is required when updating variable record length tables and is more secure in all cases.

The choice between these algorithms depends on the session variable .

This page is licensed: GPLv2

Inward

They are table whose file name is not specified at create. An empty file are given a default name (tabname.tabtype) and are populated like for other engines. They do not require the FILE privilege and can be used for testing purpose.

Outward

They are all other CONNECT tables and access external data sources or files. They are the true useful tables but require the FILE privilege.

DROP TABLE
ALTER TABLE
connect_use_tempfile
CREATE TABLE fastable table_specs SELECT * FROM slowtable;

Using CONNECT - Virtual and Special Columns

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

CONNECT supports MariaDB virtual and persistent columns. It is also possible to declare a column as being a CONNECT special column. Let us see on an example how this can be done. The boys table we have seen previously can be recreated as:

We have defined two CONNECT special columns. You can give them any name; it is the field SPECIAL option that specifies the special column functional name.

Note: the default values specified for the special columns do not mean anything. They are specified just to prevent getting warning messages when inserting new rows.

For the definition of the agehired virtual column, no CONNECT options can be specified as it has no offset or length, not being stored in the file.

The command:

will return:

linenum
name
city
birth
hired
agehired
fn

Existing special columns are listed in the following table:

Special Name
Type
Description of the column value

Note: CONNECT does not currently support auto incremented columns. However, a ROWID special column will do the job of a column auto incremented by 1.

This page is licensed: GPLv2

CREATE TABLE boys (
  linenum INT(6) NOT NULL DEFAULT 0 special=ROWID,
  name CHAR(12) NOT NULL,
  city CHAR(12) NOT NULL,
  birth DATE NOT NULL date_format='DD/MM/YYYY',
  hired DATE NOT NULL date_format='DD/MM/YYYY' flag=36,
  agehired INT(3) AS (floor(datediff(hired,birth)/365.25))
  virtual,
  fn CHAR(100) NOT NULL DEFAULT '' special=FILEID)
ENGINE=CONNECT table_type=FIX file_name='boys.txt' mapped=YES lrecl=47;

Boston

1987-06-07

2008-04-01

20

d:\mariadb\sql\data\boys.txt

6

Bill

Boston

1986-09-11

2008-02-10

21

d:\mariadb\sql\data\boys.txt

PARTID

String

The name of the partition this row belongs to. Specific to partitioned tables.

SERVID

String

The name of the federated server or server host used by a MYSQL table. “ODBC” for an ODBC table, "JDBC" for a JDBC table and “Current” for all other tables.

1

John

Boston

1986-01-25

2010-06-02

24

d:\mariadb\sql\data\boys.txt

2

ROWID

Integer

The row ordinal number in the table. This is not quite equivalent to a virtual column with an auto increment of 1 because rows are renumbered when deleting rows.

ROWNUM

Integer

The row ordinal number in the file. This is different from ROWID for multiple tables, TBL/XCOL/OCCUR/PIVOT tables, XML tables with a multiple column, and for DBF tables where ROWNUM includes soft deleted rows.

FILEID FDISK FPATH FNAME FTYPE

String

FILEID returns the full name of the file this row belongs to. Useful in particular for multiple tables represented by several files. The other special columns can be used to retrieve only one part of the full name.

TABID

String

Henry

The name of the table this row belongs to. Useful for TBL tables.

SELECT * FROM boys WHERE city = 'boston';

Using CONNECT - Indexing

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Indexing is one of the main ways to optimize queries. Key columns, in particular when they are used to join tables, should be indexed. But what should be done for columns that have only few distinct values? If they are randomly placed in the table they should not be indexed because reading many rows in random order can be slower than reading the entire table sequentially. However, if the values are sorted or clustered, indexing can be acceptable because CONNECT indexes store the values in the order they appear into the table and this will make retrieving them almost as fast as reading them sequentially.

CONNECT provides four indexing types:

  1. Standard Indexing

  2. Block Indexing

  3. Remote Indexing

  4. Dynamic Indexing

Standard Indexing

CONNECT standard indexes are created and used as the ones of other storage engines although they have a specific internal format. The CONNECT handler supports the use of standard indexes for most of the file based table types.

You can define them in the statement, or either using the CREATE INDEX statement or the statement. In all cases, the index files are automatically made. They can be dropped either using the statement or the statement, and this erases the index files.

Indexes are automatically reconstructed when the table is created, modified by INSERT, UPDATE or DELETE commands, or when the SEPINDEX option is changed. If you have a lot of changes to do on a table at one moment, you can use table locking to prevent indexes to be reconstructed after each statement. The indexes are reconstructed when unlocking the table. For instance:

If a table was modified by an external application that does not handle indexing, the indexes must be reconstructed to prevent returning false or incomplete results. To do this, use the command.

For outward tables, index files are not erased when dropping the table. This is the same as for the data file and preserves the possibility of several users using the same data file via different tables.

Unlike other storage engines, CONNECT constructs the indexes as files that are named by default from the data file name, not from the table name, and located in the data file directory. Depending on the SEPINDEX table option, indexes are saved in a unique file or in separate files (if SEPINDEX is true). For instance, if indexes are in separate files, the primary index of the table_dept.dat_ of type DOS is a file named dept_PRIMARY.dnx. This makes possible to define several tables on the same data file, with eventual different options such as mapped or not mapped, and to share the index files as well.

If the index file should have a different name, for instance because several tables are created on the same data file with different indexes, specify the base index file name with the XFILE_NAME option.

Note1: Indexed columns must be declared NOT NULL; CONNECT doesn't support indexes containing null values.

Note 2: MRR is used by standard indexing if it is enabled.

Note 3: Prefix indexing is not supported. If specified, the CONNECT engine ignores the prefix and builds a whole index.

Handling index errors

The way CONNECT handles indexing is very specific. All table modifications are done regardless of indexing. Only after a table has been modified, or when anOPTIMIZE TABLE command is sent are the indexes made. If an error occurs, the corresponding index is not made. However, CONNECT being a non-transactional engine, it is unable to roll back the changes made to the table. The main causes of indexing errors are:

  • Trying to index a nullable column. In this case, you can alter the table to declare the column as not nullable or, if the column is nullable indeed, make it not indexed.

  • Entering duplicate values in a column indexed by a unique index. In this case, if the index was wrongly declared as unique, alter is declaration to reflect this. If the column should really contain unique values, you must manually remove or update the duplicate values.

In both cases, after correcting the error, remake the indexes with the command.

Index file mapping

To accelerate the indexing process, CONNECT makes an index structure in memory from the index file. This can be done by reading the index file or using it as if it was in memory by “file mapping”. On enabled versions, file mapping is used according to the boolean system variable. Set it to 0 (file read) or 1 (file mapping).

Block Indexing

To accelerate input/output, CONNECT uses when possible a read/write mode by blocks of n rows, n being the value given in the BLOCK _ SIZE option of the Create Table, or a default value depending on the table type. This is automatic for fixed files (, , or ), but must be specified for variable files ( , or ).

For blocked tables, further optimization can be achieved if the data values for some columns are “clustered” meaning that they are not evenly scattered in the table but grouped in some consecutive rows. Block indexing permits to skip blocks in which no rows fulfill a conditional predicate without having even to read the block. This is true in particular for sorted columns.

You indicate this when creating the table by using the DISTRIB =d column option. The enum value d can be scattered, clustered, or sorted. In general only one column can be sorted. Block indexing is used only for clustered and sorted columns.

Difference between standard indexing and block indexing

  • Block indexing is internally handled by CONNECT while reading sequentially a table data. This means in particular that when standard indexing is used on a table, block indexing is not used.

  • In a query, only one standard index can be used. However, block indexing can combine the restrictions coming from a where clause implying several clustered/sorted columns.

  • The block index files are faster to make and much smaller than standard index files.

Notes for this Release:

  • On all operations that create or modify a table, CONNECT automatically calculates or recalculates and saves the mini/maxi or bitmap values for each block, enabling it to skip block containing no acceptable values. In the case where the optimize file does not correspond anymore to the table, because it has been accidentally destroyed, or because some column definitions have been altered, you can use the OPTIMIZE TABLE command to reconstruct the optimization file.

  • Sorted column special processing is currently restricted to ascending sort. Column sorted in descending order must be flagged as clustered. Improper sorting is not checked in Update or Insert operations but is flagged when optimizing the table.

  • Block indexing can be done in two ways. Keeping the min/max values existing for each block, or keeping a bitmap allowing knowing what column distinct values are met in each blocks. This second ways often gives a better optimization, except for sorted columns for which both are equivalent. The bitmap approach can be done only on columns having not too many distinct values. This is estimated by the MAX _ DIST option value associated to the column when creating the table. Bitmap block indexing are used if this number is not greater than the MAXBMP setting for the database.

Remote Indexing

Remote indexing is specific to the table type. It is equivalent to what the storage does. A MYSQL table does not support indexes per se. Because access to the table is handled remotely, it is the remote table that supports the indexes. What the MYSQL table does is just to add a WHERE clause to the command sent to the remote server allowing the remote server to use indexing when applicable. Note however that because CONNECT adds when possible all or part of the where clause of the original query, this happens often even if the remote indexed column is not declared locally indexed. The only, but very important, case a column should be locally declared indexed is when it is used to join tables. Otherwise, the required where clause would not be added to the sent SELECT query.

See for more.

Dynamic Indexing

An indexed created as “dynamic” is a standard index which, in some cases, can be reconstructed for a specific query. This happens in particular for some queries where two tables are joined by an indexed key column. If the “from” table is big and the “to” big table reduced in size because of a where clause, it can be worthwhile to reconstruct the index on this reduced table.

Because of the time added by reconstructing the index, this is valuable only if the time gained by reducing the index size if more than this reconstruction time. This is why this should not be done if the “from” table is small because there will not be enough row joining to compensate for the additional time. Otherwise, the gain of using a dynamic index is:

  • Indexing time is a little faster if the index is smaller.

  • The join process will return only the rows fulfilling the where clause.

  • Because the table is read sequentially when reconstructing the index there no need for MRR.

  • Constructing the index can be faster if the table is reduced by block indexing.

This last point is particularly important. It means that after the index is reconstructed, the join is done on a temporary memory table.

Unfortunately, storage engines being called independently by MariaDB for each table, CONNECT has no global information to decide when it is good to use dynamic indexing. This is why you should use it only on cases where you see that some important join queries take a very long time and only on columns used for joining the table. How to declare an index to be dynamic is by using the Boolean DYNAM index option. For instance, the query:

Such a query joining the diag table to the patients table may last a very long time if the tables are big. To declare the primary key on the pnb column of the patients table to be dynamic:

Note 1: The comment is not mandatory here but useful to see that the index is dynamic if you use the command.

Note 2: There is currently no way to just change the DYNAM option without dropping and adding the index. This is unfortunate because it takes time.

Virtual Indexing

It applies only to the virtual tables of type and must be made on a column specifying SPECIAL=ROWID or SPECIAL=ROWNUM.

This page is licensed: GPLv2

USING CONNECT - Offline Documentation

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Note: You can download a (1.7.0003).

This page is licensed: CC BY-SA / Gnu FDL

PDF version of the CONNECT documentation
  • CONNECT cannot perform block indexing on case insensitive character columns. To force block indexing on a character column, specify its charset as not case insensitive, for instance as binary. However this will also apply to all other clauses, this column being now case sensitive.

  • While constructing the index, CONNECT also stores in memory the values of other used columns.

    CREATE TABLE
    ALTER TABLE
    DROP INDEX
    ALTER TABLE
    OPTIMIZE TABLE
    OPTIMIZE TABLE
    connect_indx_map
    FIX
    BIN
    DBF
    VEC
    DOS
    CSV
    FMT
    MYSQL
    FEDERATED
    SELECT
    Indexing of MYSQL tables
    SHOW INDEX
    VIR
    LOCK TABLE t1 WRITE;
    INSERT INTO t1 VALUES(...);
    INSERT INTO t1 VALUES(...);
    ...
    UNLOCK TABLES;
    SELECT d.diag, COUNT(*) cnt FROM diag d, patients p WHERE d.pnb =
    p.pnb AND ageyears < 17 AND county = 30 AND drg <> 11 AND d.diag
    BETWEEN 4296 AND 9434 GROUP BY d.diag ORDER BY cnt DESC;
    ALTER TABLE patients DROP PRIMARY KEY;
    ALTER TABLE patients ADD PRIMARY KEY (pnb) COMMENT 'DYNAMIC' dynam=1;

    Using CONNECT - Partitioning and Sharding

    The CONNECT storage engine has been deprecated.

    This storage engine has been deprecated.

    CONNECT supports the MySQL/MariaDB partition specification. It is done similar to the way MyISAM or InnoDB do by using the PARTITION engine that must be enabled for this to work. This type of partitioning is sometimes referred as “horizontal partitioning”.

    Partitioning enables you to distribute portions of individual tables across a file system according to rules which you can set largely as needed. In effect, different portions of a table are stored as separate tables in different locations. The user-selected rule by which the division of data is accomplished is known as a partitioning function, which in MariaDB can be the modulus, simple matching against a set of ranges or value lists, an internal hashing function, or a linear hashing function.

    CONNECT takes this notion a step further, by providing two types of partitioning:

    1. File partitioning. Each partition is stored in a separate file like in multiple tables.

    2. Table partitioning. Each partition is stored in a separate table like in TBL tables.

    Partition engine issues

    Using partitions sometimes requires creating the tables in an unnatural way to avoid some error due to several partition engine bugs:

    1. Engine specific column and index options are not recognized and cause a syntax error when the table is created. The workaround is to create the table in two steps, a CREATE TABLE statement followed by an ALTER TABLE statement.

    2. The connection string, when specified for the table, is lost by the partition engine. The workaround is to specify the connection string in the option_list.

    3. . In case of list columns partitioning it sometimes causes a false “impossible where” clause to be raised. This makes a wrong void result returned when it should not be void. There is no workaround but this bug should be hopefully fixed.

    The following examples are using the above workaround syntax to address these issues.

    File Partitioning

    File partitioning applies to file-based CONNECT table types. As with multiple tables, physical data is stored in several files instead of just one. The differences to multiple tables are:

    1. Data is distributed amongst the different files following the partition rule.

    2. Unlike multiple tables, partitioned tables are not read only.

    3. Unlike multiple tables, partitioned tables can be indexable.

    4. The file names are generated from the partition names.

    The table file names are generated differently depending on whether the table is an inward or outward table. For inward tables, for which the file name is not specified, the partition file names are:

    For instance for the table:

    CONNECT will generate in the current data directory the files:

    This is similar to what the partition engine does for other engines - CONNECT partitioned inward tables behave like other engines partition tables do. Just the data format is different.

    Note: If sub-partitioning is used, inward table files and index files are named:

    Outward Tables

    The real problems occur with outward tables, in particular when they are created from already existing files. The first issue is to make the partition table use the correct existing file names. The second one, only for already existing not void tables, is to be sure the partitioning function match the distribution of the data already existing in the files.

    The first issue is addressed by the way data file names are constructed. For instance let us suppose we want to make a table from the fixed formatted files:

    This can be done by creating a table such as:

    The rule is that for each partition the matching file name is internally generated by replacing in the given FILE _ NAME option value the “%s” part by the partition name.

    If the table was initially void, further inserts will populate it according to the partition function. However, if the files did exist and contained data, this is your responsibility to determine what partition function actually matches the data distribution in them. This means in particular that partitioning by key or by hash cannot be used (except in exceptional cases) because you have almost no control over what the used algorithm does.

    In the example above, there is no problem if the table is initially void, but if it is not, serious problems can be met if the initial distribution does not match the table distribution. Supposing a row in which “id” as the value 12 was initially contained in the part1.txt file, it are seen when selecting the whole table but if you ask:

    The result will have 0 rows. This is because according to the partition function query pruning will only look inside the second partition and will miss the row that is in the wrong partition.

    One way to check for wrong distribution if for instance to compare the results from queries such as:

    And

    If they match, the distribution can be correct although this does not prove it. However, if they do not match, the distribution is surely wrong.

    Partitioning on a Special Column

    There are some cases where the files of a multiple table do not contain columns that can be used for range or list partitioning. For instance, let’s suppose we have a multiple table based on the following files:

    Each of them containing the same kind of data:

    A multiple table can be created on them, for instance by:

    The issue is that if we want to create a partitioned table on these files, there are no columns to use for defining a partition function. Each city file can have the same kind of column values and there is no way to distinguish them.

    However, there is a solution. It is to add to the table a special column that are used by the partition function. For instance, the new table creation can be done by:

    Note 1: we had to do it in two steps because of the column CONNECT options.

    Note 2: the special column PARTID returns the name of the partition in which the row is located.

    Note 3: here we could have used the FNAME special column instead because the file name is specified as being the partition name.

    This may seem rather stupid because it means for instance that a row are in partition boston if it belongs to the partition boston! However, it works because the partition engine doesn’t know about special columns and behaves as if the city column was a real column.

    What happens if we populate it by?

    The value given for the city column (explicitly or by default) are used by the partition engine to decide in which partition to insert the rows. It are ignored by CONNECT (a special column cannot be given a value) but later will return the matching value. For instance:

    This query returns:

    city
    first_name
    job

    Everything works as if the city column was a real column contained in the table data files.

    Partitioning of zipped tables

    Two cases are currently supported: If a table is based on several zipped files, portioning is done the standard way as above. This is the file_name option specifying the name of the zip files that shall contain the ‘%s’ part used to generate the file names. If a table is based on only one zip file containing several entries, this is indicated by placing the ‘%s’ part in the entry option value. Note: If a table is based on several zipped files each containing several entries, only the first case is possible. Using sub-partitioning to make partitions on each entries is not supported yet.

    Table Partitioning

    With table partitioning, each partition is physically represented by a sub-table. Compared to standard partitioning, this brings the following features:

    1. The partitions can be tables driven by different engines. This relieves the current existing limitation of the partition engine.

    2. The partitions can be tables driven by engines not currently supporting partitioning.

    3. Partition tables can be located on remote servers, enabling table sharding.

    4. Like for TBL tables, the columns of the partition table do not necessarily match the columns of the sub-tables.

    The way it is done is to create the partition table with a table type referring to other tables, , or . Let us see how this is done on a simple example. Supposing we have created the following tables:

    We can for instance create a partition table using these tables as physical partitions by:

    Here the name of each partition sub-table are made by replacing the ‘%s’ part of the tabname option value by the partition name. Now if we do:

    The rows are distributed in the different sub-tables according to the partition function. This can be seen by executing the query:

    This query replies:

    partition_name
    table_rows

    Query pruning is of course automatic, for instance:

    This query replies:

    id
    select_type
    table
    partitions
    type
    possible_keys
    key
    key_len
    ref
    rows
    Extra

    When executing this select query, only sub-table xt3 are used.

    Indexing with Table Partitioning

    Using the table type seems natural. However, in this current version, the issue is that PROXY (and ) tables are not indexable. This is why, if you want the table to be indexed, you must use the table type. The CREATE TABLE statement are almost the same:

    The column id is declared as a key, and the table type is now MYSQL. This makes Sub-tables accessed by calling a MariaDB server as MYSQL tables do. Note that this modifies only the way CONNECT sub-tables are accessed.

    However, indexing just make the partitioned table use “remote indexing” the way FEDERATED tables do. This means that when sending the query to retrieve the table data, a where clause are added to the query. For instance, let’s suppose you ask:

    The query sent to the server are:

    On a query like this one, it does not change much because the where clause could have been added anyway by the cond_push function, but it does make a difference in case of joins. The main thing to understand is that real indexing is done by the called table and therefore that it should be indexed.

    This also means that the xt1, xt2, and xt3 table indexes should be made separately because creating the t2 table as indexed does not make the indexes on the sub-tables.

    Sharding with Table Partitioning

    Using table partitioning can have one more advantage. Because the sub-tables can address a table located on another server, it is possible to shard a table on separate servers and hardware machines. This may be required to access as one table data already located on several remote machines, such as servers of a company branches. Or it can be just used to split a huge table for performance reason. For instance, supposing we have created the following tables:

    Creating the partition table accessing all these are almost like what we did with the t4 table:

    .

    The only difference is the tabname option now referring to the rt1, rt2, and rt3 tables. However, even if it works, this is not the best way to do it. This is because accessing a table via the MySQL API is done twice per table. Once by CONNECT to access the FEDERATED table on the local server, then a second time by FEDERATED engine to access the remote table.

    The CONNECT MYSQL table type being used anyway, you’d rather use it to directly access the remote tables. Indeed, the partition names can also be used to modify the connection URL’s. For instance, in the case shown above, the partition table can be created as:

    Several things can be noted here:

    1. As we have seen before, the partition engine currently loses the connection string. This is why it was specified as “connect” in the option list.

    2. For each partition sub-tables, the “%s” part of the connection string has been replaced by the partition name.

    3. It is not needed anymore to define the rt1, rt2, and rt3 tables (even it does not harm) and the FEDERATED engine is no more used to access the remote tables.

    This is a simple case where the connection string is almost the same for all the sub-tables. But what if the sub-tables are accessed by very different connection strings? For instance:

    There are two solutions. The first one is to use the parts of the connection string to differentiate as partition names:

    The second one, allowing avoiding too complicated partition names, is to create federated servers to access the remote tables (if they do not already exist, else just use them). For instance the first one could be:

    Similarly, “server_two” and “server_three” would be created and the final partition table would be created as:

    It would be even simpler if all remote tables had the same name on the remote databases, for instance if they all were named xt1, the connection string could be set as “server_%s/xt1” and the partition names would be just “one”, “two”, and “three”.

    Sharding on a Special Column

    The technique we have seen above with file partitioning is also available with table partitioning. Companies willing to use as one table data sharded on the company branch servers can, as we have seen, add to the table create definition a special column. For instance:

    This example assumes that federated servers had been created named “server_main”, “server_east” and “server_west” and that all remote tables are named “sales”. Note also that in this example, the column id is no more a key.

    Current Partition Limitations

    Because the partition engine was written before some other engines were added to MariaDB, the way it works is sometime incompatible with these engines, in particular with CONNECT.

    Update statement

    With the sample tables above, you can do update statements such as:

    It works perfectly and is accepted by CONNECT. However, let us consider the statement:

    This statement is not accepted by CONNECT. The reason is that the column id being part of the partition function, changing its value may require the modified row to be moved to another partition. The way it is done by the partition engine is to delete the old row and to re-insert the new modified one. However, this is done in a way that is not currently compatible with CONNECT (remember that CONNECT supports UPDATE in a specific way, in particular for the table type MYSQL) This limitation could be temporary. Meanwhile the workaround is to manually do what is done above,

    Deleting the row to modify and inserting the modified row:

    Alter Table statement

    For all CONNECT outward tables, the ALTER TABLE statement does not make any change in the table data. This is why ALTER TABLE should not be used; in particular to modify the partition definition, except of course to correct a wrong definition. Note that using ALTER TABLE to create a partition table in two steps because column options would be lost is valid as it applies to a table that is not yet partitioned.

    As we have seen, it is also safe to use it to create or drop indexes. Otherwise, a simple rule of thumb is to avoid altering a table definition and better drop and re-create a table whose definition must be modified. Just remember that for outward CONNECT tables, dropping a table does not erase the data and that creating it does not modify existing data.

    Rowid special column

    Each partition being handled separately as one table, the ROWID special column returns the rank of the row in its partition, not in the whole table. This means that for partition tables ROWID and ROWNUM are equivalent.

    This page is licensed: CC BY-SA / Gnu FDL

    Query pruning is automatically made by the partition engine.

    22

    Using where

    boston

    Johnny

    RESEARCH

    chicago

    Jim

    SALES

    1

    4

    2

    4

    3

    3

    1

    SIMPLE

    part5

    3

    MySQL upstream bug #71095
    PROXY
    MYSQL
    ODBC
    JDBC
    PROXY
    ODBC
    MYSQL

    ALL

    Data file name: table_name#P#partition_name.table_file_type
    Index file name: table_name#P#partition_name.index_file_type
    CREATE TABLE t1 (
    id INT KEY NOT NULL,
    msg VARCHAR(32))
    ENGINE=CONNECT TABLE_TYPE=FIX
    PARTITION BY RANGE(id) (
    PARTITION first VALUES LESS THAN(10),
    PARTITION middle VALUES LESS THAN(50),
    PARTITION last VALUES LESS THAN(MAXVALUE));
    | t1#P#first.fix
    | t1#P#first.fnx
    | t1#P#middle.fix
    | t1#P#middle.fnx
    | t1#P#last.fix
    | t1#P#last.fnx
    | table_name#P#partition_name#SP#subpartition_name.type
    | table_name#P#partition_name#SP#subpartition_name.index_type
    E:\Data\part1.txt
    E:\Data\part2.txt
    E:\Data\part3.txt
    CREATE TABLE t2 (
    id INT NOT NULL,
    msg VARCHAR(32),
    INDEX XID(id))
    ENGINE=connect table_type=FIX file_name='E:/Data/part%s.txt'
    PARTITION BY RANGE(id) (
    PARTITION `1` VALUES LESS THAN(10),
    PARTITION `2` VALUES LESS THAN(50),
    PARTITION `3` VALUES LESS THAN(MAXVALUE));
    SELECT * FROM t2 WHERE id = 12;
    SELECT partition_name, table_rows FROM
    information_schema.partitions WHERE table_name = 't2';
    SELECT CASE WHEN id < 10 THEN 1 WHEN id < 50 THEN 2 ELSE 3 END
    AS pn, COUNT(*) FROM part3 GROUP BY pn;
    tmp/boston.txt
    tmp/chicago.txt
    tmp/atlanta.txt
    ID: int
    First_name: varchar(16)
    Last_name: varchar(30)
    Birth: date
    Hired: date
    Job: char(10)
    Salary: double(8,2)
    CREATE TABLE mulemp (
    id INT NOT NULL,
    first_name VARCHAR(16) NOT NULL,
    last_name VARCHAR(30) NOT NULL,
    birth DATE NOT NULL date_format='DD/MM/YYYY',
    hired DATE NOT NULL date_format='DD/MM/YYYY',
    job CHAR(10) NOT NULL,
    salary DOUBLE(8,2) NOT NULL
    ) ENGINE=CONNECT table_type=FIX file_name='tmp/*.txt' multiple=1;
    CREATE TABLE partemp (
    id INT NOT NULL,
    first_name VARCHAR(16) NOT NULL,
    last_name VARCHAR(30) NOT NULL,
    birth DATE NOT NULL date_format='DD/MM/YYYY',
    hired DATE NOT NULL date_format='DD/MM/YYYY',
    job CHAR(16) NOT NULL,
    salary DOUBLE(10,2) NOT NULL,
    city CHAR(12) DEFAULT 'boston' special=PARTID,
    INDEX XID(id)
    ) ENGINE=CONNECT table_type=FIX file_name='E:/Data/Test/%s.txt';
    ALTER TABLE partemp
    PARTITION BY LIST COLUMNS(city) (
    PARTITION `atlanta` VALUES IN('atlanta'),
    PARTITION `boston` VALUES IN('boston'),
    PARTITION `chicago` VALUES IN('chicago'));
    INSERT INTO partemp(id,first_name,last_name,birth,hired,job,salary) VALUES
    (1205,'Harry','Cover','1982-10-07','2010-09-21','MANAGEMENT',125000.00);
    INSERT INTO partemp VALUES
    (1524,'Jim','Beams','1985-06-18','2012-07-25','SALES',52000.00,'chicago'),
    (1431,'Johnny','Walker','1988-03-12','2012-08-09','RESEARCH',46521.87,'boston'),
    (1864,'Jack','Daniels','1991-12-01','2013-02-16','DEVELOPMENT',63540.50,'atlanta');
    SELECT city, first_name, job FROM partemp WHERE id IN (1524,1431);
    CREATE TABLE xt1 (
    id INT NOT NULL,
    msg VARCHAR(32))
    ENGINE=myisam;
    
    CREATE TABLE xt2 (
    id INT NOT NULL,
    msg VARCHAR(32)); /* engine=innoDB */
    
    CREATE TABLE xt3 (
    id INT NOT NULL,
    msg VARCHAR(32))
    ENGINE=connect table_type=CSV;
    CREATE TABLE t3 (
    id INT NOT NULL,
    msg VARCHAR(32))
    ENGINE=connect table_type=PROXY tabname='xt%s'
    PARTITION BY RANGE COLUMNS(id) (
    PARTITION `1` VALUES LESS THAN(10),
    PARTITION `2` VALUES LESS THAN(50),
    PARTITION `3` VALUES LESS THAN(MAXVALUE));
    INSERT INTO t3 VALUES
    (4, 'four'),(7,'seven'),(10,'ten'),(40,'forty'),
    (60,'sixty'),(81,'eighty one'),(72,'seventy two'),
    (11,'eleven'),(1,'one'),(35,'thirty five'),(8,'eight');
    SELECT partition_name, table_rows FROM
    information_schema.partitions WHERE table_name = 't3';
    EXPLAIN PARTITIONS SELECT * FROM t3 WHERE id = 81;
    CREATE TABLE t4 (
    id INT KEY NOT NULL,
    msg VARCHAR(32))
    ENGINE=connect table_type=MYSQL tabname='xt%s'
    PARTITION BY RANGE COLUMNS(id) (
    PARTITION `1` VALUES LESS THAN(10),
    PARTITION `2` VALUES LESS THAN(50),
    PARTITION `3` VALUES LESS THAN(MAXVALUE));
    SELECT * FROM t4 WHERE id = 7;
    SELECT `id`, `msg` FROM `xt1` WHERE `id` = 7
    CREATE TABLE rt1 (id INT KEY NOT NULL, msg VARCHAR(32))
    ENGINE=federated connection='mysql://root@host1/test/sales';
    
    CREATE TABLE rt2 (id INT KEY NOT NULL, msg VARCHAR(32))
    ENGINE=federated connection='mysql://root@host2/test/sales';
    
    CREATE TABLE rt3 (id INT KEY NOT NULL, msg VARCHAR(32))
    ENGINE=federated connection='mysql://root@host3/test/sales';
    CREATE TABLE t5 (
    id INT KEY NOT NULL,
    msg VARCHAR(32))
    ENGINE=connect table_type=MYSQL tabname='rt%s'
    PARTITION BY RANGE COLUMNS(id) (
    PARTITION `1` VALUES LESS THAN(10),
    PARTITION `2` VALUES LESS THAN(50),
    PARTITION `3` VALUES LESS THAN(MAXVALUE));
    CREATE TABLE t6 (
    id INT KEY NOT NULL,
    msg VARCHAR(32))
    ENGINE=connect table_type=MYSQL
    option_list='connect=mysql://root@host%s/test/sales'
    PARTITION BY RANGE COLUMNS(id) (
    PARTITION `1` VALUES LESS THAN(10),
    PARTITION `2` VALUES LESS THAN(50),
    PARTITION `3` VALUES LESS THAN(MAXVALUE));
    For rt1: connection='mysql://root:tinono@127.0.0.1:3307/test/xt1'
    For rt2: connection='mysql://foo:foopass@denver/dbemp/xt2'
    For rt3: connection='mysql://root@huston :5505/test/tabx'
    CREATE TABLE t7 (
    id INT KEY NOT NULL,
    msg VARCHAR(32))
    ENGINE=connect table_type=MYSQL
    option_list='connect=mysql://%s'
    PARTITION BY RANGE COLUMNS(id) (
    PARTITION `root:tinono@127.0.0.1:3307/test/xt1` VALUES LESS THAN(10),
    PARTITION `foo:foopass@denver/dbemp/xt2` VALUES LESS THAN(50),
    PARTITION `root@huston :5505/test/tabx` VALUES LESS THAN(MAXVALUE));
    CREATE SERVER `server_one` FOREIGN DATA WRAPPER 'mysql'
    OPTIONS
    (HOST '127.0.0.1',
    DATABASE 'test',
    USER 'root',
    PASSWORD 'tinono',
    PORT 3307);
    CREATE TABLE t8 (
    id INT KEY NOT NULL,
    msg VARCHAR(32))
    ENGINE=connect table_type=MYSQL
    option_list='connect=server_%s'
    PARTITION BY RANGE COLUMNS(id) (
    PARTITION `one/xt1` VALUES LESS THAN(10),
    PARTITION `two/xt2` VALUES LESS THAN(50),
    PARTITION `three/tabx` VALUES LESS THAN(MAXVALUE));
    CREATE TABLE t9 (
    id INT NOT NULL,
    msg VARCHAR(32),
    branch CHAR(16) DEFAULT 'main' special=PARTID,
    INDEX XID(id))
    ENGINE=connect table_type=MYSQL
    option_list='connect=server_%s/sales'
    PARTITION BY RANGE COLUMNS(id) (
    PARTITION `main` VALUES IN('main'),
    PARTITION `east` VALUES IN('east'),
    PARTITION `west` VALUES IN('west'));
    UPDATE t2 SET msg = 'quatre' WHERE id = 4;
    UPDATE t2 SET id = 41 WHERE msg = 'four';
    DELETE FROM t2 WHERE id = 4;
    INSERT INTO t2 VALUES(41, 'four');