Using CONNECT - Partitioning and Sharding
- Partition engine issues
- File Partitioning
- Table Partitioning
- Current Partition Limitations
CONNECT supports the MySQL/MariaDB partition specification. It is done similar to the way MyISAM or InnoDB do by using the PARTITION engine that must be enabled for this to work. This type of partitioning is sometimes referred as “horizontal partitioning”.
Partitioning enables you to distribute portions of individual tables across a file system according to rules which you can set largely as needed. In effect, different portions of a table are stored as separate tables in different locations. The user-selected rule by which the division of data is accomplished is known as a partitioning function, which in MariaDB can be the modulus, simple matching against a set of ranges or value lists, an internal hashing function, or a linear hashing function.
CONNECT takes this notion a step further, by providing two types of partitioning:
- File partitioning. Each partition is stored in a separate file like in multiple tables.
- Table partitioning. Each partition is stored in a separate table like in TBL tables.
Partition engine issues
Using partitions sometimes requires creating the tables in an unnatural way to avoid some error due to several partition engine bugs:
- Engine specific column and index options are not recognized and cause a syntax error when the table is created. The workaround is to create the table in two steps, a CREATE TABLE statement followed by an ALTER TABLE statement.
- The connection string, when specified for the table, is lost by the partition engine. The workaround is to specify the connection string in the option_list.
- MySQL upstream bug #71095. In case of list columns partitioning it sometimes causes a false “impossible where” clause to be raised. This makes a wrong void result returned when it should not be void. There is no workaround but this bug should be hopefully fixed.
The following examples are using the above workaround syntax to address these issues.
File partitioning applies to file-based CONNECT table types. As with multiple tables, physical data is stored in several files instead of just one. The differences to multiple tables are:
- Data is distributed amongst the different files following the partition rule.
- Unlike multiple tables, partitioned tables are not read only.
- Unlike multiple tables, partitioned tables can be indexable.
- The file names are generated from the partition names.
- Query pruning is automatically made by the partition engine.
The table file names are generated differently depending on whether the table is an inward or outward table. For inward tables, for which the file name is not specified, the partition file names are:
Data file name: table_name#P#partition_name.table_file_type Index file name: table_name#P#partition_name.index_file_type
For instance for the table:
CREATE TABLE t1 ( id INT KEY NOT NULL, msg VARCHAR(32)) ENGINE=CONNECT TABLE_TYPE=FIX partition by range(id) ( partition first values less than(10), partition middle values less than(50), partition last values less than(MAXVALUE));
CONNECT will generate in the current data directory the files:
| t1#P#first.fix | t1#P#first.fnx | t1#P#middle.fix | t1#P#middle.fnx | t1#P#last.fix | t1#P#last.fnx
This is similar to what the partition engine does for other engines - CONNECT partitioned inward tables behave like other engines partition tables do. Just the data format is different.
Note: If sub-partitioning is used, inward table files and index files are named:
| table_name#P#partition_name#SP#subpartition_name.type | table_name#P#partition_name#SP#subpartition_name.index_type
The real problems occur with outward tables, in particular when they are created from already existing files. The first issue is to make the partition table use the correct existing file names. The second one, only for already existing not void tables, is to be sure the partitioning function match the distribution of the data already existing in the files.
The first issue is addressed by the way data file names are constructed. For instance let us suppose we want to make a table from the fixed formatted files:
E:\Data\part1.txt E:\Data\part2.txt E:\Data\part3.txt
This can be done by creating a table such as:
create table t2 ( id int not null, msg varchar(32), index XID(id)) engine=connect table_type=FIX file_name='E:/Data/part%s.txt' partition by range(id) ( partition `1` values less than(10), partition `2` values less than(50), partition `3` values less than(MAXVALUE));
The rule is that for each partition the matching file name is internally generated by replacing in the given FILE _ NAME option value the “%s” part by the partition name.
If the table was initially void, further inserts will populate it according to the partition function. However, if the files did exist and contained data, this is your responsibility to determine what partition function actually matches the data distribution in them. This means in particular that partitioning by key or by hash cannot be used (except in exceptional cases) because you have almost no control over what the used algorithm does.
In the example above, there is no problem if the table is initially void, but if it is not, serious problems can be met if the initial distribution does not match the table distribution. Supposing a row in which “id” as the value 12 was initially contained in the part1.txt file, it will be seen when selecting the whole table but if you ask:
select * from t2 where id = 12;
The result will have 0 rows. This is because according to the partition function query pruning will only look inside the second partition and will miss the row that is in the wrong partition.
One way to check for wrong distribution if for instance to compare the results from queries such as:
SELECT partition_name, table_rows FROM information_schema.partitions WHERE table_name = 't2';
SELECT CASE WHEN id < 10 THEN 1 WHEN id < 50 THEN 2 ELSE 3 END AS pn, COUNT(*) FROM part3 GROUP BY pn;
If they match, the distribution can be correct although this does not prove it. However, if they do not match, the distribution is surely wrong.
Partitioning on a Special Column
There are some cases where the files of a multiple table do not contain columns that can be used for range or list partitioning. For instance, let’s suppose we have a multiple table based on the following files:
tmp/boston.txt tmp/chicago.txt tmp/atlanta.txt
Each of them containing the same kind of data:
ID: int First_name: varchar(16) Last_name: varchar(30) Birth: date Hired: date Job: char(10) Salary: double(8,2)
A multiple table can be created on them, for instance by:
create table mulemp ( id int NOT NULL, first_name varchar(16) NOT NULL, last_name varchar(30) NOT NULL, birth date NOT NULL date_format='DD/MM/YYYY', hired date NOT NULL date_format='DD/MM/YYYY', job char(10) NOT NULL, salary double(8,2) NOT NULL ) engine=CONNECT table_type=FIX file_name='tmp/*.txt' multiple=1;
The issue is that if we want to create a partitioned table on these files, there are no columns to use for defining a partition function. Each city file can have the same kind of column values and there is no way to distinguish them.
However, there is a solution. It is to add to the table a special column that will be used by the partition function. For instance, the new table creation can be done by:
create table partemp ( id int NOT NULL, first_name varchar(16) NOT NULL, last_name varchar(30) NOT NULL, birth date NOT NULL date_format='DD/MM/YYYY', hired date NOT NULL date_format='DD/MM/YYYY', job char(16) NOT NULL, salary double(10,2) NOT NULL, city char(12) default 'boston' special=PARTID, index XID(id) ) engine=CONNECT table_type=FIX file_name='E:/Data/Test/%s.txt'; alter table partemp partition by list columns(city) ( partition `atlanta` values in('atlanta'), partition `boston` values in('boston'), partition `chicago` values in('chicago'));
Note 1: we had to do it in two steps because of the column CONNECT options.
Note 2: the special column PARTID returns the name of the partition in which the row is located.
Note 3: here we could have used the FNAME special column instead because the file name is specified as being the partition name.
This may seem rather stupid because it means for instance that a row will be in partition boston if it belongs to the partition boston! However, it works because the partition engine doesn’t know about special columns and behaves as if the city column was a real column.
What happens if we populate it by?
insert into partemp(id,first_name,last_name,birth,hired,job,salary) values (1205,'Harry','Cover','1982-10-07','2010-09-21','MANAGEMENT',125000.00); insert into partemp values (1524,'Jim','Beams','1985-06-18','2012-07-25','SALES',52000.00,'chicago'), (1431,'Johnny','Walker','1988-03-12','2012-08-09','RESEARCH',46521.87,'boston'), (1864,'Jack','Daniels','1991-12-01','2013-02-16','DEVELOPMENT',63540.50,'atlanta');
The value given for the city column (explicitly or by default) will be used by the partition engine to decide in which partition to insert the rows. It will be ignored by CONNECT (a special column cannot be given a value) but later will return the matching value. For instance:
select city, first_name, job from partemp where id in (1524,1431);
This query returns:
Everything works as if the city column was a real column contained in the table data files.
Partitioning of zipped tables
Two cases are currently supported:
If a table is based on several zipped files, portioning is done the standard way as above. This is the file_name option specifying the name of the zip files that shall contain the ‘%s’ part used to generate the file names.
If a table is based on only one zip file containing several entries, this will be indicated by placing the ‘%s’ part in the entry option value.
Note: If a table is based on several zipped files each containing several entries, only the first case is possible. Using sub-partitioning to make partitions on each entries is not supported yet.
With table partitioning, each partition is physically represented by a sub-table. Compared to standard partitioning, this brings the following features:
- The partitions can be tables driven by different engines. This relieves the current existing limitation of the partition engine.
- The partitions can be tables driven by engines not currently supporting partitioning.
- Partition tables can be located on remote servers, enabling table sharding.
- Like for TBL tables, the columns of the partition table do not necessarily match the columns of the sub-tables.
The way it is done is to create the partition table with a table type referring to other tables, PROXY, MYSQL ODBC or JDBC. Let us see how this is done on a simple example. Supposing we have created the following tables:
create table xt1 ( id int not null, msg varchar(32)) engine=myisam; create table xt2 ( id int not null, msg varchar(32)); /* engine=innoDB */ create table xt3 ( id int not null, msg varchar(32)) engine=connect table_type=CSV;
We can for instance create a partition table using these tables as physical partitions by:
create table t3 ( id int not null, msg varchar(32)) engine=connect table_type=PROXY tabname='xt%s' partition by range columns(id) ( partition `1` values less than(10), partition `2` values less than(50), partition `3` values less than(MAXVALUE));
Here the name of each partition sub-table will be made by replacing the ‘%s’ part of the tabname option value by the partition name. Now if we do:
insert into t3 values (4, 'four'),(7,'seven'),(10,'ten'),(40,'forty'), (60,'sixty'),(81,'eighty one'),(72,'seventy two'), (11,'eleven'),(1,'one'),(35,'thirty five'),(8,'eight');
The rows will be distributed in the different sub-tables according to the partition function. This can be seen by executing the query:
select partition_name, table_rows from information_schema.partitions where table_name = 't3';
This query replies:
Query pruning is of course automatic, for instance:
explain partitions select * from t3 where id = 81;
This query replies:
When executing this select query, only sub-table xt3 will be used.
Indexing with Table Partitioning
Using the PROXY table type seems natural. However, in this current version, the issue is that PROXY (and ODBC) tables are not indexable. This is why, if you want the table to be indexed, you must use the MYSQL table type. The CREATE TABLE statement will be almost the same:
create table t4 ( id int key not null, msg varchar(32)) engine=connect table_type=MYSQL tabname='xt%s' partition by range columns(id) ( partition `1` values less than(10), partition `2` values less than(50), partition `3` values less than(MAXVALUE));
The column id is declared as a key, and the table type is now MYSQL. This makes Sub-tables accessed by calling a MariaDB server as MYSQL tables do. Note that this modifies only the way CONNECT sub-tables are accessed.
However, indexing just make the partitioned table use “remote indexing” the way FEDERATED tables do. This means that when sending the query to retrieve the table data, a where clause will be added to the query. For instance, let’s suppose you ask:
select * from t4 where id = 7;
The query sent to the server will be:
SELECT `id`, `msg` FROM `xt1` WHERE `id` = 7
On a query like this one, it does not change much because the where clause could have been added anyway by the cond_push function, but it does make a difference in case of joins. The main thing to understand is that real indexing is done by the called table and therefore that it should be indexed.
This also means that the xt1, xt2, and xt3 table indexes should be made separately because creating the t2 table as indexed does not make the indexes on the sub-tables.
Sharding with Table Partitioning
Using table partitioning can have one more advantage. Because the sub-tables can address a table located on another server, it is possible to shard a table on separate servers and hardware machines. This may be required to access as one table data already located on several remote machines, such as servers of a company branches. Or it can be just used to split a huge table for performance reason. For instance, supposing we have created the following tables:
create table rt1 (id int key not null, msg varchar(32)) engine=federated connection='mysql://root@host1/test/sales'; create table rt2 (id int key not null, msg varchar(32)) engine=federated connection='mysql://root@host2/test/sales'; create table rt3 (id int key not null, msg varchar(32)) engine=federated connection='mysql://root@host3/test/sales';
Creating the partition table accessing all these will be almost like what we did with the t4 table:
create table t5 ( id int key not null, msg varchar(32)) engine=connect table_type=MYSQL tabname='rt%s' partition by range columns(id) ( partition `1` values less than(10), partition `2` values less than(50), partition `3` values less than(MAXVALUE));
The only difference is the tabname option now referring to the rt1, rt2, and rt3 tables. However, even if it works, this is not the best way to do it. This is because accessing a table via the MySQL API is done twice per table. Once by CONNECT to access the FEDERATED table on the local server, then a second time by FEDERATED engine to access the remote table.
The CONNECT MYSQL table type being used anyway, you’d rather use it to directly access the remote tables. Indeed, the partition names can also be used to modify the connection URL’s. For instance, in the case shown above, the partition table can be created as:
create table t6 ( id int key not null, msg varchar(32)) engine=connect table_type=MYSQL option_list='connect=mysql://root@host%s/test/sales' partition by range columns(id) ( partition `1` values less than(10), partition `2` values less than(50), partition `3` values less than(MAXVALUE));
Several things can be noted here:
- As we have seen before, the partition engine currently loses the connection string. This is why it was specified as “connect” in the option list.
- For each partition sub-tables, the “%s” part of the connection string has been replaced by the partition name.
- It is not needed anymore to define the rt1, rt2, and rt3 tables (even it does not harm) and the FEDERATED engine is no more used to access the remote tables.
This is a simple case where the connection string is almost the same for all the sub-tables. But what if the sub-tables are accessed by very different connection strings? For instance:
For rt1: connection='mysql://root:email@example.com:3307/test/xt1' For rt2: connection='mysql://foo:foopass@denver/dbemp/xt2' For rt3: connection='mysql://root@huston :5505/test/tabx'
There are two solutions. The first one is to use the parts of the connection string to differentiate as partition names:
create table t7 ( id int key not null, msg varchar(32)) engine=connect table_type=MYSQL option_list='connect=mysql://%s' partition by range columns(id) ( partition `root:firstname.lastname@example.org:3307/test/xt1` values less than(10), partition `foo:foopass@denver/dbemp/xt2` values less than(50), partition `root@huston :5505/test/tabx` values less than(MAXVALUE));
The second one, allowing avoiding too complicated partition names, is to create federated servers to access the remote tables (if they do not already exist, else just use them). For instance the first one could be:
create server `server_one` foreign data wrapper 'mysql' options (host '127.0.0.1', database 'test', user 'root', password 'tinono', port 3307);
Similarly, “server_two” and “server_three” would be created and the final partition table would be created as:
create table t8 ( id int key not null, msg varchar(32)) engine=connect table_type=MYSQL option_list='connect=server_%s' partition by range columns(id) ( partition `one/xt1` values less than(10), partition `two/xt2` values less than(50), partition `three/tabx` values less than(MAXVALUE));
It would be even simpler if all remote tables had the same name on the remote databases, for instance if they all were named xt1, the connection string could be set as “server_%s/xt1” and the partition names would be just “one”, “two”, and “three”.
Sharding on a Special Column
The technique we have seen above with file partitioning is also available with table partitioning. Companies willing to use as one table data sharded on the company branch servers can, as we have seen, add to the table create definition a special column. For instance:
create table t9 ( id int not null, msg varchar(32), branch char(16) default 'main' special=PARTID, index XID(id)) engine=connect table_type=MYSQL option_list='connect=server_%s/sales' partition by range columns(id) ( partition `main` values in('main'), partition `east` values in('east'), partition `west` values in('west'));
This example assumes that federated servers had been created named “server_main”, “server_east” and “server_west” and that all remote tables are named “sales”. Note also that in this example, the column id is no more a key.
Current Partition Limitations
Because the partition engine was written before some other engines were added to MariaDB, the way it works is sometime incompatible with these engines, in particular with CONNECT.
With the sample tables above, you can do update statements such as:
update t2 set msg = 'quatre' where id = 4;
It works perfectly and is accepted by CONNECT. However, let us consider the statement:
update t2 set id = 41 where msg = 'four';
This statement is not accepted by CONNECT. The reason is that the column id being part of the partition function, changing its value may require the modified row to be moved to another partition. The way it is done by the partition engine is to delete the old row and to re-insert the new modified one. However, this is done in a way that is not currently compatible with CONNECT (remember that CONNECT supports UPDATE in a specific way, in particular for the table type MYSQL) This limitation could be temporary. Meanwhile the workaround is to manually do what is done above,
Deleting the row to modify and inserting the modified row:
delete from t2 where id = 4; insert into t2 values(41, 'four');
Alter Table statement
For all CONNECT outward tables, the ALTER TABLE statement does not make any change in the table data. This is why ALTER TABLE should not be used; in particular to modify the partition definition, except of course to correct a wrong definition. Note that using ALTER TABLE to create a partition table in two steps because column options would be lost is valid as it applies to a table that is not yet partitioned.
As we have seen, it is also safe to use it to create or drop indexes. Otherwise, a simple rule of thumb is to avoid altering a table definition and better drop and re-create a table whose definition must be modified. Just remember that for outward CONNECT tables, dropping a table does not erase the data and that creating it does not modify existing data.
Rowid special column
Each partition being handled separately as one table, the ROWID special column returns the rank of the row in its partition, not in the whole table. This means that for partition tables ROWID and ROWNUM are equivalent.