CONNECT Zipped File Tables

You are viewing an old version of this article. View the current version here.

This is a new implementation that will be distributed in the next versions of MariaDB. It must be regarded as beta. It applies when the table file or files are compressed in one or several zip files.

The specific options used when creating tables based on zip files are:

Table OptionTypeDescription
ZIPPEDBooleanRequired. To be set as true.
ENTRY*StringThe optional name or pattern of the zip entry or entries to be used with the table. If not specified, all entries or only the first one will be used depending on the mulentry option setting.
MULENTRY*BooleanTrue if several entries are part of the table. If not specified, it defaults to false if the entry option is not specified. If the entry option is specified, it defaults to true is the entry name contains wildcard characters or false if it does not.

Options marked with a ‘*’ must be specified in the option list.

Examples of use: Let's suppose you have a CSV file from which you would create a table by:

create table emp
... optional column definition
engine=connect table_type=CSV file_name='E:/Data/employee.csv'
sep_char=';' header=1;

If the CSV file is included in a ZIP file, the CREATE TABLE becomes:

create table empzip
... optional column definition
engine=connect table_type=CSV file_name='E:/Data/employee.zip'
sep_char=';' header=1 zipped=1 option_list='Entry=emp.csv';

The file_name option is the name of the zip file. The entry option is the name of the entry inside the zip file. If there is only one entry file inside the zip file, this option can be omitted.

If the table is made from several files such as emp01.csv, emp02.csv, etc., the standard create table would be:

create table empmul (
... required column definition
) engine=connect table_type=CSV file_name='E:/Data/emp*.csv' 
sep_char=';' header=1 multiple=1;

But if these files are all zipped inside a unique zip file, it becomes:

create table empzmul
... required column definition
engine=connect table_type=CSV file_name='E:/Data/emp.zip'
sep_char=';' header=1 zipped=1 option_list='Entry=emp*.csv';

Here the entry option is the pattern that the files inside the zip file must match. If all entry files are ok, the entry option can be omitted but the Boolean option mulentry must be specified as true.

If the table is created on several zip files, it is specified as for all other multiple tables:

create table zempmul (
... required column definition
) engine=connect table_type=CSV file_name='E:/Data/emp*.zip' 
sep_char=';' header=1 multiple=1 zipped=yes 
option_list='Entry=employee.csv';

Here again the entry option is used to restrict the entry file(s) to be used inside the zip files and can be omitted if all are Ok.

The column descriptions can be retrieved by the discovery process for table types allowing it. It cannot be done for multiple table or multiple entries. Catalog table can be created by adding catfunc=columns. This can be used to show the column definitions of multiple tables. Multiple must be set to false and the column definitions will be the ones of the first table or entry. This first implementation has some restrictions:

1. This is a read only implementation. No insert, update or delete. 2. The inside files are decompressed into memory. Memory problems may arise with huge files. 3. Only file types that can be handled from memory are eligible for this. This includes DOS, FIX, BIN, CSV, FMT, JSON, and XML table types.

Optimization by indexing or block indexing is possible for table types supporting it. However, it applies on the uncompressed table. This means that the whole table is always uncompressed.

Partitioning is also supported. See how to do it in the chapter about partitioning.

ZIP Table Type

A ZIP table type is also available. It is not meant to read the inside files but to display information about the zip file contain. For instance:

create table xzipinfo2 (
fn varchar(256)not null,
cmpsize bigint not null flag=1,
uncsize bigint not null flag=2,
method int not null flag=3)
engine=connect table_type=ZIP file_name='E:/Data/Json/cities.zip';

This will display the name, compressed size, uncompressed size, and compress method of all entries inside the zip file. Column names are irrelevant, this is the flag value that mean what information to retrieve.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.