CONNECT Zipped File Tables

You are viewing an old version of this article. View the current version here.
MariaDB starting with 10.2.4

This is a new implementation that is distributed from Connect 1.05.0001 (from MariaDB 10.2.4). It must be regarded as beta. It applies when the table file or files are compressed in one or several zip files.

The specific options used when creating tables based on zip files are:

Table OptionTypeDescription
ZIPPEDBooleanRequired to be set as true.
ENTRY*StringThe optional name or pattern of the zip entry or entries to be used with the table. If not specified, all entries or only the first one will be used depending on the mulentry option setting.
MULENTRIES*BooleanTrue if several entries are part of the table. If not specified, it defaults to false if the entry option is not specified. If the entry option is specified, it defaults to true if the entry name contains wildcard characters or false if it does not.
LOAD*StringUsed when creating new zipped tables (see below)

Options marked with a ‘*’ must be specified in the option list.

Examples: CONNECT CSV for Zipped File Tables

An example of a generic table definition, which contains some of the most common table_options used in CONNECT Table_Type=CSV Engine when dealing with Zipped File Tables, would be as follows:

     ... optional column definition
     [HEADER ={0|1|NO|YES}]
     [QCHAR = '{"|''|other_quotation_marks}']

Note that zipped_file_path can contain wildcards " * " when used with MULTIPLE={1|3}, a file path example would be as follows: C:/SubFolder/Folder/Filename*.zip

Multiple tables are specified by the option MULTIPLE=n, which can take four values:

0Not a multiple table (the default). This can be used in an alter table statement.
1The table is made from files located in the same directory. The FILE_NAME option is a pattern such as 'cash*.log' that all the table file path/names verify.
2The FILE_NAME gives the name of a file that contains the path/names of all the table files. This file can be made using a DIR table.
3The table is made from files located in the same directory, and all its sub-directories (sub-foders). The FILE_NAME option is a pattern such as 'cash*.log' that all the table file path/names verify.

The column descriptions can be retrieved by the discovery process for table types allowing it. It cannot be done for multiple tables or multiple entries. A catalog table can be created by adding catfunc=columns. This can be used to show the column definitions of multiple tables. Multiple must be set to false and the column definitions will be the ones of the first table or entry.

This first implementation has some restrictions:

  1. This is a read-only implementation. No insert, update or delete.
  2. The inside files are decompressed into memory. Memory problems may arise with huge files.
  3. Only file types that can be handled from memory are eligible for this. This includes DOS, FIX, BIN, CSV, FMT, JSON, and XML table types.

Optimization by indexing or block indexing is possible for table types supporting it. However, it applies on the uncompressed table. This means that the whole table is always uncompressed.

Partitioning is also supported. See how to do it in the section about partitioning.

Examples of use:

Example 1: Single CSV File included in a Single ZIP File

Let's suppose you have a CSV file from which you would create a table by:

create table emp
... optional column definition
engine=connect table_type=CSV file_name='E:/Data/employee.csv'
sep_char=';' header=1;

If the CSV file is included in a ZIP file, the CREATE TABLE becomes:

create table empzip
... optional column definition
engine=connect table_type=CSV file_name='E:/Data/'
sep_char=';' header=1 zipped=1 option_list='Entry=emp.csv';

The file_name option is the name of the zip file. The entry option is the name of the entry inside the zip file. If there is only one entry file inside the zip file, this option can be omitted.

Example 2: Several CSV Files included in a Single ZIP File

If the table is made from several files such as emp01.csv, emp02.csv, etc., the standard create table would be:

create table empmul (
... required column definition
) engine=connect table_type=CSV file_name='E:/Data/emp*.csv' 
sep_char=';' header=1 multiple=1;

But if these files are all zipped inside a unique zip file, it becomes:

create table empzmul
... required column definition
engine=connect table_type=CSV file_name='E:/Data/'
sep_char=';' header=1 zipped=1 option_list='Entry=emp*.csv';

Here the entry option is the pattern that the files inside the zip file must match. If all entry files are ok, the entry option can be omitted but the Boolean option mulentry must be specified as true.

Example 3: Single CSV File included in Multiple ZIP Files (Without considering subfolders)

If the table is created on several zip files, it is specified as for all other multiple tables:

create table zempmul (
... required column definition
) engine=connect table_type=CSV file_name='E:/Data/emp*.zip' 
sep_char=';' header=1 multiple=1 zipped=yes 

Here again the entry option is used to restrict the entry file(s) to be used inside the zip files and can be omitted if all are ok.

Creating new zipped tables

Tables can be created to access already existing zip files. However, is it also possible to make the zip file from an existing file or table. Two ways are available to make the zip file:

Insert method:

insert can be used to make the table file for table types based on records (this excludes XML and JSON when pretty is not 0). However, the current implementation of the used package (minizip) does not support adding to an already existing zip entry. This means that when executing an insert statement the inserted records are not added but replace the existing ones. Therefore, only three ways are available to do so:

  1. Using only one insert statement to make the table. This is possible only for small tables and is principally useful when making tests.
  2. Making the table from the data of another table. This can be done by executing an “insert into table select * from another_table” or by specifying “as select * from another_table” in the create table statement.
  3. Making the table from a file whose format enables to use the “load data infile” statement.

File zipping method

This method enables to make the zip file from another file when creating the table. It applies to all table types including XML and JSON. It is specified in the create table statement with the load option. For example:

create table XSERVZIP (
NUMERO varchar(4) not null,
LIEU varchar(15) not null,
CHEF varchar(5) not null,
FONCTION varchar(12) not null,
NOM varchar(21) not null)
engine=CONNECT table_type=XML file_name='E:/Xml/' zipped=1

When executing this statement, the serv2.xml file will be zipped as / The entry name must be specified as well as the column descriptions that cannot be retrieved from the zip entry file that does not exist yet. To add a new entry in an existing zip file specify “append=YES” in the option list.

It is even possible to create a multi-entries table from several files:

CREATE TABLE znewcities (
  _id char(5) NOT NULL,
  city char(16) NOT NULL,
  lat double(18,6) NOT NULL `FIELD_FORMAT`='loc:[0]',
  lng double(18,6) NOT NULL `FIELD_FORMAT`='loc:[1]',
  pop int(6) NOT NULL,
  state char(2) NOT NULL
) ENGINE=CONNECT TABLE_TYPE=JSON FILE_NAME='E:/Json/' ZIPPED=1 LRECL=1000 OPTION_LIST='Load=E:/Json/city_*.json,mulentries=YES,pretty=0';

Here the files to load are specified with wildcard characters and the mulentries options must be specified. However, the entry option must not be specified, entry names will be made from the file names.

ZIP table type

A ZIP table type is also available. It is not meant to read the inside files but to display information about the zip file contents. For instance:

create table xzipinfo2 (
fn varchar(256)not null,
cmpsize bigint not null flag=1,
uncsize bigint not null flag=2,
method int not null flag=3)
engine=connect table_type=ZIP file_name='E:/Data/Json/';

This will display the name, compressed size, uncompressed size, and compress method of all entries inside the zip file. Column names are irrelevant; these are flag values that mean what information to retrieve.


Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.