1 of 52

CONNECT

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Note: You can download a PDF version of the CONNECT documentation (1.7.0003):

1MB

connect_1_7_03.pdf

PDF

Open

Connect Version

Introduced

Maturity

Connect 1.07.0002

The CONNECT storage engine enables MariaDB to access external local or remote data (MED). This is done by defining tables based on different data types, in particular files in various formats, data extracted from other DBMS or products (such as Excel or MongoDB) via ODBC or JDBC, or data retrieved from the environment (for example DIR, WMI, and MAC tables)

This storage engine supports table partitioning, MariaDB virtual columns and permits defining_special_ columns such as ROWID, FILEID, and SERVID.

No precise definition of maturity exists. Because CONNECT handles many table types, each type has a different maturity depending on whether it is old and well-tested, less well-tested or newly implemented. This is indicated for all data types.

Introduction to the CONNECT Engine

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

CONNECT is not just a new “YASE” (Yet another Storage Engine) that provides another way to store data with additional features. It brings a new dimension to MariaDB, already one of the best products to deal with traditional database transactional applications, further into the world of business intelligence and data analysis, including NoSQL facilities. Indeed, BI is the set of techniques and tools for the transformation of raw data into meaningful and useful information. And where is this data?

"It's amazing in an age where relational databases reign supreme when it comes to managing data that so much information still exists outside RDBMS engines in the form of flat files and other such constructs. In most enterprises, data is passed back and forth between disparate systems in a fashion and speed that would rival the busiest expressways in the world, with much of this data existing in common, delimited files. Target systems intercept these source files and then typically proceed to load them via ETL (extract, transform, load) processes into databases that then utilize the information for business intelligence, transactional functions, or other standard operations. ETL tasks and data movement jobs can consume quite a bit of time and resources, especially if large volumes of data are present that require loading into a database. This being the case, many DBAs welcome alternative means of accessing and managing data that exists in file format."

Robin Schumacher[]

What he describes is known as MED (Management of External Data) enabling the handling of data not stored in a DBMS database as if it were stored in tables. An ISO standard exists that describes one way to implement and use MED in SQL by defining foreign tables for which an external FDW (Foreign Data Wrapper) has been developed in C.

However, since this was written, a new source of data was developed as the “cloud”. Data are existing worldwide and, in particular, can be obtained in JSON or XML format in answer to REST queries. From , it is possible to create JSON, XML or CSV tables based on data retrieved from such REST queries.

MED as described above is a rather complex way to achieve this goal and MariaDB does not support the ISO SQL/MED standard. But, to cover the need, possibly in transactional but mostly in decision support applications, the CONNECT storage engine supports MED in a much simpler way.

The main features of CONNECT are:

No need for additional SQL language extensions.
Embedded wrappers for many external data types (files, data sources, virtual).
NoSQL query facilities for , , HTML files and using JSON UDFs.
NoSQL data obtained from REST queries (requires cpprestsdk).

With CONNECT, MariaDB has one of the most advanced implementations of MED and NoSQL, without the need for complex additions to the SQL syntax (foreign tables are "normal" tables using the CONNECT engine).

Giving MariaDB easy and natural access to external data enables the use of all of its powerful functions and SQL-handling abilities for developing business intelligence applications.

With version 1.07 of CONNECT, retrieving data from REST queries is available in all binary distributed version of MariaDB, and, from 1.07.002, CONNECT allows workspaces greater than 4GB.

Robin Schumacher is Vice President Products at DataStax and former Director of Product Management at MySQL. He has over 13 years of database experience in DB2, MySQL, Oracle, SQL Server and other database engines.

Using CONNECT

The CONNECT storage engine has been deprecated.

Using CONNECT - Condition Pushdown

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

The ODBC, JDBC, MYSQL, TBL and WMI table types use engine condition pushdown in order to restrict the number of rows returned by the RDBS source or the WMI component.

The CONDITION_PUSHDOWN argument used in old versions of CONNECT is no longer needed because CONNECT uses condition pushdown unconditionally.

_{This page is licensed: GPLv2}

Using CONNECT - Exporting Data From MariaDB

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Exporting data from MariaDB is obviously possible with CONNECT in particular for all formats not supported by the statement. Let us consider the query:

Supposing you want to get the result of this query into a file handlers.htm in XML/HTML format, allowing displaying it on an Internet browser, this is how you can do it:

Just create the CONNECT table that are used to make the file:

Here the column definition is not given and will come from the Select statement following the Create. The CONNECT options are the same we have seen previously. This will do both actions, creating the matching handlers CONNECT table and 'filling' it with the query result.

Using CONNECT - General Information

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

The main characteristic of CONNECT is to enable accessing data scattered on a machine as if it was a centralized database. This, and the fact that locking is not used by connect (data files are open and closed for each query) makes CONNECT very useful for importing or exporting data into or from a MariaDB database and also for all types of Business Intelligence applications. However, it is not suited for transactional applications.

For instance, the index type used by CONNECT is closer to bitmap indexing than to B-trees. It is very fast for retrieving result but not when updating is done. In fact, even if only one indexed value is modified in a big table, the index is entirely remade (yet this being four to five times faster than for a b-tree index). But normally in Business Intelligence applications, files are not modified so often.

If you are using CONNECT to analyze files that can be modified by an external process, the indexes are of course not modified by it and become outdated. Use the OPTIMIZE TABLE command to update them before using the tables based on them.

This means also that CONNECT is not designed to be used by centralized servers, which are mostly used for transactions and often must run a long time without human intervening.

Performance

Performances vary a great deal depending on the table type. For instance, ODBC tables are only retrieved as fast as the other DBMS can do. If you have a lot of queries to execute, the best way to optimize your work can be sometime to translate the data from one type to another. Fortunately this is very simple with CONNECT. Fixed formats like FIX, BIN or VEC tables can be created from slower ones by commands such as:

FIX and BIN are often the better choice because the I/O functions are done on blocks of BLOCK_SIZE rows. VEC tables can be very efficient for tables having many columns only a few being used in each query. Furthermore, for tables of reasonable size, the MAPPED option can very often speed up many queries.

Create Table statement

Be aware of the two broad kinds of CONNECT tables:

Drop Table statement

For outward tables, the statement just removes the table definition but does not erase the table data. However, dropping an inward tables also erase the table data as well.

Alter Table statement

Be careful using the statement. Currently the data compatibility is not tested and the modified definition can become incompatible with the data. In particular, Alter modifies the table definition only but does not modify the table data. Consequently, the table type should not be modified this way, except to correct an incorrect definition. Also adding, dropping or modifying columns may be wrong because the default offset values (when not explicitly given by the FLAG option) may be wrong when recompiled with missing columns.

Safe use of ALTER is for indexing, as we have seen earlier, and to change options such as MAPPED or HUGE those do not impact the data format but just the way the data file is accessed. Modifying the BLOCK_SIZE option is all right with FIX, BIN, DBF, split VEC tables; however it is unsafe for VEC tables that are not split (only one data file) because at their creation the estimate size has been made a multiple of the block size. This can cause errors if this estimate is not a multiple of the new value of the block size.

In all cases, it is safer to drop and re-create the table (outward tables) or to make another one from the table that must be modified.

Update and Delete for File Tables

CONNECT can execute these commands using two different algorithms:

It can do it in place, directly modifying rows (update) or moving rows (delete) within the table file. This is a fast way to do it in particular when indexing is used.
It can do it using a temporary file to make the changes. This is required when updating variable record length tables and is more secure in all cases.

The choice between these algorithms depends on the session variable .

_{This page is licensed: GPLv2}

USING CONNECT - Offline Documentation

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Note: You can download a (1.7.0003).

_{This page is licensed: CC BY-SA / Gnu FDL}

Using CONNECT - Virtual and Special Columns

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

CONNECT supports MariaDB virtual and persistent columns. It is also possible to declare a column as being a CONNECT special column. Let us see on an example how this can be done. The boys table we have seen previously can be recreated as:

We have defined two CONNECT special columns. You can give them any name; it is the field SPECIAL option that specifies the special column functional name.

Note: the default values specified for the special columns do not mean anything. They are specified just to prevent getting warning messages when inserting new rows.

For the definition of the agehired virtual column, no CONNECT options can be specified as it has no offset or length, not being stored in the file.

The command:

will return:

linenum

name

city

birth

hired

agehired

Existing special columns are listed in the following table:

Special Name

Type

Description of the column value

Note: CONNECT does not currently support auto incremented columns. However, a ROWID special column will do the job of a column auto incremented by 1.

_{This page is licensed: GPLv2}

Installing CONNECT

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

The CONNECT storage engine enables MariaDB to access external local or remote data (MED). This is done by defining tables based on different data types, in particular files in various formats, data extracted from other DBMS or products (such as Excel or MongoDB) via ODBC or JDBC, or data retrieved from the environment (for example DIR, WMI, and MAC tables)

This storage engine supports table partitioning, MariaDB virtual columns and permits defining special columns such as ROWID, FILEID, and SERVID.

The storage engine must be installed before it can be used.

CONNECT Table Types

The CONNECT storage engine has been deprecated.

CONNECT DBF Table Type

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Overview

A table of type DBF is physically a dBASE III or IV formatted file (used by many products like dBASE, Xbase, FoxPro etc.). This format is similar to the type format with in addition a prefix giving the characteristics of the file, describing in particular all the fields (columns) of the table.

Because DBF files have a header that contains Meta data about the file, in particular the column description, it is possible to create a table based on an existing DBF file without giving the column description, for instance:

To see what CONNECT has done, you can use the DESCRIBE or SHOW CREATE TABLE commands, and eventually modify some options with the ALTER TABLE command.

The case of deleted lines is handled in a specific way for DBF tables. Deleted lines are not removed from the file but are "soft deleted" meaning they are marked as deleted. In particular, the number of lines contained in the file header does not take care of soft deleted lines. This is why if you execute these two commands applied to a DBF table named tabdbf:

They can give a different result, the (fast) first one giving the number of physical lines in the file and the second one giving the number of line that are not (soft) deleted.

The commands UPDATE, INSERT, and DELETE can be used with DBF tables. The DELETE command marks the deleted lines as suppressed but keeps them in the file. The INSERT command, if it is used to populate a newly created table, constructs the file header before inserting new lines.

Note: For DBF tables, column name length is limited to 11 characters and field length to 256 bytes.

Conversion of dBASE Data Types

CONNECT handles only types that are stored as characters.

Symbol

DBF Type

CONNECT Type

Description

For the N numeric type, CONNECT converts it to TYPE_DOUBLE if the decimals value is not 0, to TYPE_BIGINT if the length value is greater than 10, else to TYPE_INT.

For M, B, and G types, CONNECT just returns the DBT number.

Reading soft deleted lines of a DBF table

It is possible to read these lines by changing the read mode of the table. This is specified by an option READMODE that can take the values:

For example, to read all lines of the tabdbf table, you can do:

To come back to normal mode, specify READMODE=0.

_{This page is licensed: CC BY-SA / Gnu FDL}

CONNECT - External Table Types

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Because so many ODBC and JDBC drivers exist and only the main ones have been heavily tested, these table types cannot be ranked as stable. Use them with care in production applications.

These types can be used to access tables belonging to the current or another database server. Six types are currently provided:

ODBC: To be used to access tables from a database management system providing an ODBC connector. ODBC is a standard of Microsoft and is currently available on Windows. On Linux, it can also be used provided a specific application emulating ODBC is installed. Currently only unixODBC is supported.

JDBC: To be used to access tables from a database management system providing a JDBC connector. JDBC is an Oracle standard implemented in Java and principally meant to be used by Java applications. Using it directly from C or C++ application seems to be almost impossible due to an Oracle bug still not fixed. However, this can be achieved using a Java wrapper class used as an interface between C++ and JDBC. On another hand, JDBC is available on all platforms and operating systems.

: To access MongoDB collections as tables via their MongoDB C Driver. Because this requires both MongoDB and the C Driver to be installed and operational, this table type is not currently available in binary distributions but only when compiling MariaDB from source.

: This type is the preferred way to access tables belonging to another MySQL or MariaDB server. It uses the MySQL API to access the external table. Even though this can be obtained using the FEDERATED(X) plugin, this specific type is used internally by CONNECT because it also makes it possible to access tables belonging to the current server.

: Internally used by some table types to access other tables from one table.

External Table Specification

The four main external table types – odbc, jdbc, mongo and mysql – are specified giving the following information:

The data source. This is specified in the connection option.
The remote table or view to access. This can be specified within the connection string or using specific CONNECT options.
The column definitions. This can be also left to CONNECT to find them using the discovery MariaDB feature.
The optional Quoted option. Can be set to 1 to quote the identifiers in the query sent to the remote server. This is required if columns or table names can contain blanks.

The way this works is by establishing a connection to the external data source and by sending it an SQL statement (or its equivalent using API functions for MONGO) enabling it to execute the original query. To enhance performance, it is necessary to have the remote data source do the maximum processing. This is needed in particular to reduce the amount of data returned by the data source.

This is why, for SELECT queries, CONNECT uses the MariaDB feature to retrieve the maximum of the where clause of the original query that can be added to the query sent to the data source. This is automatic and does not require anything to be done by the user.

However, more can be done. In addition to accessing a remote table, CONNECT offers the possibility to specify what the remote server must do. This is done by specifying it as a view in the srcdef option:

Doing so, the group by clause are done by the remote server considerably reducing the amount of data sent back on the connection.

This may even be increased by adding to the srcdef part of the “compatible” part of the query where clauses like this are done for table-based tables. Note that for MariaDB, this table has two columns, country and customers. Supposing the original query is:

How can we make the where clause be added to the sent srcdef? There are many problems:

Where to include the additional information.
What about the use of alias.
How to know what are a where clause or a having clause.

The first problem is solved by preparing the srcdef view to receive clauses. The above example srcdef becomes:

The %s in the srcdef are place holders for eventual compatible parts of the original query where clause. If the select query does not specify a where clause, or a gives an unacceptable where clause, place holders are filled by dummy clauses (1=1).

The other problems must be solved by adding to the create table a list of columns that must be translated because they are aliases or/and aliases on aggregate functions that must become a having clause. For example, in this case:

This is specified by the alias option, to be used in the option list. It is made of a semi-colon separated list of items containing:

The local column name (alias in the remote server)
An equal sign.
An eventual ‘*’ indicating this is column correspond to an aggregate function.
The remote column name.

With this information, CONNECT are able to make the query sent to the remote data source:

Note: Some data sources, including MySQL and MariaDB, accept aliases in the having clause. In that case, the alias option could have been specified as:

Another option exists, phpos, enabling to specify what place holders are present and in what order. To be specified as “W”, “WH”, “H”, or “HW”. It is rarely used because by default CONNECT can set it from the srcdef content. The only cases it is needed is when the srcdef contains only a having place holder or when the having place holder occurs before the where place holder, which can occur on queries containing joins. CONNECT cannot handle more than one place holder of each type.

SRCDEF is not available for MONGO tables, but other ways of achieving this exist and are described in the MONGO table type chapter.

_{This page is licensed: CC BY-SA / Gnu FDL}

CONNECT - NoSQL Table Types

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

They are based on files that do not match the relational format but often represent hierarchical data. CONNECT can handle JSON, INI-CFG, XML, and some HTML files.

The way it is done is different from what MySQL or PostgreSQL does. In addition to including in a table some column values of a specific data format (JSON, XML) to be handled by specific functions, CONNECT can directly use JSON, XML or INI files that are produced by other applications, and this is the table definition that describes where and how the contained information must be retrieved.

This is also different from what MariaDB does with dynamic columns, which is close to what MySQL and PostgreSQL do with the JSON column type.

Note: The LEVEL option used with these tables should, from Connect 1.07.0002, be specified as DEPTH. Also, what was specified with the FIELD_FORMAT column option should now also be specified using JPATH or XPATH.

_{This page is licensed: CC BY-SA / Gnu FDL}

CONNECT PROXY Table Type

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

A PROXY table is a table that accesses and reads the data of another table or view. To create a table based on the boys FIX table:

Simply, PROXY being the default type when TABNAME is specified:

Because the boys table can be directly used, what can be the use of a proxy table? Well, its main use is to be internally used by other table types such as , , , or . Sure enough, PROXY tables are CONNECT tables, meaning that they can be based on tables of any engines and accessed by table types that need to access CONNECT tables.

Proxy on non-CONNECT Tables

When the sub-table is a view or not a CONNECT table, CONNECT internally creates a temporary CONNECT table of type to access it. This connection uses the same default parameters as for a MYSQL table. It is also possible to specify them to the PROXY table using in the PROXY declaration the sameOPTION_LIST options as for a MYSQL table. Of course, it is simpler and more natural to use directly the MYSQL type in this case.

Normally, the default parameters should enable the PROXY table to reconnect the server. However, an issue is when the current user was logged using a password. The security protocol prevents CONNECT to retrieve this password and requires it to be given in the PROXY table create statement. For instance adding to it:

However, it is often not advisable to write in clear a password that can be seen by all user able to see the table declaration by show create table, in particular, if the table is used when the current user is root. To avoid this, a specific user should be created on the local host that are used by proxy tables to retrieve local tables. This user can have minimum grant options, for instance SELECT on desired directories, and needs no password. Supposing ‘proxy’ is such a user, the option list to add are:

Using a PROXY Table as a View

A PROXY table can also be used by itself to modify the way a table is viewed. For instance, a proxy table does not use the indexes of the object table. It is also possible to define its columns with different names or type, to use only some of them or to changes their order. For instance:

This will display:

city

boy

birth

Here we did not have to specify column format or offset because data are retrieved from the boys table, not directly from the boys.txt file. The flag option of the boy column indicates that it correspond to the first column of the boys table, the name column.

Avoiding PROXY table loop

CONNECT is able to test whether a PROXY, or PROXY-based, table refers directly or indirectly to itself. If a direct reference can tested at table creation, an indirect reference can only be tested when executing a query on the table. However, this is possible only for local tables. When using remote tables or views, a problem can occur if the remote table or the view refers back to one of the local tables of the chain. The same caution should be used than when using tables.

Note: All PROXY or PROXY-based tables are read-only in this version.

Modifying Operations

All / / operations can be used with proxy tables. However, the same restrictions applying to the source table also apply to the proxy table.

Note: All PROXY and PROXY-based table types are not indexable.

_{This page is licensed: CC BY-SA / Gnu FDL}

CONNECT Table Types - OEM: Implemented in an External LIB

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Although CONNECT provides a rich set of table types, specific applications may need to access data organized in a way that is not handled by its existing foreign data wrappers (FDW). To handle these cases, CONNECT features an interface that enables developers to implement in C++ the required table wrapper and use it as if it were part of the standard CONNECT table type list. CONNECT can use these additional handlers providing the corresponding external module (dll or shared lib) be available.

To create such a table on an existing handler, use a Create Table statement as shown below.

The option module gives the name of the DLL or shared library implementing the OEM wrapper for the table type. This library must be located in the plugin directory like all other plugins or UDF’s.

This library must export a function GetMYTYPE

CONNECT - Using the TBL and MYSQL Table Types Together

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Used together, these types lift all the limitations of the FEDERATED and MERGE engines.

MERGE: Its limitation is obvious, the merged tables must be identical MyISAM tables, and MyISAM is not even the default engine for MariaDB. However, TBL accesses a collection of CONNECT tables, but because these tables can be user specified or internally created MYSQL tables, there is no limitation to the type of the tables that can be merged.

TBL is also much more flexible. The merged tables must not be "identical", they just should have the columns defined in the TBL table. If the type of one column in a merged table is not the one of the corresponding column of the TBL table, the column value are converted. As we have seen, if one column of the TBL table of the TBL column does not exist in one of the merged table, the corresponding value are set to null. If columns in a sub-table have a different name, they can be accessed by position using the FLAG column option of CONNECT.

However, one limitation of the TBL type regarding MERGE is that TBL tables are currently read-only; INSERT is not supported by TBL. Also, keep using MERGE to access a list of identical MyISAM tables because it are faster, not passing by the MySQL API.

FEDERATED(X): The main limitation of FEDERATED is to access only MySQL/MariaDB tables. The MYSQL table type of CONNECT has the same limitation but CONNECT provides the and that can access tables of any RDBS providing an ODBC or JDBC driver (including MySQL even it is not really useful!)

Another major limitation of FEDERATED is to access only one table. By combining TBL and MYSQL tables, CONNECT enables to access a collection of local or remote tables as one table. Of course the sub-tables can be on different servers. With one SELECT statement, a company manager are able to interrogate results coming from all of his subsidiary computers. This is great for distribution, banking, and many other industries.

Remotely executing complex queries

Many companies or administrations must deal with distributed information. CONNECT enables to deal with it efficiently without having to copy it to a centralized database. Let us suppose we have on some remote network machines_m1, m2, … mn_ some information contained in two tables t1 and t2.

Suppose we want to execute on all servers a query such as:

This raises many problems. Returning the column values of the t1 and t2 tables from all servers can be a lot of network traffic. The group by on the possibly huge resulting tables can be a long process. In addition, the join on the t1 and t2 tables may be relevant only if the joined tuples belong to the same machine, obliging to add a condition on an additional tabid or servid special column.

All this can be avoided and optimized by forcing the query to be locally executed on each server and retrieving only the small results of the group by queries. Here is how to do it. For each remote machine, create a table that will retrieve the locally executed query. For instance for m1:

Note the alias for the functional column. An alias would be required for the c1 column if its name was different on some machines. The t1 and t2 table names can also be eventually different on the remote machines. The true names must be used in the SRCDEF parameter. This will create a set of tables with two columns named c1 and sc2[].

Then create the table that will retrieve the result of all these tables:

Now you can retrieve the desired result by:

Almost all the work are done on the remote machines, simultaneously thanks to the thread option, making this query super-fast even on big tables placed on many remote machines.

Thread is currently experimental. Use it only for test and report any malfunction on .

Providing a list of servers

An interesting case is when the query to run on remote machines is the same for all of them. It is then possible to avoid declaring all sub-tables. In this case, the table list option are used to specify the list of servers theSRCDEF query must be sent. This is a list of URL’s and/or Federated server names.

For instance, supposing that federated servers srv1, srv2, … srv_n_ were created for all remote servers, it are possible to create a tbl table allowing getting the result of a query executed on all of them by:

For instance:

This reply:

@@version

Here the server list specifies a void server corresponding to the local running MariaDB and a federated server named server_one.

To generate the columns from the SRCDEF query, CONNECT must execute it. This will make sure it is ok. However, if the remote server is not connected yet, or the remote table not existing yet, you can alternatively specify the columns in the create table statement.

_{This page is licensed: GPLv2}

CONNECT VEC Table Type

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Warning: Avoid using this table type in production applications. This file format is specific to CONNECT and may not be supported in future versions.

Tables of type VEC are binary files that in some cases can provide good performance on read-intensive query workloads. CONNECT organizes their data on disk as columns of values from the same attribute, as opposed to storing it as rows of tabular records. This organization means that when a query needs to access only a few columns of a particular table, only those columns need to be read from disk. Conversely, in a row-oriented table, all values in a table are typically read from disk, wasting I/O bandwidth.

CONNECT provides two integral VEC formats, in which each column's data is adjacent.

Integral vector formats

In these true vertical formats, the VEC files are made of all the data of the first column, followed by all the data of the second column etc. All this can be in one physical file or each column data can be in a separate file. In the first case, the option max_rows=m, where m is the estimate of the maximum size (number of rows) of the table, must be specified to be able to insert some new records. This leaves an empty space after each column area in which new data can be inserted. In the second case, the “Split” option can be specified[] at table creation and each column are stored in a file named sequentially from the table file name followed by the rank of the column. Inserting new lines can freely augment such a table.

Differences between vector formats

These formats correspond to different needs. The integral vector format provides the best performance gain. It are chosen when the speed of decisional queries must be optimized.

In the case of a unique file, inserting new data are limited but there will be only one open and close to do. However, the size of the table cannot be calculated from the file size because of the eventual unused space in the file. It must be kept in a header containing the maximum number of rows and the current number of valid rows in the table. To achieve this, specify the option Header=n when creating the table. If n=1 the header are placed at the beginning of the file, if n=2 it are a separate file with the type ‘.blk’, and if n=3 the header are place at the end of the file. This last value is provided because batch inserting is sometimes slower when the header is at the beginning of the file. If not specified, the header option will default to 2 for this table type.

On the other hand, the "Split" format with separate files have none of these issues, and is a much safer solution when the table must frequently inserted or shared among several users.

For instance:

This table, split by default, will have the column values in files vt1.vec and vt2.vec.

For vector tables, the option block_size=n is used for block reading and writing; however, to have a file made of blocks of equal size, the internal value of the max_rows=m option is eventually increased to become a multiple of n.

Like for BIN tables, numeric values are stored using platform internal layout, the correspondence between column types and internal format being the same than the default ones given above for BIN. However, field formats are not available for VEC tables.

Header option

This applies to VEC tables that are not split. Because the file size depends on the MAX_ROWS value, CONNECT cannot know how many valid records exist in the file. Depending on the value of the HEADER option, this information is stored in a header that can be placed at the beginning of the file, at the end of the file or in a separate file called fn.blk. The valid values for the HEADER option are:

The value 2 can be used when dealing with files created by another application with no header. The value 3 makes sometimes inserting in the file faster than when the header is at the beginning of the file.

Note: VEC being a file format specific to CONNECT, no big endian / little endian conversion is provided. These files are not portable between machines using a different byte order setting.

_{This page is licensed: CC BY-SA / Gnu FDL}

Inward and Outward Tables

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

There are two broad categories of file-based CONNECT tables. Inward and Outward. They are described below.

Outward Tables

Tables are "outward" when their file name is specified in the CREATE TABLE statement using the file_name option.

Firstly, remember that CONNECT implements MED (Management of External Data). This means that the "true" CONNECT tables – "outward tables" – are based on data that belongs to files that can be produced by other applications or data imported from another DBMS.

Therefore, their data is "precious" and should not be modified except by specific commands such as , , or . For other commands such as , , or their data is never modified or erased.

Outward tables can be created on existing files or external tables. When they are dropped, only the local description is dropped, the file or external table is not dropped or erased. Also, does not erase the indexes.

produces the following warning, as a reminder:

If the specified file does not exist, it is created when data is inserted into the table. If a is issued before the file is created, the following error is produced:

Altering Outward Tables

When an is issued, it just modifies the table definition accordingly without changing the data. can be used safely to, for instance, modify options such as MAPPED, HUGE or READONLY but with extreme care when modifying column definitions or order options because some column options such as FLAG should also be modified or may become wrong.

Changing the table type with often makes no sense. But many suspicious alterations can be acceptable if they are just meant to correct an existing wrong definition.

Translating a CONNECT table to another engine is fine but the opposite is forbidden when the target CONNECT table is not table based or when its data file exists (because when the target table data cannot be changed and if the source table is dropped, the table data would be lost). However, it can be done to create a new file-based tables when its file does not exist or is void.

Creating or dropping indexes is accepted because it does not modify the table data. However, it is often unsafe to do it with an statement that does other modifications.

Of course, all changes are acceptable for empty tables.

Note: Using outward tables requires the privilege.

Inward Tables

A special type of file-based CONNECT tables are “inward” tables. They are file-based tables whose file name is not specified in the statement (no file_name option).

Their file are located in the current database directory and their name will default to tablename.type where tablename is the table name and type is the table type folded to lower case. When they are created without using aCREATE TABLE ... SELECT ... statement, an empty file is made at create time and they can be populated by further inserts.

They behave like tables of other storage engines and, unlike outward CONNECT tables, they are erased when the table is dropped. Of course they should not be read-only to be usable. Even though their utility is limited, they can be used for testing purposes or when the user does not have the privilege.

Altering Inward Tables

One thing to know, because CONNECT builds indexes in a specific way, is that all index modifications are done using an "in-place" algorithm – meaning not using a temporary table. This is why, when indexing is specified in an statement containing other changes that cannot be done "in-place", the statement cannot be executed and raises an error.

Converting an inward table to an outward table, using an ALTER TABLE statement specifying a new file name and/or a new table type, is restricted the same way it is when converting a table from another engine to an outward table. However there are no restrictions to convert another engine table to a CONNECT inward table.

_{This page is licensed: CC BY-SA / Gnu FDL}

CONNECT Security

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

The use of the CONNECT engine requires the privilege for tables. This should not be an important restriction. The use of CONNECT "outward" tables on a remote server seems of limited interest without knowing the files existing on it and must be protected anyway. On the other hand, using it on the local client machine is not an issue because it is always possible to create locally a user with the FILE privilege.

_{This page is licensed: GPLv2}

Adding the REST Feature as a Library Called by an OEM Table

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

If you are using a version of MariaDB that does not support REST, this is how the REST feature can be added as a library called by an OEM table.

Before making the REST OEM module, the Microsoft Casablanca package must be installed as for compiling MariaDB from source.

Even if this module is to be used with a binary distribution, you need some CONNECT source files in order to successfully make it. It is made with four files existing in the version 1.06.0010 of CONNECT: tabrest.cpp, restget.cpp, tabrest.h and mini-global.h. It also needs the CONNECT header files that are included in tabrest.cpp and the ones they can include. This can be obtained by going to a recent download site of a version of MariaDB that includes the REST feature, downloading the MariaDB source file tar.gz and extracting from it the CONNECT sources files in a directory that are added to the additional source directories if it is not the directory containing the above files.

Compiling JSON UDFs in a Separate Library

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Although the JSON UDFs can be nicely included in the CONNECT library module, there are cases when you may need to have them in a separate library.

This is when CONNECT is compiled embedded, or if you want to test or use these UDFs with other MariaDB versions not including them.

To make it, you need to have access to the most recent MariaDB source code. Then, make a project containing these files:

Current Status of the CONNECT Handler

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

The CONNECT handler is a GA (stable) release. It was written starting both from an aborted project written for MySQL in 2004 and from the “DBCONNECT” program. It was tested on all the examples described in this document, and is distributed with a set of 53 test cases. Here is a not limited list of future developments:

Adding more table types.

CONNECT System Variables

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

This page documents system variables related to the CONNECT storage engine. See Server System Variables for instructions on setting them.

`connect_class_path`

Description: Java class path
Command line: --connect-class-path=value
Scope: Global
Dynamic:

`connect_cond_push`

Description: Enable condition pushdown
Command line: --connect-cond-push={0|1}
Scope: Global, Session
Dynamic: Yes

`connect_conv_size`

Description: The size of the created when converting from a type. See .
Command line: --connect-conv-size=#
Scope: Global, Session
Dynamic: Yes

`connect_default_depth`

Description: Default depth used by Json, XML and Mongo discovery.
Command line: --connect-default-depth=#
Scope: Global, Session
Dynamic: Yes

`connect_default_prec`

Description: Default precision used for doubles.
Command line: --connect-default-prec=#
Scope: Global, Session
Dynamic: Yes

`connect_enable_mongo`

Description: Enable the .
Command line: --connect-enable-mongo={0|1}
Scope: Global, Session
Dynamic:

`connect_exact_info`

Description: Whether the CONNECT engine should return an exact record number value to information queries. It is OFF by default because this information can take a very long time for large variable record length tables or for remote tables, especially if the remote server is not available. It can be set to ON when exact values are desired, for instance when querying the repartition of rows in a partition table.
Command line: --connect-exact-info={0|1}
Scope: Global, Session

`connect_force_bson`

Description: Force using BSON for JSON tables. Starting with these releases, the internal way JSON was parsed and handled was changed. The main advantage of the new way is to reduce the memory required to parse JSON (from 6 to 10 times the size of the JSON source to now only 2 to 4 times). However, this is in Beta mode and JSON tables are still handled using the old mode. To use the new mode, tables should be created with TABLE_TYPE=BSON, or by setting this session variable to 1 or ON. Then, all JSON tables are handled as BSON. This is temporary until the new way replaces the old way by default.
Command line: --connect-force-bson={0|1}
Scope: Global, Session

`connect_indx_map`

Description: Enable file mapping for index files. To accelerate the indexing process, CONNECT makes an index structure in memory from the index file. This can be done by reading the index file or using it as if it was in memory by “file mapping”. Set to 0 (file read, the default) or 1 (file mapping).
Command line: --connect-indx-map=#
Scope: Global
Dynamic: Yes

`connect_java_wrapper`

Description: Java wrapper.
Command line: --connect-java-wrapper=val
Scope: Global, Session
Dynamic: Yes

`connect_json_all_path`

Description: Discovery to generate json path for all columns if ON (the default) or do not when the path is the column name.
Command line: --connect-json-all-path={0|1}
Scope: Global, Session
Dynamic: Yes

`connect_json_grp_size`

Description: Max number of rows for JSON aggregate functions.
Command line: --connect-json-grp-size=#
Scope: Global, Session
Dynamic: Yes

`connect_json_null`

Description: Representation of JSON null values.
Command line: --connect-json-null=value
Scope: Global, Session
Dynamic: Yes

`connect_jvm_path`

Description: Path to JVM library.
Command line: --connect-jvm_path=value
Scope: Global
Dynamic:

`connect_type_conv`

Description: Determines the handling of columns.
- NO: The default until Connect 1.06.005, no conversion takes place, and a TYPE_ERROR is returned, resulting in a “not supported” message.
- YES: The default from Connect 1.06.006. The column is internally converted to a column declared as VARCHAR(n), n being the value of .

`connect_use_tempfile`

Description:
- NO: The first algorithm is always used. Because it can cause errors when updating variable record length tables, this value should be set only for testing.
- AUTO: This is the default value. It leaves CONNECT to choose the algorithm to use. Currently it is equivalent to NO, except when updating variable record length tables (, or ) with file mapping forced to OFF.

`connect_work_size`

Description: Size of the CONNECT work area used for memory allocation. Permits allocating a larger memory sub-allocation space when dealing with very large if sub-allocation fails. If the specified value is too big and memory allocation fails, the size of the work area remains but the variable value is not modified and should be reset.
Command line: --connect-work-size=#
Scope: Global, Session (Session-only from CONNECT 1.03.005)

`connect_xtrace`

Description: Console trace value. Set to 0 (no trace), or to other values if a console tracing is desired. Note that to test this handler, MariaDB should be executed with the parameter because CONNECT prints some error and trace messages on the console. In some Linux versions, this is re-routed into the error log file. Console tracing can be set on the command line or later by names or values. Valid values (from Connect 1.06.006) include:
- 0: No trace
- YES

_{This page is licensed: CC BY-SA / Gnu FDL}

CONNECT MYSQL Table Type: Accessing MySQL/MariaDB Tables

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

This table type uses libmysql API to access a MySQL or MariaDB table or view. This table must be created on the current server or on another local or remote server. This is similar to what the FederatedX storage engine provides with some differences.

Currently the Federated-like syntax can be used to create such a table, for instance:

The connection string can have the same syntax as that used by FEDERATED

However, it can also be mixed with connect standard options. For instance:

It can also be specified as a reference to a federated server:

The pure (deprecated) CONNECT syntax is also accepted:

The specific connection items are:

Option

Default value

Description

- When the host is specified as “localhost”, the connection is established on Linux using Linux sockets. On Windows, the connection is established by default using shared memory if it is enabled. If not, the TCP protocol is used. An alternative is to specify the host as “.” to use a named pipe connection (if it is enabled). This makes possible to use these table types with server skipping networking.

Caution: Take care not to refer to the MYSQL table itself to avoid an infinite loop!

MYSQL table can refer to the current server as well as to another server. Views can be referred by name or directly giving a source definition, for instance:

When specified, the columns of the mysql table must exist in the accessed table with the same name, but can be only a subset of them and specified in a different order. Their type must be a type supported by CONNECT and, if it is not identical to the type of the accessed table matching column, a conversion can be done according to the rules given in .

Note: For columns prone to be targeted by a where clause, keep the column type compatible with the source table column type (numeric or character) to have a correct rephrasing of the where clause.

If you do not want to restrict or change the column definition, do not provide it and leave CONNECT get the column definition from the remote server. For instance:

This will create the essai table with the same columns than the people table. If the target table contains CONNECT incompatible type columns, see to know how these columns can be converted or skipped.

Charset Specification

When accessing the remote table, CONNECT sets the connection charset set to the default local table charset as the FEDERATED engine does.

Do not specify a column character set if it is different from the table default character set even when it is the case on the remote table. This is because the remote column is translated to the local table character set when reading it. This is the default but it can be modified by the setting the variable of the target server. If it must keep its setting, for instance to UTF8 when containing Unicode characters, specify the local default charset to its character set.

This means that it is not possible to correctly retrieve a remote table if it contains columns having different character sets. A solution is to retrieve it by several local tables, each accessing only columns with the same character set.

Indexing of MYSQL tables

Indexes are rarely useful with MYSQL tables. This is because CONNECT tries to access only the requested rows. For instance if you ask:

CONNECT will construct and send to the server the query:

If the people table is indexed on num, indexing are used on the remote server. This, in all cases, will limit the amount of data to retrieve on the network.

However, an index can be specified for columns that are prone to be used to join another table to the MYSQL table. For instance:

If the id column of the remote table addressed by the cnc_tab MYSQL table is indexed (which is likely if it is a key) you should also index the id column of the MYSQL cnc_tab table. If so, using “remote” indexing as does FEDERATED, only the useful rows of the remote table are retrieved during the join process. However, because these rows are retrieved by separate statements, this is useful only when retrieving a few rows of a big table.

In particular, you should not specify an index for columns not used for joining and above all DO NOT index a joined column if it is not indexed in the remote table. This would cause multiple scans of the remote table to retrieve the joined rows one by one.

Data Modifying Operations

The CONNECT MYSQL type supports and and a somewhat limited form of and . These are described below.

The MYSQL type uses similar methods than the ODBC type to implement the , and commands. Refer to the ODBC chapter for the restrictions concerning them.

For the and commands, there are fewer restrictions because the remote server being a MySQL server, the syntax of the command are always acceptable by the remote server.

For instance, you can freely use keywords like IGNORE or LOW_PRIORITY as well as scalar functions in the SET and WHERE clauses.

However, there is still an issue on multi-table statements. Let us suppose you have a t1 table on the remote server and want to execute a query such as:

When parsed locally, you will have errors if no t1 table exists or if it does not have the referenced columns. When t1 does not exist, you can overcome this issue by creating a local dummy t1 table:

This will make the local parser happy and permit to execute the command on the remote server. Note however that having a local MySQL table defined on the remote t1 table does not solve the problem unless it is also names t1 locally.

This is why, to permit to have all types of commands executed by the data source without any restriction, CONNECT provides a specific MySQL table subtype described now.

Sending commands to a MariaDB Server

This can be done like for ODBC or JDBC tables by defining a specific table that are used to send commands and get the result of their execution..

The key points in this create statement are the EXECSRC option and the column definition.

The EXECSRC option tells that this table are used to send commands to the MariaDB server. Most of the sent commands do not return result set. Therefore, the table columns are used to specify the command to be executed and to get the result of the execution. The name of these columns can be chosen arbitrarily, their function coming from the FLAG value:

How to use this table and specify the command to send? By executing a command such as:

This will send the command specified in the WHERE clause to the data source and return the result of its execution. The syntax of the WHERE clause must be exactly as shown above. For instance:

This command returns:

command

warnings

number

message

Sending several commands in one call

It can be faster to execute because there are only one connection for all of them. To send several commands in one call, use the following syntax:

When several commands are sent, the execution stops at the end of them or after a command that is in error. To continue after n errors, set the option maxerr=n (0 by default) in the option list.

Note 1: It is possible to specify the SRCDEF option when creating an EXECSRC table. It are the command sent by default when a WHERE clause is not specified.

Note 2: Backslashes inside commands must be escaped. Simple quotes must be escaped if the command is specified between simple quotes, and double quotes if it is specified between double quotes.

Note 3: Sent commands apply in the specified database. However, they can address any table within this database.

Note 4: Currently, all commands are executed in mode AUTOCOMMIT.

Retrieving Warnings and Notes

If a sent command causes warnings to be issued, it is useless to resend a “show warnings” command because the MariaDB server is opened and closed when sending commands. Therefore, getting warnings requires a specific (and tricky) way.

To indicate that warning text must be added to the returned result, you must send a multi-command query containing “pseudo” commands that are not sent to the server but directly interpreted by the EXECSRC table. These “pseudo” commands are:

Note that they must be spelled (case insensitive) exactly as above, no final “s”. For instance:

This can return something like this:

command

warnings

number

message

The execution continued after the command in error because of the MAXERR option. Normally this would have stopped the execution.

Of course, the last “select” command is useless here because it cannot return the table contain. Another MYSQL table without the EXECSRC option and with proper column definition should be used instead.

Connection Engine Limitations

Data types

There is a maximum key.index length of 255 bytes. You may be able to declare the table without an index and rely on the engine condition pushdown and remote schema.

The following types can't be used:

, , ,
, ,

Note: is allowed. However, the handling depends on the values given to the and system variables, and by default no conversion of TEXT columns is permitted.

SQL Limitations

The following SQL queries are not supported

CONNECT MYSQL versus FEDERATED

The CONNECT MYSQL table type should not be regarded as a replacement for the engine. The main use of the MYSQL type is to access other engine tables as if they were CONNECT tables. This was necessary when accessing tables from some CONNECT table types such as , , , or that are designed to access CONNECT tables only. When their target table is not a CONNECT table, these types are silently using internally an intermediate MYSQL table.

However, there are cases where you can use MYSQL CONNECT tables yourself, for instance:

When the table are used by a table. This enables you to specify the connection parameters for each sub-table and is more efficient than using a local FEDERATED sub-table.
When the desired returned data is directly specified by the SRCDEF option. This is great to let the remote server do most of the job, such as grouping and/or joining tables. This cannot be done with the FEDERATED engine.
To take advantage of the push_cond facility that adds a where clause to the command sent to the remote table. This restricts the size of the result set and can be crucial for big tables.
For tables with the EXECSRC option on.

If you need multi-table updating, deleting, or bulk inserting on a remote table, you can alternatively use the FEDERATED engine or a “send” table specifying the EXECSRC option on.

CONNECT Data Types

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Many data types make no or little sense when applied to plain files. This why CONNECT supports only a restricted set of data types. However, ODBC, JDBC or MYSQL source tables may contain data types not supported by CONNECT. In this case, CONNECT makes an automatic conversion to a similar supported type when it is possible.

The data types currently supported by CONNECT are:

Type name

Description

Used for

TYPE_STRING

This type corresponds to what is generally known as or by database users, or as strings by programmers. Columns containing characters have a maximum length but the character string is of fixed or variable length depending on the file format.

The DATA_CHARSET option must be used to specify the character set used in the data source or file. Note that, unlike usually with MariaDB, when a multi-byte character set is used, the column size represents the number of bytes the column value can contain, not the number of characters.

TYPE_INT

The type contains signed integer numeric 4-byte values (the int of the C language) ranging from –2,147,483,648 to 2,147,483,647 for signed type and 0 to 4,294,967,295 for unsigned type.

TYPE_SHORT

The SHORT data type contains signed values (the short integer of the C language) ranging from –32,768 to 32,767 for signed type and 0 to 65,535 for unsigned type.

TYPE_TINY

The TINY data type contains values (the char of the C language) ranging from –128 to 127 for signed type and 0 to255 for unsigned type. For some table types, TYPE_TINY is used to represent Boolean values (0 is false, anything else is true).

TYPE_BIGINT

The data type contains signed integer 8-byte values (the long long of the C language) ranging from -9,223,372,036,854,775,808 to9,223,372,036,854,775,807 for signed type and from 0 to18,446,744,073,709,551,615 for unsigned type.

Inside tables, the coding of all integer values depends on the table type. In tables represented by text files, the number is written in characters, while in tables represented by binary files (BIN or VEC) the number is directly stored in the binary representation corresponding to the platform.

The length (or precision) specification corresponds to the length of the table field in which the value is stored for text files only. It is used to set the output field length for all table types.

TYPE_DOUBLE

The DOUBLE data type corresponds to the C language type, a floating-point double precision value coded with 8 bytes. Like for integers, the internal coding in tables depends on the table type, characters for text files, and platform binary representation for binary files.

The length specification corresponds to the length of the table field in which the value is stored for text files only. The scale (was_precision_) is the number of decimal digits written into text files. For binary table types (BIN and VEC) this does not apply. The length and_scale_ specifications are used to set the output field length and number of decimals for all types of tables.

TYPE_DECIM

The DECIMAL data type corresponds to what MariaDB or ODBC data sources call NUMBER, NUMERIC, or : a numeric value with a maximum number of digits (the precision) some of them eventually being decimal digits (the scale). The internal coding in CONNECT is a character representation of the number. For instance:

This defines a column colname as a number having a precision of 14 and a scale of 6. Supposing it is populated by:

The internal representation of it are the character string-2658.740000. The way it is stored in a file table depends on the table type. The length field specification corresponds to the length of the table field in which the value is stored and is calculated by CONNECT from the_precision_ and the scale values. This length is precision plus 1 if_scale_ is not 0 (for the decimal point) plus 1 if this column is not unsigned (for the eventual minus sign). In fix formatted tables the number is right justified in the field of width length, for variable formatted tables, such as CSV, the field is the representing character string.

Because this type is mainly used by CONNECT to handle numeric or decimal fields of ODBC, JDBC and MySQL table types, CONNECT does not provide decimal calculations or comparison by itself. This is why decimal columns of CONNECT tables cannot be indexed.

DATE Data type

Internally, date/time values are stored by CONNECT as a signed 4-byte integer. The value 0 corresponds to 01 January 1970 12:00:00 am coordinated universal time (). All other date/time values are represented by the number of seconds elapsed since or before midnight (00:00:00), 1 January 1970, to that date/time value. Date/time values before midnight 1 January 1970 are represented by a negative number of seconds.

CONNECT handles dates from 13 December 1901, 20:45:52 to18 January 2038, 19:14:07.

Although date and time information can be represented in both CHAR and INTEGER data types, the DATE data type has special associated properties. For each DATE value, CONNECT can store all or only some of the following information: century, year, month, day, hour, minute, and second.

Date Format in Text Tables

Internally, date/time values are handled as a signed 4-byte integer. But in text tables (type DOS, FIX, CSV, FMT, and DBF) dates are most of the time stored as a formatted character string (although they also can be stored as a numeric string representing their internal value). Because there are infinite ways to format a date, the format to use for decoding dates, as well as the field length in the file, must be associated to date columns (except when they are stored as the internal numeric value).

Note that this associated format is used only to describe the way the temporal value is stored internally. This format is used both for output to decode the date in a SELECT statement as well as for input to encode the date in INSERT or UPDATE statements. However, what is kept in this value depends on the data type used in the column definition (all the MariaDB temporal values can be specified). When creating a table, the format is associated to a date column using the DATE_FORMAT option in the column definition, for instance:

The SELECT query returns:

Name

Bday

Btime

The values of the INSERT statement must be specified using the standard MariaDB syntax and these values are displayed as MariaDB temporal values. Sure enough, the column formats apply only to the way these values are represented inside the CSV files. Here, the inserted record are:

Note: The field_length option exists because the MariaDB syntax does not allow specifying the field length between parentheses for temporal column types. If not specified, the field length is calculated from the date format (sometimes as a max value) or made equal to the default length value if there is no date format. In the above example it could have been removed as the calculated values are the ones specified. However, if the table type would have been DOS or FIX, these values could be adjusted to fit the actual field length within the file.

A CONNECT format string consists of a series of elements that represent a particular piece of information and define its format. The elements are recognized in the order they appear in the format string. Date and time format elements are replaced by the actual date and time as they appear in the source string. They are defined by the following groups of characters:

Element

Description

Usage Notes

To match the source string, you can add body text to the format string, enclosing it in single quotes or double quotes if it would be ambiguous. Punctuation marks do not need to be quoted.
The hour information is regarded as 12-hour format if a “t” or “tt” element follows the “hh” element in the format or as 24-hour format otherwise.
The "MM", "DD", "hh", "mm", "ss" elements can be specified with one or two letters (e.g. "MM" or "M") making no difference on input, but placing a leading zero to one-digit values on output [] for two-letter elements.
If the format contains elements DDD or DDDD, the day of week name is skipped on input and ignored to calculate the internal date value. On output, the correct day of week name is generated and displayed.

Handling dates that are out of the range of supported CONNECT dates

If you want to make a table containing, for instance, historical dates not being convertible into CONNECT dates, make your column CHAR or VARCHAR and store the dates in the MariaDB format. All date functions applied to these strings will convert them to MariaDB dates and will work as if they were real dates. Of course they must be inserted and are displayed using the MariaDB format.

NULL handling

CONNECT handles for data sources able to produce nulls. Currently this concerns mainly the , , MONGO, , , and table types. For INI, , MONGO or XML types, null values are returned when the key is missing in the section (INI) or when the corresponding node does not exist in a row (XML, JSON, MONGO).

For other file tables, the issue is to define what a null value is. In a numeric column, 0 can sometimes be a valid value but, in some other cases, it can make no sense. The same for character columns; is a blank field a valid value or not?

A special case is DATE columns with a DATE _FORMAT specified. Any value not matching the format can be regarded as NULL.

CONNECT leaves the decision to you. When declaring a column in the statement, if it is declared NOT NULL, blank or zero values are considered as valid values. Otherwise they are considered as NULL values. In all cases, nulls are replaced on insert or update by pseudo null values, a zero-length character string for text types or a zero value for numeric types. Once converted to pseudo null values, they are recognized as NULL only for columns declared as nullable.

For instance:

The select query replies:

Sure enough, the value 0 entered on the first row is regarded as NULL for a nullable column. However, if we execute the query:

This will return no line because a NULL is not equal to 0 in an SQL where clause.

Now let us see what happens with not null columns:

The insert statement will produce a warning saying:

Level

Code

Message

It is replaced by a pseudo null 0 on the fourth row. Let us see the result:

The first query returns no rows, 0 are valid values and not NULL. The second query replies:

It shows that the NULL inserted value was replaced by a valid 0 value.

Unsigned numeric types

They are supported by CONNECT since version 1.01.0010 for fixed numeric types (TINY, SHORT, INTEGER, and BITINT).

Data type conversion

CONNECT is able to convert data from one type to another in most cases. These conversions are done without warning even when this leads to truncation or loss of precision. This is true, in particular, for tables of type ODBC, JDBC, MYSQL and PROXY (via MySQL) because the source table may contain some data types not supported by CONNECT. They are converted when possible to CONNECT types.

When converted, MariaDB types are converted as:

MariaDB Types

CONNECT Type

Remark

For , the length of the column is the length of the longest value of the enumeration. For the length is enough to contain all the set values concatenated with comma separator.

In the case of columns, the handling depends on the values given to the and system variables.

Note: is currently not converted by default until a TYPE_BIN type is added to CONNECT. However, the FORCE option (from Connect 1.06.006) can be specified for blob columns containing text and the SKIP option also applies to ODBC BLOB columns.

ODBC SQL types are converted as:

SQL Types

Connect Type

Remark

JDBC SQL types are converted as:

JDBC Types

Connect Type

Remark

Note: The SKIP option also applies to ODBC and JDBC tables.

Here input and output are used to specify respectively decoding the date to get its numeric value from the data file and encoding a date to write it in the table file. Input is performed within queries; output is performed in or queries.

_{This page is licensed: GPLv2}

CONNECT MONGO Table Type: Accessing Collections from MongoDB

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Classified as a NoSQL database program, MongoDB uses JSON-like documents (BSON) grouped in collections. The MONGO type is used to directly access MongoDB collections as tables.

Accessing MongDB from CONNECT

Accessing MongoDB from CONNECT can be done in different ways:

As a MONGO table via the MongoDB C Driver.
As a MONGO table via the MongoDB Java Driver.
As a JDBC table using some commercially available MongoDB JDBC drivers.
As a JSON table via the MongoDB C or Java Driver.

Using the MongoDB C Driver

This is currently not available from binary distributions but only for versions compiled from source. The preferred version of the MongoDB C Driver is 1.7, because they provide package recognition. What must be done is:

Install libbson and the MongoDB C Driver 1.7.
Configure, compile and install MariaDB.

With earlier versions of the Mongo C Driver, the additional include directories and libraries will have to be specified manually when compiling.

When possible, this is the preferred means of access because it does not require all the Java path settings etc. and is faster than using the Java driver.

Using the Mongo Java Driver

This is possible with all distributions including JDBC support, or compiling from source. With a binary distribution that does not enable the MONGO table type, it is possible to access MongoDB using an OEM module. See for details. The only additional things to do are:

Install the MongoDB Java Driver by downloading its jar file. Several versions are available. If possible use the latest version 3 one.
Add the path to it in the CLASSPATH environment variable or in the connect_class_path variable. This is like what is done to declare JDBC drivers.

Connection is established by new Java wrappers Mongo3Interface and Mongo2Interface. They are available in a JDBC distribution in the Mongo2.jar and Mongo3.jar files (previously JavaWrappers.jar). If version 2 of the Java Driver is used, specify “Version=2” in the option list when creating tables.

Using JDBC

See the documentation of the existing commercial JDBC Mongo drivers.

Using JSON

See the specific chapter of the JSON Table Type.

The following describes the MONGO table type.

CONNECT MONGO Tables

Creating and running MONGO tables requires a connection to a running local or remote MongoDB server.

A MONGO table is defined to access a MongoDB collection. The table rows are the collection documents. For instance, to create a table based on the MongoDB sample collection restaurants, you can do something such as the following:

Note: The used driver is by default the C driver if only the MongoDB C Driver is installed and the Java driver if only the MongoDB Java Driver is installed. If both are available, it can be specified by the DRIVER option to be specified in the option list and defaults to C.

Here we did not define all the items of the collection documents but only those that are JSON values. The database is test by default. The connection value is the URI used to establish a connection to a local or remote MongoDB server. The value shown in this example corresponds to a local server started with its default port. It is the default connection value for MONGO tables so we could have omit specifying it.

Using discovery is available. This table could have been created by:

Here “depth=-1” is used to create only columns that are simple values (no array or object). Without this, with the default value “depth=0” the table had been created as:

Fixing Problems With mariadb-dump

In some case or some platforms, when CONNECT is set up for use with JDBC table types, this causes with the --all-databases option to fail.

This was reported by Robert Dyas who found the cause of it and how to fix it (see ).

This occurs when the Java JRE “Usage Tracker” is enabled. In that case, Java creates a directory #mysql50#.oracle_jre_usage in the mysql data directory that shows up as a database but cannot be accessed via MySQL Workbench nor apparently backed up by mariadb-dump --all-databases.

Per the Oracle documentation () the “Usage Tracker” is disabled by default. It is enabled only when creating the properties file /lib/management/usagetracker.properties. This turns out to be WRONG on some platforms as the file does exist by default on a new installation, and the existence of this file enables the usage tracker.

The solution on CentOS 7 with the Oracle JVM is to rename or delete the usagetracker.properties file (to disable it) and then delete the bogus folder it created in the mysql database directory, then restart.

For example, the following works:

In this collection, the address column is a JSON object and the column grades is a JSON array. Unlike the JSON table, just specifying the column name with no Jpath result in displaying the JSON representation of them. For instance:

name

address

MongoDB Dot Notation

To address the items inside object or arrays, specify the Jpath in MongoDB syntax (if using Discovery, specify the Depth option accordingly):

From Connect 1.7.0002

Before Connect 1.7.0002

If this is not done, the Oracle JVM will start the usage tracker, which will create the hidden folder .oracle_jre_usage in the mysql home directory, which will cause a mariadb-dump of the server to fail.

name

street

score

date

MONGO Specific Options

The MongoDB syntax for Jpath does not allow the CONNECT specific items on arrays. The same effect can still be obtained by a different way. For this, additional options are used when creating MONGO tables.

Option

Type

Description

: To be specified in the option list.

Note: For the content of these options, refer to the MongoDB documentation.

Colist Option

Used to pass different options when making the MongoDB cursor used to retrieve the collation documents. One of them is the projection, allowing to limit the items retrieved in documents. It is hardly useful because this limitation is made automatically by CONNECT. However, it can be used when using discovery to eliminate the _id (or another) column when you are not willing to keep it:

In this example, we added another cursor option, the limit option that works like the limit SQL clause.

This additional option works only with the C driver. When using the Java driver, colist should be:

And limit would be specified with select statements.

Note: When used with a JSON table, to specify the projection list (or ‘all’ to get all columns) makes JPATH to be Connect Json paths, not MongoDB ones, allowing JPATH options not available to MongoDB.

Filter Option

This option is used to specify a “filter” that works as a where clause on the table. Supposing we want to create a table restricted to the restaurant making English cuisine that are not located in the Manhattan borough, we can do it by:

And if we ask:

This query will return:

_id

borough

name

restaurant_id

Pipeline Option

When this option is specified as true (by YES or 1) the Colist option contains a MongoDB pipeline applying to the table collation. This is a powerful mean for doing things such as expanding arrays like we do with JSON tables. For instance:

In this pipeline “$match” is an early filter, “$unwind” means that the grades array are expanded (one Document for each array values) and “$project” eliminates the _id and cuisine columns and gives the Jpath for the date, grade and score columns.

This query replies:

name

grade

score

date

This make possible to get things like we do with JSON tables:

Can be used to get the average score inside the grades array.

name

average

Fullarray Option

This option, like the Depth option, is only interpreted when creating a table with Discovery (meaning not specifying the columns). It tells CONNECT to generate a column for all existing values in the array. For instance, let us see the MongoDB collection tar by:

From Connect 1.7.0002

Before Connect 1.7.0002

The format ‘*’ indicates we want to see the Json documents. This small collection is:

Collection

The Fullarray option can be used here to generate enough columns to see all the prices of the document prices array.

The table has been created as:

From Connect 1.7.0002

Before Connect 1.7.0002

And is displayed as:

item

prices_0

prices_1

prices_2

prices_3

prices_4

Create, Read, Update and Delete Operations

All modifying operations are supported. However, inserting into arrays must be done in a specific way. Like with the Fullarray option, we must have enough columns to specify the array values. For instance, we can create a new table by:

From Connect 1.7.0002

Before Connect 1.7.0002

Now it is possible to populate it by:

The result are:

surname

name

age

price_1

price_2

price_3

Note: If the collection does not exist yet when creating the table and inserting in it, MongoDB creates it automatically.

It can be updated by queries such as:

To look how the array is generated, let us create another table:

From Connect 1.7.0002

Before Connect 1.7.002

This table is displayed as:

From Connect 1.7.0002

name

prices

Before Connect 1.7.002

name

prices

Note: This last table can be used to make array calculations like with JSON tables using the JSON UDF functions. For instance:

This query returns:

name

sum_prices

avg_prices

Note: When calculating on arrays, null values are ignored.

Status of MONGO Table Type

This table type is still under development. It has significant advantages over the JSON type to access MongoDB collections. Firstly, the access being direct, tables are always up to date whether the collection has been modified by another application. Performance wise, it can be faster than JSON, because most processing is done by MongoDB on BSON, its internal representation of JSON data, which is designed to optimize all operations. Note that using the MongoDB C Driver can be faster than using the MongoDB Java Driver.

Current Restrictions

Option “CATFUNC=tables” is not implemented yet.
Options SRCDEF and EXECSRC do not apply to MONGO tables.

_{This page is licensed: CC BY-SA / Gnu FDL}

CONNECT ODBC Table Type: Accessing Tables From Another DBMS

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

ODBC (Open Database Connectivity) is a standard API for accessing database management systems (DBMS). CONNECT uses this API to access data contained in other DBMS without having to implement a specific application for each one. An exception is the access to MySQL that should be done using the MYSQL table type.

Note: On Linux, unixODBC must be installed.

These tables are given the type ODBC. For example, if a "Customers" table is contained in an Access™ database you can define it with a command such as:

Tabname option defaults to the table name. It is required if the source table name is different from the name of the CONNECT table. Note also that for some data sources this name is case sensitive.

Often, because CONNECT can retrieve the table description using ODBC catalog functions, the column definitions can be unspecified. For instance this table can be simply created as:

The BLOCK_SIZE specification are used later to set the RowsetSize when retrieving rows from the ODBC table. A reasonably large RowsetSize can greatly accelerate the fetching process.

If you specify the column description, the column names of your table must exist in the data source table. However, you are not obliged to define all the data source columns and you can change the order of the columns. Some type conversion can also be done if appropriate. For instance, to access the FireBird sample table EMPLOYEE, you could define your table as:

This definition ignores the FIRST_NAME, LAST_NAME, JOB_CODE, and JOB_GRADE columns. It places the FULL_NAME last column of the original table in second position. The type of the HIRE_DATE column was changed from timestamp todate and the type of the DEPT_NO column was changed from char tointeger.

Currently, some restrictions apply to ODBC tables:

Cursor type is forward only (sequential reading).
No indexing of ODBC tables (do not specify any columns as key). However, because CONNECT can often add a where clause to the query sent to the data source, indexing are used by the data source if it supports it. (Remote indexing is available with version 1.04, released with )
CONNECT ODBC supports and . and are also supported in a somewhat restricted way (see below). For other operations, use an ODBC table with the EXECSRC option (see below) to directly send proper commands to the data source.

Random Access of ODBC Tables

In CONNECT version 1.03 (until ) ODBC tables are not indexable. Version 1.04 (from ) adds remote indexing facility to the ODBC table type.

However, some queries require random access to an ODBC table; for instance when it is joined to another table or used in an order by queries applied to a long column or large tables.

There are several ways to enable random (position) access to a CONNECT ODBC table. They are dependant on the following table options:

Option

Type

Used For

* - To be specified in the option_list.

When dealing with small tables, the simpler way to enable random access is to specify a rowset size equal or larger than the table size (or the result set size if a push down where clause is used). This means that the whole result is in memory on the first fetch and CONNECT will use it for further positional accesses.

Another way to have the result set in memory is to use the memory option. This option can be set to the following values:

0. No memory used (the default). Best when the table is read sequentially as in SELECT statements with only eventual WHERE clauses.1. Memory size required is calculated during the first sequential table read. The allocated memory is filled during the second sequential read. Then the table rows are retrieved from the memory. This should be used when the table are accessed several times randomly, such as in sub-selects or being the target table of a join.2. A first query is executed to get the result set size and the needed memory is allocated. It is filled on the first sequential reading. Then random access of the table is possible. This can be used in the case of ORDER BY clauses, when MariaDB uses position reading.

Note that the best way to handle ORDER BY is to set the max_length_for_sort_data variable to a larger value (its default value is 1024 that is pretty small). Indeed, it requires less memory to be used, particularly when a WHERE clause limits the retrieved data set. This is because in the case of an order by query, MariaDB firstly retrieves the sequentially the result set and the position of each records. Often the sort can be done from the result set if it is not too big. But if too big, or if it implies some “long” columns, only the positions are sorted and MariaDB retrieves the final result from the table read in random order. If setting the max_length_for_sort_data variable is not feasible or does not work, to be able to retrieve table data from memory after the first sequential read, the memory option must be set to 2.

For tables too large to be stored in memory another possibility is to make your table to use a scrollable cursor. In this case each randomly accessed row can be retrieved from the data source specifying its cursor position, which is reasonably fast. However, scrollable cursors are not supported by all data sources.

With CONNECT version 1.04 (from ), another way to provide random access is to specify some columns to be indexed. This should be done only when the corresponding column of the source table is also indexed. This should be used for tables too large to be stored in memory and is similar to the remote indexing used by the and by the .

There remains the possibility to extract data from the external table and to construct another table of any file format from the data source. For instance to construct a fixed formatted DOS table containing the CUSTOMER table data, create the table as

Now you can use custfix for fast database operations on the copied_customer_ table data.

Retrieving data from a spreadsheet

ODBC can also be used to create tables based on tabular data belonging to an Excel spreadsheet:

This supposes that a tabular zone of the sheet including column headers is defined as a table named CONTACT or using a “named reference”. Refer to the Excel documentation for how to specify tables inside sheets. Once done, you can ask:

This will extract the data from Excel and display:

Nom

Fonction

Societe

Here again, the columns description was left to CONNECT when creating the table.

Multiple ODBC tables

The concept of multiple tables can be extended to ODBC tables when they are physically represented by files, for instance to Excel or Access tables. The condition is that the connect string for the table must contain a field DBQ=filename, in which wildcard characters can be included as for multiple=1 tables in their filename. For instance, a table contained in several Excel files CA200401.xls, CA200402.xls, ...CA200412.xls can be created by a command such as:

Providing that in each file the applying information is internally set for Excel as a table named "bank account". This extension to ODBC does not support_multiple_=2. The qchar option was specified to make the identifiers quoted in the select statement sent to ODBC, in particular the when the table or column names contain blanks, to avoid SQL syntax errors.

Caution: Avoid accessing tables belonging to the currently running MariaDB server via the MySQL ODBC connector. This may not work and may cause the server to be restarted.

Performance consideration

To avoid extracting entire tables from an ODBC source, which can be a lengthy process, CONNECT extracts the "compatible" part of query WHERE clauses and adds it to the ODBC query. Compatible means that it must be understood by the data source. In particular, clauses involving scalar functions are not kept because the data source may have different functions than MariaDB or use a different syntax. Of course, clauses involving sub-select are also skipped. This will transfer eventual indexing to the data source.

Take care with clauses involving string items because you may not know whether they are treated by the data source as case sensitive or case insensitive. If in doubt, make your queries as if the data source was processing strings as case sensitive to avoid incomplete results.

Using ODBC Tables inside correlated sub-queries

Unlike not correlated subqueries that are executed only once, correlated subqueries are executed many times. It is what ODBC calls a "requery". Several methods can be used by CONNECT to deal with this depending on the setting of the MEMORY or SCROLLABLE Boolean options:

Option

Description

Note: the MEMORY and SCROLLABLE options must be specified in the OPTION _ LIST.

Because the table is accessed several times, this can make queries last very long except for small tables and is almost unacceptable for big tables. However, if it cannot be avoided, using the memory method is the best choice and can be more than four times faster than the default method. If it is supported by the driver, using a scrollable cursor is slightly slower than using memory but can be an alternative to avoid memory problems when the sub-query returns a huge result set.

If the result set is of reasonable size, it is also possible to specify the block_size option equal or slightly larger than the result set. The whole result set being read on the first fetch, can be accessed many times without having to do anything else.

Another good workaround is to replace within the correlated sub-query the ODBC table by a local copy of it because MariaDB is often able to optimize the query and to provide a very fast execution.

Accessing specified views

Instead of specifying a source table name via the TABNAME option, it is possible to retrieve data from a “view” whose definition is given in a new option SRCDEF. For instance:

Or simply, because CONNECT can retrieve the returned column definition:

Then, when executing for instance:

The processing of the group by is done by the data source, which returns only the generated result set on which only the where clause is performed locally. The result:

country

customers

This makes possible to let the data source do complicated operations, such as joining several tables or executing procedures returning a result set. This minimizes the data transfer through ODBC.

Data Modifying Operations

The only data modifying operations are the , and commands. They can be executed successfully only if the data source database or tables are not read/only.

INSERT Command

When inserting values to an ODBC table, local values are used and sent to the ODBC table. This does not make any difference when the values are constant but in a query such as:

Where t1 is an ODBC table, t2 is a locally defined table that must exist on the local server. Besides, it is a good way to create a distant ODBC table from local data.

CONNECT does not directly support INSERT commands such as:

Sure enough, the “on duplicate key update” part of it is ignored, and will result in error if the key value is duplicated.

UPDATE and DELETE Commands

Unlike the command, and are supported in a simplified way. Only simple table commands are supported; CONNECT does not support multi-table commands, commands sent from a procedure, or issued via a trigger. These commands are just rephrased to correspond to the data source syntax and sent to the data source for execution. Let us suppose we created the table:

We can populate it by:

The function now() are executed by MariaDB and it returned value sent to the ODBC table.

Let us see what happens when updating the table. If we use the query:

CONNECT will rephrase the command as:

What it did is just to replace the local table name with the remote table name and change all the back ticks to blanks or to the data source identifier quoting characters if QUOTED is specified. Then this command are sent to the data source to be executed by it.

This is simpler and can be faster than doing a positional update using a cursor and commands such as “select ... for update of ...” that are not supported by all data sources. However, there are some restrictions that must be understood due to the way it is handled by MariaDB.

MariaDB does not know about all the above. The command are parsed as if it were to be executed locally. Therefore, it must respect the MariaDB syntax.
Being executed by the data source, the (rephrased) command must also respect the data source syntax.
All data referenced in the SET and WHERE clause belongs to the data source.

This is possible because both MariaDB and the data source are using the SQL language. But you must use only the basic features that are part of the core SQL language. For instance, keywords like IGNORE or LOW_PRIORITY will cause syntax error with many data source.

Scalar function names also can be different, which severely restrict the use of them. For instance:

This will not work with SQLite3, the data source returning an “unknown scalar function” error message. Note that in this particular case, you can rephrase it to:

This understood by both parsers, and even if this function would return NULL executed by MariaDB, it does return the current date when executed by SQLite3. But this begins to become too trickery so to overcome all these restrictions, and permit to have all types of commands executed by the data source, CONNECT provides a specific ODBC table subtype described now.

Sending commands to a Data Source

This can be done using a special subtype of ODBC table. Let us see this in an example:

The key points in this create statement are the EXECSRC option and the column definition.

The EXECSRC option tells that this table are used to send a command to the data source. Most of the sent commands do not return result set. Therefore, the table columns are used to specify the command to be executed and to get the result of the execution. The name of these columns can be chosen arbitrarily, their function coming from the FLAG value:

How to use this table and specify the command to send? By executing a command such as:

This will send the command specified in the WHERE clause to the data source and return the result of its execution. The syntax of the WHERE clause must be exactly as shown above. For instance:

This command returns:

command

number

message

Now we can create a standard ODBC table on the newly created table:

We can populate it directly using the supported statement:

And see the result:

name

birth

rem

Any command, for instance , can be executed from the crlite table:

This command returns:

command

number

message

Let us verify it:

name

birth

rem

The syntax to send a command is rather strange and may seem unnatural. It is possible to use an easier syntax by defining a stored procedure such as:

Now you can send commands like this:

This is possible only when sending one single command.

Sending several commands together

Grouping commands uses an easier syntax and is faster because only one connection is made for the all of them. To send several commands in one call, use the following syntax:

When several commands are sent, the execution stops at the end of them or after a command that is in error. To continue after n errors, set the option maxerr=n (0 by default) in the option list.

Note 1: It is possible to specify the SRCDEF option when creating an EXECSRC table. It are the command sent by default when a WHERE clause is not specified.

Note 2: Most data sources do not allow sending several commands separated by semi-colons.

Note 3: Quotes inside commands must be escaped. This can be avoided by using a different quoting character than the one used in the command

Note 4: The sent command must obey the data source syntax.

Note 5: Sent commands apply in the specified database. However, they can address any table within this database, or belonging to another database using the name syntax schema.tabname.

Connecting to a Data Source

There are two ways to establish a connection to a data source:

Using SQLDriverConnect and a Connection String
Using SQLConnect and a Data Source Name (DSN)

The first way uses a Connection String whose components describe what is needed to establish the connection. It is the most complete way to do it and by default CONNECT uses it.

The second way is a simplified way in which ODBC is just given the name of a DSN that must have been defined to ODBC or UnixOdbc and that contains the necessary information to establish the connection. Only the user name and password can be specified out of the DSN specification.

Defining the Connection String

Using the first way, the connection string must be specified. This is sometimes the most difficult task when creating ODBC tables because, depending on the operating system and the data source, this string can widely differ.

The format of the ODBC Connection String is:

Where character-string has zero or more characters; identifier has one or more characters; attribute- keyword is not case-sensitive; attribute-value may be case-sensitive; and the value of the DSN keyword does not consist solely of blanks. Due to the connection string grammar, keywords and attribute values that contain the characters []{}(),;?*=!@ should be avoided. The value of the DSN keyword cannot consist only of blanks, and should not contain leading blanks. Because of the grammar of the system information, keywords and data source names cannot contain the backslash () character. Applications do not have to add braces around the attribute value after the DRIVER keyword unless the attribute contains a semicolon (;), in which case the braces are required. If the attribute value that the driver receives includes the braces, the driver should not remove them, but they should be part of the returned connection string.

ODBC Defined Connection Attributes

The ODBC defined attributes are:

DSN - the name of the data source to connect to. You must create this before attempting to refer to it. You create new DSNs through the ODBC Administrator (Windows), ODBCAdmin (unixODBC's GUI manager) or in the odbc.ini file.
DRIVER - the name of the driver to connect to. You can use this in DSN-less connections.
FILEDSN - the name of a file containing the connection attributes.
UID/PWD - any username and password the database requires for authentication.

Other attributes are DSN dependent attributes. The connection string can give the name of the driver in the DRIVER field or the data source in the DSN field (attention! meet the spelling and case) and has other fields that depend on the data source. When specifying a file, the DBQ field must give the full path and name of the file containing the table. Refer to the specific ODBC connector documentation for the exact syntax of the connection string.

Using a Predefined DSN

This is done by specifying in the option list the Boolean option “UseDSN” as yes or 1. In addition, string options “user” and “password” can be optionally specified in the option list.

When doing so, the connection string just contains the name of the predefined Data Source. For instance:

Note: the connection data source name (limited to 32 characters) should not be preceded by “DSN=”.

ODBC Tables on Linux/Unix

In order to use ODBC tables, you will need to have unixODBC installed. Additionally, you will need the ODBC driver for your foreign server's protocol. For example, for MS SQL Server or Sybase, you will need to have FreeTDS installed.

Make sure the user running mysqld (usually the mysql user) has permission to the ODBC data source configuration and the ODBC drivers. If you get an error on Linux/Unix when using TABLE_TYPE=ODBC:

You must make sure that the user running mysqld (usually "mysql") has enough permission to load the ODBC driver library. It can happen that the driver file does not have enough read privileges (use chmod to fix this), or loading is prevented by SELinux configuration (see below).

Try this command in a shell to check if the driver had enough permission:

SELinux

SELinux can cause various problems. If you think SELinux is causing problems, check the system log (e.g. /var/log/messages) or the audit log (e.g. /var/log/audit/audit.log).

mysqld can't load some executable code, so it can't use the ODBC driver.

Example error:

Audit log:

mysqld can't open TCP sockets on some ports, so it can't connect to the foreign server.

Example error:

Audit log:

ODBC Catalog Information

Depending on the version of the used ODBC driver, some additional information on the tables are existing, such as table QUALIFIER or OWNER for old versions, now named CATALOG or SCHEMA since version 3.

CATALOG is apparently rarely used by most data sources, but SCHEMA (formerly OWNER) is and corresponds to the DATABASE information of MySQL.

The issue is that if no schema name is specified, some data sources return information for all schemas while some others only return the information of the “default” schema. In addition, the used “schema” or “database” is sometimes implied by the connection string and sometimes is not. Sometimes, it also can be included in a data source definition.

CONNECT offers two ways to specify this information:

When specified, the DBNAME create table option is regarded by ODBC tables as the SCHEMA name.
Table names can be specified as “cat.sch.tab” allowing to set the catalog and schema info.

When both are used, the qualified table name has precedence over DBNAME . For instance:

Tabname

DBname

Description

When creating a standard ODBC table, you should make sure only one source table is specified. Specifying more than one source table must be done only for CONNECT catalog tables (with CATFUNC=tables or columns).

In particular, when column definition is left to the Discovery feature, if tables with the same name are present in several schemas and the schema name is not specified, several columns with the same name are generated. This will make the creation fail with a not very explicit error message.

Note: With some ODBC drivers, the DBNAME option or qualified table name is useless because the schema implied by the connection string or the definition of the data source has priority over the specified DBNAME .

Table name case

Another issue when dealing with ODBC tables is the way table and column names are handled regarding of the case.

For instance, Oracle follows to the SQL standard here. It converts non-quoted identifiers to upper case. This is correct and expected. PostgreSQL is not standard. It converts identifiers to lower case. MySQL/MariaDB is not standard. They preserve identifiers on Linux, and convert to lower case on Windows.

Think about that if you fail to see a table or a column on an ODBC data source.

Non-ASCII Character Sets with Oracle

When connecting through ODBC, the MariaDB Server operates as a client to the foreign database management system. As such, it requires that you configure MariaDB as you would configure native clients for the given database server.

In the case of connecting to Oracle, when using non-ASCI character sets, you need to properly set the NLS_LANG environment variable before starting the MariaDB Server.

For instance, to test this on Oracle, create a table that contains a series of special characters:

Then create a connecting table on MariaDB and attempt the same query:

While the character set is defined in a way that satisfies MariaDB, it has not been defined for Oracle, (that is, setting the NLS_LANG environment variable). As a result, Oracle is not providing the characters you want to MariaDB and Connect. The specific method of setting the NLS_LANG variable can vary depending on your operating system or distribution. If you're experiencing this issue, check your OS documentation for more details on how to properly set environment variables.

Using systemd

With Linux distributions that use , you need to set the environment variable in the service file, (systemd doesn't read from the /etc/environment file).

This is done by setting the Environment variable in the [Service] unit. For instance,

Then restart MariaDB,

You can now retrieve the appropriate characters from Oracle tables:

Using Windows

Microsoft Windows doesn't ignore environment variables the way systemd does on Linux, but it does require that you set the NLS_LANG environment variable on your system. In order to do so, you need to open an elevated command-prompt, (that is, Cmd.exe with administrative privileges).

From here, you can use the Setx command to set the variable. For instance,

Note: For more detail about this, see .

`OPTION_LIST` Values Supported by the ODBC Tables

The following options can be given as comma-separated string to the OPTION_LIST value in the CREATE TABLE statement.

Name

Default

Description

_{This page is licensed: GPLv2}

CONNECT XML Table Type

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Overview

CONNECT supports tables represented by XML files. For these tables, the standard input/output functions of the operating system are not used but the parsing and processing of the file is delegated to a specialized library. Currently two such systems are supported: libxml2, a part of the GNOME framework, but which does not require GNOME and, on Windows, MS-DOM (DOMDOC), the Microsoft standard support of XML documents.

DOMDOC is the default for the Windows version of CONNECT and libxml2 is always used on other systems. On Windows the choice can be specified using the XMLSUP list option, for instance specifyingoption_list='xmlsup=libxml2'.

Creating XML tables

First of all, it must be understood that XML is a very general language used to encode data having any structure. In particular, the tag hierarchy in an XML file describes a tree structure of the data. For instance, consider the file:

It represents data having the structure:

This structure seems at first view far from being tabular. However, modern database management systems, including MariaDB, implement something close to the relational model and work on tables that are structurally not hierarchical but tabular with rows and columns.

Nevertheless, CONNECT can do it. Of course, it cannot guess what you want to extract from the XML structure, but gives you the possibility to specify it when you create the table[].

Let us take a first example. Suppose you want to make a table from the above document, displaying the node contents.

For this, you can define a table xsamptag as:

It are displayed as:

AUTHOR

TITLE

TRANSLATOR

PUBLISHER

DATEPUB

Let us try to understand what happened. By default the column names correspond to tag names. Because this file is rather simple, CONNECT was able to default the top tag of the table as the root node <BIBLIO> of the file, and the row tags as the <BOOK> children of the table tag. In a more complex file, this should have been specified, as we will see later. Note that we didn't have to worry about the sub-tags such as <FIRSTNAME> or <LASTNAME> because CONNECT automatically retrieves the entire text contained in a tag and its sub-tags[].

Only the first author of the first book appears. This is because only the first occurrence of a column tag has been retrieved so the result has a proper tabular structure. We will see later what we can do about that.

How can we retrieve the values specified by attributes? By using a Coltype table option to specify the default column type. The value ‘@’ means that column names match attribute names. Therefore, we can retrieve them by creating a table such as:

This table returns the following:

ISBN

LANG

SUBJECT

Now to define a table that will give us all the previous information, we must specify the column type for each column. Because in the next statement the column type defaults to Node, the field_format column parameter was used to indicate which columns are attributes:

From Connect 1.7.0002

Before Connect 1.7.0002

Once done, we can enter the query:

This will return the following result:

SUBJECT

LANG

TITLE

AUTHOR

Note that we have been lucky. Because unlike SQL, XML is case sensitive and the column names have matched the node names only because the column names were given in upper case. Note also that the order of the columns in the table could have been different from the order in which the nodes appear in the XML file.

Using Xpaths with XML tables

Xpath is used by XML to locate and retrieve nodes. The table's main node Xpath is specified by the tabname option. If just the node name is given, CONNECT constructs an Xpath such as ‘BIBLIO’ in the example above that should retrieve the BIBLIO node wherever it is within the XML file.

The row nodes are by default the children of the table node. However, for instance to eliminate some children nodes that are not real row nodes, the row node name can be specified using the rownode sub-option of the option_list option.

The field_format options we used above can be specified to locate more precisely where and what information to retrieve using an Xpath-like syntax. For instance:

From Connect 1.7.0002

Before Connect 1.7.0002

This very flexible column parameter serves several purposes:

To specify the tag name, or the attribute name if different from the column name.
To specify the type (tag or attribute) by a prefix of '@' for attributes.
To specify the path for sub-tags using the '/' character.

This path is always relative to the current context (the column top node) and cannot be specified as an absolute path from the document root, therefore a leading '/' cannot be used. The path cannot be variable in node names or depth, therefore using '//' is not allowed.

The query:

replies:

ISBN

TITLE

TRANSLATED

TRANFN

TRANLN

LOCATION

Libxml2 default name space issue

An issue with libxml2 is that some files can declare a default name space in their root node. Because Xpath only searches in that name space, the nodes will not be found if they are not prefixed. If this happens, specify the tabname option as an Xpath ignoring the current name space:

This must also be done for the default of specified Xpath of the not attribute columns. For instance:

Note: This raises an error (and is useless anyway) with DOMDOC.

Direct access on XML tables

Direct access is available on XML tables. This means that XML tables can be sorted and used in joins, even in the one-side of the join.

However, building a permanent index is not yet implemented. It is unclear whether this can be useful. Indeed, the DOM implementation that is used to access these tables firstly parses the whole file and constructs a node tree in memory. This may often be the longest part of the process, so the use of an index would not be of great value. Note also that this limits the XML files to a reasonable size. Anyway, when speed is important, this table type is not the best to use. Therefore, in these cases, it is probably better to convert the file to another type by inserting the XML table into another table of a more appropriate type for performance.

Accessing tags with namespaces

With the Windows DOMDOC support, this can be done using the prefix in the tabname column option and/or xpath column option. For instance, given the file gns.xml:

and the defined CONNECT table:

Displays:

lon

lat

ele

time

Only the prefixed ‘ele’ tag is recognized.

However, this does not work with the libxml2 support. The solution is then to use a function ignoring the name space:

Then :

Displays:

lon

lat

ele

time

This time, all ‘ele` tags are recognized. This solution does not work with DOMDOC.

Having Columns defined by Discovery

It is possible to let the MariaDB discovery process do the job of column specification. When columns are not defined in the statement, CONNECT endeavours to analyze the XML file and to provide the column specifications. This is possible only for true XML tables, but not for HTML tables.

For instance, the xsamp table could have been created specifying:

Let’s check how it was actually specified using the SHOW CREATE TABLE statement:

It is equivalent except for the column sizes that have been calculated from the file as the maximum length of the corresponding column when it was a normal value. Also, all columns are specified as type because XML does not provide information about the node content data type. Nullable is set to true if the column is missing in some rows.

If a more complex definition is desired, you can ask CONNECT to analyse the XPATH up to a given level using the level option in the option list. The level value is the number of nodes that are taken in the XPATH. For instance:

This will define the table as:

From Connect 1.7.0002

Then if we ask:

Everything seems correct when we get the result:

SUBJECT

AUTHOR

TITLE

TRANSLATOR

PUBLISHER

However if we enter the apparently equivalent query on the xsampall table, based on the same file:

this returns an apparently wrong answer:

SUBJECT

AUTHOR

TITLE

TRANSLATOR

PUBLISHER

What happened here? Simply, because we used the xsamp table to do the Insert, what has been inserted within the XML file had the structure described for xsamp:

CONNECT cannot "invent" sub-tags that are not part of the xsamp table. Because these sub-tags do not exist, the xsampall table cannot retrieve the information that should be attached to them. If we want to be able to query the XML file by all the defined tables, the correct way to insert a new book to the file is to use the xsampall table, the only one that addresses all the components of the original document:

Now the added book, in the XML file, will have the required structure:

Note: We used a column list in the Insert statements when creating the table to avoid generating a <TRANSLATOR> node with sub-nodes, all containing null values (this works on Windows only).

Multiple nodes in the XML document

Let us come back to the above example XML file. We have seen that the author node can be "multiple" meaning that there can be more than one author of a book. What can we do to get the complete information fitting the relational model? CONNECT provides you with two possibilities, but is restricted to only one such multiple node per table.

The first and most challenging one is to return as many rows than there are authors, the other columns being repeated as if we had make a join between the author column and the rest of the table. To achieve this, simply specify the “multiple” node name and the “expand” option when creating the table. For instance, we can create the xsamp2 table like this:

In this statement, the Limit option specifies the maximum number of values that are expanded. If not specified, it defaults to 10. Any values above the limit are ignored and a warning message issued[]. Now you can enter a query such as:

This will retrieve and display the following result:

ISBN

SUBJECT

AUTHOR

TITLE

In this case, this is as if the table had four rows. However if we enter the query:

this time the result are:

ISBN

SUBJECT

TITLE

PUBLISHER

Because the author column does not appear in the query, the corresponding row was not expanded. This is somewhat strange because this would have been different if we had been working on a table of a different type. However, it is closer to the relational model for which there should not be two identical rows (tuples) in a table. Nevertheless, you should be aware of this somewhat erratic behavior. For instance:

This last query replies:

ISBN

SUBJECT

TITLE

PUBLISHER

Even though the author column does not appear in the result, the corresponding row was expanded because the multiple column was used in the where clause.

Intermediate multiple node

The "multiple" node can be an intermediate node. If we want to do the same expanding with the xsampall table, there are nothing more to do. The_xsampall2_ table can be created with:

From Connect 1.7.0002

Before Connect 1.7.0002

The only difference is that the "multiple" node is an intermediate node in the path. The resulting table can be seen with a query such as:

This query displays:

SUBJECT

LANG

TITLE

FIRST

LAST

YEAR

These composite tables, half array half tree, reserve some surprises for us when updating, deleting from or inserting into them. Insert just cannot generate this structure; if two rows are inserted with just a different author, two book nodes are generated in the XML file. Delete always deletes one book node and all its children nodes even if specified against only one author. Update is more complicated:

After these three updates, the first two responding "Affected rows: 1" and the last one responding "Affected rows: 2", the last query answers:

subject

lang

title

first

last

year

What must be understood here is that the Update modifies node values in the XML file, not cell values in the relational table. The first update worked normally. The second update changed the year value of the book and this shows for the two expanded rows because there is only one DATEPUB node for that book. Because the third update applies to a row having a certain date value, both author names were updated.

Making a List of Multiple Values

Another way to see multiple values is to ask CONNECT to make a comma separated list of the multiple node values. This time, it can only be done if the "multiple" node is not intermediate. For example, we can modify the xsamp2 table definition by:

This time 'Expand' is not specified, and Limit gives the maximum number of items in the list. Now if we enter the query:

We will get the following result:

ISBN

SUBJECT

AUTHOR(S)

TITLE

Note that updating the "multiple" column is not possible because CONNECT does not know which of the nodes to update.

This could not have been done with the xsampall2 table because the author node is intermediate in the path, and making two lists, one of first names and another one of last names would not make sense anyway.

What if a table contains several multiple nodes

This can be handled by creating several tables on the same file, each containing only one multiple node and constructing the desired result using joins.

Support of HTML Tables

Most tables included in HTML documents cannot be processed by CONNECT because the HTML language is often not compatible with the syntax of XML. In particular, XML requires all open tags to be matched by a closing tag while it is sometimes optional in HTML. This is often the case concerning column tags.

However, you can meet tables that respect the XML syntax but have some of the features of HTML tables. For instance:

Here the different column tags are included in <td></td> tags as for HTML tables. You cannot just add this tag in the Xpath of the columns, because the search is done on the first occurrence of each tag, and this would cause this search to fail for all columns except the first one. This case is handled by specifying the Colnode table option that gives the name of these column tags, for example:

From Connect 1.7.0002

Before Connect 1.7.0002

The table are displayed as:

Name

Origin

Description

However, you can deal with tables even closer to the HTML model. For example the coffee.htm file:

Here column values are directly represented by the TD tag text. You cannot declare them as tags nor as attributes. In addition, they are not located using their name but by their position within the row. Here is how to declare such a table to CONNECT:

You specify the fact that columns are located by position by setting the_Coltype_ option to 'HTML'. Each column position (0 based) are the value of the flag column parameter that is set by default in sequence. Now we are able to display the table:

Name

Cups

Type

Sugar

Note 1: We specified 'header=n' in the create statement to indicate that the first n rows of the table are not data rows and should be skipped.

Note 2: In this last example, we did not specify the node names using the Rownode and Colnode options because when Coltype is set to 'HTML' they default to 'Rownode=TR' and 'Colnode=TD'.

Note 3: The Coltype option is a word only the first character of which is significant. Recognized values are:

New file setting

Some create options are used only when creating a table on a new file, i. e. when inserting into a file that does not exist yet. When specified, the 'Header' option will create a header row with the name of the table columns. This is chiefly useful for HTML tables to be displayed on a web browser.

Some new list-options are used in this context:

Let us see for instance, the following create statement:

Supposing the table file does not exist yet, the first insert into that table, for instance by the following statement:

will generate the following file:

This file can be used to display the table on a web browser (encoding should beISO-8859-x)

handler

version

author

description

maturity

Note: The XML document encoding is generally specified in the XML header node and can be different from the DATA_CHARSET, which is always UTF-8 for XML tables. Therefore the table DATA_CHARSET character set should be unspecified, or specified as UTF8. The Encoding specification is useful only for new XML files and ignored for existing files having their encoding already specified in the header node.

Notes

CONNECT does not claim to be able to deal with any XML document. Besides, those that can usefully be processed for data analysis are likely to have a structure that can easily be transformed into a table.
With libxml2, sub tags text can be separated by 0 or several blanks depending on the structure and indentation of the data file.
This may cause some rows to be lost because an eventual where clause on the “multiple” column is applied only on the limited number of retrieved rows.

_{This page is licensed: CC BY-SA / Gnu FDL}

CONNECT JSON Table Type

The CONNECT storage engine has been deprecated.

This storage engine has been deprecated.

Overview

JSON (JavaScript Object Notation) is a lightweight data-interchange format widely used on the Internet. Many applications, generally written in JavaScript or PHP use and produce JSON data, which are exchanged as files of different physical formats. JSON data is often returned from REST queries.

It is also possible to query, create or update such information in a database-like manner. MongoDB does it using a JavaScript-like language. PostgreSQL includes these facilities by using a specific data type and related functions like dynamic columns.

The CONNECT engine adds this facility to MariaDB by supporting tables based on JSON data files. This is done like for XML tables by creating tables describing what should be retrieved from the file and how it should be processed.

Starting with 1.07.0002, the internal way JSON was parsed and handled was changed. The main advantage of the new way is to reduce the memory required to parse JSON. It was from 6 to 10 times the size of the JSON source and is now only 2 to 4 times. However, this is in Beta mode and JSON tables are still handled using the old mode. To use the new mode, tables should be created with TABLE_TYPE=BSON. Another way is the set the session variable to 1 or ON. Then all JSON tables are handled as BSON. Of course, this is temporary and when successfully tested, the new way will replace the old way and all tables be created as JSON.

Let us start from the file “biblio3.json” that is the JSON equivalent of the XML Xsample file described in the XML table chapter:

This file contains the different items existing in JSON.

Arrays: They are enclosed in square brackets and contain a list of comma separated values.
Objects: They are enclosed in curly brackets. They contain a comma separated list of pairs, each pair composed of a key name between double quotes, followed by a ‘:’ character and followed by a value.
Values: Values can be an array or an object. They also can be a string between double quotes, an integer or float number, a Boolean value or a null value. The simplest way for CONNECT to locate a table in such a file is by an array containing a list of objects (this is what MongoDB calls a collection of documents). Each array value are a table row and each pair of the row objects will represent a column, the key being the column name and the value the column value.

A first try to create a table on this file are to take the outer array as the table:

If we execute the query:

We get the result:

isbn

author

title

publisher

Note that by default, column values that are objects have been set to the concatenation of all the string values of the object separated by a blank. When a column value is an array, only the first item of the array is retrieved (This will change in later versions of Connect).

However, things are generally more complicated. If JSON files do not contain attributes (although object pairs are similar to attributes) they contain a new item, arrays. We have seen that they can be used like XML multiple nodes, here to specify several authors, but they are more general because they can contain objects of different types, even it may not be advisable to do so.

This is why CONNECT enables the specification of a column field_format option “JPATH” (FIELD_FORMAT until Connect 1.6) that is used to describe exactly where the items to display are and how to handles arrays.

Here is an example of a new table that can be created on the same file, allowing choosing the column names, to get some sub-objects and to specify how to handle the author array.

Until Connect 1.5:

From Connect 1.6:

From Connect 1.07.0002

Given the query:

The result is:

title

author

publisher

location

Note: The JPATH was not specified for column ISBN because it defaults to the column name.

Here is another example showing that one can choose what to extract from the file and how to “expand” an array, meaning to generate one row for each array value:

Until Connect 1.5:

From Connect 1.6:

From Connect 1.06.006:

From Connect 1.07.0002

It is displayed as:

ISBN

Title

AuthorFN

AuthorLN

Year

Note: The example above shows that the ‘$.’, that means the beginning of the path, can be omitted.

The Jpath Specification

From Connect 1.6, the Jpath specification has changed to be the one of the native JSON functions and more compatible with what is generally used. It is close to the standard definition and compatible to what MongoDB and other products do. The ‘:’ separator is replaced by ‘.’. Position in array is accepted MongoDB style with no square brackets. Array specification specific to CONNECT are still accepted but [*] is used for expanding and [x] for multiply. However, tables created with the previous syntax can still be used by adding SEP_CHAR=’:’ (can be done with alter table). Also, it can be now specified as JPATH (was FIELD_FORMAT) but FIELD_FORMAT is still accepted.

Until Connect 1.5, it is the description of the path to follow to reach the required item. Each step is the key name (case sensitive) of the pair when crossing an object, and the number of the value between square brackets when crossing an array. Each specification is separated by a ‘:’ character.

From Connect 1.6, It is the description of the path to follow to reach the required item. Each step is the key name (case sensitive) of the pair when crossing an object, and the position number of the value when crossing an array. Key specifications are separated by a ‘.’ character.

For instance, in the above file, the last name of the second author of a book is reached by:

$.AUTHOR[1].LASTNAME standard style &#xNAN;$AUTHOR.1.LASTNAME MongoDB style AUTHOR:[1]:LASTNAME old style when SEP_CHAR=’:’ or until Connect 1.5

The ‘$’ or “$.” prefix specifies the root of the path and can be omitted with CONNECT.

The array specification can also indicate how it must be processed:

For instance, in the above file, the last name of the second author of a book is reached by:

The array specification can also indicate how it must be processed:

Specification

Array Type

Limit

Description

Note 1: When the LIMIT restriction is applicable, only the first m array items are used, m being the value of the LIMIT option (to be specified in option_list). The LIMIT default value is 10.

Note 2: An alternative way to indicate what is to be expanded is to use the expand option in the option list, for instance:

AUTHOR is here the key of the pair that has the array as a value (case sensitive). Expand is limited to only one branch (expanded arrays must be under the same object).

Let us take as an example the file expense.json (). The table jexpall expands all under and including the week array:

From Connect 1.07.0002

From Connect.1.6

Until Connect 1.5:

WHO

WEEK

WHAT

AMOUNT

The table jexpw shows what was bought and the sum and average of amounts for each person and week:

From Connect 1.07.0002

From Connect 1.6:

Until Connect 1.5:

WHO

WEEK

WHAT

SUM

AVERAGE

Let us see what the table jexpz does:

From Connect 1.6:

From Connect 1.07.0002

Until Connect 1.5:

WHO

WEEKS

SUMS

SUM

AVGS

SUMAVG

AVGSUM

AVERAGE

For all persons:

Column 1 show the person name.
Column 2 shows the weeks for which values are calculated.
Column 3 lists the sums of expenses for each week.
Column 4 calculates the sum of all expenses by person.

It would be very difficult, if even possible, to obtain this result from table jexpall using an SQL query.

Handling of NULL Values

Json has a null explicit value that can be met in arrays or object key values. When regarding json as a relational table, a column value can be null because the corresponding json item is explicitly null, or implicitly because the corresponding item is missing in an array or object. CONNECT does not make any distinction between explicit and implicit nulls.

However, it is possible to specify how nulls are handled and represented. This is done by setting the string session variable . The default value of connect_json_null is “”; it can be changed, for instance, by:

This changes its representation when a column displays the text of an object or the concatenation of the values of an array.

It is also possible to tell CONNECT to ignore nulls by:

When doing so, nulls do not appear in object text or array lists. However, this does not change the behavior of array calculation nor the result of array count.

Having Columns defined by Discovery

It is possible to let the MariaDB discovery process do the job of column specification. When columns are not defined in the create table statement, CONNECT endeavors to analyze the JSON file and to provide the column specifications. This is possible only for tables represented by an array of objects because CONNECT retrieves the column names from the object pair keys and their definition from the object pair values. For instance, the jsample table could be created saying:

Let’s check how it was actually specified using the show create table statement:

It is equivalent except for the column sizes that have been calculated from the file as the maximum length of the corresponding column when it was a normal value. For columns that are json arrays or objects, the column is specified as a varchar string of length 256, supposedly big enough to contain the sub-object's concatenated values. Nullable is set to true if the column is null or missing in some rows or if its JPATH contains arrays.

If a more complex definition is desired, you can ask CONNECT to analyse the JPATH up to a given depth using the DEPTH or LEVEL option in the option list. Its default value is 0 but can be changed setting the session variable (in future versions the default are 5). The depth value is the number of sub-objects that are taken in the JPATH2 (this is different from what is defined and returned by the native function).

For instance:

This will define the table as:

From Connect 1.07.0002

From Connect 1.6:

Until Connect 1.5:

For columns that are a simple value, the Json path is the column name. This is the default when the Jpath option is not specified, so it was not specified for such columns. However, you can force discovery to specify it by setting the connect_all_path variable to 1 or ON. This can be useful if you plan to change the name of such columns and relieves you of manually specifying the path (otherwise it would default to the new name and cause the column to not or wrongly be found).

Another problem is that CONNECT cannot guess what you want to do with arrays. Here the AUTHOR array is set to 0, which means that only its first value are retrieved unless you also had specified “Expand=AUTHOR” in the option list. But of course, you can replace it with anything else.

This method can be used as a quick way to make a “template” table definition that can later be edited to make the desired definition. In particular, column names are constructed from all the object keys of their path in order to have distinct column names. This can be manually edited to have the desired names, provided their JPATH key names are not modified.

DEPTH can also be given the value -1 to create only columns that are simple values (no array or object). It normally defaults to 0 but this can be modified setting the variable.

Note: Since version 1.6.4, CONNECT eliminates columns that are “void” or whose type cannot be determined. For instance given the file sresto.json:

Previously, when using discovery, creating the table by:

The table was previously created as:

The column “grades” was added because of the void array in line 2. Now this column is skipped and does not appear anymore (unless the option Accept=1 is added in the option list).

JSON Catalogue Tables

Another way to see JSON table column specifications is to use a catalogue table. For instance:

which returns:

From Connect 1.07.0002:

column_name

type

size

jpath

From Connect 1.6:

column_name

type

size

jpath

Until Connect 1.5:

column_name

type

size

jpath

All this is mostly useful when creating a table on a remote file that you cannot easily see.

Finding the table within a JSON file

Given the file “facebook.json”:

The table we want to analyze is represented by the array value of the “data” object. Here is how this is specified in the create table statement:

From Connect 1.07.0002:

From Connect 1.6:

Until Connect 1.5:

This is the object option that gives the Jpath of the table. Note also an alternate way to declare the array to be expanded by the expand option of the option_list.

Because some string values contain a date representation, the corresponding columns are declared as datetime and the date format is specified for them.

The Jpath of the object option has the same syntax as the column Jpath but of course all array steps must be specified using the [n] (until Connect 1.5) or n (from Connect 1.6) format.

Note: This applies to the whole document for tables having PRETTY = 2 (see below). Otherwise, it applies to the document objects of each file records.

JSON File Formats

The examples we have seen so far are files that, even they can be formatted in different ways (blanks, tabs, carriage return and line feed are ignored when parsing them), respect the JSON syntax and are made of only one item (Object or Array). Like for XML files, they are entirely parsed and a memory representation is made used to process them. This implies that they are of reasonable size to avoid an out of memory condition. Tables based on such files are recognized by the option Pretty=2 that we did not specify above because this is the default.

An alternate format, which is the format of exported MongoDB files, is a file where each row is physically stored in one file record. For instance:

The original file, “cities.json”, has 29352 records. To base a table on this file we must specify the option Pretty=0 in the option list. For instance:

From Connect 1.07.0002:

From Connect 1.6:

Until Connect 1.5:

Note the use of [n] (until Connect 1.5) or n (from Connect 1.6) array specifications for the longitude and latitude columns.

When using this format, the table is processed by CONNECT like a DOS, CSV or FMT table. Rows are retrieved and parsed by records and the table can be very large. Another advantage is that such a table can be indexed, which can be of great value for very large tables. The “distrib” option of the “state” column tells CONNECT to use block indexing when possible.

For such tables – as well as for pretty=1 ones – the record size must be specified using the LRECL option. Be sure you don’t specify it too small as it is used to allocate the read/write buffers and the memory used for parsing the rows. If in doubt, be generous as it does not cost much in memory allocation.

Another format exists, noted by Pretty=1, which is similar to this one but has some additions to represent a JSON array. A header and a trailer records are added containing the opening and closing square bracket, and all records but the last are followed by a comma. It has the same advantages for reading and updating, but inserting and deleting are executed in the pretty=2 way.

Alternate Table Arrangement

We have seen that the most natural way to represent a table in a JSON file is to make it on an array of objects. However, other possibilities exist. A table can be an array of arrays, a one column table can be an array of values, or a one row table can be just one object or one value. Single row tables are internally handled by adding a one value array around them.

Let us see how to handle, for instance, a table that is an array of arrays. The file:

A table can be created on this file as:

From Connect 1.07.0002:

From Connect 1.6:

Until Connect 1.5:

Columns are specified by their position in the row arrays. By default, this is zero-based but for this table the base was set to 1 by the Base option of the option list. Another new option in the option list is Jmode=1. It indicates what type of table this is. The Jmode values are:

An array of objects. This is the default.
An array of Array. Like this one.
An array of values.

When reading, this is not required as the type of the array items is specified for the columns; however, it is required when inserting new rows so CONNECT knows what to insert. For instance:

After this, it is displayed as:

Unspecified array values are represented by their first element.

Getting and Setting JSON Representation of a Column

We have seen that columns corresponding to a Json object or array are retrieved by default as the concatenation of all its values separated by a blank. It is also possible to retrieve and display such column contains as the full JSON string corresponding to it in the JSON file. This is specified in the JPATH by a “*” where the object or array would be specified.

Note: When having columns generated by discovery, this can be specified by adding the STRINGIFY option to ON or 1 in the option list.

For instance:

From Connect 1.07.0002:

From Connect 1.6:

Until Connect 1.5:

Now the query:

will return and display :

json_Author

Note: Prefixing the column name by json_ is optional but is useful when using the column as argument to Connect UDF functions, making it to be surely recognized as valid Json without aliasing.

This also works on input, a column specified so that it can be directly set to a valid JSON string.

This feature is of great value as we will see below.

Create, Read, Update and Delete Operations on JSON Tables

The SQL commands INSERT, UPDATE and DELETE are fully supported for JSON tables except those returned by REST queries. For INSERT and UPDATE, if the target values are simple values, there are no problems.

However, there are some issues when the added or modified values are objects or arrays.

Concerning objects, the same problems exist that we have already seen with the XML type. The added or modified object will have the format described in the table definition, which can be different from the one of the JSON file. Modifications should be done using a file specifying the full path of modified objects.

New problems are raised when trying to modify the values of an array. Only updates can be done on the original table. First of all, for the values of the array to be distinct values, all update operations concerning array values must be done using a table expanding this array.

For instance, to modify the authors of the biblio.json based table, the jsampex table must be used. Doing so, updating and deleting authors is possible using standard SQL commands. For example, to change the first name of Knab from François to John:

However It would be wrong to do:

Because this would change the first name of both authors as they share the same ISBN.

Where things become more difficult is when trying to delete or insert an author of a book. Indeed, a delete command will delete the whole book and an insert command will add a new complete row instead of adding a new author in the same array. Here we are penalized by the SQL language that cannot give us a way to specify this. Something like:

However this does not exist in SQL. Does this mean that it is impossible to do it? No, but it requires us to use a table specified on the same file but adapted to this task. One way to do it is to specify a table for which the authors are no more an expanded array. Supposing we want to add an author to the “XML en Action” book. We will do it on a table containing just the author(s) of that book, which is the second book of the table.

From Connect 1.6:

Until Connect 1.5

The command:

replies:

FIRSTNAME

LASTNAME

It is a standard JSON table that is an array of objects in which we can freely insert or delete rows.

We can check that this was done correctly by:

This will display:

ISBN

Title

AuthorFN

AuthorLN

Year

Note: If this table were a big table with many books, it would be difficult to know what the order of a specific book is in the table. This can be found by adding a special ROWID column in the table.

However, an alternate way to do it is by using direct JSON column representation as in the JSAMPLE2 table. This can be done by:

Here, we didn't have to find the index of the sub array to modify. However, this is not quite satisfying because we had to manually write the whole JSON value to set to the json_Author column.

Therefore we need specific functions to do so. They are introduced now.

JSON User Defined Functions

Although such functions written by other parties do exist,[] CONNECT provides its own UDFs that are specifically adapted to the JSON table type and easily available because, being inside the CONNECT library or DLL, they require no additional module to be loaded (see to make these functions in a separate library module).

Here is the list of the CONNECT functions; more can be added if required.

Name

Type

Return

Description

Added

String values are mapped to JSON strings. These strings are automatically escaped to conform to the JSON syntax. The automatic escaping is bypassed when the value has an alias beginning with ‘json_’. This is automatically the case when a JSON UDF argument is another JSON UDF whose name begins with “json_” (not case sensitive). This is why all functions that do not return a Json item are not prefixed by “json_”.

Argument string values, for some functions, can alternatively be json file names. When this is ambiguous, alias them as jfile_. Full path should be used because UDF functions has no means to know what the current database is. Apparently, when the file name path is not full, it is based on the MariaDB data directory but I am not sure it is always true.

Numeric values are (big) integers, double floating point values or decimal values. Decimal values are character strings containing a numeric representation and are treated as strings. Floating point values contain a decimal point and/or an exponent. Integers are written without decimal points.

To install these functions execute the following commands :[]

Note

Json function names are often written on this page with leading upper case letters for clarity. It is possible to do so in SQL queries because function names are case insensitive. However, when creating or dropping them, their names must match the case they are in the library module, which is in lower case.

On Unix systems (from Connect 1.7.02):

On Unix systems (from Connect 1.6):

On Unix systems (until Connect 1.5):

On WIndows (from Connect 1.7.02):

On WIndows (from Connect 1.6):

On WIndows (until Connect 1.5):

Jfile_Bjson

MariaDB starting with

JFile_Bjson was introduced in MariaDB.

Converts the first argument pretty=0 json file to Bjson file. B(inary)json is a pre-parsed json format. It is described below in the Performance chapter (available in next Connect versions).

Jfile_Convert

MariaDB starting with

JFile_Convert was introduced in MariaDB.

Converts the first argument json file to another pretty=0 json file. The third integer argument is the record length to use. This is often required to process huge json files that would be very slow if they were in pretty=2 format.

This is done without completely parsing the file, is very fast and requires no big memory.

Jfile_Make

Jfile_Make was added in CONNECT 1.4

The first argument must be a json item (if it is just a string, Jfile_Make will try its best to see if it is a json item or an input file name). The following arguments are a string file name and an integer pretty value (defaulting to 2) in any order. This function creates a json file containing the first argument item.

The returned string value is the created file name. If not specified as an argument, the file name can in some cases be retrieved from the first argument; in such cases the file itself is modified.

This function can be used to create or format a json file. For instance, supposing we want to format the file tb.json, this can be done with the query:

The tb.json file are changed to:

Json_Array_Add

Note: The following describes this function for CONNECT version 1.4 only. The first argument must be a JSON array. The second argument is added as member of this array:

Array

Note: The first array is not escaped, its (alias) name beginning with ‘json_’.

Now we can see how adding an author to the JSAMPLE2 table can alternatively be done:

Note: Calling a column returning JSON a name prefixed by json_ (like json_author here) is good practice and removes the need to give it an alias to prevent escaping when used as an argument.

Additional arguments: If a third integer argument is given, it specifies the position (zero based) of the added value:

Array

If a string argument is added, it specifies the Json path to the array to be modified. For instance:

Json_Array_Add('{"a":1,"b":2,"c":[3, 4]}' json_, 5, 1, 'c')

Json_Array_Add_Values

Json_Array_Add_Values added in CONNECT 1.4 replaces the function Json_Array_Add of CONNECT version 1.3.

The first argument must be a JSON array string. Then all other arguments are added as members of this array:

Array

Json_Array_Delete

The first argument should be a JSON array. The second argument is an integer indicating the rank (0 based conforming to general json usage) of the element to delete:

Array

Now we can see how to delete the second author from the JSAMPLE2 table:

A Json path can be specified as a third string argument

Json_Array_Grp

This is an aggregate function that makes an array filled from values coming from the rows retrieved by a query. Let us suppose we have the pet table:

name

race

number

The query:

will return:

name

One problem with the JSON aggregate functions is that they construct their result in memory and cannot know the needed amount of storage, not knowing the number of rows of the used table.

Therefore, the number of values for each group is limited. This limit is the value of JsonGrpSize whose default value is 10 but can be set using the JsonSet_Grp_Size function. Nevertheless, working on a larger table is possible, but only after setting JsonGrpSize to the ceiling of the number of rows per group for the table. Try not to set it to a very large value to avoid memory exhaustion.

JsonContains

This function can be used to check whether an item is contained in a document. Its arguments are the same than the ones of the JsonLocate function; only the return value changes. The integer returned value is 1 is the item is contained in the document or 0 otherwise.

JsonContains_Path

This function can be used to check whether a Json path is contained in the document. The integer returned value is 1 is the path is contained in the document or 0 otherwise.

Json_File

The first argument must be a file name. This function returns the text of the file that is supposed to be a json file. If only one argument is specified, the file text is returned without being parsed. Up to two additional arguments can be specified:

A string argument is the path to the sub-item to be returned. An integer argument specifies the pretty format value of the file.

This function is chiefly used to get the json item argument of other json functions from a json file. For instance, supposing the file tb.json is:

Extracting a value from it can be done with a query such as:

This query returns:

Type

However, we’ll see that, most of the time, it is better to use Jbin_File or to directly specify the file name in queries. In particular this function should not be used for queries that must modify the json item because, even if the modified json is returned, the file itself would be unchanged.

Json_Get_Item

Json_Get_Item was added in CONNECT 1.4.

This function returns a subset of the json document passed as first argument. The second argument is the json path of the item to be returned and should be one returning a json item (terminated by a ‘*’). If not, the function will try to make it right but this is not foolproof. For instance:

The correct path should have been ‘second.*’), but in this simple case the function was able to make it right. The returned item:

item

Note: The array is aliased “json_second” to indicate it is a json item and avoid escaping it. However, the “json_” prefix is skipped when making the object and must not be added to the path.

JsonGet_Grp_Size

This function returns the JsonGrpSize value.

JsonGet_String / JsonGet_Int / JsonGet_Real

JsonGet_String, JsonGet_Int and JsonGet_Real were added in CONNECT 1.4.

The first argument should be a JSON item. If it is a string with no alias, it are converted as a json item. The second argument is the path of the item to be located in the first argument and returned, eventually converted according to the used function:

This query returns:

String

Int

Real

The function JsonGet_Real can be given a third argument to specify the number of decimal digits of the returned value. For instance:

This query returns:

String

The given path can specify all operators for arrays except the “expand” [*] operator). For instance:

The result:

Rank

Number

Concat

Sum

Avg

Json_Item_Merge

This function merges two arrays or two objects. For arrays, this is done by adding to the first array all the values of the second array. For instance:

The function returns:

Result

For objects, the pairs of the second object are added to the first object if the key does not yet exist in it; otherwise the pair of the first object is set with the value of the matching pair of the second object. For instance:

The function returns:

Result

JsonLocate

The first argument must be a JSON tree. The second argument is the item to be located. The item to be located can be a constant or a json item. Constant values must be equal in type and value to be found. This is "shallow equality" – strings, integers and doubles won't match.

This function returns the json path to the located item or null if it is not found:

This query returns:

Path

The path syntax is the same used in JSON CONNECT tables.

By default, the path of the first occurrence of the item is returned. The third parameter can be used to specify the occurrence whose path is to be returned. For instance:

first

second

wrong type

json

For string items, the comparison is case sensitive by default. However, it is possible to specify a string to be compared case insensitively by giving it an alias beginning by “ci”:

Path

Json_Locate_All

The first argument must be a JSON item. The second argument is the item to be located. This function returns the paths to all locations of the item as an array of strings:

This query returns:

All paths

The returned array can be applied other functions. For instance, to get the number of occurrences of an item in a json tree, you can do:

The displayed result:

Nb of occurs

If specified, the third integer argument set the depth to search in the document. This means the maximum items in the paths. This value defaults to 10 but can be increased for complex documents or reduced to set the maximum wanted depth of the returned paths.

Json_Make_Array

Json_Make_Array returns a string denoting a JSON array with all its arguments as members:

Json_Make_Array(56, 3.1416, 'My name is "Foo"',N ULL)

Note: The argument list can be void. If so, a void array is returned.

Json_Make_Object

Json_Make_Object returns a string denoting a JSON object. For instance:

The object is filled with pairs corresponding to the given arguments. The key of each pair is made from the argument (default or specified) alias.

Json_Make_Object(56, 3.1416, 'machin', NULL)

When needed, it is possible to specify the keys by giving an alias to the arguments:

Json_Make_Object(56 qty,3.1416 price,'machin' truc, NULL garanty)

If the alias is prefixed by ‘json_’ (to prevent escaping) the key name is stripped from that prefix.

This function is chiefly useful when entering values retrieved from a table, the key being by default the column name:

Json_Make_Object(matricule, nom, titre, salaire)

Json_Object_Add

The first argument must be a JSON object. The second argument is added as a pair to this object:

newobj

Note: If the specified key already exists in the object, its value is replaced by the new one.

The third string argument is a Json path to the target object.

Json_Object_Delete

The first argument must be a JSON object. The second argument is the key of the pair to delete:

newobj

The third string argument is a Json path to the object to be the target of deletion.

Json_Object_Grp

This function works like Json_Array_Grp. It makes a JSON object filled with value pairs whose keys are passed from its first argument and values are passed from its second argument.

This can be seen with the query:

This query returns:

name

json_object_grp(number,race)

Json_Object_Key

Return a string denoting a JSON object. For instance:

The object is filled with pairs made from each key/value arguments.

Json_Object_Key('qty', 56, 'price', 3.1416, 'truc', 'machin', 'garanty', NULL)

Json_Object_List

The first argument must be a JSON object. This function returns an array containing the list of all keys existing in the object:

Key List

Json_Object_Nonull

This function works like but “null” arguments are ignored and not inserted in the object. Arguments are regarded as “null” if they are JSON null values, void arrays or objects, or arrays or objects containing only null members.

It is mainly used to avoid constructing useless null items when converting tables (see later).

Json_Object_Values

The first argument must be a JSON object. This function returns an array containing the list of all values existing in the object:

Value List

JsonSet_Grp_Size

This function is used to set the JsonGrpSize value. This value is used by the following aggregate functions as a ceiling value of the number of items in each group. It returns the JsonGrpSize value that can be its default value when passed 0 as argument.

Json_Set_Item / Json_Insert_Item / Json_Update_Item

These functions insert or update data in a JSON document and return the result. The value/path pairs are evaluated left to right. The document produced by evaluating one pair becomes the new value against which the next pair is evaluated.

Json_Set_Item replaces existing values and adds non-existing values.
Json_Insert_Item inserts values without replacing existing values.
Json_Update_Item replaces only existing values.

Example:

This query returns:

Set

Insert

Update

JsonValue

Returns a JSON value as a string, for instance:

JsonValue(3.1416)

The “JBIN” return type

Almost all functions returning a json string - whose name begins with Json_ - have a counterpart with a name beginning with Jbin_. This is both for performance (speed and memory) as well as for better control of what the functions should do.

This is due to the way CONNECT UDFs work internally. The Json functions, when receiving json strings as parameters, parse them and construct a binary tree in memory. They work on this tree and before returning; serialize this tree to return a new json string.

If the json document is large, this can take up a large amount of time and storage space. It is all right when one simple json function is called – it must be done anyway – but is a waste of time and memory when json functions are used as parameters to other json functions.

To avoid multiple serializing and parsing, the Jbin functions should be used as parameters to other functions. Indeed, they do not serialize the memory document tree, but return a structure allowing the receiving function to have direct access to the memory tree. This saves the serialize-parse steps otherwise needed to pass the argument and removes the need to reallocate the memory of the binary tree, which by the way is 6 to 7 times the size of the json string. For instance:

This query returns:

Result

Here the binary json tree allocated by Jbin_Array is completed by Jbin_Array_Add and Json_Object and serialized only once to make the final result string. It would be serialized and parsed two more times if using “Json” functions.

Note that Jbin results are recognized as such because they are aliased beginning with “Jbin_”. This is why in the Json_Object function the alias is specified as “Jbin_foo”.

What happens if it is not recognized as such? These functions are declared as returning a string and to take care of this, the returned structure begins with a zero-terminated string. For instance:

This query replies:

Jbin_Array('a','b','c')

Note: When testing, the tree returned by a “Jbin” function can be seen using the Json_Serialize function whose unique parameter must be a “Jbin” result. For instance:

This query returns:

Json_Serialize(Jbin_Array('a','b','c'))

Note: For this simple example, this is equivalent to using the Json_Array function.

Using a file as json UDF first argument

We have seen that many json UDFs can have an additional argument not yet described. This is in the case where the json item argument was referring to a file. Then the additional integer argument is the pretty value of the json file. It matters only when the first argument is just a file name (to make the UDF understand this argument is a file name, it should be aliased with a name beginning with jfile_) or if the function modifies the file, in which case it are rewritten with this pretty format.

The json item is created by extracting the required part from the file. This can be the whole file but more often only some of it. There are two ways to specify the sub-item of the file to be used:

Specifying it in the Json_File or Jbin_File arguments.
Specifying it in the receiving function (not possible for all functions).

It doesn’t make any difference when the Jbin_File is used but it does with Json_File. For instance:

The second query returns:

Json_Array_Add(Json_File('test.json', 'b'), 66)

It just returns the – modified -- subset returned by the Json_File function, while the query:

returns what was received from Json_File with the modification made on the subset.

Json_Array_Add(Json_File('test.json'), 66, 'b')

Note that in both case the test.json file is not modified. This is because the Json_File function returns a string representing all or part of the file text but no information about the file name. This is all right to check what would be the effect of the modification to the file.

However, to have the file modified, use the Jbin_File function or directly give the file name. Jbin_File returns a structure containing the file name, a pointer to the file parsed tree and eventually a pointer to the subset when a path is given as a second argument:

This query returns:

Json_Array_Add(Jbin_File('test.json', 'b'), 66)

This time the file is modified. This can be checked with:

Json_File('test.json', 3)

The reason why the first argument is returned by such a query is because of tables such as:

In this table, the jfile_cols column just contains a file name. If we update it by:

This is the test.json file that must be modified, not the jfile_cols column. This can be checked by:

JsonGet_String(jfile_cols, '[1]:*')

Note: It was an important facility to name the second column of the table beginning by “jfile_” so the json functions knew it was a file name without obliging to specify an alias in the queries.

Using “Jbin” to control what the query execution does

This is applying in particular when acting on json files. We have seen that a file was not modified when using the Json_File function as an argument to a modifying function because the modifying function just received a copy of the json file. This is not true when using the Jbin_File function that does not serialize the binary document and make it directly accessible. Also, as we have seen earlier, json functions that modify their first file parameter modify the file and return the file name. This is done by directly serializing the internal binary document as a file.

However, the “Jbin” counterpart of these functions does not serialize the binary document and thus does not modify the json file. For example let us compare these two queries:

/* First query */

/* Second query */

Both queries return:

Result

In the first query Jbin_Object_Add does not serialize the document (no “Jbin” functions do) and Json_Object just returns a serialized modified tree. Consequently, the file bt2.json is not modified. This query is all right to copy a modified version of the json file without modifying it.

However, in the second query Json_Object_Add does modify the json file and returns the file name. The Json_Object function receives this file name, reads and parses the file, makes an object from it and returns the serialized result. This modification can be done willingly but can be an unwanted side effect of the query.

Therefore, using “Jbin” argument functions, in addition to being faster and using less memory, are also safer when dealing with json files that should not be modified.

Using JSON as Dynamic Columns

The JSON nosql language has all the features to be used as an alternative to dynamic columns. For instance, take the following example of dynamic columns:

/* Remove a column: */

/* Add a column: */

/* You can also list all columns, or get them together with their values in JSON format: */

The same result can be obtained with json columns using the json UDF’s:

/* JSON equivalent */

/* Remove a column: */

/* Add a column */

/* You can also list all columns, or get them together with their values in JSON format: */

However, using JSON brings features not existing in dynamic columns:

Use of a language used by many implementation and developers.
Full support of arrays, currently missing from dynamic columns.
Access of subpart of json by JPATH that can include calculations on arrays.
Possible references to json files.

With more experience, additional UDFs can be easily written to support new needs.

New Set of BSON Functions

All these functions have been rewritten using the new JSON handling way and are temporarily available changing the J starting name to B. Then Json_Make_Array new style is called using Bson_Make_Array. Some, such as Bson_Item_Delete, are new and some fix bugs found in their Json counterpart.

Converting Tables to JSON

The JSON UDF’s and the direct Jpath “*” facility are powerful tools to convert table and files to the JSON format. For instance, the file biblio3.json we used previously can be obtained by converting the xsample.xml file. This can be done like this:

From Connect 1.07.0002

Before Connect 1.07.0002

And then :

The xj1 table rows will directly receive the Json object made by the select statement used in the insert statement and the table file are made as shown (xj1 is pretty=2 by default) Its mode is Jmode=2 because the values inserted are strings even if they denote json objects.

Another way to do this is to create a table describing the file format we want before the biblio3.json file existed:

From Connect 1.07.0002

Before Connect 1.07.0002

and to populate it by:

This is a simpler method. However, the issue is that this method cannot handle the multiple column values. This is why we inserted from xsampall not from xsampall2. How can we add the missing multiple authors in this table? Here again we must create a utility table able to handle JSON strings. From Connect 1.07.0002

Before Connect 1.07.0002

Voilà !

Converting json files

We have seen that json files can be formatted differently depending on the pretty option. In particular, big data files should be formatted with pretty equal to 0 when used by a CONNECT json table. The best and simplest way to convert a file from one format to another is to use the Jfile_Make function. Indeed this function makes a file of specified format using the syntax:

The file name is optional when the json document comes from a Jbin_File function because the returned structure makes it available. For instance, to convert back the json file tb.json to pretty= 0, this can be simply done by:

Performance Consideration

MySQL and PostgreSQL have a JSON data type that is not just text but an internal encoding of JSON data. This is to save parsing time when executing JSON functions. Of course, the parse must be done anyway when creating the data and serializing must be done to output the result.

CONNECT directly works on character strings impersonating JSON values with the need of parsing them all the time but with the advantage of working easily on external data. Generally, this is not too penalizing because JSON data are often of some or reasonable size. The only case where it can be a serious problem is when working on a big JSON file.

Then, the file should be formatted or converted to pretty=0.

From Connect 1.7.002, this easily done using the Jfile_Convert function, for instance:

Such a json file should not be used directly by JSON UDFs because they parse the whole file, even when only a subset is used. Instead, it should be used by a JSON table created on it. Indeed, JSON tables do not parse the whole document but just the item corresponding to the row they are working on. In addition, indexing can be used by the table as explained previously on this page.

Generally speaking, the maximum flexibility offered by CONNECT is by using JSON tables and JSON UDFs together. Some things are better handled by tables, other by UDFs. The tools are there but it is up to you to discover the best way to resolve your problems.

Bjson files

Starting with Connect 1.7.002, pretty=0 json files can be converted to a binary format that is a pre-parsed representation of json. This can be done with the Jfile_Bjson UDF function, for instance:

Here the third argument, the record length, must 6 to 10 times larger than the lrecl of the initial json file because the parsed representation is bigger than the original json text representation.

Tables using such Bjson files must specify ‘Pretty=-1’ in the option list.

It is probably similar to the BSON used by MongoDB and PostgreSQL and permits to process queries up to 10 times faster than working on text json files. Indexing is also available for tables using this format making even more performance improvement. For instance, some queries on a json table of half a million rows, that were previously done in more than 10 seconds, took only 0.1 second when converted and indexed.

Here again, this has been remade to use the new way Json is handled. The files made using the bfile_bjson function are only from two to four times the size of the source files. This new representation is not compatible with the old one. Therefore, these files must be used with BSON tables only.

Specifying a JSON table Encoding

An important feature of JSON is that strings should in UNICODE. As a matter of fact, all examples we have found on the Internet seemed to be just ASCII. This is because UNICODE is generally encoded in JSON files using UTF8 or UTF16 or UTF32.

To specify the required encoding, just use the data_charset CONNECT option or the native DEFAULT CHARSET option.

Retrieving JSON data from MongoDB

Classified as a NoSQL database program, MongoDB uses JSON-like documents (BSON) grouped in collections. The simplest way, and only method available before Connect 1.6, to access MongoDB data was to export a collection to a JSON file. This produces a file having the pretty=0 format. Viewed as SQL, a collection is a table and documents are table rows.

Since CONNECT version 1.6, it is now possible to directly access MongoDB collections via their MongoDB C Driver. This is the purpose of the MONGO table type described later. However, JSON tables can also do it in a somewhat different way (providing MONGO support is installed as described for MONGO tables).

It is achieved by specifying the MongoDB connection URI while creating the table. For instance:

From Connect 1.7.002

Before Connect 1.7.002

In this statement, the file_name option was replaced by the connection option. It is the URI enabling to retrieve data from a local or remote MongoDB server. The tabname option is the name of the MongoDB collection that are used and the dbname option could have been used to indicate the database containing the collection (it defaults to the current database).

The way it works is that the documents retrieved from MongoDB are serialized and CONNECT uses them as if they were read from a file. This implies serializing by MongoDB and parsing by CONNECT and is not the best performance wise. CONNECT tries its best to reduce the data transfer when a query contains a reduced column list and/or a where clause. This way makes all the possibilities of the JSON table type available, such as calculated arrays.

However, to work on large JSON collations, using the MONGO table type is generally the normal way.

Note: JSON tables using the MongoDB access accept the specific MONGO options , and . They are described in the MONGO table chapter.

Summary of Options and Variables Used with Json Tables

Options and variables that can be used when creating Json tables are listed here:

Table Option

Type

Description

(*) For Json tables connected to MongoDB, Mongo specific options can also be used.

Other options must be specified in the option list:

Table Option

Type

Description

Column options:

Column Option

Type

Description

Variables used with Json tables are:

Notes

The value n can be 0 based or 1 based depending on the base table option. The default is 0 to match what is the current usage in the Json world but it can be set to 1 for tables created in old versions.
See for instance: , and
This will not work when CONNECT is compiled embedded

_{This page is licensed: CC BY-SA / Gnu FDL}

CONNECT

Introduction to the CONNECT Engine

Using CONNECT

Using CONNECT - Condition Pushdown

Using CONNECT - Exporting Data From MariaDB

Using CONNECT - General Information

Performance

Create Table statement

Drop Table statement

Alter Table statement

Update and Delete for File Tables

USING CONNECT - Offline Documentation

Using CONNECT - Virtual and Special Columns

Installing CONNECT

CONNECT Table Types

CONNECT DBF Table Type

Overview

Conversion of dBASE Data Types

Reading soft deleted lines of a DBF table

CONNECT - External Table Types

External Table Specification

CONNECT - NoSQL Table Types

CONNECT PROXY Table Type

Proxy on non-CONNECT Tables

Using a PROXY Table as a View

Avoiding PROXY table loop

Modifying Operations

CONNECT Table Types - OEM: Implemented in an External LIB

CONNECT - Using the TBL and MYSQL Table Types Together

Remotely executing complex queries

Providing a list of servers

CONNECT VEC Table Type

Integral vector formats

Differences between vector formats

Header option

Inward and Outward Tables

Outward Tables

Altering Outward Tables

Inward Tables

Altering Inward Tables

CONNECT Security

Adding the REST Feature as a Library Called by an OEM Table

Compiling JSON UDFs in a Separate Library

Current Status of the CONNECT Handler

Introduction to the CONNECT Engine

Using CONNECT - Condition Pushdown

USING CONNECT - Offline Documentation

Using CONNECT

CONNECT Table Types

CONNECT

CONNECT - NoSQL Table Types

Using CONNECT - Exporting Data From MariaDB

CONNECT Security

Using CONNECT - General Information

Performance

Create Table statement

Drop Table statement

Alter Table statement

Update and Delete for File Tables

CONNECT VEC Table Type

Integral vector formats

Differences between vector formats

Header option

Inward and Outward Tables

Outward Tables

Altering Outward Tables

Inward Tables

Altering Inward Tables

Installing CONNECT

CONNECT Table Types - OEM: Implemented in an External LIB

Current Status of the CONNECT Handler

Installing on Linux

Installing with a Package Manager

Installing the Plugin

Uninstalling the Plugin

Installing Dependencies

Installing unixODBC

See Also

An OEM Table Example

Some Currently Available OEM Table Modules and Subtypes