MariaDB starting with 10.0

The CONNECT handler was introduced in MariaDB 10.0.

Many data types make no or little sense when applied to plain files. This why CONNECT supports only a restricted set of data types. However, ODBC, JDBC or MYSQL source tables may contain data types not supported by CONNECT. In this case, CONNECT makes an automatic conversion to a similar supported type when it is possible.

The data types currently supported by CONNECT are:

Type nameDescriptionUsed for
TYPE_STRINGZero ended stringchar, varchar, text
TYPE_INT4 bytes integerint, mediumint, integer
TYPE_SHORT2 bytes integersmallint
TYPE_TINY1 byte integertinyint
TYPE_BIGINT8 bytes integerbigint, longlong
TYPE_DOUBLE8 bytes floating pointdouble, float, real
TYPE_DECIMNumeric valuedecimal, numeric, number
TYPE_DATE4 bytes integerdate, datetime, time, timestamp, year

TYPE_STRING

This type corresponds to what is generally known as CHAR or VARCHAR by database users, or as strings by programmers. Columns containing characters have a maximum length but the character string is of fixed or variable length depending on the file format.

The DATA_CHARSET option must be used to specify the character set used in the data source or file. Note that, unlike usually with MariaDB, when a multi-byte character set is used, the column size represents the number of bytes the column value can contain, not the number of characters.

TYPE_INT

The ]INTEGER type contains signed integer numeric 4-byte values (the int/ of the C language) ranging from –2,147,483,648 to 2,147,483,647 for signed type and 0 to 4,294,967,295 for unsigned type.

TYPE_SHORT

The SHORT data type contains signed integer numeric 2-byte values (the short integer of the C language) ranging from –32,768 to 32,767 for signed type and 0 to 65,535 for unsigned type.

TYPE_TINY

The TINY data type contains integer numeric 1-byte values (the char of the C language) ranging from –128 to 127 for signed type and 0 to 255 for unsigned type.

TYPE_BIGINT

The BIGINT data type contains signed integer 8-byte values (the long long of the C language) ranging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 for signed type and from 0 to 18,446,744,073,709,551,615 for unsigned type.

Inside tables, the coding of all integer values depends on the table type. In tables represented by text files, the number is written in characters, while in tables represented by binary files (BIN or VEC) the number is directly stored in the binary representation corresponding to the platform.

The length (or precision) specification corresponds to the length of the table field in which the value is stored for text files only. It is used to set the output field length for all table types.

TYPE_DOUBLE

The DOUBLE data type corresponds to the C language double type, a floating-point double precision value coded with 8 bytes. Like for integers, the internal coding in tables depends on the table type, characters for text files, and platform binary representation for binary files.

The length specification corresponds to the length of the table field in which the value is stored for text files only. The scale (was precision) is the number of decimal digits written into text files. For binary table types (BIN and VEC) this does not apply. The length and scale specifications are used to set the output field length and number of decimals for all types of tables.

TYPE_DECIM

The DECIMAL data type corresponds to what MariaDB or ODBC data sources call NUMBER, NUMERIC, or DECIMAL: a numeric value with a maximum number of digits (the precision) some of them eventually being decimal digits (the scale). The internal coding in CONNECT is a character representation of the number. For instance:

colname decimal(14,6)

This defines a column colname as a number having a precision of 14 and a scale of 6. Supposing it is populated by:

insert into xxx values (-2658.74);

The internal representation of it will be the character string -2658.740000. The way it is stored in a file table depends on the table type. The length field specification corresponds to the length of the table field in which the value is stored and is calculated by CONNECT from the precision and the scale values. This length is precision plus 1 if scale is not 0 (for the decimal point) plus 1 if this column is not unsigned (for the eventual minus sign). In fix formatted tables the number is right justified in the field of width length, for variable formatted tables, such as CSV, the field is the representing character string.

Because this type is mainly used by CONNECT to handle numeric or decimal fields of ODBC, JDBC and MySQL table types, CONNECT does not provide decimal calculations or comparison by itself. This is why decimal columns of CONNECT tables cannot be indexed.

DATE Data type

Internally, date/time values are stored by CONNECT as a signed 4-byte integer. The value 0 corresponds to 01 January 1970 12:00:00 am coordinated universal time (UTC). All other date/time values are represented by the number of seconds elapsed since or before midnight (00:00:00), 1 January 1970, to that date/time value. Date/time values before midnight 1 January 1970 are represented by a negative number of seconds.

CONNECT handles dates from 13 December 1901, 20:45:52 to 18 January 2038, 19:14:07.

Although date and time information can be represented in both CHAR and INTEGER data types, the DATE data type has special associated properties. For each DATE value, CONNECT can store all or only some of the following information: century, year, month, day, hour, minute, and second.

Date Format in Text Tables

Internally, date/time values are handled as a signed 4-byte integer. But in text tables (type DOS, FIX, CSV, FMT, and DBF) dates are most of the time stored as a formatted character string (although they also can be stored as a numeric string representing their internal value). Because there are infinite ways to format a date, the format to use for decoding dates, as well as the field length in the file, must be associated to date columns (except when they are stored as the internal numeric value).

Note that this associated format is used only to describe the way the temporal value is stored internally. This format is used both for output to decode the date in a SELECT statement as well as for input to encode the date in INSERT or UPDATE statements. However, what is kept in this value depends on the data type used in the column definition (all the MariaDB temporal values can be specified). When creating a table, the format is associated to a date column using the DATE_FORMAT option in the column definition, for instance:

create table birthday (
  Name varchar(17),
  Bday date field_length=10 date_format='MM/DD/YYYY',
  Btime time field_length=8 date_format='hh:mm tt')
engine=CONNECT table_type=CSV;

insert into birthday values ('Charlie','2012-11-12','15:30:00');

select * from birthday;

The SELECT query returns:

NameBdayBtime
Charlie2012-11-1215:30:00

The values of the INSERT statement must be specified using the standard MariaDB syntax and these values are displayed as MariaDB temporal values. Sure enough, the column formats apply only to the way these values are represented inside the CSV files. Here, the inserted record will be:

Charlie,11/12/2012,03:30 PM

Note: The field_length option exists because the MariaDB syntax does not allow specifying the field length between parentheses for temporal column types. If not specified, the field length is calculated from the date format (sometimes as a max value) or made equal to the default length value if there is no date format. In the above example it could have been removed as the calculated values are the ones specified. However, if the table type would have been DOS or FIX, these values could be adjusted to fit the actual field length within the file.

A CONNECT format string consists of a series of elements that represent a particular piece of information and define its format. The elements will be recognized in the order they appear in the format string. Date and time format elements will be replaced by the actual date and time as they appear in the source string. They are defined by the following groups of characters:

ElementDescription
YYThe last two digits of the year (that is, 1996 would be coded as "96").
YYYYThe full year (that is, 1996 could be entered as "96" but displayed as “1996”).
MMThe one or two-digit month number.
MMMThe three-character month abbreviation.
MMMMThe full month name.
DDThe one or two-digit month day.
DDDThe three-character weekday abbreviation.
DDDDThe full weekday name.
hhThe one or two-digit hour in 12-hour or 24-hour format.
mmThe one or two-digit minute.
ssThe one or two-digit second.
tThe one-letter AM/PM abbreviation (that is, AM is entered as "A").
ttThe two-letter AM/PM abbreviation (that is, AM is entered as "AM").

Usage Notes

  • To match the source string, you can add body text to the format string, enclosing it in single quotes or double quotes if it would be ambiguous. Punctuation marks do not need to be quoted.
  • The hour information is regarded as 12-hour format if a “t” or “tt” element follows the “hh” element in the format or as 24-hour format otherwise.
  • The "MM", "DD", "hh", "mm", "ss" elements can be specified with one or two letters (e.g. "MM" or "M") making no difference on input, but placing a leading zero to one-digit values on output[1] for two-letter elements.
  • If the format contains elements DDD or DDDD, the day of week name is skipped on input and ignored to calculate the internal date value. On output, the correct day of week name is generated and displayed.
  • Temporal values are always stored as numeric in BIN and VEC tables.

Handling dates that are out of the range of supported CONNECT dates

If you want to make a table containing, for instance, historical dates not being convertible into CONNECT dates, make your column CHAR or VARCHAR and store the dates in the MariaDB format. All date functions applied to these strings will convert them to MariaDB dates and will work as if they were real dates. Of course they must be inserted and will be displayed using the MariaDB format.

NULL handling

CONNECT handles null values for data sources able to produce nulls. Currently this concerns mainly the ODBC, JDBC, MYSQL, XML, JSON and INI table types. For INI or XML types, null values are returned when the key is missing in the section (INI) or when the corresponding node does not exist in a row (XML or JSON).

For other file tables, the issue is to define what a null value is. In a numeric column, 0 can sometimes be a valid value but, in some other cases, it can make no sense. The same for character columns; is a blank field a valid value or not?

A special case is DATE columns with a DATE_FORMAT specified. Any value not matching the format can be regarded as NULL.

CONNECT leaves the decision to you. When declaring a column in the CREATE TABLE statement, if it is declared NOT NULL, blank or zero values will be considered as valid values. Otherwise they will be considered as NULL values. In all cases, nulls are replaced on insert or update by pseudo null values, a zero-length character string for text types or a zero value for numeric types. Once converted to pseudo null values, they will be recognized as NULL only for columns declared as nullable.

For instance:

create table t1 (a int, b char(10)) engine=connect;
insert into t1 values (0,'zero'),(1,'one'),(2,'two'),(null,'???');
select * from t1 where a is null;

The select query replies:

ab
NULLzero
NULL???

Sure enough, the value 0 entered on the first row is regarded as NULL for a nullable column. However, if we execute the query:

select * from t1 where a = 0;

This will return no line because a NULL is not equal to 0 in an SQL where clause.

Now let us see what happens with not null columns:

create table t1 (a int not null, b char(10) not null) engine=connect;
insert into t1 values (0,'zero'),(1,'one'),(2,'two'),(null,'???');

The insert statement will produce a warning saying:

LevelCodeMessage
Warning1048Column 'a' cannot be null

It is replaced by a pseudo null 0 on the fourth row. Let us see the result:

select * from t1 where a is null;
select * from t1 where a = 0;

The first query returns no rows, 0 are valid values and not NULL. The second query replies:

ab
0zero
0???

It shows that the NULL inserted value was replaced by a valid 0 value.

Unsigned numeric types

They are supported by CONNECT since version 1.01.0010 for fixed numeric types (TINY, SHORT, INTEGER, and BITINT).

Data type conversion

CONNECT is able to convert data from one type to another in most cases. These conversions are done without warning even when this leads to truncation or loss of precision. This is true, in particular, for tables of type ODBC, JDBC, MYSQL and PROXY (via MySQL) because the source table may contain some data types not supported by CONNECT. They are converted when possible to CONNECT types.

MariaDB types are converted as:

MariaDB TypesCONNECT TypeRemark
integer, medium integerTYPE_INT4 byte integer
small integerTYPE_SHORT2 byte integer
tiny integerTYPE_TINY1 byte integer
char, varcharTYPE_STRINGSame length
double, float, realTYPE_DOUBLE8 byte floating point
decimal, numericTYPE_DECIMLength depends on precision and scale
all date related typesTYPE_DATEDate format can be set accordingly
bigint, longlongTYPE_BIGINT8 byte integer
Other typesTYPE_ERRORNot supported, no conversion provided.

In the case of TEXT columns, the handling depends on the values given to the connect_type_conv and connect_conv_size system variables. By default no conversion is permitted.

Note: BLOB is currently not converted until a TYPE_BIN type is added to CONNECT. However, the SKIP option also applies to BLOB columns.

ODBC SQL types are converted as:

SQL TypesConnect TypeRemark
SQL_CHAR, SQL_VARCHARTYPE_STRING
SQL_LONGVARCHARTYPE_STRINGlen = min(abs(len), 255) If the column is generated by discovery (columns not specified) its length is connect_conv_size.
SQL_NUMERIC, SQL_DECIMALTYPE_DECIM
SQL_INTEGERTYPE_INT
SQL_SMALLINTTYPE_SHORT
SQL_TINYINT, SQL_BITTYPE_TINY
SQL_FLOAT, SQL_REAL, SQL_DOUBLETYPE_DOUBLE
SQL_DATETIMETYPE_DATElen = 10
SQL_INTERVALTYPE_STRINGlen = 8 + ((scale) ? (scale+1) : 0)
SQL_TIMESTAMPTYPE_DATElen = 19 + ((scale) ? (scale +1) : 0)
SQL_BIGINTTYPE_BIGINT
Other typesTYPE_ERRORNot supported.

JDBC SQL types are converted as:

JDBC TypesConnect TypeRemark
CHAR, VARCHARTYPE_STRING
LONGVARCHARTYPE_STRINGlen = min(abs(len), connect_conv_size) If the column is generated by discovery (columns not specified), its length is connect_conv_size
NUMERIC, DECIMAL, VARBINARYTYPE_DECIM
INTEGERTYPE_INT
SMALLINTTYPE_SHORT
TINYINT, BITTYPE_TINY
FLOAT, REAL, DOUBLETYPE_DOUBLE
DATETYPE_DATElen = 10
TIMETYPE_DATElen = 8 + ((scale) ? (scale+1) : 0)
TIMESTAMPTYPE_DATElen = 19 + ((scale) ? (scale +1) : 0)
BIGINTTYPE_BIGINT
Other typesTYPE_ERRORNot supported.

Note: The connect_type_conv SKIP option also applies to ODBC and JDBC tables.


  1. Here input and output are used to specify respectively decoding the date to get its numeric value from the data file and encoding a date to write it in the table file. Input is performed within SELECT queries; output is performed in UPDATE or INSERT queries.

Comments

Comments loading...