CONNECT JSON Table Type

You are viewing an old version of this article. View the current version here.

Overview

MariaDB starting with 10.0.16

The JSON table type was introduced in MariaDB 10.0.16.

Caution: This is a new table type implementation that should be used for testing, preferably on local servers. DO NOT use it on production servers.

JSON (JavaScript Object Notation) is a lightweight data-interchange format widely used on the Internet. Many applications, generally written in JavaScript or PHP use and produce JSON data, which are exchanged as files of different physical formats.

It is also possible to query, create or update such information in a database-like manner. MongoDB does it using a JavaScript-like language. PostgreSQL includes these facilities by using a specific data type and related functions like dynamic columns.

The CONNECT engine adds this facility to MariaDB by supporting tables based on JSON data files. This is done like for XML tables by creating tables describing what should be retrieved from the file and how it should be processed.

Let us start from the file “biblio3.json” that is the JSON equivalent of the XML Xsample file we have described in the XML table chapter:

[
  {
    "ISBN": "9782212090819",
    "LANG": "fr",
    "SUBJECT": "applications",
    "AUTHOR": [
      {
        "FIRSTNAME": "Jean-Christophe",
        "LASTNAME": "Bernadac"
      },
      {
        "FIRSTNAME": "François",
        "LASTNAME": "Knab"
      }
    ],
    "TITLE": "Construire une application XML",
    "PUBLISHER": {
      "NAME": "Eyrolles",
      "PLACE": "Paris"
    },
    "DATEPUB": 1999
  },
  {
    "ISBN": "9782840825685",
    "LANG": "fr",
    "SUBJECT": "applications",
    "AUTHOR": [
      {
        "FIRSTNAME": "William J.",
        "LASTNAME": "Pardi"
      }
    ],
    "TITLE": "XML en Action",
    "TRANSLATED": {
       "PREFIX": "adapté de l'anglais par",
       "TRANSLATOR": {
          "FIRSTNAME": "James",
          "LASTNAME": "Guerin"
    },
    "PUBLISHER": {
      "NAME": "Microsoft Press",
      "PLACE": "Paris"
    },
    "DATEPUB": 1999
  }
]

This file contains the different items existing in JSON.

  • Arrays: They are enclosed in square brackets and contain a list of comma separated values.
  • Objects: They are enclosed in curly brackets. They contain a comma separated list of pairs, each pair composed of a key name between double quotes, followed by a ‘:’ character and followed by a value.
  • Values: Values can be an array or an object. They also can be a string between double quotes, an integer or float number, a Boolean value or a null value. The simplest way for CONNECT to locate a table in such a file is by an array containing a list of objects. Each array value will be a table row and each pair of the row objects will represent a column, the key being the column name and the value the column value.

A first try to create a table on this file will be to take the outer array as the table:

create table jsample (
ISBN char(15),
LANG char(2),
SUBJECT char(32),
AUTHOR char(128),
TITLE char(32),
TRANSLATED char(80),
PUBLISHER char(20),
DATEPUB int(4))
engine=CONNECT table_type=JSON
File_name='biblio3.json';

If we execute the query:

select isbn, author, title, publisher from jsample;

We get the result:

isbnauthortitlepublisher
9782212090819Jean-Christophe BernadacConstruire une application XMLEyrolles Paris
9782840825685William J. PardiXML en ActionMicrosoft Press Pari

Note that by default, column values that are objects have been set to the concatenation of all the string values of the object separated by a blank. When a column value is an array, only the first item of the array is retrieved.

However, things are generally more complicated. If JSON files do not contain attributes (although object pairs are similar to attributes) they contain a new item, arrays. We have seen that they can be used like XML multiple nodes, here to specify several authors, but they are more general because they can contain objects of different types, even it may not be advisable to do so.

This is why CONNECT enables the specification of a column field_format option “JPATH” that is used to describe exactly where the items to display are and how to handles arrays.

Here is an example of a new table that can be created on the same file, allowing choosing the column names, to get some sub-objects and to specify how to handle the author array.

create table jsampall (
ISBN char(15),
Language char(2) field_format='LANG',
Subject char(32) field_format='SUBJECT',
Author char(128) field_format='AUTHOR:[" and "]',
Title char(32) field_format='TITLE',
Translation char(32) field_format='TRANSLATOR :PREFIX',
Translator char(80) field_format='TRANSLATOR',
Publisher char(20) field_format='PUBLISHER:NAME',
Location char(16) field_format='PUBLISHER:PLACE',
Year int(4) field_format='DATEPUB')
engine=CONNECT table_type=JSON File_name='biblio3.json';

Given the query:

select title, author, publisher, location from jsampall;

The result is:

titleauthorpublisherlocation
Construire une application XMLJean-Christophe Bernadac and François KnabEyrollesParis
XML en ActionWilliam J. PardiMicrosoft PressParis

Here is another example showing that one can choose what to extract from the file and how to “expand” an array, meaning to generate one row for each array value:

create table jsampex (
ISBN char(15),
Title char(32) field_format='TITLE',
AuthorFN char(128) field_format='AUTHOR:[X]:FIRSTNAME',
AuthorLN char(128) field_format='AUTHOR:[X]:LASTNAME',
Year int(4) field_format='DATEPUB')
engine=CONNECT table_type=JSON File_name='biblio3.json';

It is displayed as:

ISBNTitleAuthorFNAuthorLNYear
9782212090819Construire une application XMLJean-ChristopheBernadac1999
9782212090819Construire une application XMLFrançoisKnab1999
9782840825685XML en ActionWilliam J.Pardi1999

The Jpath Specification

This is the description of the path to follow to reach the required item. Each step is the key name (case sensitive) of the pair when crossing an object, and the number of the value between square brackets when crossing an array. Each specification is separated by a ‘:’ character.

For instance, in the above file, the last name of the second author of a book is reached by:

AUTHOR:[2]:LASTNAME

The array specification can also indicate how it must be processed:

SpecificationArray TypeLimitDescription
[n]AllN.ATake the nth value of the array. Ignore it if n is 0.
[X] or [x]AllExpand. Generate one row for each array value.
["string"]StringConcatenate all values separated by the specified string.
[+]NumericMake the sum of all the array values.
[*]NumericMake the product of all array values.
[!]NumericMake the average of all the array values.
[>] or [<]AllReturn the greatest or least value of the array.
[#]AllN.AReturn the number of values in the array.
[]AllExpand if under an expanded object. Otherwise sum if numeric, else concatenation separated by “, “.
AllN.AIf an array, expand it if under an expanded object or take the first value of it.

Note 1: When the LIMIT restriction is applicable, only the first m array items are used, m being the value of the LIMIT option (to be specified in option_list). The LIMIT default value is 10.

Note 2: Expand is limited to only one branch (expanded arrays must be under the same object).

Let us take as an example the file expense.json shown in Appendix A. The table jexpall expands all under and including the week array:

create table jexpall (
WHO char(12),
WEEK int(2) field_format='WEEK:[x]:NUMBER',
WHAT char(32) field_format='WEEK:[x]:EXPENSE:[x]:WHAT',
AMOUNT double(8,2) field_format='WEEK:[x]:EXPENSE:[x]:AMOUNT')
engine=CONNECT table_type=JSON File_name='expense.json';
WHOWEEKWHATAMOUNT
Joe3Beer18.00
Joe3Food12.00
Joe3Food19.00
Joe3Car20.00
Joe4Beer19.00
Joe4Beer16.00
Joe4Food17.00
Joe4Food17.00
Joe4Beer14.00
Joe5Beer14.00
Joe5Food12.00
Beth3Beer16.00
Beth4Food17.00
Beth4Beer15.00
Beth5Food12.00
Beth5Beer20.00
Janet3Car19.00
Janet3Food18.00
Janet3Beer18.00
Janet4Car17.00
Janet5Beer14.00
Janet5Car12.00
Janet5Beer19.00
Janet5Food12.00

The table jexpw shows what was bought and the sum and average of amounts for each person and week:

create table jexpw (
WHO char(12) not null,
WEEK int(2) not null field_format='WEEK:[x]:NUMBER',
WHAT char(32) not null field_format='WEEK::EXPENSE:[", "]:WHAT',
SUM double(8,2) not null field_format='WEEK::EXPENSE:[+]:AMOUNT',
AVERAGE double(8,2) not null field_format='WEEK::EXPENSE:[!]:AMOUNT')
engine=CONNECT table_type=JSON File_name='expense.json';
WHOWEEKWHATSUM= AVERAGE
Joe3Beer, Food, Food, Car69.0017.25
Joe4Beer, Beer, Food, Food, Beer83.0016.60
Joe5Beer, Food26.0013.00
Beth3Beer16.0016.00
Beth4Food, Beer32.0016.00
Beth5Food, Beer32.0016.00
Janet3Car, Food, Beer55.0018.33
Janet4Car17.0017.00
Janet5Beer, Car, Beer, Food57.0014.25

Finding the table within a JSON file

Given the file “facebook.jsn”:

{
   "data": [
      {
         "id": "X999_Y999",
         "from": {
            "name": "Tom Brady", "id": "X12"
         },
         "message": "Looking forward to 2010!",
         "actions": [
            {
               "name": "Comment",
               "link": "http://www.facebook.com/X999/posts/Y999"
            },
            {
               "name": "Like",
               "link": "http://www.facebook.com/X999/posts/Y999"
            }
         ],
         "type": "status",
         "created_time": "2010-08-02T21:27:44+0000",
         "updated_time": "2010-08-02T21:27:44+0000"
      },
      {
         "id": "X998_Y998",
         "from": {
            "name": "Peyton Manning", "id": "X18"
         },
         "message": "Where's my contract?",
         "actions": [
            {
               "name": "Comment",
               "link": "http://www.facebook.com/X998/posts/Y998"
            },
            {
               "name": "Like",
               "link": "http://www.facebook.com/X998/posts/Y998"
            }
         ],
         "type": "status",
         "created_time": "2010-08-02T21:27:44+0000",
         "updated_time": "2010-08-02T21:27:44+0000"
      }
   ]
}

The table we want to analyze is represented by the array value of the “data” object. Here is how this is specified in the create table statement:

create table jfacebook (
`ID` char(10) field_format='id',
`Name` char(32) field_format='from:name',
`MyID` char(16) field_format='from:id',
`Message` varchar(256) field_format='message',
`Action` char(16) field_format='actions::name',
`Link` varchar(256) field_format='actions::link',
`Type` char(16) field_format='type',
`Created` datetime date_format='YYYY-MM-DD\'T\'hh:mm:ss' field_format='created_time',
`Updated` datetime date_format='YYYY-MM-DD\'T\'hh:mm:ss' field_format='updated_time')
engine=connect table_type=JSON file_name='facebook.jsn' option_list='Object=data,Expand=actions';

This is the object option that gives the Jpath of the table. Note also an alternate way to declare the array to be expanded by the expand option of the option_list.

Because some string values contain a date representation, the corresponding columns are declared as datetime and the date format is specified for them.

The Jpath of the object option has the same syntax than the column Jpath but of course all array step must be specified using the [n] format.

JSON File Formats

The examples we have seen so far are files that, even they can be formatted in different ways (blanks, tabs, carriage return and line feed are ignored when parsing them), respect the JSON syntax and are made of only one item (Object or Array). Like for XML files, they are entirely parsed and a memory representation is made used to process them. This implies that they are of reasonable size to avoid an out of memory condition. Tables based on such files are recognized by the option Pretty=2 that we did not specify above because this is the default.

An alternate format, which is the format of exported MongoDB files, is a file where each row is physically stored in one file record. For instance:

{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" }
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }
{ "_id" : "01005", "city" : "BARRE", "loc" : [ -72.1083540000001, 42.409698 ], "pop" : 4546, "state" : "MA" }
{ "_id" : "01007", "city" : "BELCHERTOWN", "loc" : [ -72.4109530000001, 42.275103 ], "pop" : 10579, "state" : "MA" }
…
{ "_id" : "99929", "city" : "WRANGELL", "loc" : [ -132.352918, 56.433524 ], "pop" : 2573, "state" : "AK" }
{ "_id" : "99950", "city" : "KETCHIKAN", "loc" : [ -133.18479, 55.942471 ], "pop" : 422, "state" : "AK" }

The original file, “cities.jsn”, has 29352 records. To base a table on this file we must specify the option Pretty=0 in the option list. For instance:

create table cities (
`_id` char(5) key,
`city` char(32),
`long` double(12,6) field_format='loc:[1]',
`lat` double(12,6) field_format='loc:[2]',
`pop` int(8),
`state` char(2) distrib='clustered')
engine=CONNECT table_type=JSON file_name='cities.jsn' lrecl=128 option_list='pretty=0';

Note the use of [n] array specifications for the longitude and latitude columns.

When using this format, the table is processed by CONNECT like a DOS, CSV or FMT table. Rows are retrieved and parsed by records and the table can be very large. Another advantage is that such a table can be indexed, which can be of great value for very large tables.

Another format exists, noted by Pretty=1, which is similar to this one but has some additions to represent a JSON array. A header and a trailer records are added containing the opening and closing square bracket, and all records but the last are followed by a comma. It has the same advantages for reading and updating, but inserting and deleting must be executed in the other way.

Alternate Table Arrangement

We have seen that the most natural way to represent a table in a JSON file is to make it on an array of objects. However, other possibilities exist. A table can be an array of arrays, a one column table can be an array of values, or a one row table be just one object or one value. One row tables are internally handled by adding a one value array around them.

Let us see how to handle, for instance, a table that is an array of arrays. The file:

[
  [56, "Coucou", 500.00],
  [[2,0,1,4], "Hello World", 2.0316],
  ["1784", "John Doo", 32.4500],
  [1914, ["Nabucho","donosor"], 5.12],
  [7, "sept", [0.77,1.22,2.01]],
  [8, "huit", 13.0],
]

A table can be created on this file as:

create table xjson (
`a` int(6) field_format='[1]',
`b` char(32) field_format='[2]',
`c` double(10,4) field_format='[3]')
engine=connect table_type=JSON file_name='test.jsn' option_list='Pretty=1,Jmode=1' lrecl=128;

Columns are specified by their position in the row arrays. A new option in the option list is Jmode=1. It indicates what type of table this is. The Jmode values are:

  1. An array of objects. This is the default.
  2. An array of Array. Like this one.
  3. An array of values.

When reading, this is not required as the type of the array items is specified for the columns; however, it is required when inserting new rows so CONNECT knows what to insert. For instance:

insert into xjson values(25, 'Breakfast', 1.414);

After this, It is displayed as:

abc
56Coucou500.0000
2Hello World2.0316
1784John Doo32.4500
1914Nabucho5.1200
7sept0.7700
8huit13.0000
25Breakfast1.4140

Unspecified array values are represented by their first element.

Write Operations on JSON Tables

The SQL commands INSERT, UPDATE and DELETE are fully supported for JSON tables. For INSERT and UPDATE, if the target values are simple values, there are no problems.

However, there are some issues when the added or modified values are objects or arrays.

Concerning objects, the same problems exist that we have already seen with the XML type. The added or modified object will have the format described in the table definition, which can be different from the one of the JSON file. Modifications should be done using a file specifying the full path of modified objects.

New problems are raised when trying to modify the values of an array. First of all, for the values of the array to be distinct values, all write operation concerning an array values must be done using a table expanding this array.

For instance, to modify the authors of the biblio.jsn base table, the jsampex table must be used. Doing so, updating and deleting authors is possible using standard SQL commands. For example to change the first name of Knab from François to John:

update jsampex set authorfn = 'John' where authorln = 'Knab';

However It would be wrong to do:

update jsampex set authorfn = 'John' where isbn = '9782212090819';

Because this would change the first name of both authors as they share the same ISBN.

Deleting can be done the same way with a standard delete SQL command.

Where things become more difficult is when trying to insert a new author of a book. Indeed, an insert command will add a new complete row instead of adding a new author in the same array. Here we are penalized by the SQL language that cannot give us a way to specify this. Something like:

update jsampex add authorfn = 'Charles', authorln = 'Dickens'
where title = 'XML en Action';

However this does not exist in SQL. Does this mean that it is impossible to do it? No, but it requires us to specify a table for which the authors are no more an expanded array. Supposing we want to add an author to the “XML en Action” book. We will do it on a table containing just the author(s) of that book, which is the second book of the table.

create table jauthor (
FIRSTNAME char(64),
LASTNAME char(64))
engine=CONNECT table_type=JSON File_name='biblio3.jsn' option_list='Object=[2]:AUTHOR';

The command:

select * from jauthor;

replies:

FIRSTNAMELASTNAME
William J.Pardi

It is a standard JSON table that is an array of object in which we can freely insert a new row.

insert into jauthor values('Charles','Dickens');

We can check that this was done correctly by:

select * from jsampex;

This will display:

ISBNTitleAuthorFNAuthorLNYear
9782212090819Construire une application XMLJean-ChristopheBernadac1999
9782212090819Construire une application XMLJohnKnab1999
9782840825685XML en ActionWilliam J.Pardi1999
9782840825685XML en ActionCharlesDickens1999

Note: If this table were a big table with many books, it would be difficult to know what the order of a specific book is in the table. This can be found by adding a special ROWID column in the table.

Specifying a JSON table Encoding

An important feature of JSON is that strings should in UNICODE. As a matter of facts, all examples we have found on the Internet seemed to be just ASCII. This because UNICODE is generally encoded in JSON files using UTF8 or UTF16 or UTF32.

To specify the required encoding, just use the data_charset CONNECT option.

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.