Dynamic columns
Contents
Dynamic columns allows you to have a different set of "virtual columns" for each row in your table.
You can at any time add or remove columns from a row.
This allows you to to solve problems like a store application, where you have a lot of different things — like t-shirts and phones — and you want to store different attributes for each item. This is something that traditionally is very hard to do in a relational database.
Dynamic columns works by storing the extra columns in a blob and having a small set of functions to manipulate it. It was first implemented in MariaDB 5.3.
You can either manipulate the dynamic columns in SQL or in the client.
The original architecture of dynamic columns was done in MWL#34. This includes information on how the data is stored.
The first implementation of dynamic columns is meant to be highly efficient for programs that generate sql code (which is what we belive most store applications today that try to handle objects with different attributes use). This is why access to dynamic columns is via by numbers. See the TODO section for more information on the future roadmap of dynamic columns.
SQL functions
Creating a dynamic column object
COLUMN_CREATE(column_nr, value [as type], [column_nr, value [as type]]...)
- '
type
' is not required and is mostly there for testing except in cases like date/time/datetime where you can't create literals of the type you need.
Adding or updating dynamic columns
COLUMN_ADD(dynamic_col, column_nr, value [as type], [column_nr, value [as type]]...)
- '
dynamic_col
' must either be data created byCOLUMN_CREATE()
or an empty string. - If '
dynamic_col
' isNULL
then the function returnsNULL
. - If you add a column that already exists, the new value will replace it.
- If you add
NULL
for a column, the entry will be deleted. - The type is not required (will be detected automatically) and is there mostly for testing.
Typical usage:
UPDATE t1 set dynamic_col=COLUMN_ADD(dynamic_col, 5, "test") WHERE id=1;
The above adds or replaces dynamic_column
'5
', with value "test"
,
stored in the 'dynamic_col
' column, for the row with id=1
The reason you have dynamic_col
twice is the COLUMN_ADD()
is a normal
function, like CONCAT()
and you want to replace the old value in
the 'dynamic_col
' value with the new value from the COLUMN_ADD()
function.
Deleting a dynamic column
COLUMN_DELETE(dynamic_column, column_nr, column_nr...);
Querying a dynamic column
COLUMN_GET(dynamic_column, column_nr as type)
Type here can be one of the same ones that you would use in CAST
or CONVERT
:
type | dynamic columns internal type |
---|---|
BINARY[(N)] | DYN_COL_STRING (variable length string with binary charset) |
CHAR[(N)] | DYN_COL_STRING (variable length string with charset) |
DATE | DYN_COL_DATE (date - 3 bytes) |
DATETIME[(D)] | DYN_COL_DATETIME (date and time (with microseconds) - 9 bytes) |
DECIMAL[(M[,D])] | DYN_COL_DECIMAL (variable length binary decimal representation with MariaDB limitation) |
DOUBLE[(M[,D])] | DYN_COL_DOUBLE (64 bit double-precision floating point) |
INTEGER | DYN_COL_INT (64 bit signed integer) |
SIGNED [INTEGER] | DYN_COL_INT (64 bit signed integer) |
TIME[(D)] | DYN_COL_TIME (time (with microseconds, may be negative) - 6 bytes) |
UNSIGNED [INTEGER] | DYN_COL_UINT (64bit unsigned integer) |
The type is required because SQL is a strongly typed language and the rest of the functions (and the client) need to know what the type is before fetching any data.
If the dynamic column has a different type than the requested one, the value will be converted to the requested type.
Note that if you are using CHAR
or BINARY
without a length, MariaDB
will handle it as a blob with length up to 536870911
(the max length of
data in a dynamic column). This buffer is a dynamically allocated in the server
but because some clients may allocate buffers up to the max field length, it's
best to always supply a length.
Checking if a column exists
COLUMN_EXISTS(dynamic_column, column_nr);
This returns 1
if the column exists, 0
if not, and NULL
if
dynamic_column
is NULL
.
COLUMN_LIST(dynamic_column);
Returns a string of all columns in the dynamic_column
separated with
',
'.
Example Using SQL with Dynamic Columns
First create the table. Dynstr will be our place where we store the dynamic columns:
create table t1 (id int auto_increment primary key, name varchar(40), type enum ("shirt", "phone", "computer"), price decimal(10,2), dynstr mediumblob);
Insert some random items, each with some common and own attributes.
insert into t1 (name, type, price, dynstr) values ("Funny shirt", "shirt", 10.0, COLUMN_CREATE(1, "blue", 10, "XL")), ("nokia", "phone", 649, COLUMN_CREATE(1, "black", 2, "touchscreen")), ("htc Desire hd", "phone", 579, COLUMN_CREATE(1, "black", 3, "Android")), ("BM/Lenovo Thinkpad X60s", "computer", 419, COLUMN_CREATE(1, "black", 3, "Linux"));
Lets check the data:
select id, name, type, price, length(dynstr) as len, column_list(dynstr) as list from t1;
+----+-------------------------+----------+--------+------+------+ | id | name | type | price | len | list | +----+-------------------------+----------+--------+------+------+ | 1 | Funny shirt | shirt | 10.00 | 17 | 1,10 | | 2 | nokia | phone | 649.00 | 27 | 1,2 | | 3 | htc Desire hd | phone | 579.00 | 23 | 1,3 | | 4 | BM/Lenovo Thinkpad X60s | computer | 419.00 | 21 | 1,3 | +----+-------------------------+----------+--------+------+------+
SELECT name FROM t1 WHERE COLUMN_GET(dynstr, 1 as char(10)) = "black";
+-------------------------+ | name | +-------------------------+ | nokia | | htc Desire hd | | BM/Lenovo Thinkpad X60s | +-------------------------+
SELECT name, COLUMN_GET(dynstr, 1 as char(10)) FROM t1 WHERE COLUMN_EXISTS(dynstr, 1);
+-------------------------+--------+ | name | colour | +-------------------------+--------+ | Funny shirt | blue | | nokia | black | | htc Desire hd | black | | BM/Lenovo Thinkpad X60s | black | +-------------------------+--------+
Lets update all items with colour="black"
to red
UPDATE t1 set dynstr=COLUMN_ADD(dynstr, 1, "red") where COLUMN_GET(dynstr, 1 as char(10)) = "black"; SELECT name, COLUMN_GET(dynstr, 1 as char(10)) FROM t1 WHERE COLUMN_EXISTS(dynstr, 1);
+-------------------------+-----------------------------------+ | name | COLUMN_GET(dynstr, 1 as char(10)) | +-------------------------+-----------------------------------+ | Funny shirt | blue | | nokia | red | | htc Desire hd | red | | BM/Lenovo Thinkpad X60s | red | +-------------------------+-----------------------------------+
SELECT COLUMN_GET(dynstr, 1 as char(10)) as colour, count(*) FROM t1 group by COLUMN_GET(dynstr, 1 as char(10));
+--------+----------+ | colour | count(*) | +--------+----------+ | blue | 1 | | red | 3 | +--------+----------+
Adding a new column:
UPDATE t1 set dynstr=COLUMN_ADD(dynstr, 15, "4G ram") where type="computer"; SELECT name, type, price, length(dynstr) as len, column_list(dynstr) as list from t1 where type="computer";
+-------------------------+----------+--------+------+--------+ | name | type | price | len | list | +-------------------------+----------+--------+------+--------+ | BM/Lenovo Thinkpad X60s | computer | 419.00 | 29 | 1,3,15 | +-------------------------+----------+--------+------+--------+
Dynamic Column Integrity
Dynamic columns try to pack data so that it has the smallest disk footprint while still having reasonable access time. Any damage to the string can lead to complete loss of all the data in the dynamic column record.
Such a situation could occur when updating dynamic column records when the results do not fit in the storage. When MariaDB encounters a situation like this it (by default) truncates the data and issues a warning. Writing truncated data to a dynamic column record could lead to losing all of the data in that record (updated or not). To prevent this, if you unsure about your storage capacity, use strict SQL_MODE
which preserves the old data and makes the update operation abort.
The C client functions
The dynamic column functions are included in libmysqld.a
in MariaDB 5.3
This gives you the option to retrive the full dynamic column object (normally a
blob) from MariaDB and then unpack it in the client.
The definitions for the structures and functions are in
include/ma_dyncol.h
. The implementation can be found in
mysys/ma_dyncol.c
. You can find a C usage example at:
unittest/mysys/ma_dyncol-t.c
.
value object
When asking for a value with dynamic_column_get()
you get the result in the
following structure:
enum enum_dynamic_column_type { DYN_COL_NULL= 0, DYN_COL_INT, DYN_COL_UINT, DYN_COL_DOUBLE, DYN_COL_STRING, DYN_COL_DECIMAL, DYN_COL_DATETIME, DYN_COL_DATE, DYN_COL_TIME }; typedef enum enum_dynamic_column_type DYNAMIC_COLUMN_TYPE; struct st_dynamic_column_value { DYNAMIC_COLUMN_TYPE type; union { long long long_value; unsigned long long ulong_value; double double_value; struct { LEX_STRING string_value; CHARSET_INFO *charset; }; struct { decimal_digit_t decimal_buffer[DECIMAL_BUFF_LENGTH]; decimal_t decimal_value; }; MYSQL_TIME time_value; }; };
The 'type
' tells you what kind of data is in the object and the
date is then stored in the respective union part.
Function results
All functions return an enum
with one of the following results
enum enum_dyncol_func_result { ER_DYNCOL_OK= 0, ER_DYNCOL_YES= 1, /* For functions returning 0/1 */ ER_DYNCOL_FORMAT= -1, /* Wrong format of the encoded string */ ER_DYNCOL_LIMIT= -2, /* Some limit reached */ ER_DYNCOL_RESOURCE= -3, /* Out of resourses */ ER_DYNCOL_DATA= -4, /* Incorrect input data */ ER_DYNCOL_UNKNOWN_CHARSET= -5 /* Unknown character set */ };
Initializing and freeing a dynamic column object
The following should be used to initialize your DYNAMIC_COLUMN
object (It
just resets all bits of the object to 0
):
void dynamic_column_initialize(DYNAMIC_COLUMN *str);
When you are done with the object, you should free the memory associated with it:
void dynamic_column_column_free(DYNAMIC_COLUMN *str);
Creating a dynamic column object
Creating an object with one value:
enum enum_dyncol_func_result dynamic_column_create(DYNAMIC_COLUMN *str, uint column_nr, DYNAMIC_COLUMN_VALUE *value);
Creating an object with many values:
enum enum_dyncol_func_result dynamic_column_create_many(DYNAMIC_COLUMN *str, uint column_count, uint *column_numbers, DYNAMIC_COLUMN_VALUE *values);
In this case you create an array of column numbers and another array of values.
'str
' is either an object initialized with
dynamic_column_initialize(DYNAMIC_COLUMN *str)
or an old
DYNAMIC_COLUMN *
object that is reused.
Deleting a dynamic column
enum enum_dyncol_func_result dynamic_column_delete(DYNAMIC_COLUMN *org, uint column_nr);
Checking which columns exists
Checking if one column exists:
enum enum_dyncol_func_result dynamic_column_exists(DYNAMIC_COLUMN *org, uint column_nr);
Getting a list of all columns that exist:
enum enum_dyncol_func_result dynamic_column_list(DYNAMIC_COLUMN *org, DYNAMIC_ARRAY *array_of_uint);
In the later case, the DYNAMIC_ARRAY
object will be initialized as part of
the call. The functions to manipualte the DYNAMIC_ARRAY
are:
Free data in the array delete_dynamic(DYNAMIC_ARRAY *array); Number of elements in the array: array.elements Accessing one element: unsigned int column= dynamic_element(array, index, unsigned int)
Using dynamic columns with your own objects
If you read a blob from MariaDB and want to use this with the above functions you probably don't want to duplicate it just to access data in it. Then you can do the following trick to create a DYNAMIC_COLUMN that you use for those functions that will not change it:
char *blob; /* Your data */ size_t blob_length; /* blob length */ DYNAMIC_STRING dynstr; struct st_dynamic_column_value value; dynamic_column_initialize(DYNAMIC_COLUMN &str); dynstr.str= blog; dynstr.length= length; if (dynamic_column_get(&str, 1, &value)) { /* process value for column 1 */ }
Implementation goals:
The following are the goals we had when implementing dynamic columns:
- One should be able to use dynamic columns anywhere you use a normal column. (In ORDER BY, GROUP BY etc.)
- Dynamic columns should be implemented trough functions (not operators) to make it easy to use them from any SQL client (like ODBC).
- Very low storage overhead compared to a normal column (Overhead is now 4-6 bytes per column)
- Optimized for read; Read speed should be close to a normal column.
- We don't have to support having NULL values in a dynamic columns; If you don't have a value for a column, you don't store it in the dynamic column. This is why we delete a column if you change it's value to NULL.
- Small set of functions to make the functionally easy to use and understand.
- Free license (BSD) to make it easy to adopt to other languages than C.
- Storage format should be possible to extend in the future without having to modify old data. (Done by having a 'flag' byte first in the storage format).
- First version optimized for computer generated SQL and then if the feature gets adoption add new things like column names.
Limitations:
- Columns have to be referenced by numbers (see also TODO)
- You can use any number 0-65535 for a column
- Max size of an dynamic column object is 536870911 bytes. (29 bits).
- The only way to add an index on a dynamic column is by creating a virtual column. However in this case the virtual column will be instantiated.
- You can't specify a column number twice in COLUMN_CREATE() or COLUMN_ADD(). If you try you will get the not that informative error "Illegal value used as argument of dynamic column function". This will be fixed in the future so that the last argument will be used.
TODO:
- Add a
COLUMN_CHECK()
function to check if the context of the dynamic column is correct. (Good for checking integrity and finding errors). - Allow one to specify same column number twice in COLUMN_CREATE() / COLUMN_ADD().
- Change type syntax for COLUMN_CREATE() to be same as for COLUMN_GET().
- Add functional indexes. This will partly solve the problem of adding an index on virtual columns (partly because adding a functional index is not 'instant').
- Provide a way to use names for the virtual columns. The problem with this is to decide where to store the name: In each row, in the .frm file or in an extra file. All solutions have different space/speed advantages and we are waiting for a bit more usage of dynamic functions until we decide which way to go.