Cassandra storage engine
This page is a documentation for Cassandra storage engine which is a work in progress.
This page describes a feature that's under development. The feature has not been released (even in beta), its interface and function may change, etc.
Contents
If you prefer a hands-on approach, cassandra-storage-engine-use-example shows an example session. Otherwise, there is a manual-style description below.
Data mapping
MariaDB's table represents a column family in Cassandra. The table must follow this pattern:
create table cassandra_tbl -- name can be chosen at will ( rowkey char(N), -- First field must be named 'rowkey'. This -- is what Cassandra's row key is mapped to. column1 varchar(N), -- columns go here. Names should match -- Cassandra's column names ... PRIMARY KEY(rowkey) -- Primary key over 'rowkey' is mandatory ) engine=cassandra thrift_host='192.168.1.0' keyspace= 'cassandra_key_space' column_family='column_family_name';
MariaDB's table is a view of Cassandra's column family.
- column family must exist before the MariaDB table can be created
- dropping MariaDB's table will not drop the column family.
Mapping for columns
Cassandra has three kinds of columns
- The row key. It is not considered to be a column by Cassandra, but MySQL/MariaDB can only store data in columns, so we consider it a special kind of column.
- Columns that were present in static column family declaration. These columns have a defined name/data type.
- "Ad-hoc" columns that can be encountered in individual rows.
note: Cassandra's supercolumns are not handled at the moment. If we need to handle them, we'll make them dynamic-column blob. Cassandra's counter columns will be handled, and will be mapped to regular SQL columns.
When these three kinds of columns are mapped to SQL:
- The row key is mapped into a regular column named
rowkey
- Static column family members are mapped to regular SQL columns (i.e, column named 'foo' in Cassandra will map to column named 'foo' in SQL. If SQL's CREATE TABLE doesn't match Cassandra's columns, a warning is issued.
- Ad-hoc columns are all put into a regular SQL column which has a Dynamic Columns blob. This allows us to return arbitrary sets of columns within one row.
Mapping for datatypes
There is no direct 1-to-1 mapping between Cassandra data types and MySQL datatypes. Also, Cassandra's limitations are generally more lax than MySQL/MariaDB's: for example, Cassandra's limit on rowkey is about 2G, while MySQL's is about 1.5Kb.
Mapping between Cassandra types and MySQL types are as follows:
Cassandra | MariaDB |
---|---|
blob | VARBINARY(n) |
ascii | VARCHAR(n) .. use charset=latin1 |
text | VARCHAR(n) .. use charset=utf8 |
varint | BIGINT, VARCHAR(N) |
int | INT |
bigint | BIGINT |
uuid, | ?? todo: figure out thrift representation?? |
timestamp | DATETIME, TIMESTAMP |
boolean | BOOL, BOOLEAN |
float | FLOAT |
double | DOUBLE |
decimal | ?? todo: figure out thrift representation?? |
counter | BIGINT |
SQL's NULL means that "corresponding column is not present in Cassandra column family".
(TODO: what about the scenario where we already have an SQL DDL and just want to use Cassandra for storage? Perhaps, we could allow arbitrary SQL types, which are stored as strings (or even as binary) in cassandra's blob
columns?)
{TODO: Mapping for rowkey.}
Command mapping
INSERT
Cassandra doesn't provide any practical way to make INSERT different from UPDATE. Therefore, INSERT works as INSERT-or-UPDATE, it will overwrite the data, if necessary.
INSERT ... SELECT
and multi-line INSERT will try to write data in batches. Batch size is controlled by @@cassandra_insert_batch_size
system variable, which specifies max. batch size in records.
Status variables Cassandra_row_inserts
and Cassandra_row_insert_batches
allow one to see whether inserts are actually batched.
UPDATE
Currently, all UPDATE commands will fail (you can use INSERT instead) (TODO: make UPDATE work)
DELETE
DELETE FROM cassandra_table
will map totruncate(column_family)
- DELETE with WHERE clause will do regular row deletions, and, just like
del
command in cassandra-cli, may leave a row in a weird state where the row is present but all columns are NULL. (TODO: does this live till the nearest compaction?) (TODO: CQL will ignore rows that do not have all needed columns. Maybe, we should do so, too?)
SELECTs
Generally, all SELECT statement should like one expects in SQL. Conditions in form rowkey=....
allow the server to construct query plan with efficient lookups.
Batched Key Access support
Cassandra supports Batched Key Access in no-association mode. This means that it requires the SQL layer to do hashing, which means the following settings are required:
- optimizer_switch='join_cache_hashed=on'
- join_cache_level=7|8
Cassandra SE is unable to use parts of join buffer (size controlled by @@join_buffer_size). This buffer will be used exclusively by the SQL layer. It is also possible (and recommended) to limit size of read batches by setting @@cassandra_multiget_batch_size
. It specifies max. read size in records.
It is possible to track number of batches, how many keys were looked-up, and how many results were produced with these status variables:
Variable_name Value Cassandra_multiget_reads 0 Cassandra_multiget_keys_scanned 0 Cassandra_multiget_rows_read 0
How to try it out
The code is at lp:maria-captains/maria/5.5-cassandra .
Build process is not fully streamlined yet. You'll need to
- Install Cassandra (I used 1.1.2, judging from changelog, 1.1.3 should work too)
- Install Thrift (I used 0.8.0), only C++ backend is needed.
- edit
storage/cassandra/CMakeLists.txt
and modify INCLUDE_DIRECTORIES directive to point to Thrift's include directory. export LIBS="-lthrift"
, on another machine it was "-lthrift -ldl"export LDFLAGS=-L/path/to/thrift/libs
- Build the server
Testing can be done as follows:
- Start Cassandra on localhost
- Look through
mysql-test/t/cassandra.test
see the required CQL initialization commands and run them incqlsh -3
. - Run
./mysql-test-run t/cassandra.test
See also
- MDEV-431 - jira task for the cassandra storage engine
- cassandra-storage-engine-future-plans
- cassandra-storage-engine-use-example
- cassandra-storage-engine-issues
- hbase-storage-engine