History Source Flag as Spam / Inappropriate Translate

Created: 11 years, 11 months ago
Modified: 11 years, 10 months ago
Type: article
Status: active
License: CC BY-SA / Gnu FDL

Attachments

Localized Versions

Cassandra [it]

Product Versions

MariaDB starting with 10.0.1
MariaDB 5.5.27

Cassandra storage engine

You are viewing an old version of this article. View the current version here.

This page is a documentation for Cassandra storage engine which is a work in progress.

This page describes a feature that's under development. The feature has not been released (even in beta), its interface and function may change, etc.

Data mapping
1. Mapping for columns ===
2. Mapping for datatypes
Command mapping
How to try it out
See also

If you prefer a hands-on approach, cassandra-storage-engine-use-example shows an example Cassandra storage engine session. This page is more exhaustive and up-to-date, though.

Data mapping

MariaDB's table represents a column family in Cassandra. The table must follow this pattern:

create table cassandra_tbl      -- Table name can be chosen at will
(
  rowkey  type PRIMARY KEY,     -- represents Column Family's row key. Primary key
                                -- must be defined over the column.

  column1 type,                 -- Names of columns must match the names of CQL's 
  column2 type,                 -- columns; datatypes must also "match", see below
                                -- for what this means
  ... 
) engine=cassandra
  thrift_host='192.168.1.0' 
  keyspace= 'cassandra_key_space'
  column_family='column_family_name';

MariaDB's table represents a view of Cassandra's Column Family. That is,

Creating a table will not create a Column Family. Column Family must exist in Cassandra before MariaDB table can be created.
Dropping the table will not drop the Column Family.

Mapping for columns

Cassandra has three kinds of columns

The row key. Cassandra API does not consider it to be a column, but CQL and MariaDB do.
- If Column Family defines a name for rowkey (always true when using CQL), then the name of primary key column in MariaDB must match it.
- If Column Family doesn't define a name for row key, MariaDB's column must be named "rowkey".

Columns that are defined as part of Cassandra's "static column family" (this includes columns that one can define with CQL). These columns should have their counterparts in MariaDB. A static column named 'foo' in Cassandra must have a counterpart named 'foo' in MariaDB. The types must also match, they are covered below.

"Ad-hoc" columns that can be encountered in individual rows. These will be mapped into one blob column with Dynamic Columns content. This is not yet implemented, but is actively worked on

Counter columns are not supported but will be supported very soon.

Cassandra's SuperColumns are not supported, and we have no plans to support them.

Mapping for datatypes

There is no direct 1-to-1 mapping between Cassandra's data types and MySQL datatypes. Also, Cassandra's size limitations are often more relaxed than MySQL/MariaDB's, for example, Cassandra's limit on rowkey is about 2G, while MySQL's is about 1.5Kb.

Mapping between Cassandra types and MySQL types are as follows:

Cassandra	MariaDB	Already supported?
blob	VARBINARY(n)	Yes, to [VAR]CHAR or [VAR]BINARY*
ascii	VARCHAR(n) .. use charset=latin1	Yes, to [VAR]CHAR or [VAR]BINARY*
text	VARCHAR(n) .. use charset=utf8	Yes, to [VAR]CHAR or [VAR]BINARY*
varint	BIGINT, VARCHAR(N)
int	INT	Yes
bigint	BIGINT	Yes, also to TINY and SHORT
uuid,	CHAR(36), text representation	Not yet, high priority
timestamp	TIMESTAMP	Yes, you can map to TIMESTAMP with seconds precision, or microseconds precision (Cassandra stores milliseconds)
boolean	BOOL, BOOLEAN
float	FLOAT	Yes
double	DOUBLE	Yes
decimal	?? todo: figure out thrift representation??
counter	BIGINT	Not yet, high priority

(*) - charsets/collations are not strictly enforced, yet.

SQL's NULL means that "corresponding column is not present in Cassandra column family". There are two possible approaches to handle them, see cassandra-storage-engine-issues.

(TODO: what about the scenario where we already have an SQL DDL and just want to use Cassandra for storage? Perhaps, we could allow arbitrary SQL types, which are stored as strings (or even as binary) in cassandra's blob columns?)

Command mapping

INSERT

Cassandra doesn't provide any practical way to make INSERT different from UPDATE. Therefore, INSERT works as INSERT-or-UPDATE, it will overwrite the data, if necessary.

INSERT ... SELECT and multi-line INSERT will try to write data in batches. Batch size is controlled by @@cassandra_insert_batch_size system variable, which specifies max. batch size in records.

Status variables Cassandra_row_inserts and Cassandra_row_insert_batches allow one to see whether inserts are actually batched.

UPDATE

Currently, all UPDATE commands will fail (you can use INSERT instead) (TODO: make UPDATE work)

DELETE

DELETE FROM cassandra_table will map to truncate(column_family)
DELETE with WHERE clause will do regular row deletions, and, just like del command in cassandra-cli, may leave a row in a weird state where the row is present but all columns are NULL. (TODO: does this live till the nearest compaction?) (TODO: CQL will ignore rows that do not have all needed columns. Maybe, we should do so, too?)

SELECTs

Generally, all SELECT statement should like one expects in SQL. Conditions in form primary_key=... allow the server to construct query plans which access Cassandra's rows with key lookups.

Full table scan

These are done in a memory-efficient way. Cassandra SE will split table scan into batches which retrieve @@cassandra_rnd_batch_size records at a time.

Batched Key Access support

Cassandra supports Batched Key Access in no-association mode. This means that it requires the SQL layer to do hashing, which means the following settings are required:

optimizer_switch='join_cache_hashed=on'
join_cache_level=7|8

Cassandra SE is unable to use parts of join buffer (size controlled by @@join_buffer_size). This buffer will be used exclusively by the SQL layer. It is also possible (and recommended) to limit size of read batches by setting @@cassandra_multiget_batch_size. It specifies max. read size in records.

It is possible to track number of batches, how many keys were looked-up, and how many results were produced with these status variables:

Variable_name	Value
Cassandra_multiget_reads	0
Cassandra_multiget_keys_scanned	0
Cassandra_multiget_rows_read	0

How to try it out

The code is at lp:maria-captains/maria/5.5-cassandra .

Build process is not fully streamlined yet. You'll need to

Install Cassandra (I used 1.1.2, judging from changelog, 1.1.3 should work too)
Install Thrift (I used 0.8.0), only C++ backend is needed.
edit storage/cassandra/CMakeLists.txt and modify INCLUDE_DIRECTORIES directive to point to Thrift's include directory.
export LIBS="-lthrift", on another machine it was "-lthrift -ldl"
export LDFLAGS=-L/path/to/thrift/libs
Build the server

Testing can be done as follows:

Start Cassandra on localhost
Look through mysql-test/t/cassandra.test see the required CQL initialization commands and run them in cqlsh -3.
Run ./mysql-test-run t/cassandra.test

Comments

Comments loading...

Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.

Cassandra storage engine

Contents

Data mapping

Mapping for columns

Mapping for datatypes

Command mapping

INSERT

UPDATE

DELETE

SELECTs

Full table scan

Batched Key Access support

How to try it out

See also

Comments

Products

Services

Pricing

Resources

About Us

Download MariaDB