History Source Flag as Spam / Inappropriate Translate

Created: 11 years, 11 months ago
Modified: 11 years, 11 months ago
Type: article
Status: active
License: CC BY-SA / Gnu FDL

Attachments

Localized Versions

Cassandra [it]

Product Versions

MariaDB starting with 10.0.1
MariaDB 5.5.27

Cassandra storage engine

You are viewing an old version of this article. View the current version here.

This page is a documentation for Cassandra storage engine which is a work in progress.

This page describes a feature that's under development. The feature has not been released (even in beta), its interface and function may change, etc.

Data mapping
1. Mapping for columns ===
2. Mapping for datatypes
Command mapping
System and status variables
TODO
How to try it out
See also

If you prefer a hands-on approach, cassandra-storage-engine-use-example shows an example session. Otherwise, there is a manual-style description below.

Data mapping

MariaDB's table represents a column family in Cassandra. The table must follow this pattern:

create table cassandra_tbl               -- name can be chosen at will
(
  rowkey  char(N),                       -- First field must be named 'rowkey'. This
                                         --   is what Cassandra's row key is mapped to.

  column1    varchar(N),                 -- columns go here. Names should match
                                         -- Cassandra's column names 
  ... 
  PRIMARY KEY(rowkey)                    -- Primary key over 'rowkey' is mandatory
) engine=cassandra 
  thrift_host='192.168.1.0' 
  keyspace= 'cassandra_key_space'
  column_family='column_family_name';

MariaDB's table is a view of Cassandra's column family.

column family must exist before the MariaDB table can be created
dropping MariaDB's table will not drop the column family.

Mapping for columns

Cassandra has three kinds of columns

The row key. It is not considered to be a column by Cassandra, but MySQL/MariaDB can only store data in columns, so we consider it a special kind of column.
Columns that were present in static column family declaration. These columns have a defined name/data type.
"Ad-hoc" columns that can be encountered in individual rows.

note: Cassandra's supercolumns are not handled at the moment. If we need to handle them, we'll make them dynamic-column blob. Cassandra's counter columns will be handled, and will be mapped to regular SQL columns.

When these three kinds of columns are mapped to SQL:

The row key is mapped into a regular column named rowkey
Static column family members are mapped to regular SQL columns (i.e, column named 'foo' in Cassandra will map to column named 'foo' in SQL. If SQL's CREATE TABLE doesn't match Cassandra's columns, a warning is issued.
Ad-hoc columns are all put into a regular SQL column which has a Dynamic Columns blob. This allows us to return arbitrary sets of columns within one row.

Mapping for datatypes

There is no direct 1-to-1 mapping between Cassandra data types and MySQL datatypes. Also, Cassandra's limitations are generally more lax than MySQL/MariaDB's: for example, Cassandra's limit on rowkey is about 2G, while MySQL's is about 1.5Kb.

Mapping between Cassandra types and MySQL types are as follows:

Cassandra	MariaDB
blob	VARBINARY(n)
ascii	VARCHAR(n) .. use charset=latin1
text	VARCHAR(n) .. use charset=utf8
varint	BIGINT, VARCHAR(N)
int	INT
bigint	BIGINT
uuid,	?? todo: figure out thrift representation??
timestamp	DATETIME, TIMESTAMP
boolean	BOOL, BOOLEAN
float	FLOAT
double	DOUBLE
decimal	?? todo: figure out thrift representation??
counter	BIGINT

SQL's NULL means that "corresponding column is not present in Cassandra column family".

(TODO: what about the scenario where we already have an SQL DDL and just want to use Cassandra for storage? Perhaps, we could allow arbitrary SQL types, which are stored as strings (or even as binary) in cassandra's blob columns?)

{TODO: Mapping for rowkey.}

Command mapping

Cassandra (unlike HBase) doesn't have any distinction between INSERT and UPDATE. We support INSERTs currently.
- TODO: batch the INSERTs together (we're already using batch_mutate so it should be easy)

SELECT ... WHERE rowkey=... makes lookup.
- TODO: Support Multi-Range-read

SELECT * FROM cassandra_table runs a full scan.

System and status variables

TODO: need Ha_xxx counters for # of various Cassandra calls. TODO: need a system variable to control how many records should be written/read in one batch.

TODO

More details here.

How to try it out

The code is at lp:maria-captains/maria/5.5-cassandra .

Build process is not fully streamlined yet. You'll need to

Install Cassandra (I used 1.1.2, judging from changelog, 1.1.3 should work too)
Install Thrift (I used 0.8.0), only C++ backend is needed.
edit storage/cassandra/CMakeLists.txt and modify INCLUDE_DIRECTORIES directive to point to Thrift's include directory.
export LIBS="-lthrift", on another machine it was "-lthrift -ldl"
export LDFLAGS=-L/path/to/thrift/libs
Build the server

Testing can be done as follows:

Start Cassandra on localhost
Look through mysql-test/t/cassandra.test see the required CQL initialization commands and run them in cqlsh -3.
Run ./mysql-test-run t/cassandra.test

Comments

Comments loading...

Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.

Cassandra storage engine

Contents

Data mapping

Mapping for columns

Mapping for datatypes

Command mapping

System and status variables

TODO

How to try it out

See also

Comments

Products

Services

Pricing

Resources

About Us

Download MariaDB