Cassandra storage engine future plans
Unlike MySQL/MariaDB, Cassandra is not suitable for transaction processing. (the only limited scenario that they handle well is append-only databases)
Instead, they focus on ability to deliver real-time analytics. This is achieved via the following combination:
1. Insert/update operations are reduced to inserting a newer version, their implementation (SSTree) allows to make lots of updates at a low cost
2. The data model is targeted at denormalized data, cassandra docs and user stories all mention the practice of creating one column family (=table) for each type of query you're going to have.
In other words, Cassandra encourages creation/use of materialized VIEWs. Having lots of materialized VIEWs makes updates expensive, but their low cost and non-conflicting nature should offset that.
How does one use Cassandra together with an SQL database? I can think of these use cases:
use case 1
- The "OLTP as in SQL" is kept in the SQL database. - Data from logging (such as web views, or clicks) is stored in Cassandra, but can also be accessed from SQL.
As an example, one can think of a web shop, which provides real-time peeks into analytics data, like amazon's "people who looked at this item, also looked at ..." , or "*today*'s best sellers in this category are ...", etc
Generally, CQL (Cassandra Query Language) allows to query Cassandra's data in an SQL-like fashion.
Access through a storage engine will additionally allow: - to get all of the data from one point, instead of rwo - joins betwen SQL and Cassandra's data *might* be more efficent due to Batched Key Access (this remains to be seen) - ??
use case 2
Suppose, all of the system's data is actually stored in an OLTP SQL database.
Cassandra is only used as an accelerator for analytical queries. Cassandra won't allow arbitrary, ad-hoc dss-type queries, it will require the DBA to create and maintain appropriate column families (however, it is supposed to give a nearly-instant answers to analytics-type questions).
Tasks that currently have no apparent solutions: - There is no way to replicate data from MySQL/MariaDB into Cassandra. It would be nice if one could update data in MySQL and that would cause appropriate inserts made into all relevant column families in Cassandra. - ...