Comments - Cassandra Storage Engine Overview

9 years, 8 months ago Andrew Galewsky

I have built a simple example using a dictionary of words - with word as the primary key.

In mariadb - if I use this query:

select * from words where word='hello';

1 row in set (0.07 sec)

Works as expected - using the primary key

then I do: select * from words where word > 'hello' and word < 'help'; 54 rows in set (4.62 sec)

That is roughly the same time as a full table scan (from other experiments) so it appears not to be using the primary key in this example.

Am I doing something wrong - or misunderstanding something fundimental...

-Andy

 
9 years, 8 months ago Sergei Petrunia

What does the EXPLAIN show for word >'hello'? I guess it will have type=ALL, which means a full table scan.

Internally, Cassandra's Thrift API for primary key lookups is:

list<ColumnOrSuperColumn> get_slice(binary key, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)

(see http://wiki.apache.org/cassandra/API10). Note that you have to provide one single key value, not a range of values.

There is get_range_slices() call, which looks as if it was returning records in a range, but there is a catch. Cassandra docs say:

"Note that when using RandomPartitioner, keys are stored in the order of their MD5 hash, making it impossible to get a meaningful range of keys between two endpoints."

and since RandomPartitioner is the one that is typically used with Cassandra, we have to conclude that ordered scans on Primary Key are not possible in Cassandra (at least not in an efficient way).

 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.