LevelDB storage engine

You are viewing an old version of this article. View the current version here.

Basic feature list

  • single-statement transactions
  • secondary indexes
  • HANDLER implementation with extensions to support atomic multi-put (kind of like multi-statement transactions)
  • binlog XA on the master to be crash safe
  • crash-proof slave replication state
  • (almost) non blocking schema change
  • full test coverage via mysql-test-run
  • hot backup
  • possible options are to have LevelDB instance per mysqld, per schema or per table

Implementation overview

One leveldb instance

Current solution is to use one LevelDB instance for mysqld process, and then prefix LevelDB keys with 'dbname.table_name', 'dbname.table_name.index_name', or some shorter equivalent.

That way, we'll be able to store arbitrary number of tables/indexes in one LevelDB instance.

Transactions

LevelDB supports

  • read snapshots
  • batch updates when you have just those, there is no easy way to support full transactional semantics in the way it is required from MySQL table engine.

The hard requirement is only to have single-statement transactions. Perhaps, we could also put limits on the size of the transaction.

(Note: the "Test implementation" is non-transactional and uses LevelDB batches, which it applies at the end of the statement. Storage Engine API allows that changes made to the table only become visible at the end of the statement)

Data encoding

LevelDB compresses data with something called SnappyCompressor.

Does this mean we can just give it keys in KeyTupleFormat, and all other columns in whatever format they happen to be in table->record[0] (except blobs) ? Or we need to design something more compact?

(note: datatypes in the provided benchmark are: composite primary/secondary keys, INTs and VARCHARs (are they latin1 or utf-8?)).

Secondary Indexes

Unique secondary indexes

Unique secondary indexes are KEY->VALUE mappings that use index columns as KEY, and Primary Key columns as VALUE. This is traditional approach, we'll get

  • possible "index-only" scans
  • non-index-only scan will be a two step process (access the index, access the primary index).

(TODO: Is this needed at all? It is expensive to maintain unique index. Having "INSERT is REPLACE" for the primary key is tolerable, but having this policy for multiple indexes will cause mess!)

Non-unique secondary indexes

.

Non-blocking schema changes

  • There is a requirement that doing schema changes does not block other queries from running.
  • Reclaiming space immediately after some parts of data were dropped is not important.

Possible approaches we could use:

  • Record format that support multiple versions. That way, adding/modifying/ dropping a non-indexed column may be instant. Note that this is applicable for records, not keys.
  • Background creation/dropping of indexes.

Hot backup

Hot backup will be made outside of this project. The idea is to hard-link the files so that they can't be deleted by compaction process, and then copy them over.

SQL Command mapping for LevelDB

INSERT

INSERTs will translate into DB::Put() calls. This means, one will not get an error if they are inserting data that's already there, and INSERT will work like REPLACE. (Q: is this ok?)

Single-line insert will translate into db->Put() call with WriteOptions(sync=true).

Multi-line INSERTs will be batched. Storage engine API provides start_bulk_insert()/end_bulk_insert() methods to aid insert batching. The batches will be not bigger than @@leveldb_max_batch_size, if INSERT is bigger, it will use multiple batches.

If an INSERT uses multiple batches and encounters some error on a non-first batch, it is not possible to fully roll it back. In other words, INSERTs bigger than certain size become non-transactional.

SELECT

UPDATE

UPDATEs will be run as reads-followed-by-writes.

DELETE

?

ALTER TABLE

.

Other details

  • The target version is MySQL 5.6 (good, because LevelDB API uses STL and 5.6-based versions support compiling with STL).
  • It is ok to make changes to LevelDB itself

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.