LevelDB storage engine development
Items that are considered for development.
Contents
AUTO_INCREMENT
We need to support AUTO_INCREMENT.
(I am investigating what storage engine needs to implement for it).
memcmp'able keys
Current way to compare keys (find the table DDL in the hash, then use ha_key_cmp()) is likely to be slow.
The advantages of current scheme are
- It is possible to restore field values from index tuple, which means index-only scans are possible.
- Keys are "packed" - for example, endspace is removed from CHAR(n) fields.
If we switched to keys that were comparable with memcmp(), one could expect comparisons to become faster.
Making keys comparable
Falcon SE
Falcon did use memcmp() to compare index tuples. Looking into the source (it is available for download still), one can see the comparison being somewhere around:
void Index::makeKey(Field *field, Value *value, int segment, IndexKey *indexKey, bool highKey) void Index::makeMultiSegmentKey(int count, Value **values, IndexKey *indexKey, bool highKey) ... void IndexKey::appendNumber(double number) ^^ makes double numbers memcmp'able...
unfortunately, there is no single, isolated part of code that we could copy. (Or may be there is, but we were not able to find it yet).
Field::make_sort_key
Found this in the source:
/** Writes a copy of the current value in the record buffer, suitable for sorting using byte-by-byte comparison. Integers are always in big-endian regardless of hardware architecture. At most length bytes are written into the buffer. @param buff The buffer, assumed to be at least length bytes. @param length Number of bytes to write. */ virtual void make_sort_key(uchar *buff, uint length) = 0;
Looks like this is exactly what we needed?
Use table/index numbers as prefixes
Currently, keys are prefixed with
dbname.table_name$INDEX_NO\0
where INDEX_NO is one byte with the number of index, counting from 1.
With this:
- Renaming a table is very slow. This is a problem, because ALTER TABLE assumes table rename is a fast, O(1) operation.
- DROP TABLE needs to delete every row, it is not possible to do the deletions later in the background. If one wants to run
DROP TABLE t; CREATE TABLE t; ...
then CREATE TABLE will have to wait until DROP has truly finished.
- Dropping a table and creating another table with different DDL causes deleted records of the first table to be compared with DDL of the second. This particular issue (but not others) can be avoided if we switch to keys that are compared with memcmp().
Proposed solution
For keys, store
[table-nr] table_record
the data dictionary will store mappings:
{ (table_name -> table-nr) // 1 (table-nr -> table-DDL) // 2 }
- #2 will be used for comparisons.
- #1 will tell us what rows to read when SQL layer accesses table by its name.
- DROP TABLE will remove the (table_name -> table-nr) entry, and then delete rows.
- RENAME TABLE will add an entry with(new_table_name -> table-nr) and remove (old_table_name -> table-nr) entry. TODO: do we need this to be crash-safe?
- TRUNCATE TABLE will remove the (table_name -> table-nr) entry and add a new (table_name -> new_table-nr) entry. then, it will delete rows.
Fewer mutexes
Current code initializes/uses a mutex for every row lock taken. According to Monty, having lots of mutexes that are spread out all over the memory will slow things down, and we should switch to fewer mutexes (this is a basic description).
Maybe, it makes sense to use mutex/condition from waiter's struct st_my_thread_var.