HBase storage engine

You are viewing an old version of this article. View the current version here.

Data mapping from HBase to SQL

Hbase data model and operations

1.1 HBase data model

  • An HBase table consists of rows, which are identified by row key.
  • Each row has an arbitrary (potentially, very large) number of columns.
  • Columns are split into column groups, column groups define how the columns are stored (not reading some column groups is an optimization).
  • Each (row, column) combination can have multiple versions of the data, identified by timestamp.

1.2 Hbase read operations

HBase API defines two ways to read data:

  • Point lookup: get record for a given row_key.
  • Point scan: read all records in [startRow, stopRow) range.

Both kinds of scans allow to specify:

  • A column family we're interested in
  • A particular column we're interested in

The default behavior for versioned columns is to return only the most recent version. HBase API also allows to ask for

  • versions of columns that were valid at some specific timestamp value;
  • all versions that were valid within a specifed [minStamp, maxStamp) interval.
  • N most recent versions We'll refer to the above as [VersionedDataConds].

One can see two ways to map HBase tables to SQL tables:

Comments

Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.