Temporal Tables Part 3: Managing Historical Data Growth

spacer

This is part 3 of a 5-part series, if you want to start from the beginning see Temporal Tables Part 1: Introduction & Use Case Example

Up until now, we haven’t crisply defined what is meant by SYSTEM_TIME in the above examples.  With the DDL statement above, the time that is recorded is the time that the change arrived at the database server.  This suffices for many use cases, but in some cases, particularly when debugging the behavior of queries at specific points of time, it is more important to know when the change was committed to the database.  Only at that point does the data become visible to other users of the database.  MariaDB can record temporal information based on the commit time by using transaction-precise system version.  Two extra columns, start_trxid and end_trxid must be manually declared on the table:

CREATE TABLE purchaseOrderLines(
    purchaseOrderID              INTEGER NOT NULL
  , LineNum                      SMALLINT NOT NULL
  , status                       VARCHAR(20) NOT NULL
  , itemID                       INTEGER NOT NULL
  , supplierID                   INTEGER NOT NULL
  , purchaserID                  INTEGER NOT NULL
  , quantity                     SMALLINT NOT NULL
  , price                        DECIMAL (10,2) NOT NULL
  , discountPercent              DECIMAL (10,2) NOT NULL
  , amount                       DECIMAL (10,2) NOT NULL
  , orderDate                    DATETIME
  , promiseDate                  DATETIME
  , shipDate                     DATETIME
  , start_trxid                  BIGINT UNSIGNED GENERATED ALWAYS AS ROW START
  , end_trxid                    BIGINT UNSIGNED GENERATED ALWAYS AS ROW END
  , PERIOD FOR SYSTEM_TIME(start_trxid, end_trxid) 
  , PRIMARY KEY (purchaseOrderID, LineNum)
) WITH SYSTEM VERSIONING;

The rows now contain columns that represent the start and end the transaction ids for the change as recorded in the TRANSACTION_REGISTRY table in the system schema.

Temporal table: example 1

If you need to return the transaction commit time information from your temporal queries, you will need to join with this TRANSACTION_REGISTRY table, returning the commit_timestamp:

SELECT 
    commit_timestamp
  , begin_timestamp
  , purchaseOrderID
  , LineNum
  , status
  , itemID
  , supplierID
  , purchaserID
  , quantity
  , price
  , amount
FROM purchaseOrderLines, mysql.transaction_registry
WHERE start_trxid = transaction_id;

This will show when the change became visible to all sessions in the database (the most common scenario), or the begin_timestamp if you care about the beginning of the transaction that made the change.

Temporal table: example 2

Capturing the history of changes to a table does not come without some cost.  As we showed earlier, one insert with three subsequent updates results in 4 rows being stored in the database.

Temporal table: example 3

For smaller tables, or tables that have infrequent changes to their rows, this may not be a problem.  The storage and performance impact of additional rows might be insignificant compared to other activity.  However, high-volume tables with many changes to rows may want to consider techniques for managing the growth of the historical data.

The first option is to disable temporal track for specific columns when appropriate.  This is accomplished by using the WITHOUT SYSTEM VERSIONING modified on specific columns:

CREATE TABLE PurchaseOrderLines (
    purchaseOrderID         INTEGER NOT NULL
  , LineNum                 SMALLINT NOT NULL
  , status                  VARCHAR(20) NOT NULL
  , itemID                  INTEGER NOT NULL
  , supplierID              INTEGER NOT NULL
  , purchaserID             INTEGER NOT NULL
  , quantity                SMALLINT NOT NULL
  , price                   DECIMAL(10,2) NOT NULL
  , discountPercent         DECIMAL(10,2) NOT NULL
  , amount                  DECIMAL(10,2) NOT NULL
  , orderDate               DATETIME
  , promiseDate             DATETIME
  , shipDate                DATETIME
  , comments                VARCHAR(2000) WITHOUT SYSTEM VERSIONING
  , PRIMARY KEY (purchaseOrderID, LineNum)
) WITH SYSTEM VERSIONING;

Partitioning is another popular technique for managing the growth of historical data in temporal tables.  The CURRENT keyword is understood by the partitioning logic when used on temporal tables with system versioning.  Isolating the historical versions of the rows into their own partition is as simple as:

CREATE TABLE PurchaseOrderLines (
    purchaseOrderID         INTEGER NOT NULL
  , LineNum                 SMALLINT NOT NULL
  , status                  VARCHAR(20) NOT NULL
  , itemID                  INTEGER NOT NULL
  , supplierID              INTEGER NOT NULL
  , purchaserID             INTEGER NOT NULL
  , quantity                SMALLINT NOT NULL
  , price                   DECIMAL (10,2) NOT NULL
  , discountPercent         DECIMAL (10,2) NOT NULL
  , amount                  DECIMAL (10,2) NOT NULL
  , orderDate               DATETIME
  , promiseDate             DATETIME
  , shipDate                DATETIME
  , comments                VARCHAR(2000) WITHOUT SYSTEM VERSIONING
  , PRIMARY KEY (purchaseOrderID, LineNum)
) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME (
        PARTITION p_hist HISTORY
      , PARTITION p_cur CURRENT
);

This technique is especially powerful because partitions will be pruned when executing queries.  Queries that access the current information will quickly skip historical data and only interact with the smaller data and associated indexes on the current partition.

Partitioning becomes an even more powerful tool when combined with interval definitions, dividing historical data into buckets that can then be managed individually.

CREATE TABLE PurchaseOrderLines (
    purchaseOrderID          INTEGER NOT NULL
  , LineNum                  SMALLINT NOT NULL
  , status                   VARCHAR(20) NOT NULL
  , itemID                   INTEGER NOT NULL
  , supplierID               INTEGER NOT NULL
  , purchaserID              INTEGER NOT NULL
  , quantity                 SMALLINT NOT NULL
  , price                    DECIMAL (10,2) NOT NULL
  , discountPercent          DECIMAL (10,2) NOT NULL
  , amount                   DECIMAL (10,2) NOT NULL
  , orderDate                DATETIME
  , promiseDate              DATETIME
  , shipDate                 DATETIME
  , comments                 VARCHAR(2000) WITHOUT SYSTEM VERSIONING
  , PRIMARY KEY (purchaseOrderID, LineNum)
) WITH SYSTEM VERSIONING
    PARTITION BY SYSTEM_TIME INTERVAL 1 WEEK (
        PARTITION p0 HISTORY
      , PARTITION p1 HISTORY
      , PARTITION p2 HISTORY
      ...
      , PARTITION p_cur CURRENT
);

Once a temporal table is partitioned based on intervals, administrators can use the Transportable Tablespaces feature of the InnoDB storage engine and the EXCHANGE PARTITION command syntax to manage table growth.  Copying, dropping, and restoring partitions become simple data definition language (DDL) commands and file system operations, avoiding performance impact of changing individual rows.

Continue to Temporal Tables Part 4: Application Time to learn more.