Temporal Tables Part 3: Managing Historical Data Growth
This is part 3 of a 5-part series, if you want to start from the beginning see Temporal Tables Part 1: Introduction & Use Case Example
Up until now, we haven’t crisply defined what is meant by SYSTEM_TIME in the above examples. With the DDL statement above, the time that is recorded is the time that the change arrived at the database server. This suffices for many use cases, but in some cases, particularly when debugging the behavior of queries at specific points of time, it is more important to know when the change was committed to the database. Only at that point does the data become visible to other users of the database. MariaDB can record temporal information based on the commit time by using transaction-precise system version. Two extra columns, start_trxid and end_trxid must be manually declared on the table:
CREATE TABLE purchaseOrderLines( Â Â purchaseOrderID INTEGER NOT NULL Â , LineNum SMALLINT NOT NULL Â , status VARCHAR(20) NOT NULL Â , itemID INTEGER NOT NULL Â , supplierID INTEGER NOT NULL Â , purchaserID INTEGER NOT NULL Â , quantity SMALLINT NOT NULL Â , price DECIMAL (10,2) NOT NULL Â , discountPercent DECIMAL (10,2) NOT NULL Â , amount DECIMAL (10,2) NOT NULL Â , orderDate DATETIME Â , promiseDate DATETIME Â , shipDate DATETIME Â , start_trxid BIGINT UNSIGNED GENERATED ALWAYS AS ROW START Â , end_trxid BIGINT UNSIGNED GENERATED ALWAYS AS ROW END Â , PERIOD FOR SYSTEM_TIME(start_trxid, end_trxid) Â , PRIMARY KEY (purchaseOrderID, LineNum) ) WITH SYSTEM VERSIONING;
The rows now contain columns that represent the start and end the transaction ids for the change as recorded in the TRANSACTION_REGISTRY table in the system schema.
If you need to return the transaction commit time information from your temporal queries, you will need to join with this TRANSACTION_REGISTRY table, returning the commit_timestamp:
SELECT   commit_timestamp  , begin_timestamp  , purchaseOrderID  , LineNum  , status  , itemID  , supplierID  , purchaserID  , quantity  , price  , amount FROM purchaseOrderLines, mysql.transaction_registry WHERE start_trxid = transaction_id;
This will show when the change became visible to all sessions in the database (the most common scenario), or the begin_timestamp if you care about the beginning of the transaction that made the change.
Capturing the history of changes to a table does not come without some cost. As we showed earlier, one insert with three subsequent updates results in 4 rows being stored in the database.
For smaller tables, or tables that have infrequent changes to their rows, this may not be a problem. The storage and performance impact of additional rows might be insignificant compared to other activity. However, high-volume tables with many changes to rows may want to consider techniques for managing the growth of the historical data.
The first option is to disable temporal track for specific columns when appropriate. This is accomplished by using the WITHOUT SYSTEM VERSIONING modified on specific columns:
CREATE TABLE PurchaseOrderLines ( Â Â purchaseOrderID INTEGER NOT NULL Â , LineNum SMALLINT NOT NULL Â , status VARCHAR(20) NOT NULL Â , itemID INTEGER NOT NULL Â , supplierID INTEGER NOT NULL Â , purchaserID INTEGER NOT NULL Â , quantity SMALLINT NOT NULL Â , price DECIMAL(10,2) NOT NULL Â , discountPercent DECIMAL(10,2) NOT NULL Â , amount DECIMAL(10,2) NOT NULL Â , orderDate DATETIME Â , promiseDate DATETIME Â , shipDate DATETIME Â , comments VARCHAR(2000) WITHOUT SYSTEM VERSIONING Â , PRIMARY KEY (purchaseOrderID, LineNum) ) WITH SYSTEM VERSIONING;
Partitioning is another popular technique for managing the growth of historical data in temporal tables. The CURRENT keyword is understood by the partitioning logic when used on temporal tables with system versioning. Isolating the historical versions of the rows into their own partition is as simple as:
CREATE TABLE PurchaseOrderLines ( Â Â purchaseOrderID INTEGER NOT NULL Â , LineNum SMALLINT NOT NULL Â , status VARCHAR(20) NOT NULL Â , itemID INTEGER NOT NULL Â , supplierID INTEGER NOT NULL Â , purchaserID INTEGER NOT NULL Â , quantity SMALLINT NOT NULL Â , price DECIMAL (10,2) NOT NULL Â , discountPercent DECIMAL (10,2) NOT NULL Â , amount DECIMAL (10,2) NOT NULL Â , orderDate DATETIME Â , promiseDate DATETIME Â , shipDate DATETIME Â , comments VARCHAR(2000) WITHOUT SYSTEM VERSIONING Â , PRIMARY KEY (purchaseOrderID, LineNum) ) WITH SYSTEM VERSIONING Â Â PARTITION BY SYSTEM_TIME ( Â Â Â Â PARTITION p_hist HISTORY Â Â Â , PARTITION p_cur CURRENT );
This technique is especially powerful because partitions will be pruned when executing queries. Queries that access the current information will quickly skip historical data and only interact with the smaller data and associated indexes on the current partition.
Partitioning becomes an even more powerful tool when combined with interval definitions, dividing historical data into buckets that can then be managed individually.
CREATE TABLE PurchaseOrderLines ( Â Â purchaseOrderID INTEGER NOT NULL Â , LineNum SMALLINT NOT NULL Â , status VARCHAR(20) NOT NULL Â , itemID INTEGER NOT NULL Â , supplierID INTEGER NOT NULL Â , purchaserID INTEGER NOT NULL Â , quantity SMALLINT NOT NULL Â , price DECIMAL (10,2) NOT NULL Â , discountPercent DECIMAL (10,2) NOT NULL Â , amount DECIMAL (10,2) NOT NULL Â , orderDate DATETIME Â , promiseDate DATETIME Â , shipDate DATETIME Â , comments VARCHAR(2000) WITHOUT SYSTEM VERSIONING Â , PRIMARY KEY (purchaseOrderID, LineNum) ) WITH SYSTEM VERSIONING Â Â PARTITION BY SYSTEM_TIME INTERVAL 1 WEEK ( Â Â Â Â PARTITION p0 HISTORY Â Â Â , PARTITION p1 HISTORY Â Â Â , PARTITION p2 HISTORY Â Â Â ... Â Â Â , PARTITION p_cur CURRENT );
Once a temporal table is partitioned based on intervals, administrators can use the Transportable Tablespaces feature of the InnoDB storage engine and the EXCHANGE PARTITION command syntax to manage table growth. Copying, dropping, and restoring partitions become simple data definition language (DDL) commands and file system operations, avoiding performance impact of changing individual rows.
Continue to Temporal Tables Part 4: Application Time to learn more.