The place between transactions and analytics – and what it means for you
We’ve always classified database queries as transactional or analytical. If our application has transactional queries, we use an OLTP database. If it has analytical queries, we use an OLAP database (or data warehouse). However, with analytics fast becoming a competitive differentiator for customer-facing web and mobile applications, businesses are discovering the place between transactions and analytics.
On one hand you have standard CRUD operations – view a product, add it to a cart and complete a purchase. All of these interactions are performed with transactional queries as part of a customer interaction. They typically need most or all columns in a row, and use indexes to improve performance. They’re row oriented, and are executed on an OLTP database.
On the other, you have a data analyst performing a market basket analysis to improve product recommendations. It’s performed with analytical queries as part of an employee interaction. The queries typically need a small number of columns in a row, and do not use indexes. They’re column oriented, and they would be executed on a separate OLAP database.
Then there is the place between – queries within customer-facing applications that have to sort, filter and aggregate data without the help of indexes. These queries may need to aggregate a column (or a small number of columns) for many rows and use the results to look up a small number of rows in a separate table. They are both column oriented and row oriented.
For example, say you want to alert customers of the top five soon-to-be-sold-out products based on the quantity in active shopping carts, the quantity in recent purchases (say, the last 24 hours) and current inventory. This query will use sorting, filtering and aggregation to find the product IDs for the top five soon-to-be-sold our products – column oriented and analytical. It may access many thousands, if not millions, of shopping carts and purchases – all to get the IDs of five products. It will then use them to look up five rows in the products table by product ID (i.e., the primary key) – row oriented and transactional. It’s the space between transactional and analytical, and today it’s a critical component of innovative, customer-facing applications.
What’s needed from the database
To enable ad hoc, near-real-time analytical queries on transactional data, you need a database that does three things – each of which is enabled by MariaDB’s approach to hybrid transactional/analytical processing.
First, it needs to store the same data in two different tables: one with a columnar format to support sorting, filtering and aggregating data without indexes; and one with a row format to support looking up rows based on primary keys or secondary indexes.
MariaDB Platform is based on workload-optimized storage engines, so it uses the InnoDB or MyRocks storage engines to store data in rows, optimized for transactional queries. And it uses the ColumnStore storage engine to store data in a columnar format, optimized for analytical queries.
Second, it needs to synchronize the two tables. All changes to the row-based table must be replicated to the column-based table. The database uses streaming change-data-capture with micro batching to synchronize the two.
MariaDB Platform supports cross-engine joins, or the capability to join rows stored in columnar tables to rows in row-based tables. In our example of soon-to-be-sold-out products, it would perform the aggregations using tables with the columnar format, and the lookups on tables using the row format.
Third, it needs to know which tables to use and when. MariaDB Platform uses rules-based query routing to determine which table the query should be sent to. In some cases, it’s a binary choice. If there’s a CRUD operation, the query is routed to the row-based table. If the query cannot use indexes or is performing an aggregate on a small number of columns, it is routed to the column-based table.
For more on hybrid transactional/analytical processing, watch this on-demand webinar: The Role of Databases in a Hybrid-Everything Future.