Dissecting the architecture of Google AlloyDB, Amazon Aurora, and MariaDB Xpand

Welcome AlloyDB

Recently, Google announced their AlloyDB database-as-a-service (DBaaS) product, whose design looks strikingly similar to that of AWS Aurora for PostgreSQL. The announcement mentioned several claims of superiority over Aurora, such as multi-level caching, elasticity of the storage layer, vectorized processing of columnar data for online analytical processing (OLAP) queries, etc. But fundamentally, the design is similar to that of Aurora, which Amazon describes as “the log is the database”.

For those unfamiliar with database internals, the vast majority of database systems use a write-ahead log (WAL) to durably record all update operations for both crash recovery and replication. In addition to the WAL, conventional database systems also have data pages that contain actual data and/or metadata such as system catalogs and indexes. The main idea behind both Aurora and AlloyDB is to make the WAL the primary persistence layer of the database, and to dedicate a specialized storage service that materializes and caches data pages asynchronously. This may work well for a write-intensive workload or a read-intensive workload with a high buffer cache hit ratio (highly localized access pattern). However, for many real-world applications that do not fit either of those characteristics, performance is likely to suffer. The reason is, the cost of materializing data pages asynchronously incurs significant overhead and lag. This is referred to as the hysteresis effect.

As a metaphor for “the log is the database” approach, it’s as if you set out to read a book from start to finish, but instead were presented with a timestamped list of edits to the original manuscript as changes were made by the author. Most authors do not write a book from start to finish, but instead write different chapters at different times, and then make numerous edits. Similarly, it would be as if you went to a concert to hear a symphony, expecting it to be played from start to finish – but instead were forced to listen to all the changes to the musical score in the order that they were made. With the possible exception of Mozart, most composers did not write their masterpieces straight through in one sitting. Rather, they wrote a few notes here and there, and later made numerous edits and worked on different parts of the overall piece at different times. The rather challenging exercise for the reader or listener would be to reassemble the complete book or symphony from the time-ordered set of edits.

The problem that both Aurora and AlloyDB must deal with when having to retrieve a piece of the database that is not already materialized in cache is that it must be reconstructed from the WAL. This is a costly operation that requires fetching a block from the WAL based on a log sequence number (LSN) and data page number. This piecing together of data pages and ultimately using those materialized pages to satisfy queries takes time, and this causes a lag as compared to a conventional database system that simply would read the data page, if it didn’t already exist in cache.

While AlloyDB touts their product as a good example of “disaggregation of compute and storage,” their implementation is an anti-pattern for this type of architecture. There are some cases where “intelligent” or “application-aware” storage can be beneficially used to offload database operations, such as predicate pushdown into the storage layer. But doing data page materialization asynchronously is generally not a good design decision for supporting a broad class of workload types, such as those with writes immediately followed by reads of the most recently updated data, which is a common access pattern.

Put together, this makes me think – somewhat sadly – that Aurora and AlloyDB are not only alike, they are LemmingDB’s – the blind leading the blind, joining a long list of database products that position themselves as “PostgreSQL-compatible” in one way or another. As one of the original PostgreSQL developers, I have some perspective on the original goals of the Postgres project at the University of California at Berkeley in the mid- to late-1980s, and what it has evolved into. See my recent blog on Weighing the Pros and Cons of PostgreSQL. While many of the current offerings that claim some degree of PostgreSQL compatibility in the name of “openness”, they are hindered by many of the problems, disadvantages, and baggage that are inherent with PostgreSQL. Some have managed to overcome a few of those problems, but none have solved them all.

If you are interested in buying a database from a long line of lemming-like database vendors, by all means consider using AlloyDB, Aurora, or any one of several PostgreSQL-compatible products.

A Modern Alternative: Distributed SQL

Personally, I recently made the decision to join the MariaDB Xpand project, with a focus on geo-distribution capabilities. Xpand is a mature distributed SQL database based on a shared nothing architecture. Unlike AWS Aurora or Google’s AlloyDB storage replication, Xpand is a true multi-writer database. Any node can be written to, and adding nodes adds scale for reads and writes.

Rather than a database whose performance is highly dependent on the characteristics of an application’s workload, MariaDB Xpand offers best-in class price/performance for a wide range of customer workload types regardless of the workload’s data access patterns. It has highly efficient parallel query execution and parallelized I/O paths, and no need to “materialize” data pages. This stands in contrast to the approach taken by AlloyDB and Aurora which suffer from the hysteresis effect for certain access patterns.

It is nice to see AlloyDB recognizing the need for support of operational analytics type queries based on a columnar model. Xpand actually is the first modern distributed SQL database that implemented columnar indexes for OLAP queries. Xpand uses columnar indexes for real-time operational analytics directly on transactional data, without losing consistency or missing the latest transactions.

More resources

Xpand can be installed on premises, in a public cloud and is available as a fully managed cloud database service in MariaDB SkySQL.
Take the Distributed SQL Challenge: If you are evaluating distributed SQL for an upcoming project, take our distributed SQL challenge to get the best performing distributed SQL database.