SkySQL and Life After Amazon Redshift, Part 1

If you needed an on-premises data warehouse today, would you choose a version of Postgres released in 2005 (Postgres 8.0.2)? How about a version of MySQL released in 2005 (MySQL 5.0.15)? Perhaps you’d ask Oracle if you could use Oracle Database 10g Release 2 even though it was released in 2005 and reached end-of-life a decade ago?

I didn’t think so. You know better, and so do I.

Amazon? Not so much.

They built Redshift on ParAccel Analytic Database (PADB) in 2013, itself built on Postgres 8.0.2. ParAccel would go on to be acquired by Actian. PADB would become Actian Matrix. And Redshift? Seven years later and it’s still based on Postgres 8.0.2. That’s right. Redshift is, and always has been, based on a 15 year old version of Postgres.

I get it. I still have a fleece I wore some 20 years ago. It’s falling apart, but I tell myself it can still do the job (it can’t). And if I wear it, I get an earful from my wife. But here’s the thing. It didn’t stop me from getting new fleeces, and it’s no longer my go-to fleece when it’s cold out.

As an aside, I thought Greenplum was bad (Postgres 9.4), but it has nothing on Redshift. Then again, at least it hasn’t ended up at Microstrategy where software goes to die. Sorry Vertica.

So, if you wouldn’t use a 15 year old database for an on-premises data warehouse, why would you use one for a cloud data warehouse?

To be fair, there weren’t a lot of options back in 2013. When it comes to a data warehouse, you want a distributed, columnar database. It has much better compression, far less disk IO and scales out from gigabytes to petabytes of data. But if you wanted one in the cloud, Redshift was the only game in town.

Things change.

A few years ago MariaDB introduced ColumnStore, a distributed and columnar storage engine. Enterprise customers and community users alike have been deploying MariaDB as a data warehouse ever since, replacing Greenplum and Vertica, or when they reached the limits of its analytical capabilities, MySQL.

Enter SkySQL. In the context of Amazon, it’s RDS and Redshift combined. It can deploy databases for transactions (row data), data warehouses for analytics (columnar data) or hybrid databases for smart transactions (row + columnar data).

There are three things you should know about SkySQL vs. Redshift:

SkySQL is updated regularly
MariaDB releases a new version of ColumnStore every year, adding new features and improvements. We’re doing the same with SkySQL. We’re not going to prop up a 15 year old database and hold it there for as long as we can.
SkySQL separates compute from storage
Despite Amazon’s claims, Redshift does not separate storage from compute. If you have 128TB of data, you need a cluster with four (4) nodes because a) the data is replicated, so you actually need 256TB of storage and b) the only way to add storage is to add nodes, and nodes add at most 64TB of storage. SkySQL, on the other hand, separates storage from compute. The data is stored on object storage (e.g., S3), buffered on a local SSD and cached in memory. You can scale out storage as much as you like without adding nodes. In fact, storage is unlimited.
No ETL is needed with SkySQL
You don’t have to worry about creating complex ETL pipelines to copy RDS data into Redshift. SkySQL does it for you. If you launch a hybrid database, data will be stored in both row and columnar formats. The row format is used by transactional queries, the columnar format by analytical queries. Easy peasy. No more ETL. No more batch processing.

Amazon launched cloud data warehouses the easy way. With SkySQL, MariaDB did it the right way – and with a significantly lower price, like half the price significant.

There is life after Redshift, and it begins now.

Continue with part 2 of this blog

Read Now