The unbundling of cloud data warehouses

December 7, 2023

Tom Nats
Director of Customer Solutions
Starburst

Tom Nats
Director of Customer Solutions
Starburst

More deployment options

Request Enterprise trial license key →

I was listening to the excellent “Data Engineering Podcast” yesterday and the episode was “Surveying The Market Of Database Products” with Tanya Bragin who is the VP of Product ClickHouse. If you haven’t heard this episode or even this podcast, you are really missing out. Tanya does a wonderful job summarizing the state of databases mostly focusing on where the analytical side is going.

One phrase Tanya said a few times in this episode was the “unbundling of cloud data warehouses”. This simply means separating storage from compute which data lakes have provided on day one. What’s changed is more and more use cases are able to be served from an open data lake by numerous engines with features such as ACID transactions, higher performance (ssd caching and indexing) and enterprise-grade security thanks to table formats like Iceberg, Delta Lake and Hudi. (RBAC, ABAC, masking,etc..) This means you aren’t stuck with a single vendor for all of your analytics which includes storage lock-in.

What Tanya was clear to say though was you should start by landing your data in a lake/object store and there will be times when you need to choose another technology to meet certain use cases. Examples of this are OLAP databases and how they have been modified to handle the constant ingestion of data while providing sub-second analytical queries to power applications and real-time ad-hoc queries. Other use cases are high-performance search and industry-specific type solutions. In the diagram below, I attempt to illustrate where things appear to be heading into 2024.

Some points:

Previously closed architectures like Snowflake, BigQuery, etc.. are starting to get on board and supporting external customer storage.
Data sources can be batch or real-time with more and more systems hooking right into the source systems directly to basically “mirror” their data into lake storage.
In addition to OLAP, we have other technologies such as search and purpose-built applications like Data Dog,etc..
There are many other lake engines but they need to support the table formats I listed above in order to be considered supporting an “open” architecture
One gotcha is who is going to “own” your table metadata. I outlined this in a recent post which can be found here. This can be considered lock-in as well. We’ll call it “metadata lock-in”…

To sum it up, 2024 is going to be an exciting time for companies that can finally take advantage of and build more open analytical environments without the fear of lock-in. If you get some time, I highly recommend the Data Engineering Podcast and especially this episode. Tanya is wealth of information and knowledge on database technologies and I agree with her assessment of the future of database engines.

And remember..data lakes aren’t any risk to your company…just silliness.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages.

Functional/Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites.

The unbundling of cloud data warehouses

More deployment options

BestSecret’s data journey: Moving beyond Snowflake

GigaOm TCO report: Starburst data lakehouse enables 3x faster time to insight at half the cost

Offload your cloud data warehouse workloads

How data engineering fails

Cookie Notice

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Analytical/Performance Cookies

Functional/Preference Cookies

Targeting/Advertising Cookies

Starburst’s mission is to free our customers to see the invisible and achieve the impossible

The unbundling of cloud data warehouses

More deployment options

BestSecret’s data journey: Moving beyond Snowflake

GigaOm TCO report: Starburst data lakehouse enables 3x faster time to insight at half the cost

Offload your cloud data warehouse workloads

How data engineering fails

Cookie Notice

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Analytical/Performance Cookies

Functional/Preference Cookies

Targeting/Advertising Cookies