A first-class experience for Iceberg, Delta Lake, and Hudi in Starburst Galaxy

CompanyJune 22, 2023

Emma Lullo
Senior Product Marketing Manager, Starburst Galaxy
Starburst

Emma Lullo
Senior Product Marketing Manager, Starburst Galaxy
Starburst

More deployment options

Request Enterprise trial license key →

The rising popularity of the data lakehouse has led many to try to compare the merits of the open table formats underpinning this architecture: Apache Iceberg, Delta Lake, and Apache Hudi. If you look between the lines, the conversation is mostly driven by hype, making it hard to parse reality from marketing jargon.

This article isn’t going to solve that problem. Instead, the goal is to introduce you to a new way of thinking about table formats – as a use case-level choice rather than an organization-level decision.

Choosing a table format

When deciding between table formats, it’s important to understand the similarities and differences that may impact performance and scalability.

For example, Iceberg is currently the only table format with partition evolution support. This allows the partitioning scheme of a table to be changed without requiring a rewrite of the table, and it enables queries to be optimized by all partition schemes.

On the other hand, Iceberg’s streaming support is lagging behind Delta Lake and Hudi. So the question to pick a table format becomes – which is more important to your business? Partitioning or streaming?

Now, any seasoned data engineer knows that it’s not that simple. You don’t just have a single type of data in your systems or a single way you’re looking to interact with that data. Instead, you’re dealing with streaming pipelines, batch jobs, ad hoc queries, and more – all at the same time. And you don’t get to control what is added to that mix in the future.

All of these factors make the binary decision – partitioning or streaming, Iceberg or Delta Lake – almost impossible to get right at the organization-level. But most vendors require you to do just that.

Starburst’s approach

With Starburst, everything is built with openness in mind. We designed Starburst Galaxy to be interoperable with nearly any data environment, including first-class support for all modern open table formats.

This means that you can use the table format that is right for each of your workloads and change it when new needs emerge. You don’t need to worry about limited support for external tables or being locked into an old table format when new ones come along (and it will).

How it works

We wanted to make it as easy as possible to write to and read from different table formats, so we built Great Lakes connectivity – an under-the-hood process that abstracts away the details of using different table formats and file types.

This connectivity is built into Starburst Galaxy, and is available to all users that are working with the following data sources:

Amazon S3
Azure Data Lake Storage
Google Cloud Storage

To create a table with one of these formats, you simply provide a “type” in the table ddl. Here is a simple example of creating an Iceberg table:

CREATE TABLE customer(
name varchar,
address varchar,
WITH (type='iceberg');

That’s it! An Iceberg table has been created.

To read a table using Great Lakes connectivity, you simply issue a SQL select query against it:

SELECT * FROM customer

Again… that’s it! End users shouldn’t need to worry about file types or table formats, they just want to query their data.

Try Starburst Galaxy today

The analytics platform for your data lake

Start free

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages.

Functional/Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites.

A first-class experience for Iceberg, Delta Lake, and Hudi in Starburst Galaxy

More deployment options

Choosing a table format

Starburst’s approach

How it works

Try Starburst Galaxy today

Announcing the public preview of Gravity in Starburst Galaxy

Introducing Starburst Galaxy as a data lake analytics platform

Building an open lakehouse just got easier with Starburst Galaxy & Tabular

Why choose Apache Iceberg over Databricks’ Delta Lake

Cookie Notice

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Analytical/Performance Cookies

Functional/Preference Cookies

Targeting/Advertising Cookies

Starburst’s mission is to free our customers to see the invisible and achieve the impossible

A first-class experience for Iceberg, Delta Lake, and Hudi in Starburst Galaxy

More deployment options

Choosing a table format

Starburst’s approach

How it works

Try Starburst Galaxy today

Announcing the public preview of Gravity in Starburst Galaxy

Introducing Starburst Galaxy as a data lake analytics platform

Building an open lakehouse just got easier with Starburst Galaxy & Tabular

Why choose Apache Iceberg over Databricks’ Delta Lake

Cookie Notice

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Analytical/Performance Cookies

Functional/Preference Cookies

Targeting/Advertising Cookies