Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes.

For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/ Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.

Functional/ Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/ Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.

Blog

Resources

Documentation

Barton Wright

Information Engineer, Technical Lead

Starburst

Manfred Moser

Director of Technical Content

Starburst

Batch processing, Iceberg, and Delta Lake for Starburst Galaxy

Last Updated: March 22, 2024

Releases Starburst Galaxy

Starburst Galaxy includes two exciting new features:

Great Lakes connectivity
Batch mode clusters

These features greatly expand the supported use cases for Starburst Galaxy, and bring new performance benefits to everyone.

Beyond Hive with Delta Lake or Iceberg

Great Lakes connectivity abstracts the details of using different table formats and file types when using certain write access statements for object storage systems. This connectivity is built into Starburst Galaxy, and is available to all users.

You can now use any of the following object storage catalogs with the modern table formats Delta Lake and Iceberg.

This allows you to seamlessly migrate from the legacy Hive system to Iceberg or Delta Lake. You can migrate all from within Starburst Galaxy, one table at the time if desired. Once you are using Iceberg or Delta Lake, all the advantages of these modern systems are available, such as improved performance, snapshots, and more. At the same time, you can query all tables in your queries just as before.

Great Lakes connectivity lets you configure the table format with a single type parameter for CREATE TABLE or CREATE TABLE AS statements. Find out more details for the different formats in the documentation:

For example, use a CREATE TABLE statement like the following to specify using the Iceberg table format and the Apache Parquet file format:

CREATE TABLE customer (name varchar,address, varchar)WITH (type='iceberg',format='parquet'

Batch processing and ETL processing for everyone

In an ideal world, analytics is instant and data is clean and consistent at the source. In the real world however the amount of data you deal with every day is huge. The data is distributed, includes inconsistencies, and running any analytics can take a long time. That is why Extract, Transform, Load (ETL) processes are still important. Processing these long-running queries is often a critical process and failures have to be avoided.

With the new batch mode for Starburst Galaxy clusters, you now get access to a very convenient way to activate fault-tolerant query execution using a simple cluster type selection.

Cluster batch mode selection

With batch mode enabled, all queries on the cluster benefit from fault-tolerant execution. Problems such as network issues accessing the source data, memory overflows, or even partial cluster outages no longer cause the query to fail. Even parts of a query can be reprocessed as necessary.

With this feature enabled on your cluster, usage of tools such as dbt can become a regular occurrence. This allows you to automate further parts of your analytics pipeline. With Starburst Galaxy you don’t have to worry about managing the infrastructure or operating the clusters, and your query processing can be automated as well. You can spend more time on understanding the data, such as with the help of the built-in query editor.

Try it out. You are going to love it! And keep an eye on the release notes for Starburst Galaxy, because more great features are on the way.

Barton and Manfred

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

Query your data lake fast with Starburst's best-in-class MPP SQL query engine
Get up and running in less than 5 minutes
Easily deploy clusters in AWS, Azure and Google Cloud

For more deployment options:

Download Starburst Enterprise

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Blog

Resources

Pages

Documentation

Batch processing, Iceberg, and Delta Lake for Starburst Galaxy

Last Updated: March 22, 2024

Beyond Hive with Delta Lake or Iceberg

Batch processing and ETL processing for everyone

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free with
Starburst Galaxy

For more deployment options:

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Starburst Galaxy

Starburst Enterprise

By Use Cases

By Industry

Documentation

Connect

Education

Filter:

Blog

Resources

Pages

Documentation

Batch processing, Iceberg, and Delta Lake for Starburst Galaxy

Last Updated: March 22, 2024

Beyond Hive with Delta Lake or Iceberg

Batch processing and ETL processing for everyone

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free withStarburst Galaxy

For more deployment options:

Start Free with
Starburst Galaxy