Introducing Great Lakes Connectivity for Starburst Galaxy

Connect to numerous data lakehouse file and table formats that are available today including Hive, Delta Lake, and the quickly growing Apache Iceberg table format.

May 4, 2022

Tom Nats
Director of Customer Solutions
Starburst

Tom Nats
Director of Customer Solutions
Starburst

More deployment options

Request Enterprise trial license key →

Great Lakes connectivity for Starburst Galaxy is now available. This feature enables connectivity to numerous data lakehouse file and table formats that are available today including Hive, Delta Lake, and the quickly growing Apache Iceberg table format. Choosing which file format to create is handled with a simple SQL statement and querying is transparent to the end-users so they don’t need to worry about what’s “under the hood.”

Users simply need to configure their object storage catalogs – Amazon S3, Azure Data Lake Storage, or Google Cloud Storage, and Starburst Galaxy handles the rest, allowing you to instantly leverage the powerful open table formats of Iceberg and Delta Lake.

Open table formats like Iceberg and Delta Lake allow users to interact with the data lake as easily as you would a database using SQL. Coupled with Hive, these open table formats try to solve problems typically associated with data lakes. Apache Iceberg and Delta Lake allow for more analytics to be served out of the lake and reduce the need for data movement/migration which provides substantial cost savings.

Additionally, these open table formats provide increased performance benefits over traditional formats such as orc and parquet. Data skipping and improved partition handling are just some of the many benefits offered by Iceberg and Delta Lake. Combined with Starburst’s best-in-class query engine, these open table formats can unlock the data lakehouse for organizations.

How it works

We wanted to make working with different file and table formats seamless by providing unified connectivity which handles all of these formats (and whatever comes in the future) but allows querying from them using regular SQL. In Starburst Galaxy, you simply choose which object store to connect with and we handle it all behind the scenes for you.

Create a table

To create a table with one of these formats, you simply provide a “type” in the table ddl. Here is a simple example of creating an Iceberg table:

    CREATE TABLE customer (
  name varchar,
  address varchar)
WITH (type='iceberg');

  

view raw create_iceberg_table.md hosted with ❤ by GitHub

That’s it! An Iceberg table has been created.

Read data

To read a table using Great Lakes connectivity, you simply just issue a SQL select query against it:
select * from customer;

Again… that’s it! End users shouldn’t need to worry about file types or table formats, they just want to query their data.

For Hive and Iceberg connectivity, you can also optionally specify the file type. For Iceberg, it’s

WITH (type=’iceberg’,format=’parquet or orc’) and hive, it’s WITH (type=’hive’,format=’parquet, orc, json, textfile, avro, rcbinary, rctext, csv and sequencefile’)

More examples

Create a hive based table with parquet partitioned by day_id

    CREATE TABLE events (   
  id number,    
  day_id varchar,   
  other columns… )  
WITH (   
  type=’hive’,    
  format=’parquet’,    
  partitioned_by=array[’day_id’] 
);

  

view raw create_hive_table.md hosted with ❤ by GitHub

Create a Delta Lake table partitioned by day_id

    CREATE TABLE events ( 
  id number, 
  day_id varchar, 
  other columns … ) 
WITH ( 
  type = ‘delta’, 
  partitioned_by = array[’day_id’] 
);

  

view raw create_delta_lake_table.md hosted with ❤ by GitHub

You don’t have to worry about different catalogs as well as training your users on which catalog to query based on file type and table format. All the power of SQL and the Great Lakes connectivity with Delta Lake and Iceberg support is now available in Starburst Galaxy.

If there are any new file types or table formats in the future, we’ll incorporate them into the Great Lakes connectivity in Starburst Galaxy.

If you haven’t heard of Starburst Galaxy, it’s our new fully managed platform which is deployable on all three clouds. To learn more, and for $500 in free credits, visit our website!