Tutorial: Using Starburst Galaxy’s materialized views with Apache Iceberg

May 12, 2023

Lester Martin
Developer Adocate
Starburst

Lester Martin
Developer Adocate
Starburst

More deployment options

Request Enterprise trial license key →

Materialized views have become available in Starburst Galaxy for catalogs using Great Lakes connectivity. For folks who are NOT already using Starburst Galaxy — come sign up — it’s FREE — especially if you want to exercise the content in this blog post. To test this all out you will need to connect to one of the cloud object stores, too. Feel to look at my querying data in the cloud post if you need a jump start on that.

Please check out training opportunities from Starburst Academy where an enhanced version of this information will soon be available. And yes, enough with all of that, let’s DO SOMETHING!

Define Table and Query

Create a couple of schemas in a catalog aligned to your cloud object store.

CREATE SCHEMA mycloud.mv_storage;
CREATE SCHEMA mycloud.mvtest;
USE mycloud.mvtest;

Create an Apache Iceberg table and populate it.

CREATE TABLE dune_characters (
  id integer,
  name varchar(55),
  notes varchar(255)
)
WITH (type = 'iceberg');

INSERT INTO dune_characters 
  (id, name, notes)
VALUES
  (101, 'Leto', 'Ruler of House Atreides'),
  (102, 'Jessica', 'Concubine of the Duke');

Run a very simple query to base a materialized view on.

SELECT name, notes 
  FROM dune_characters;

Create/Load the Materialized View

Create a materialized view with the previous defining query.

CREATE MATERIALIZED VIEW mat_view
WITH (storage_schema = 'mv_storage')
AS SELECT name, notes 
     FROM dune_characters;

The WITH clause is optional, but the default behavior for the Iceberg connector writes the underlying “storage table” to the same schema. In practice, it makes sense to locate the storage table somewhere else (if only to limit consumer confusion).

If you queried mat_view now the results would be as expected, but the query would actually access the table in the defining query. The following command will (re)build the underlying storage table.

REFRESH MATERIALIZED VIEW mat_view;

You can now see the dune_characters table and the mat_view materialized view in the mvtest schema as well as the storage table located in the mv_storage schema.

Running a query on the materialized view returns the expected results.

SELECT * FROM mat_view;

To fully verify that the storage table was accessed instead of the original table, click on Query details from the screen shot above and then the Advanced tab under Query ID heading on the page that renders.

As you can see above, the storage table that starts with st_ from the mv_storage schema is being accessed instead of the dune_characters table from the defining query.

What about Starburst Enterprise?

Great news!! This all works on Starburst Enterprise, too, as called out in the Starburst Iceberg connector!

This tutorial was originally published by Lester Martin.

Why choose Apache Iceberg over Databricks’ Delta Lake
Full story

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages.

Functional/Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites.

Starburst’s mission is to free our customers to see the invisible and achieve the impossible