Share

Linkedin iconFacebook iconTwitter icon

More deployment options

Today marks a major step forward in democratizing the data lakehouse. We are thrilled to announce our collaborative work with Google Cloud to integrate Starburst Enterprise with the BigLake metastore, which will soon be generally available.

This powerful integration enables a truly unified experience, allowing you to access a single, governed Iceberg metadata layer for all your data, whether you are querying with Starburst or BigQuery.

Why data lakehouse metadata matters

The promise of the data lakehouse is flexibility–the ability to use the best engine for any workload without having to copy data. However, as organizations adopt more tools, managing metadata across different compute engines becomes an operational burden.

Previously, complex, manual processes were often required to help ensure that data updates or schema changes made by one engine (like Trino) were properly reflected for others (like BigQuery). This friction limited interoperability.

BigLake metastore solves this problem by serving as a fully managed, single source of truth for your Iceberg metadata. Based on the industry-standard Apache Iceberg REST Catalog Spec, it provides a common, scalable interface that removes the need for custom, brittle ETL pipelines.

Starburst meets BigLake

Our collaboration means that Starburst users can now directly leverage the BigLake metastore. This is more than just access; it’s about unified, flexible data management:

Unified Metadata

Both Starburst and BigQuery can read and write to the same Iceberg tables, ensuring every engine is working with the most up-to-date schema and data state.

Serverless and Scalable 

BigLake metastore is a fully managed, serverless service on Google Cloud, that provides a single source of truth for metadata across data lakes and warehouses, ensuring your metadata scales with your data. It spans across multiple engines, including BigQuery, Apache Spark, and Flink, allowing you to share metadata specifically for Apache Iceberg formats, without having to copy files.

Configuring Starburst Enterprise to use BigLake metastore

To query your BigLake-managed tables, you must first register BigLake metastore as a catalog within Starburst Enterprise. This configuration specifies where the REST endpoint is located and how to authenticate with Google Cloud.

You can perform this registration in one of two ways, depending on your environment’s requirements.

Option 1: Using a file-based configuration

For production environments using a file-based configuration, you can define the catalog by creating a properties file in your cluster’s catalog directory. 

To do this, connect Starburst Enterprise to BigLake metastore, create an Iceberg catalog properties file, for example, etc/catalog/biglake.properties:

connector.name=iceberg
iceberg.catalog.type=rest
# BigLake REST endpoint and warehouse
iceberg.rest-catalog.uri=https://biglake.googleapis.com/iceberg/v1beta/restcatalog
iceberg.rest-catalog.warehouse=gs://my-iceberg-warehouse
iceberg.unique-table-location=false
# BigLake-specific security
iceberg.rest-catalog.security=GOOGLE
iceberg.rest-catalog.google-project-id=my-gcp-project-id
# Views currently not exposed via BigLake REST
iceberg.rest-catalog.view-endpoints-enabled=false
# Use Trino’s native GCS filesystem for table data
fs.native-gcs.enable=true
gcs.json-key-file-path=/etc/starburst/gcs-key.json

Option 2: Configuring the BigLake Iceberg REST Catalog using dynamic catalogs

If your cluster has Dynamic Catalogs enabled, you can skip the file system and restart the process entirely. By running the following SQL statement, the catalog is created instantly and becomes available to all users on the cluster immediately.

CREATE CATALOG biglake 
USING iceberg WITH ( 
"iceberg.catalog.type" = 'rest',
"iceberg.rest-catalog.uri" = 'https://biglake.googleapis.com/iceberg/v1beta/restcatalog', "iceberg.rest-catalog.warehouse" = 'gs://my-iceberg-warehouse', "iceberg.unique-table-location" = false, "iceberg.rest-catalog.security" = 'GOOGLE', "iceberg.rest-catalog.google-project-id" = 'my-gcp-project-id', "iceberg.rest-catalog.view-endpoints-enabled" = false, "fs.native-gcs.enable" = true, 
"gcs.json-key-file-path" = '/etc/starburst/gcs-key.json' );

Unpacking configuration parameters

Regardless of which method you use, the underlying parameters remain the same. These settings establish the secure handshake between Starburst Enterprise and BigLake. 

  • connector.name=iceberg and iceberg.catalog.type=rest tell Starburst Enterprise to use the Iceberg REST Catalog implementation.
  • iceberg.rest-catalog.uri points at the BigLake Iceberg REST endpoint.
  • iceberg.rest-catalog.security=GOOGLE and iceberg.rest-catalog.google-project-id enable Google‑style auth for the REST client.
  • fs.native-gcs.enabled and gcs.json-key-file-path configure GCS IO for the table data itself; this reuses existing Starburst Enterprise Google Cloud Storage support.

Once this catalog is configured and a cluster reloaded, you can treat BigLake as any other Iceberg catalog in Starburst Enterprise.

Creating and querying Iceberg tables in BigLake from Starburst Enterprise

Once the BigLake catalog is defined, Starburst Enterprise serves as the primary interface for managing the entire data lifecycle in your Google Cloud environment. This integration allows you to leverage Starburst’s high-performance SQL engine to execute DDL and DML operations that are immediately reflected across the BigLake metastore. 

By using standard SQL syntax, you can initialize schemas, build optimized Iceberg tables, and begin ingesting data without ever leaving the Starburst console or manually syncing metadata between services.

Let’s see how this works in practice. 

Create a schema and table

To get started, you can define your storage boundaries and table structures directly. This helps ensure that your Iceberg metadata is properly registered with the BigLake REST Catalog while the physical data remains organized in your designated Google Storage buckets.

-- Create a schema backed by the BigLake warehouse location
CREATE SCHEMA biglake.retail
WITH (location = 'gs://my-iceberg-warehouse/retail');
-- Create an Iceberg table in BigLake via REST catalog
CREATE TABLE biglake.retail.orders (
    order_id      BIGINT,
    customer_id   BIGINT,
    order_ts      TIMESTAMP,
    status        VARCHAR,
    total_amount  DECIMAL(18, 2)
)
WITH (
    format_version = 2,          -- BigLake do not yet support V3 features
    partitioning  = ARRAY['status']
);

The schema and table definitions are registered in BigLake metastore via the Iceberg REST Catalog, while the data files land in the configured Google Storage warehouse.

Insert and query data

With the table structure in place, you can perform standard data manipulation and analytical queries. 

Because Starburst Enterprise handles the heavy lifting of query planning and execution, you can interact with these tables as if they were in a local database, even while they remain part of a much larger, federated ecosystem.

INSERT INTO biglake.retail.orders VALUES
    (1, 1001, TIMESTAMP '2025-01-01 10:00:00', 'NEW'59.99),
    (2, 1002, TIMESTAMP '2025-01-01 11:15:00', 'NEW', 129.50),
    (3, 1001, TIMESTAMP '2025-01-02 09:45:00', 'CANCELLED', 59.99);
-- Run analytics from Starburst Enterprise
SELECT status, COUNT(*) AS orders, SUM(total_amount) AS revenue
FROM biglake.retail.orders
GROUP BY status
ORDER BY revenue DESC;

Understanding the power of open data architecture 

This workflow highlights the true power of an open architecture. Because you have used a standardized Iceberg format through the BigLake REST Catalog, the same table is now immediately available for a variety of use cases across different teams:

  • Analysts using BigQuery can query it as a native BigLake Iceberg table.
  • Data engineers can use Spark for complex ETL or machine learning workflows.
  • Platform owners can use Starburst Enterprise to federate this data alongside other catalogs, such as BigQuery, Postgres, or on-premises systems.

Considering three interoperability scenarios

Let’s consider three scenarios where you can use Starburst to enhance compute engine interoperability. These architectural patterns demonstrate how a shared Iceberg foundation allows organizations to move away from rigid, single-vendor stacks and toward a more flexible, open ecosystem. 

By leveraging the BigLake REST Catalog, teams can ensure that data remains accessible and governed across disparate tools without the need for constant migration or duplication.

1. Spark writes, Starburst Enterprise, and BigQuery read

Using this multi-engine architecture, Spark serves as the primary data engineering platform and writes Iceberg tables to Cloud Storage, while using BigLake metastore as the central catalog. 

This setup enables seamless, cross-platform consumption, allowing BigQuery analysts to query the data directly through the native BigLake Iceberg integration. Simultaneously, Starburst Enterprise accesses the same datasets via the BigLake Iceberg REST Catalog. By serving as a high-performance federation layer, Starburst can join these governed Iceberg tables with external data from other systems, such as operational databases, without requiring additional data movement or duplication.

2. Starburst Enterprise‑managed tables consumed by BigQuery

In this second architecture, Starburst Enterprise leads the data lifecycle by creating and maintaining Iceberg tables directly within BigLake. This includes managing critical operational tasks such as DDL execution, background compaction, and complex schema evolution to ensure the data remains optimized for performance. 

BigQuery users then consume these tables directly as native BigLake Iceberg assets, benefiting from centralized Dataplex governance and security policies across both engines.

3. One governance layer, many engines

Starburst operates within this shared ecosystem as a high-performance engine that respects and enforces a single, centralized governance layer across multiple compute environments

This approach treats interoperability as a core capability by establishing a shared foundation through BigLake and Dataplex. By centralizing essential functions such as data lineage and quality, the ecosystem removes friction caused by inconsistent rules across different platforms. 

This allows engines such as BigQuery, Spark, and Starburst Enterprise to coexist seamlessly, giving teams the flexibility to choose the right tool for their workload while maintaining a single, reliable standard for their data.

Getting started with Starburst Enterprise and Google Cloud Storage

It’s easy to get started with Starburst Enterprise and Google Cloud. To try the integration:

  1. Set up BigLake metastore and the Iceberg REST Catalog for your GCS data, following BigLake documentation.
  2. Configure Starburst Enterprise 480-e release with an Iceberg REST catalog pointing at the BigLake endpoint, using iceberg.rest-catalog.security=GOOGLE and a Cloud Storage filesystem configuration as shown above.
  3. Create or register Iceberg tables via Starburst Enterprise, Spark, or BigQuery, and confirm you can see the same tables from all engines.
  4. Layer in governance using Dataplex Universal Catalog for lineage, quality, and discovery across your BigLake Iceberg tables.

By combining BigLake metastore with Starburst Enterprise, you get a fully open, interoperable Iceberg lakehouse on Cloud Storage with a managed metadata plane, multiple best‑of‑breed engines, and a clear path to AI workloads on top of shared, governed data.

Starburst Galaxy and BigLake

Already operating entirely in the cloud? You might be a good fit for Starburst Galaxy. While Galaxy and BigLake don’t integrate today, they will soon. 

If this interests you, please reach out to our team for more information

 

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.
Start Free