
Today marks a major step forward in democratizing the data lakehouse. We are thrilled to announce our collaborative work with Google Cloud to integrate Starburst Enterprise with the BigLake metastore, which will soon be generally available.
This powerful integration enables a truly unified experience, allowing you to access a single, governed Iceberg metadata layer for all your data, whether you are querying with Starburst or BigQuery.
Why data lakehouse metadata matters
The promise of the data lakehouse is flexibility–the ability to use the best engine for any workload without having to copy data. However, as organizations adopt more tools, managing metadata across different compute engines becomes an operational burden.
Previously, complex, manual processes were often required to help ensure that data updates or schema changes made by one engine (like Trino) were properly reflected for others (like BigQuery). This friction limited interoperability.
BigLake metastore solves this problem by serving as a fully managed, single source of truth for your Iceberg metadata. Based on the industry-standard Apache Iceberg REST Catalog Spec, it provides a common, scalable interface that removes the need for custom, brittle ETL pipelines.
Starburst meets BigLake
Our collaboration means that Starburst users can now directly leverage the BigLake metastore. This is more than just access; it’s about unified, flexible data management:
Unified Metadata
Both Starburst and BigQuery can read and write to the same Iceberg tables, ensuring every engine is working with the most up-to-date schema and data state.
Serverless and Scalable
BigLake metastore is a fully managed, serverless service on Google Cloud, that provides a single source of truth for metadata across data lakes and warehouses, ensuring your metadata scales with your data. It spans across multiple engines, including BigQuery, Apache Spark, and Flink, allowing you to share metadata specifically for Apache Iceberg formats, without having to copy files.
Configuring Starburst Enterprise to use BigLake metastore
To query your BigLake-managed tables, you must first register BigLake metastore as a catalog within Starburst Enterprise. This configuration specifies where the REST endpoint is located and how to authenticate with Google Cloud.
You can perform this registration in one of two ways, depending on your environment’s requirements.
Option 1: Using a file-based configuration
For production environments using a file-based configuration, you can define the catalog by creating a properties file in your cluster’s catalog directory.
To do this, connect Starburst Enterprise to BigLake metastore, create an Iceberg catalog properties file, for example, etc/catalog/biglake.properties:
connector.name=iceberg iceberg.catalog.type=rest # BigLake REST endpoint and warehouse iceberg.rest-catalog.uri=https://biglake.googleapis.com/iceberg/v1beta/restcatalog iceberg.rest-catalog.warehouse=gs://my-iceberg-warehouse iceberg.unique-table-location=false # BigLake-specific security iceberg.rest-catalog.security=GOOGLE iceberg.rest-catalog.google-project-id=my-gcp-project-id # Views currently not exposed via BigLake REST iceberg.rest-catalog.view-endpoints-enabled=false # Use Trino’s native GCS filesystem for table data fs.native-gcs.enable=true gcs.json-key-file-path=/etc/starburst/gcs-key.json
Option 2: Configuring the BigLake Iceberg REST Catalog using dynamic catalogs
If your cluster has Dynamic Catalogs enabled, you can skip the file system and restart the process entirely. By running the following SQL statement, the catalog is created instantly and becomes available to all users on the cluster immediately.
CREATE CATALOG biglake USING iceberg WITH ( "iceberg.catalog.type" = 'rest', "iceberg.rest-catalog.uri" = 'https://biglake.googleapis.com/iceberg/v1beta/restcatalog', "iceberg.rest-catalog.warehouse" = 'gs://my-iceberg-warehouse', "iceberg.unique-table-location" = false, "iceberg.rest-catalog.security" = 'GOOGLE', "iceberg.rest-catalog.google-project-id" = 'my-gcp-project-id', "iceberg.rest-catalog.view-endpoints-enabled" = false, "fs.native-gcs.enable" = true, "gcs.json-key-file-path" = '/etc/starburst/gcs-key.json' );
Unpacking configuration parameters
Regardless of which method you use, the underlying parameters remain the same. These settings establish the secure handshake between Starburst Enterprise and BigLake.
- connector.name=iceberg and iceberg.catalog.type=rest tell Starburst Enterprise to use the Iceberg REST Catalog implementation.
- iceberg.rest-catalog.uri points at the BigLake Iceberg REST endpoint.
- iceberg.rest-catalog.security=GOOGLE and iceberg.rest-catalog.google-project-id enable Google‑style auth for the REST client.
- fs.native-gcs.enabled and gcs.json-key-file-path configure GCS IO for the table data itself; this reuses existing Starburst Enterprise Google Cloud Storage support.
Once this catalog is configured and a cluster reloaded, you can treat BigLake as any other Iceberg catalog in Starburst Enterprise.
Creating and querying Iceberg tables in BigLake from Starburst Enterprise
Once the BigLake catalog is defined, Starburst Enterprise serves as the primary interface for managing the entire data lifecycle in your Google Cloud environment. This integration allows you to leverage Starburst’s high-performance SQL engine to execute DDL and DML operations that are immediately reflected across the BigLake metastore.
By using standard SQL syntax, you can initialize schemas, build optimized Iceberg tables, and begin ingesting data without ever leaving the Starburst console or manually syncing metadata between services.
Let’s see how this works in practice.
Create a schema and table
To get started, you can define your storage boundaries and table structures directly. This helps ensure that your Iceberg metadata is properly registered with the BigLake REST Catalog while the physical data remains organized in your designated Google Storage buckets.
-- Create a schema backed by the BigLake warehouse location CREATE SCHEMA biglake.retail WITH (location = 'gs://my-iceberg-warehouse/retail'); -- Create an Iceberg table in BigLake via REST catalog CREATE TABLE biglake.retail.orders ( order_id BIGINT, customer_id BIGINT, order_ts TIMESTAMP, status VARCHAR, total_amount DECIMAL(18, 2) ) WITH ( format_version = 2, -- BigLake do not yet support V3 features partitioning = ARRAY['status'] );
The schema and table definitions are registered in BigLake metastore via the Iceberg REST Catalog, while the data files land in the configured Google Storage warehouse.
Insert and query data
With the table structure in place, you can perform standard data manipulation and analytical queries.
Because Starburst Enterprise handles the heavy lifting of query planning and execution, you can interact with these tables as if they were in a local database, even while they remain part of a much larger, federated ecosystem.
INSERT INTO biglake.retail.orders VALUES (1, 1001, TIMESTAMP '2025-01-01 10:00:00', 'NEW', 59.99), (2, 1002, TIMESTAMP '2025-01-01 11:15:00', 'NEW', 129.50), (3, 1001, TIMESTAMP '2025-01-02 09:45:00', 'CANCELLED', 59.99); -- Run analytics from Starburst Enterprise SELECT status, COUNT(*) AS orders, SUM(total_amount) AS revenue FROM biglake.retail.orders GROUP BY status ORDER BY revenue DESC;
Understanding the power of open data architecture
This workflow highlights the true power of an open architecture. Because you have used a standardized Iceberg format through the BigLake REST Catalog, the same table is now immediately available for a variety of use cases across different teams:
- Analysts using BigQuery can query it as a native BigLake Iceberg table.
- Data engineers can use Spark for complex ETL or machine learning workflows.
- Platform owners can use Starburst Enterprise to federate this data alongside other catalogs, such as BigQuery, Postgres, or on-premises systems.
Considering three interoperability scenarios
Let’s consider three scenarios where you can use Starburst to enhance compute engine interoperability. These architectural patterns demonstrate how a shared Iceberg foundation allows organizations to move away from rigid, single-vendor stacks and toward a more flexible, open ecosystem.
By leveraging the BigLake REST Catalog, teams can ensure that data remains accessible and governed across disparate tools without the need for constant migration or duplication.
1. Spark writes, Starburst Enterprise, and BigQuery read
Using this multi-engine architecture, Spark serves as the primary data engineering platform and writes Iceberg tables to Cloud Storage, while using BigLake metastore as the central catalog.
This setup enables seamless, cross-platform consumption, allowing BigQuery analysts to query the data directly through the native BigLake Iceberg integration. Simultaneously, Starburst Enterprise accesses the same datasets via the BigLake Iceberg REST Catalog. By serving as a high-performance federation layer, Starburst can join these governed Iceberg tables with external data from other systems, such as operational databases, without requiring additional data movement or duplication.
2. Starburst Enterprise‑managed tables consumed by BigQuery
In this second architecture, Starburst Enterprise leads the data lifecycle by creating and maintaining Iceberg tables directly within BigLake. This includes managing critical operational tasks such as DDL execution, background compaction, and complex schema evolution to ensure the data remains optimized for performance.
BigQuery users then consume these tables directly as native BigLake Iceberg assets, benefiting from centralized Dataplex governance and security policies across both engines.
3. One governance layer, many engines
Starburst operates within this shared ecosystem as a high-performance engine that respects and enforces a single, centralized governance layer across multiple compute environments.
This approach treats interoperability as a core capability by establishing a shared foundation through BigLake and Dataplex. By centralizing essential functions such as data lineage and quality, the ecosystem removes friction caused by inconsistent rules across different platforms.
This allows engines such as BigQuery, Spark, and Starburst Enterprise to coexist seamlessly, giving teams the flexibility to choose the right tool for their workload while maintaining a single, reliable standard for their data.
Getting started with Starburst Enterprise and Google Cloud Storage
It’s easy to get started with Starburst Enterprise and Google Cloud. To try the integration:
- Set up BigLake metastore and the Iceberg REST Catalog for your GCS data, following BigLake documentation.
- Configure Starburst Enterprise 480-e release with an Iceberg REST catalog pointing at the BigLake endpoint, using iceberg.rest-catalog.security=GOOGLE and a Cloud Storage filesystem configuration as shown above.
- Create or register Iceberg tables via Starburst Enterprise, Spark, or BigQuery, and confirm you can see the same tables from all engines.
- Layer in governance using Dataplex Universal Catalog for lineage, quality, and discovery across your BigLake Iceberg tables.
By combining BigLake metastore with Starburst Enterprise, you get a fully open, interoperable Iceberg lakehouse on Cloud Storage with a managed metadata plane, multiple best‑of‑breed engines, and a clear path to AI workloads on top of shared, governed data.
Starburst Galaxy and BigLake
Already operating entirely in the cloud? You might be a good fit for Starburst Galaxy. While Galaxy and BigLake don’t integrate today, they will soon.
If this interests you, please reach out to our team for more information.



