
A couple of weeks ago at the Databricks Data + AI Summit, I gave a joint session with Alex Jiang, a product manager from the Databricks Unity Catalog team. The topic covered a problem that the two of us have spent a lot of time on from opposite sides. How does a query engine like Starburst honor the fine-grained access control policies defined in Unity Catalog, without re-implementing any of that policy logic in the engine?
It’s a challenging problem, sitting at the intersection of two products that many enterprises run side by side. I want to walk through how we solved it, because the answer is more interesting than a simple connector integration. Instead, it is built on an open standard, the Iceberg REST catalog API, and it is a concrete example of how Starburst has been championing the vision of an open lakehouse.
The state of the open lakehouse
As the lakehouse architecture has been widely adopted across the industry, data has moved from rigid and locked-in warehouses to open table formats, revolutionizing how data is accessed and managed. Open table formats like Apache Iceberg and Delta Lake, paired with a shared catalog, allow many engines to read and write from a single copy of data rather than each keeping its own. That is genuine interoperability at the data layer, and it is the foundation on which everything here builds.
The governance problem
A high degree of openness comes with downsides, however. Traditional governance is disrupted when a single engine does not have full control over an entire organization’s data. The development of open table formats was not paired with a shared policy language, so each engine might implement governance differently, or not at all. Engines cannot be trusted to enforce a catalog’s policies.
Because of this, policies stayed siloed inside individual engines. You could point five engines at the same Iceberg table and still have five different ideas of who was allowed to see which rows. That is the gap Unity Catalog set out to close by centralizing governance, ensuring that policies are respected by any engine using their Iceberg REST catalog API, eliminating duplicated policy layers and compliance risks.
Why coarse-grained policies are not enough
It’s worth taking a moment to consider why other approaches don’t work, namely coarse-grained access strategies. The mechanism many catalogs use to enforce access policies on external engines is through credential vending. When a user runs a query with an engine, the catalog checks that user’s permissions and selectively vends storage credentials for the underlying data files. That works cleanly to control table-level access per user.
Fine-grained access control is difficult
The harder question is fine-grained access control, including policies that filter out rows or mask columns based on the user accessing the data. Credential vending alone cannot support that, as credentials are coarse-grained. They grant access to files, not to a subset of rows or columns inside them. You cannot create a credential that means “you may read this table, but only the non-EU rows, and with the address column masked.”
Additionally, not every engine can be trusted to enforce a catalog’s policies. Policy language differs widely across many different engines and catalogs, and there’s no guarantee one policy will be applied by another engine. Until a shared policy model exists, catalogs need a hook into a query’s read path to redact the data a user sees.
This brings us to centralized enforcement. To ensure policies are applied properly, Unity Catalog must only surface data that has already been sanitized. Fortunately, a recent update to the Iceberg REST catalog API offers a solution.
Supporting server-side scan planning
Scan planning is the process of reading Iceberg metadata files, and using that information to determine which data files need to be read to satisfy a query. Until now, this has been done locally on the client, which has direct access to an Iceberg table’s raw files.
The Iceberg REST catalog API added an endpoint to perform scan planning on the catalog server, bypassing the need for the client to read Iceberg metadata files. More importantly, the plan returned by the catalog server can reference any data files it chooses, not necessarily the raw data itself.
POST /v1/{prefix}/namespaces/{ns}/tables/{table}/plan { "select": ["name", "address"], "filter": { "type": "eq", "term": "region", "value": "EU" } } { "plan-id": "<guid>", "status": "completed", "file-scan-tasks": [ ... ], "storage-credentials": [ ... ] }
Using this endpoint, a client can request a set of columns to select and filters to apply, and it then receives a list of file scan tasks to execute, along with vended credentials.
This results in several benefits over client-side scan planning:
- The catalog can control which data is surfaced, enabling more granular access controls
- Planning is offloaded to the catalog server, which often has more context than the client
- Vended credentials keep permissions tightly scoped to the user running the query
- The result is execution-ready, as it follows the Iceberg specification
Fine-grained access control with server-side scan planning
Using this endpoint gives Unity Catalog a hook into the read path of the query. It can identify the user (through Starburst’s existing authentication model), determine what data that user is authorized to see, and redact it as needed. Unity Catalog will generate temporary sanitized data files and return file scan tasks pointing to them. Starburst will then read those sanitized files directly.
From the moment Starburst reads data from the server-side scan planning response, the data is already sanitized. No additional logic is required to enforce Unity Catalog’s policies. Unity Catalog can centralize its governance, and Starburst can be trusted to respect those policies, regardless of how they’re implemented by Databricks.
This trust will extend to any other REST catalog that implements this endpoint as well.
What actually happens when Starburst runs the query
Here is the end-to-end flow, which ships in Starburst Enterprise Platform 481-e STS.

On the Starburst side
As our engine begins planning the stages and splits of a SQL query, we use the Iceberg client to begin scan planning. Instead of reading metadata files locally and creating our own scan tasks to farm to workers, we instead invoke Unity Catalog’s ScanAPI endpoint. We farm those returned file scan tasks out to our workers and execute them in exactly the same way as locally planned tasks.
On the Unity Catalog side
On the other side of that call, Unity Catalog examines the table metadata and the policies that apply to the specific user behind the query, and generates temporary data files that already have row filters and column masks applied. It returns a scan plan whose file-scan tasks point to those sanitized temporary files, along with the credentials needed to read them.
How it comes together
The data is already sanitized by the time it reaches Starburst. There is no masking or filtering happening inside the Starburst query engine. We are not interpreting Unity Catalog’s policy language or re-applying its rules. We are reading data that the catalog has already filtered based on the user.
Why this matters, from a real customer
This is not a hypothetical. One Starburst customer in travel and hospitality runs exactly the multi-vendor data lakehouse that this is built for. They are modernizing their stack, which includes Starburst as a query engine, Okta for identity, both Unity Catalog and AWS Glue as Iceberg catalogs, and Apache Ranger as an extra governance layer on top of Starburst, precisely because their Unity Catalog policies were not being honored by Starburst queries.
Before server-side scan planning, their options were limited:
- They could duplicate every Unity Catalog policy in Apache Ranger…
- … resulting in policy drift, auditing gaps, and inconsistency across engines!
- They could only use Databricks…
- … giving up federation across all of Starburst’s connectors!
- They could just use coarse-grained access controls…
- … and deal with overly restrictive access controls or compliance risk!
With server-side scan planning, the policy they already defined in Unity Catalog applies to their Starburst queries as well. Unity Catalog stays the single source of governance for that data. There is a unified audit trail coming from Unity Catalog, as no other policies are being applied. The redundant Ranger layer is no longer necessary for Starburst queries against Unity-governed tables, allowing them to delete a whole layer of duplicated policy rather than maintain it.
Seeing the workflow in action
In the Databricks session, the live demo is the clearest way to make this functionality concrete. We started with a test table full of the usual personally identifiable information.

Querying it from Starburst with client-side scan planning returned every row and every column in the clear, as we’d expect from a table with no policies.

Then, over in Databricks, we masked the Social Security number column and added a row filter to exclude anyone in Washington or California.

Without server-side scan planning, the data would appear unmasked and unfiltered on the Starburst side. However, using server-side scan planning, the results have the Social Security numbers masked and the filtered rows gone!

This functions for any masks, filters, or governed tags applied to your data, with no inconsistencies or discrepancies to results from Databricks.
The power of interoperability
The key takeaway here is the value that interoperability of compute can bring to real, production scenarios. It would be easy to frame two query engines that overlap as rivals in this competitive market. But the thing we keep coming back to, and the reason this collaboration was so successful, is that enterprises do not run just one tool. They run many, choosing the right tool for the job deliberately. The best engine for a given workload is not the same across an entire organization, and the freedom to choose is worth protecting.
What makes that freedom possible are open standards. Server-side scan planning works because the Iceberg REST catalog API is an open specification that any catalog can implement, not a private handshake between two vendors. Unity Catalog enforcing policy at the catalog layer, and Starburst federating across the whole estate while honoring that policy, is the open data lakehouse working as it was intended to. Define your governance once. Query your data with the engine that fits the job. Trust that the rules hold either way.
That is the same idea behind everything Starburst has built around Unity Catalog interoperability, and it is consistent with what we showed at last year’s summit, where the theme was Choice. Choice is only meaningful if it is governed. Server-side scan planning is how the governance follows you across engines.
Starburst is built for interoperable compute
Server-side scan planning for Iceberg REST catalogs is now available in Starburst Enterprise Platform 481-e STS. If you want the deeper technical detail, the 481 documentation covers configuration, and our write-up on Starburst’s Unity Catalog integration is the right place to start if you are setting this up. For the broader picture of why we lean so hard on the Iceberg REST catalog and query federation as the basis for an open data lakehouse, those explainers go further than I can here.
If you are at a point where Unity Catalog governs your data and you want to query it across multiple engines without giving up your policies, this is for you. Define the policy once, and let it follow your data wherever the query runs.



