Error opening Hive split

We are consuming data in Power BI from Starburst and noticed the following error:
Encountered user gateway exception ODBC: Error [HY000] [Starburst] [Trino] (1060) Trino query error: Error opening Hive split…

When you track it down into the Query Insights UI, can you find the query that was submitted from Power BI? It would be a good thing to try to run it w/o ODBC in the mix first as the error sure sounds like something that would happen on the same SQL regardless of who submitted it.

It would probably be easier to investigate that way.

1 Like

I did verify that HY000 is ODBC-speak for “General Error” so let’s focus on what SQL was sent into Starburst and see what’s up.

1 Like

Handling Partition Metadata Changes in Starburst for ADLS Data

When working with Starburst on top of ADLS or Hive-based tables, you might encounter errors if the underlying data files have changed like when files are deleted, updated, or moved. This happens because Starburst relies on Hive Metastore metadata for table partitions. If the files on storage no longer match what the metastore expects, queries can fail.

Best practices:

  • For tables that change frequently, consider running this periodically.
  • For very large tables, refreshing all partitions may take time. You can also add partitions incrementally if you know the exact paths.
  • Ensure that the Starburst user has access to the storage location so that changes can be detected.

Using system.sync_partition_metadata Procedure (recommended)

Starburst also offers a flexible procedure to manage partition metadata more precisely:
CALL <catalog_name>.system.sync_partition_metadata(
schema_name => ‘<schema_name>’,
table_name => ‘<table_name>’,
mode => ‘FULL’,
case_sensitive => true
);

Parameters explained in docs

  • catalog_name, schema_name, table_name: Specify your catalog, schema, and table.
  • mode:
    • ADD → Add partitions that exist in storage but not in the metastore.
    • DROP → Remove partitions that exist in the metastore but no longer on storage.
    • FULL → Performs both add and drop operations.
  • case_sensitive: Use true to match Hive’s lowercase convention for partition folder names.

you can find more details about call procedures here: https://trino.io/docs/current/connector/hive.html”

This approach ensures that your table metadata always reflects the actual data on ADLS, preventing errors and keeping queries consistent.

Refreshing Partition Metadata

To sync the metadata with the current state of the data, you can use:
MSCK REPAIR TABLE <table_name>; ( Is this is recommended to use on a big table… it is a expensive operation @lester )
How it works:

  • The command scans the table’s storage location (for example, in ADLS) and identifies any partitions that are missing or new.
  • It then updates the Hive Metastore, so Starburst has an accurate view of all partitions.

In order to solve the issue we have handled the partition metadata changes and doing sync of the metadata.

1 Like