Migrate objects from a S3 bucket to a new bucket, discovery failed

dichensun · June 4, 2025, 3:43pm

Hi,
What have done:

created a new metastore using a new postgres database
copy all objects from one directory in a bucket to a new bucket
created a new catalog point to the new bucket

Discovery failed on “Procedure not registered: system.register_table”
new bucket url is s3://new-bucket
From command “CALL hive_new_metastore_catalog.system.register_table(schema_name => ‘discovered_schema’, table_name => ‘c434af5c59e4542345234068750’, table_location => s3://old_bucket/data_catalog/granite/ise/oct_vlanpool_b-3c434af5c59e45f1a4ed955473068750’)” we noticed table_location still points to old bucket.
Looks like we need to update table_name and table_location somewhere to fix this.
Can somebody help?
Thank you

lester · June 4, 2025, 5:46pm

From the hive_new_metastore_catalog name, I’m making the assumption that you are setting up the Hive connector, not the Iceberg connector (especially with the long _abc123XYZ!@# suffix to the oct_vlanpool table name suggesting we are talking about iceberg tables).

If that’s true (i.e. using the Hive, not Iceberg connector) then the error message is telling you the register_table procedure is NOT available in the Hive connector. It is in the Iceberg connect as doc’d at Iceberg connector — Trino 475 Documentation.

Could that be the problem?

joechu · June 4, 2025, 6:06pm

To add to what lester is saying, it’s important to know whether the hive_new_metastore_catalog is using connector.name=hive, or connector.name=iceberg.

Is it possible that there are both Hive table objects and Iceberg table objects in the S3 bucket location that is being searched? That could also explain this kind of situation. If you don’t intend to migrate the Iceberg tables, it’s best to run multiple discoveries, setting the URI to include the Hive table objects but avoid the Iceberg ones.

If you do want to migrate the Iceberg table objects into the new bucket, you’ll need to do that differently. Iceberg metadata is included in the object files that will explicitly point to the original location / bucket. This is likely why you are seeing the old location set in the table_location.

dichensun · June 5, 2025, 9:16pm

Hello lester and joechu,

Thank you so much for your help. You are right the correct setting should be connector.name=iceberg.
We updated all .json files made the table location point to new bucket.
We get different errors like :
hive1.helios-dev.svc.cluster.local:9083: java.net.SocketTimeoutException: Read timed out.

e5e6543bc65c4c69b98d25b522d1a6c6 is not a valid object name.

We want to keep those .json files to save some time when we do discovery against a large amount of data.

Now we are not sure this migration process works or not.
Any one has done this before.

Thank you

lester · June 6, 2025, 9:54pm

Starburst/Trino do not yet have a tool to rename the file locations, but you can tackle this with Spark using the rewrite_table_path function as documented at Procedures - Apache Iceberg™.

As an FYI… in the future, the v4 spec might take care of this problem automatically by supporting relative paths instead as identified in Relative Path Support In Table Spec · Issue #13141 · apache/iceberg · GitHub (note: v3 was just finalized, so we are talking about the future).

dichensun · June 9, 2025, 7:58pm

Thank you so much for your help.