This post is part of the Iceberg blog series. Read the entire series:
- Introduction to Apache Iceberg in Trino
- Iceberg Partitioning and Performance Optimizations in Trino
- Apache Iceberg DML (update/delete/merge) & Maintenance in Trino
- Apache Iceberg Schema Evolution in Trino
- Apache Iceberg Time Travel & Rollbacks in Trino
- Automated maintenance for Apache Iceberg tables in Starburst Galaxy
- Improving performance with Iceberg sorted tables
- How to migrate your Hive tables to Apache Iceberg
Time travel in Trino using Iceberg is a handy feature to “look back in time” at a table’s history. As we covered in this blog, each change to an Iceberg table creates a new “snapshot” which can be referred to by using standard sql.
As you can see from the diagram below, a new snapshot is created for the table creation, insert and update.
To see the snapshots on a table, you can use the handy metadata table that exists for each table:
SELECT * FROM "customer_iceberg$snapshots";
The above snapshot table shows the create, insert and update operations on the customer_iceberg table. You can see what type of operation was performed and when it was executed.
To select a certain snapshot, you use the “for version as of” syntax. In the following two examples, we show the customer name before and after an update:
FROM customer_iceberg FOR VERSION AS OF 5043425904354141100 where custkey = 2732;
FROM customer_iceberg FOR VERSION AS OF 3117754680069542695 where custkey = 2732;
You can also specify a timeframe to retrieve an older snapshot of a table. For example, the following query brings back the data for the first snapshot on or before a given timestamp:
FROM s3lakehouse.demo_tpch.customer_iceberg FOR TIMESTAMP AS OF TIMESTAMP '2022-09-18 07:18:09.002 America/New_York' where custkey = 2732;
Another great feature of Iceberg is the ability to roll back a table to a previous snapshot. Sometimes this is used when a row was accidentally deleted or updated. As long as the snapshot exists, (it hasn’t been cleaned up yet) then you can roll back to any existing snapshot.
For example, in the scenario above, if I wanted to roll back to the state of the table before the update on the customer, then I would issue the following command:
CALL iceberg.system.rollback_to_snapshot('demo_tpch', 'customer_iceberg', 5043425904354141100)
Then we can query the tablea again to see the customer’s name was “rolled back” to the previous version before the update:
FROM s3lakehouse.demo_tpch.customer_iceberg where custkey = 2732;
Both time travel and rolling back are database functions that are now available in your data lakehouse. This is a game changer as it allows database type functionality to objects stores which were not available until now.