This post is part of the Iceberg blog series. Read the entire series:
- Introduction to Apache Iceberg in Trino
- Iceberg Partitioning and Performance Optimizations in Trino
- Apache Iceberg DML (update/delete/merge) & Maintenance in Trino
- Apache Iceberg Schema Evolution in Trino
- Apache Iceberg Time Travel & Rollbacks in Trino
- Automated maintenance for Apache Iceberg tables in Starburst Galaxy
- Improving performance with Iceberg sorted tables
- Hive vs. Iceberg: Choosing the best table format for your analytics workload
Iceberg Time Travel
Time travel in Trino using Iceberg is a handy feature to “look back in time” at a table’s history. As we covered in this blog, each change to an Iceberg table creates a new “snapshot” which can be referred to by using standard sql.
As you can see from the diagram below, a new snapshot is created for the table creation, insert and update.
To see the snapshots on a table, you can use the handy metadata table that exists for each table:
SELECT * FROM "customer_iceberg$snapshots";
The above snapshot table shows the create, insert and update operations on the customer_iceberg table. You can see what type of operation was performed and when it was executed.
To select a certain snapshot, you use the “for version as of” syntax. In the following two examples, we show the customer name before and after an update:
SELECT custkey,name
FROM customer_iceberg FOR VERSION AS OF 5043425904354141100 where custkey = 2732;
SELECT custkey,name
FROM customer_iceberg FOR VERSION AS OF 3117754680069542695 where custkey = 2732;
You can also specify a timeframe to retrieve an older snapshot of a table. For example, the following query brings back the data for the first snapshot on or before a given timestamp:
SELECT custkey,name
FROM s3lakehouse.demo_tpch.customer_iceberg FOR TIMESTAMP AS OF TIMESTAMP '2022-09-18 07:18:09.002 America/New_York' where custkey = 2732;
Rolling Back
Another great feature of Iceberg is the ability to roll back a table to a previous snapshot. Sometimes this is used when a row was accidentally deleted or updated. As long as the snapshot exists, (it hasn’t been cleaned up yet) then you can roll back to any existing snapshot.
For example, in the scenario above, if I wanted to roll back to the state of the table before the update on the customer, then I would issue the following command:
CALL iceberg.system.rollback_to_snapshot('demo_tpch', 'customer_iceberg', 5043425904354141100)
Then we can query the tablea again to see the customer’s name was “rolled back” to the previous version before the update:
SELECT custkey,name
FROM s3lakehouse.demo_tpch.customer_iceberg where custkey = 2732;
Both time travel and rolling back are database functions that are now available in your data lakehouse. This is a game changer as it allows database type functionality to objects stores which were not available until now.
What are some next steps you can take?
Below are three ways you can continue your journey to accelerate data access at your company
- 1
- 2
Automate the Icehouse: Our fully-managed open lakehouse platform
- 3
Follow us on YouTube, LinkedIn, and X(Twitter).