eBook

Optimizing Your Apache Iceberg Lakehouse - Early Release Chapters

Improving Performance & Scalability

Chapter 2: Exploring the Apache Iceberg Architecture

A deep dive into the architecture that makes Iceberg unique. This chapter explores the reference architecture in depth and focuses on the following.

Metadata and Iceberg – Why metadata is key. The relationship between catalogs and metadata. Snapshots & manifests. Metadata pruning. Inspecting metadata.
Table Maintenance – What is table maintenance? Why is it needed? Why is it not always easy? Types of table maintenance.
Compute Engine Interoperability – Understanding this feature. Interoperability challenges. Multi-engine use cases.

Chapter 3: Primary Features of Iceberg

Details of the primary features present in Iceberg, showcasing how Iceberg tables can be structured and modified. In particular, the snapshot process is outlined in detail, showing how to reference specific points in time. Topics covered:

Mutability Options – Append-only tables. ACID-compliant transactional tables. Copy-on-Write vs Merge-on-Read.
Schema Evolution – Supported changes. Operationally fast. Enforces correctness.
Hidden Partitioning – Benefits of partitions. Transform functions. Partition evolution.
Snapshot Features – Time travel. Rollbacks. Tagging. Branching.

If you manage an Iceberg data lakehouse at scale and want to understand what is happening below the surface as well as how to utilize all the features & benefits this modern table format offers, these chapters lay the foundation on which query optimization, partitioning strategy, and data governance depend.

The Data Engineers Guide to Iceberg v3

Optimizing Your Apache Iceberg Lakehouse - Early Release Chapters

Chapter 2: Exploring the Apache Iceberg Architecture

Chapter 3: Primary Features of Iceberg

Chapter 2: Exploring the Apache Iceberg Architecture

Chapter 3: Primary Features of Iceberg