×
×

ACID Transactions

ACID transactions are methods for ensuring database integrity.

Developed for financial systems and similar use cases where data accuracy and consistency are mission-critical, ACID-compliant transactions prevent accidental data loss or corruption. Database systems use ACID transactions to ensure data integrity. Increasingly, data lakehouses are bringing ACID compliance to the table, making them useful for applications beyond analytics.

Iceberg & Trino

Icehouse Resource Center

The Icehouse is an open lakehouse with Trino as the open query engine and Apache Iceberg as the open table format.

This introduction to ACID transactions explains how ACID properties protect data integrity and how ACID compliance has become a feature of modern data lakehouse platforms.

ACID properties

The acronym ACID stands for Atomicity, Consistency, Isolation, and Durability. These four ACID properties define how database transactions should behave to ensure data remains in a consistent state even in the event of a system failure.

Atomicity

Atomicity treats an entire transaction as a single unit of work. Database transactions typically involve multiple database operations. If any of them fails, partly or completely, then the transaction’s final result may be incorrect. 

Atomic transactions require the successful completion of each operational step. If part of the transaction fails, the entire transaction fails and must run again.

For example, transferring funds from one bank account to another involves subtracting the transfer amount from the first and adding it to the second. If either step fails, then at least one account balance will be wrong. An atomic transaction facing a similar operational failure would not go through at all.

Consistency

Consistency is the enforcement of business rules and data integrity constraints that govern how transactions change a database’s state. The database would become unreliable if transactions were allowed to violate these restrictions. There would be no way to tell whether any two data values were comparable.

For example, a withdrawal amount exceeding an account balance would violate a bank’s rule banning overdrafts. Consistency prevents the transaction from completing, rolling the system back to its previous state.

Isolation

Concurrent transactions, those interacting with the same data at the same time, could undermine data integrity. The ANSI/ISO SQL standard describes three situations that occur depending on the timing of two transactions’ operations:

Dirty reads: Transaction A has updated a row but has not yet committed the change when transaction B retrieves the row.

Non-repeatable reads: Transaction B retrieves a row, transaction A commits updates to the row, and transaction B retrieves the row again.

Phantom reads: Transaction B retrieves a set of rows, transaction A inserts or removes rows from that set, and transaction B retrieves the set of rows again.

ACID-compliant systems use concurrency controls to isolate transactions from each other. Lock-based controls force a new transaction to wait for the current transaction to complete. Multiversion controls use snapshot isolation, letting the new transaction act on the current state while the other transaction works. The control commits the new transaction’s changes provided there’s no conflict; otherwise, it rejects the transaction.

Database systems offer different isolation levels that balance isolation and performance. These levels dictate how read and write operations occur and whether to allow phantom, non-repeatable, or dirty reads.

Durability

Durability simply means that once a transaction commits its changes, those changes become part of the database’s permanent record, even in the event of a power outage or other system failures. Database systems usually achieve durability by moving in-memory data to non-volatile storage.

Analytical databases

ACID compliance is a feature of online transactional processing (OLTP) systems as well as database management systems (DBMS) like MongoDB, MySQL, or Oracle Database. However, these systems store data in row-based schemas that are not conducive to query efficiency. This performance issue was one of the drivers of analytical databases like data warehouses. By using columnar schemas, these systems could read data quickly.

Since data warehouses evolved from on-premises databases, they inherited ACID properties. Even with new, cloud-based methods for managing enterprise data, data warehouse vendors could promote the advantages of ACID compliance.

Data lake vs data lakehouse

Data warehouses have not kept up with the expanding diversity and volumes of data enterprises need to power increasingly sophisticated analytics. Open-source technologies and commodity-priced cloud storage services like AWS let data lakes provide cost-effective alternatives to proprietary warehouses.

Still, data lakes by themselves cannot match a warehouse’s data management and analytics capabilities, leading to complex and expensive workarounds that often involve layering one or more data warehouses between the data lake and users.

Data lakehouses solve this problem by combining the full capabilities of a data warehouse with a data lake’s cost-effective scalability. 

Related reading: Data warehouse architecture vs lake architecture vs lakehouse architecture

Open table formats: Why are ACID transactions a good thing to have?

Open source file formats like Apache Parquet and table formats like Apache Iceberg let companies build performant, scalable data lakehouse analytics platforms with data portability not found in proprietary warehouse solutions.

Modern open table formats support ACID properties, allowing query engines to add, remove, or change tables without compromising data integrity.

Transactional support and ACID compliance

ACID compliance lets a data lakehouse support use cases beyond pure analytics. The lakehouse won’t replace OLTP systems. However, they can replace database systems used in decision-making processes subject to audits where data consistency, accuracy, and integrity are essential.

Icehouse architecture 

The Icehouse architecture is one reason open data lakehouses are on the upswing. It combines two open technologies, Iceberg and Trino, to create ACID-compliant data lakehouses that go beyond analytics to support mission-critical data management and decision-making processes.

Trino

Trino is a massively parallel processing query engine that can access petabyte-scale datasets from distributed sources. Originally designed for petabyte-scale interactive data analytics, Trino’s support for ANSI-standard SQL and dozens of connectors to enterprise data sources have made the query engine an essential data management resource.

Iceberg 

Apache Iceberg’s metadata layer turns a data lake into an ACID-compliant data lakehouse. For example, Iceberg creates snapshots of data tables that isolate concurrent users. This open table format also provides schema evolution, time travel, and performance features usually associated with a traditional DBMS.

Iceberg & Trino

Icehouse Resource Center

The Icehouse is an open lakehouse with Trino as the open query engine and Apache Iceberg as the open table format.

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.