
For years, the promise of the Open Data Lakehouse has been undercut by a harsh reality. Data ingestion is often a bottleneck, especially for AI workloads. While Apache Iceberg has emerged as the gold standard for table formats, getting data into it remains a complex engineering nightmare, especially at the breakneck speed of modern streaming and the massive scale of raw batch ingestion.
In this context, data teams are often forced to choose between fast but fragile streaming data stacks or slow but stable batch processing, all while being buried under the operational tax of Iceberg maintenance.
How Starburst helps data ingestion
The revolution has arrived. At Starburst, we have spent the last year dismantling the barriers to data ingestion and data maintenance to ensure that your data is ready for action in near real time.
To this end, we have moved our core ingestion innovations on Starburst Galaxy from experimental to General Availability (GA), allowing you to deploy production-grade pipelines for:
Streaming Ingest from Kafka Sources
Ingest up to 100GB/second into managed Iceberg tables with built-in exactly-once delivery and sub-minute latency.
File Ingest
Automatically discover, schematize, and load massive volumes of data from Amazon S3 directly into Iceberg tables without writing custom code or external orchestration.
Serverless Table Maintenance
Automated compaction, snapshot expiration, and orphan file removal are now fully available to keep your tables healthy and query-ready by default.
By unifying these capabilities on a serverless architecture, we have eliminated the persistent overhead of infrastructure management. Starburst scales compute resources instantly to match your fluctuating data volumes, ensuring you only pay for the capacity you actually consume. This approach guarantees that your data lands in Managed Iceberg reliably and cost-effectively, fully optimized for the performance of Trino, the fastest engine for the Iceberg lakehouse.
How data ingestion drives real-world results
This adoption has translated into real-world results, with customers trusting these features for their most critical production workloads. We are seeing Starburst power high-impact operations across industries like cybersecurity, where it handles rapid endpoint detection, and marketing, where it drives real-time campaigns. It is also the engine behind seamless database replication and comprehensive internal analytics, providing a unified source of truth across the enterprise.
The graphic below highlights our 2025 momentum, showcasing the scale and reliability our customers achieved across these diverse sectors this past year.
- > 2 Trillion total records ingested
- ~10 TB of raw data ingested
- ~ 90% data compression rate
- 550 TB of Iceberg data compacted

Benchmarking
Beyond our internal success, independent validation has confirmed the strength of our architecture. A recent benchmarking project by Concurrency Labs put our ingestion capabilities to the test, comparing Starburst Galaxy against AWS Data Firehose and Confluent Tableflow for streaming data into Iceberg. Using a standard TPC-DS dataset across both Amazon MSK and Confluent Cloud, the tests evaluated performance and cost under identical conditions.
Starburst Galaxy scores 7x higher on ingestion rate
The results were definitive. Starburst Galaxy delivered an approximately 7x higher record ingestion rate compared to both AWS Data Firehose and Confluent Tableflow.
The cost efficiencies were equally stark. Concurrency Labs estimated that Starburst ingestion costs were 87% lower than AWS Data Firehose and 81% lower than Confluent Tableflow at the tested throughput. Furthermore, Starburst produced significantly smaller Iceberg tables due to superior data compression.
Independent benchmark validation
This third-party analysis reinforces what our customers see in production. Starburst is not just a faster way to query your data. It is also the most cost-effective way to land it, manage it, and turn that data into insights.

What’s New: Expanding the Ingestion Toolkit
We are thrilled by the positive reception these features have received, and we are continuously improving the product based on your feedback.
To that end, we are excited to announce the Public Preview of Avro support for streaming ingest. As part of this update, we have also launched the Confluent Schema Registry integration, allowing you to seamlessly manage and evolve your schemas while Starburst handles the heavy lifting of deserialization and ingestion into Iceberg.
On the file ingestion front, we are expanding beyond JSON to include Public Preview support for CSV files and other common text delimiter types.
By continuing to broaden our format support, we are making it easier than ever to consolidate your data into high-performance Iceberg ingestion pipelines optimized for Trino.
What’s Next: Bringing the Revolution to Starburst Enterprise
We hear you. The feedback from our customers has been overwhelming. You want these same managed ingestion and maintenance capabilities in Starburst Enterprise. We have prioritized this work and are systematically moving these components over to Enterprise with the goal of delivery by the end of the year.
Our mission is to ensure that every Starburst customer, regardless of their deployment model, can leverage the high-velocity Kafka ingestion, automated file loading, and serverless maintenance that currently powers Starburst Galaxy. We are committed to making the Open Data Lakehouse accessible and manageable for everyone. By providing a consistent experience across the entire Starburst ecosystem, we are removing the complexity from the data lifecycle so you can focus on the value of the data itself.



