Starburst Galaxy Provides Superior Data Ingestion

Results from recent tests show Starburst Galaxy outperforms peers in data ingestion

March 5, 2026

Ahmed Niyaz

Product Manager

Starburst

Ahmed Niyaz

Product Manager

Starburst

More deployment options

Request Enterprise trial license key →

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

Introducing the Great Lakes Connector for Starburst Enterprise

For years, the promise of the Open Data Lakehouse has been undercut by a harsh reality. Data ingestion is often a bottleneck, especially for AI workloads. While Apache Iceberg has emerged as the gold standard for table formats, getting data into it remains a complex engineering nightmare, especially at the breakneck speed of modern streaming and the massive scale of raw batch ingestion.

In this context, data teams are often forced to choose between fast but fragile streaming data stacks or slow but stable batch processing, all while being buried under the operational tax of Iceberg maintenance.

How Starburst helps data ingestion

The revolution has arrived. At Starburst, we have spent the last year dismantling the barriers to data ingestion and data maintenance to ensure that your data is ready for action in near real time.

To this end, we have moved our core ingestion innovations on Starburst Galaxy from experimental to General Availability (GA), allowing you to deploy production-grade pipelines for:

Streaming Ingest from Kafka Sources

Ingest up to 100GB/second into managed Iceberg tables with built-in exactly-once delivery and sub-minute latency.

File Ingest

Automatically discover, schematize, and load massive volumes of data from Amazon S3 directly into Iceberg tables without writing custom code or external orchestration.

Serverless Table Maintenance

Automated compaction, snapshot expiration, and orphan file removal are now fully available to keep your tables healthy and query-ready by default.

By unifying these capabilities on a serverless architecture, we have eliminated the persistent overhead of infrastructure management. Starburst scales compute resources instantly to match your fluctuating data volumes, ensuring you only pay for the capacity you actually consume. This approach guarantees that your data lands in Managed Iceberg reliably and cost-effectively, fully optimized for the performance of Trino, the fastest engine for the Iceberg lakehouse.

How data ingestion drives real-world results

This adoption has translated into real-world results, with customers trusting these features for their most critical production workloads. We are seeing Starburst power high-impact operations across industries like cybersecurity, where it handles rapid endpoint detection, and marketing, where it drives real-time campaigns. It is also the engine behind seamless database replication and comprehensive internal analytics, providing a unified source of truth across the enterprise.

The graphic below highlights our 2025 momentum, showcasing the scale and reliability our customers achieved across these diverse sectors this past year.

> 2 Trillion total records ingested
~10 PB of raw data ingested
~ 90% data compression rate
550 TB of Iceberg data compacted

Image depicting Starburst Galaxy benchmarking for data ingestion.

Benchmarking

Beyond our internal success, independent validation has confirmed the strength of our architecture. A recent benchmarking project by Concurrency Labs put our ingestion capabilities to the test, comparing Starburst Galaxy against AWS Data Firehose and Confluent Tableflow for streaming data into Iceberg. Using a standard TPC-DS dataset across both Amazon MSK and Confluent Cloud, the tests evaluated performance and cost under identical conditions.

Starburst Galaxy scores 7x higher on ingestion rate

The results were definitive. Starburst Galaxy delivered an approximately 7x higher record ingestion rate compared to both AWS Data Firehose and Confluent Tableflow.

The cost efficiencies were equally stark. Concurrency Labs estimated that Starburst ingestion costs were 87% lower than AWS Data Firehose and 81% lower than Confluent Tableflow at the tested throughput. Furthermore, Starburst produced significantly smaller Iceberg tables due to superior data compression.

Independent benchmark validation

This third-party analysis reinforces what our customers see in production. Starburst is not just a faster way to query your data. It is also the most cost-effective way to land it, manage it, and turn that data into insights.

Image showing that Starburst Galaxy performed 7x faster for data ingestion compared to competitors, according to 3rd party tests. Including 84% lower monthly streaming, 85% faster total time to commits, and 72% smaller average footprint.

What’s New: Expanding the Ingestion Toolkit

We are thrilled by the positive reception these features have received, and we are continuously improving the product based on your feedback.

To that end, we are excited to announce the Public Preview of Avro support for streaming ingest. As part of this update, we have also launched the Confluent Schema Registry integration, allowing you to seamlessly manage and evolve your schemas while Starburst handles the heavy lifting of deserialization and ingestion into Iceberg.

On the file ingestion front, we are expanding beyond JSON to include Public Preview support for CSV files and other common text delimiter types.

By continuing to broaden our format support, we are making it easier than ever to consolidate your data into high-performance Iceberg ingestion pipelines optimized for Trino.

What’s Next: Bringing the Revolution to Starburst Enterprise

We hear you. The feedback from our customers has been overwhelming. You want these same managed ingestion and maintenance capabilities in Starburst Enterprise. We have prioritized this work and are systematically moving these components over to Enterprise with the goal of delivery by the end of the year.

Our mission is to ensure that every Starburst customer, regardless of their deployment model, can leverage the high-velocity Kafka ingestion, automated file loading, and serverless maintenance that currently powers Starburst Galaxy. We are committed to making the Open Data Lakehouse accessible and manageable for everyone. By providing a consistent experience across the entire Starburst ecosystem, we are removing the complexity from the data lifecycle so you can focus on the value of the data itself.

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

Starburst’s mission is to free our customers to see the invisible and achieve the impossible

Starburst Galaxy Provides Superior Data Ingestion

More deployment options

Start for Free with Starburst Galaxy

Introducing the Great Lakes Connector for Starburst Enterprise

How Starburst helps data ingestion

Streaming Ingest from Kafka Sources

File Ingest

Serverless Table Maintenance

How data ingestion drives real-world results

Benchmarking

Starburst Galaxy scores 7x higher on ingestion rate

Independent benchmark validation

What’s New: Expanding the Ingestion Toolkit

What’s Next: Bringing the Revolution to Starburst Enterprise

Start for Free with Starburst Galaxy