
BestSecret’s data journey: Moving beyond Snowflake

Jay Chen
Vice President, Product Marketing
Starburst
Charles Frisbie
SA Manager
Starburst
Jay Chen
Vice President, Product Marketing
Starburst
Charles Frisbie
SA Manager
Starburst
Share
In a new report Cloud Data Warehouse vs. Cloud Data Lakehouse: A Snowflake vs. Starburst TCO and Performance Comparison, published by GigaOm, a leading technology research firm, a comparison was made between the cost, time, and effort required to adopt a Snowflake cloud data warehouse vs. a data lakehouse powered by Starburst. The report concluded that a Starburst lakehouse architecture could achieve superior price-performance and significantly faster time-to-insight at a much lower total cost of ownership (TCO).
In this blog, we will take a deeper look at these results and explain how Starburst’s approach can help data teams achieve more at a lower total cost. First, the specific results:
The test was designed to simulate an actual migration project that an enterprise might take. The migration process was broken down into four distinct steps: Planning, Migration, Path-to-Production, and Post-Migration. These processes were further broken down into individual tasks, then measured and documented for each scenario in the comparison report.
The hypothetical enterprise performing the migration has a few different data sources. To represent a legacy on-premise OLAP system, a traditional Oracle data warehouse was used. Data is piped into Oracle from upstream transactional systems that represents the traditional channels like brick and mortar sales. In addition to legacy OLAP, a couple of cloud data sources were also included. A cloud Postgres database represents OLTP data from ecommerce sales, and JSON data stored in S3 represents weblog data that can be used to analyze customer lifecycles.
For the Snowflake test scenario, per Snowflake requirements, all data must first be moved to the cloud and into Snowflake before any queries can be run. After the initial lift-and-shift migration, ongoing ETL pipelines must be maintained to continually ingest data into Snowflake for analytical use.
For the Starburst test scenario, a few different options were considered. Because Starburst can access data at the source, the report considered various combinations of lift-and-shift migration and ongoing federation. The lowest TCO option was a migration of cloud sources to an Iceberg data lake with ongoing federation to the on-prem data source.
Despite significant interest in data lakes and lakehouses within data engineering communities in recent years, as well as the rise of open source data lake formats, such as Apache Iceberg, Delta Lake, and Apache Hudi, many data engineers and architects still express hesitancy around the performance and manageability of data lakes today. Some have the battle scars from their experiences during the Hadoop era to justify it. Legacy data lakes were seen as slow, cumbersome, and difficult to manage.
The GigaOm report shows that a modern data lake powered by Starburst is just as performant and cost-effective as a cloud data warehouse for standard BI queries, as measured by the industry standard TPC-DS benchmark.
To achieve similar query performance on the benchmark across both systems, the following setup was used:
Snowflake | Starburst | |
Cloud | AWS | AWS |
Storage | Native Tables | S3 Iceberg |
Performance Features | Clustering | Cache Service Warp Speed |
Pricing Tier | Enterprise | EC2 3-year Reserved Instances |
Cluster Size/Instance Type | Large | 10 nodes of m6gd.8xlarge (320 vCPU) |
Using list prices across both Starburst software as well as AWS infrastructure, the two tests returned nearly identical query response times at similar cost:
TPC-DS | Criteria | Snowflake | Starburst |
Single-User Stream Execution Time | < 900 sec. | 707 sec. | 751 sec. |
Geometric Mean | < 5 sec. | 3.81 sec. | 4.23 sec. |
Single-User Price-Per-Performance
($/hour / Execution Time / 3,600sec/hr) |
– | $4.71 | $4.88 |
20-User Execution Time | – | 9,366 sec. | 9,686 sec. |
20-User Price-Per-Performance
($/hour/Execution Time/3,600sec/hr) |
– | $62.44 | $62.94 |
Based on this test, the report concludes that price-performance on traditional warehouse workloads for BI (as represented by TPC-DS) are essentially identical between Starburst and Snowflake.
Beyond BI workloads, the GigaOm report goes on to test more specialized queries involving semi-structured data. These tests were designed to be representative of typical complex workloads, such as customer analytics, log analytics, clickstream analytics, and security analytics, that often need to incorporate data outside of a data warehouse.
For testing purposes, GigaOm simulated website log data in JSON format. For the comparison, the web data was loaded into Snowflake as a VARIANT column and also into an AWS S3 for Starburst to query. The web data could then be analyzed alone for user traffic patterns like views, and could be further joined with the main datasets to represent performing a customer 360 or conversion analysis. Several of the benchmark queries were tailored to join and analyze data from the JSON web data along with either or both of the other sources.
Here, the ability to analyze unstructured data without having to build and maintain pipelines also adds significant value to the Starburst option.
TPC-DS 1 TB + JSON 1 TB | Criteria | Snowflake | Starburst |
Single-User Stream Execution Time | < 900 sec. | 3,686 sec. | 814 sec. |
Geometric Mean | < 5 sec. | 4.97 sec. | 4.63 sec. |
Single-User Price-Per-Performance
($/hour / Execution Time / 3,600sec/hr) |
– | $24.57 | $5.29 |
In the JSON web data test, Starburst outperformed Snowflake in query speed by 4x and price-performance by nearly 5x.
With any data migration, project effort and business disruption are often two major concerns.
The sheer effort required to move vast amounts of data can be overwhelming. This includes ensuring data integrity, managing dependencies between applications, and addressing potential compatibility issues. Additionally, businesses must ensure that their teams are adequately trained to handle new cloud technologies, which can be a steep learning curve for some.
Downtime during the migration (which can often take multiple years) can disrupt business operations, making it crucial to plan major migrations in phases. Moreover, unforeseen challenges, such as data discrepancies or integration hiccups, can extend the migration timeline, causing additional negative business impact.
The GigaOm field test revealed that a Starburst lakehouse architecture requires between 47% and 67% less migration effort compared to migrating data into Snowflake.
In particular, the report found that Starburst’s data federation capability made a significant impact by accelerating some integration tasks and eliminating others altogether.
For a typical data migration project, Starburst reduces overall migration time by up to 67%, which directly translates to a reduction in time-to-insight, allowing businesses to make faster, data-driven decisions.
Assuming performance and architectural requirements are met by either Starburst or Snowflake, the total cost of ownership can be a deciding factor when choosing a long term technology investment. The GigaOm report summarizes its findings into a TCO comparison, which includes the following important components:
In general, the GigaOm report further solidifies Starburst’s position as leading Data Lake Analytics Platform, delivering similar or better query performance at a fraction of the cost of traditional data centralization architectures. This is because traditional data warehousing models require adherence to a rigid data architecture, which includes significant data movement and duplication, leading to data lock-in and unpredictable high costs.
Starburst’s Modern Data Lake approach delivers warehouse-like query performance and capability directly on the data lake.
GigaOm’s exclusive Snowflake vs. Starburst performance and TCO analysis
Starburst Galaxy is the fastest and easiest way to get started with open source Trino