Cookie Notice
This site uses cookies for performance, analytics, personalization and advertising purposes.
For more information about how we use cookies please see our Cookie Policy.
Manage Consent Preferences
These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.
Fully managed in the cloud
Self-managed anywhere
Starburst Galaxy is a price-performant multi-cloud open data lakehouse powered by Trino, a leading open-source distributed MPP SQL query engine. Starburst Galaxy is used for interactive ad-hoc analytics, long-running workloads like batch and ETL/ELT, streaming, and offers high scalability and query completion rates even as the amount of data (petabytes of data), query volume, and query complexity increases. Starburst runs federated queries across data lakes, cloud data warehouses, on-premise databases, and relational data management systems like PostgreSQL and MySQL. Galaxy also supports enhanced fault-tolerant execution, smart indexing and caching, Data Products, machine learning (PyStarburst and integration with Ibis), universal search and schema discovery while truly separating compute and storage between Starburst and your cloud data lake.
Snowflake is a data warehouse in the cloud brought to life in the Snowflake Data Cloud. It is built on top of Amazon Web Services and also runs on Microsoft Azure and Google Cloud. Snowflake offers a fully managed solution with no hardware or software needed to install, configure, or manage. Similar to Starburst, Snowflake powers multiple data workloads, from data warehousing, data Engineering, AI and machine learning, data applications, and cybersecurity across multiple cloud providers and regions from anywhere in the organization. Also, Snowflake separates compute and storage, but both are managed, billed, and executed within the Snowflake platform.
“We decided against keeping Snowflake because it rises complexity and cost. [Starburst] Trino will be a lasting piece of our tech stack, because it solves the problems associated with a monolithic data warehouse. Now, we can support many emerging projects, which can be onboarded in the data platform without needing much of our attention.”
— Lutz Künneke, Director of Engineering, BestSecret
Learn More
“We considered Snowflake, but its cost for storing our 8 petabytes of data was impractical. Using a data warehouse would have been redundant and 10X more expensive. Starburst offers superior performance at a fraction of the cost.”
— Richard Teachout, CTO, El Toro
Learn More
“In 6 months, we’ve successfully migrated more than 50% of our workloads from Snowflake into Starburst. Users are reporting 10X increases in query speed and we’ve saved more than $2M from reduced data warehousing costs. This has a tremendous impact on our bottom line and allows us to invest in other areas of the business.”
– Anonymous, Sr Software Developer Manager, Online Travel Agency
“Migrating to Snowflake would have required 2 FTEs and ongoing maintenance, not to mention the initial time required to stand up the new pipeline. With Starburst, we are able to query the data where it lives and focus on more important projects for our business.”
– Anonymous, Engineering Manager, Leading Food Delivery Company
“The amount of clickstream data we wanted to analyze would have been cost-prohibitive on Snowflake. In a side by side evaluation of Starburst and Snowflake, and found that these high complexity queries would have cost 4X as much to run on Snowflake compared to Starburst.”
– Anonymous, Director of Engineering, American Freelancing Platform
“We faced challenges with disparate data systems from acquisitions, hindering customer behavior tracking. Deploying Starburst enabled us to federate data sources, including S3 Data Lake, Snowflake, and Redshift. This centralized access empowered cross-sell and up-sell campaigns.”
– Anonymous, Sr Director Of Engineering, Leading Cloud-Based Financial Operations Company
Don’t take our word for it. Starburst is named #1 for Quality of Support and Ease of Use in G2 Crowd’s Grid Report based on real customer reviews. Additionally, customers said Starburst beat out Snowflake in all of these categories:
Going beyond platform governance and management capabilities, a modern data and analytics platform empowers data teams with easy-to-use functionality that increases productivity without adding complexity. It allows teams to use a range of existing investments in just a few clicks. It helps to break down data silos and build federated data products from distributed data sets to support use cases and scale self-service usage and adoption across the organization.
Starburst Galaxy
Snowflake
Native data security
Native data security
Automated cluster management
Automated cluster management
Built-in real-time usage monitoring
Built-in real-time usage monitoring
Internal data product sharing and marketplace
Internal data product sharing and marketplace
GenAI text-to-SQL
GenAI text-to-SQL
*
Federated Data Products
Federated Data Products
Automated data maintenance
Automated data maintenance
Easy to get started
Easy to get started
Comparison based on publicly available information as of March 18, 2024.
* In preview. Contact us to learn more.
True data access empowers data teams with the ability to use all their data, no matter where it lives, across data lakes, data warehouses, and databases while having confidence in security and governance controls. True access is about meeting business needs on time while adhering to regulatory data sovereignty requirements. Your open data lakehouse should free your data sources for analytics, not confine them in another way.
Starburst Galaxy
Snowflake
Data Observability
Data Observability
RBAC/ABAC
RBAC/ABAC
Column/Row Masking
Column/Row Masking
Time-based policies
Time-based policies
SOC 2 Type 2 compliance and ISO 27001 certified
SOC 2 Type 2 compliance and ISO 27001 certified
Private Link for AWS, Azure, and Google
Private Link for AWS, Azure, and Google
*
Near real-time data ingestion of streaming data
Near real-time data ingestion of streaming data
*
Cross region analytics without data movement
Cross region analytics without data movement
On-premise data federation
On-premise data federation
Performant Cross-cloud analytics without data movement
Performant Cross-cloud analytics without data movement
Universal search and schema discovery
Universal search and schema discovery
Comparison based on publicly available information as of March 18, 2024.
* In preview. Contact us to learn more.
Today’s data teams need to manage performance and costs. Internet scale matters in an internet-powered world but not every workload needs the highest levels of power and performance – especially as costs go up at a faster rate than performance. An open data lakehouse powers data and analytics platforms by putting control in your hands to ensure high-performance scalability is available at the click of a button or automatically when you need it most while optimizing price-to-performance for all analytics workloads.
Starburst Galaxy
Snowflake
Interactive query performance
Interactive query performance
Consistently execute long-running batch queries
Consistently execute long-running batch queries
Fault Tolerant Execution
Fault Tolerant Execution
Materialized views
Materialized views
Smart indexing and caching (results and query)
Smart indexing and caching (results and query)
Autoscaling by adding/removing incremental nodes
Autoscaling by adding/removing incremental nodes
Customizable scaling for cost and performance optimization
Customizable scaling for cost and performance optimization
Comparison based on publicly available information as of March 18, 2024.
* In preview. Contact us to learn more.
Open file and table formats are table stakes in providing optionality. An open data lakehouse for your modern data and analytics platform goes beyond the fundamentals to ensure your business has full control over your data by accessing data where it lives, by allowing choice in cloud providers, security, and BI tools. Optionality also means that your SQL scripts are easily transferable to new data architectures and platforms when you need them to move without requiring massive undertakings in conversion and rewriting.
Starburst Galaxy
Snowflake
Runs on multiple clouds
Runs on multiple clouds
Supports popular open file formats
Supports popular open file formats
Supports Python
Supports Python
Supports first and third-party data catalogs
Supports first and third-party data catalogs
Standard ANSI SQL
Standard ANSI SQL
OSS MPP SQL query engine
OSS MPP SQL query engine
Supports Iceberg, Delta Lake, Hudi, and Hive table formats
Supports Iceberg, Delta Lake, Hudi, and Hive table formats
Supports Apache Ranger
Supports Apache Ranger
Comparison based on publicly available information as of March 18, 2024.
* In preview. Contact us to learn more.
Access and analyze your data with elastic scale and high performance your business demands. Take Starburst Galaxy for a free test drive, watch the on-demand demo (no form fill needed), or contact us.
At its core, Snowflake’s data cloud architecture is a data warehouse that is optimized for a cloud-based architecture. It allows the movement and transformation of data from storage (S3) to its proprietary and closed cloud data warehouse. This provides a holistic data cloud for elastic data management and consumption. Snowflake is not just a storage solution but a tool that provides performance, relational querying, security, and governance for data that resides within the confines of its digital walls. It can be used to consolidate structured and semi-structured data and power transformations, analytics, and reporting using a highly custom SQL.
In addition to handling structured and semi-structured data, Snowflake recently announced support for unstructured data in a data warehouse architecture. By incorporating unstructured data, Snowflake also aims to expand into data science and machine learning use cases within a data warehouse architecture.
Yes, Snowflake is indeed a cloud data platform. It provides a data warehouse as a service designed for the cloud. This platform allows businesses to store and analyze data using cloud-based hardware and software with both storage and compute within the Snowflake account. With Snowflake, businesses can handle several aspects of data storage, including performance, scalability, and security, once the data is loaded into Snowflake’s proprietary and closed storage.
The Snowflake platform is similar to a combination of AWS services but does not provide the full breadth of functionality of all of AWS combined. Consider Snowflake similar to combining Amazon S3, Amazon Redshift, Amazon Glue, Amazon EMR, and Amazon Athena, along with several other security, governance, and management services in a highly closed and propriety platform.
Snowflake operates a cloud data warehouse architecture. At the heart of this are virtual warehouses, which are essentially clusters of compute resources. In order to use Snowflake, there are multiple checkpoints. It’s typically recommended to first stage your data in a cloud data lake and then ingest it into Snowflake. Once your data is in Snowflake’s proprietary storage, you can transform the data within the platform for your analytical needs.
Within the Snowflake cloud data warehouse, compute resources are separate from storage, meaning you can scale them independently based on your needs. This separation ensures that large queries won’t slow down smaller, more urgent ones and vice versa. It also means that you only pay for the compute resources you use. When queries require more nodes for processing, instead of adding more nodes to the existing cluster, Snowflake opts to add another cluster, doubling the total compute resources whether or not all the resources are needed.
Data outside of Snowflake can be queried using External tables, unmanaged Iceberg Tables, and their propiartaary managed Iceberg Tables.
Similar to Starburst, the Snowflake architecture is designed to be secure, fast, and easy to use.
The Snowflake data cloud offers its users many benefits align with the approach taken by Starburst.io. For Snowflake customers, the ability to handle all types of data in one place simplifies data management and allows for more comprehensive analytics, a benefit that is also central to Starburst’s approach.
Metadata handling is another significant advantage of Snowflake, with automatic collection and management of metadata making it easier for users to understand and use their data effectively. Similarly, Starburst also emphasizes the importance of metadata in its approach.
The on-demand nature of Snowflake’s compute resources is a major advantage, allowing users to scale up or down instantly based on their needs. This mirrors Starburst’s emphasis on flexibility and cost-effectiveness.
Finally, Snowflake’s proprietary architecture allows for high levels of concurrency, meaning multiple queries can be run simultaneously without affecting each other’s performance. This is a capability that is also highlighted in Starburst’s approach, with its platform built for analyzing large amounts of distributed data with high concurrency using an open-source foundation.
In summary, both Snowflake and Starburst.io offer powerful solutions for data management with many similar benefits, including handling all types of data, effective metadata management, on-demand resources, and high levels of concurrency.
While Snowflake’s data cloud offers many benefits, it also presents certain challenges for SQL workloads. First off, Snowflake is difficult and excessively time-consuming to get started. It requires that all your data is first ingested into their closed and proprietary storage, which can take years. Furthermore, general Snowflake best practices, first have you move your data into a cloud data lake staging environment before it gets sucked up into Snowflake. This means you’re paying for data movement twice.
For Snowflake customers, one of the key challenges is building effective data pipelines. The scalability of Snowflake can lead to the ingestion of excessive amounts of data, which can increase storage costs and potentially degrade the quality of the data.
Another challenge is related to workload management. Snowflake’s multi-cluster architecture allows for high levels of concurrency, but it can also lead to increased complexity in managing and optimizing workloads. For instance, joining large tables of data directly from raw to presentation layers can cause workloads to run for hours and add significant costs to the warehouse.
Next, while Snowflake provides a lot of automation, it still requires some manual intervention. For example, decisions around whether to use Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) need to be made. Additionally, while Snowflake automates many aspects of data management, it still requires DBA efforts.
Customers of Snowflake of all sizes will also regularly cite costs going beyond forecasts very quickly as data volume increases, premium features are activated, and inefficient performance enhancement practices, i.e., doubling of clusters for autoscaling rather than adding the necessary amount of additional worker nodes to existing clusters.
Lastly, Snowflake is a highly closed and proprietary platform that is not only difficult to get all your data into, but once in the platform, the proprietary storage and custom SQL make it extremely difficult and costly to offload workloads from Snowflake as data and query volumes grow.
Yes, Snowflake is ANSI SQL compliant. This means that all of the most common operations are usable within Snowflake. Snowflake also supports all of the operations that enable data warehousing operations, like create, update, insert, etc. In addition to that, Snowflake supports a subset of ANSI SQL:1999 and the SQL:2003 analytic extensions. However, it’s important to note that though Snowflake is ANSI SQL compliant, Snowflake operates its own highly proprietary version of SQL built on top of ANSI standards with 100+ custom functions, making the scripts difficult to migrate out of Snowflake as volumes increase.
Data sharing is limited to within the Snowflake ecosystem, which allows you to share selected objects in a database in your account with other Snowflake accounts. Here’s how you can do it:
In addition to the above, Snowflake offers other options for data sharing:
These features enable an ecosystem of data sharing and collaboration exclusively within Snowflake. At a high platform cost, they allow for the creation of data applications and apps that leverage shared data for reporting.
Snowflake cloud data warehouse consists of three core layers:
Beyond the three core components, there is also a built-in visualization component and data marketplace.
© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC
Up to $500 in usage credits included