×
×

A complete comparison of Starburst and Snowflake

Discover how Starburst and Snowflake compare across platform access, scalability, simplicity, and optionality, including real customer reviews and G2 Crowd ratings.

Snowflake is too expensive for all data. With Starburst, get up to a 55% improved TCO and 90% faster time to insight without the Snowflake lock-in.

What is Starburst Galaxy?

Starburst Galaxy is a price-performant multi-cloud data lake analytics platform powered by Trino, a leading open-source distributed MPP SQL query engine. Starburst Galaxy is used for both interactive ad-hoc analytics and long-running workloads like batch and ETL/ELT, and offers high scalability and query completion rates even as the amount of data (terabytes of data), query volume, and query complexity increases. The service runs federated queries across the data lake, cloud data warehouses, on-premise databases, and relational data management systems like PostgreSQL and MySQL. Galaxy also supports fault-tolerant execution, smart indexing and caching, Data Products, machine learning (PyStarburst and integration with Ibis), universal search and schema discovery while truly separating compute and storage between Starburst and your cloud data lake.

What is Snowflake?

Snowflake is a data warehouse in the cloud brought to life in the Snowflake Data Cloud and is built on top of Amazon Web Services and runs on Microsoft Azure and Google Cloud. Snowflake offers a fully managed solution with no hardware or software needed to install, configure, or manage. Similar to Starburst, Snowflake powers multiple data workloads, from Data Warehousing and Data Lake to Data Engineering, AI and machine learning, Applications, and Cybersecurity across multiple cloud providers and regions from anywhere in the organization. Also, Snowflake separates compute and storage, but both are managed, billed, and executed within the Snowflake platform.

Starburst is a High Performer in Data Warehousing

Don’t take our word for it. Starburst is named #1 for Quality of Support and Ease of Use in G2 Crowd’s Grid Report based on real customer reviews. Additionally, customers said Starburst beat out Snowflake in all of these categories: 

  • Meets Requirements
  • Ease of Use
  • Ease of Admin
  • Product Direction
  • Data Integration
  • Quality of Support
  • Data Visualization
  • Multi-Source Analysis
  • BI Tool Integration
  • Data Lake Analytics 

Simplicity

Going beyond key platform governance and management capabilities, a modern data analytics platform empowers data teams with easy-to-use functionality that increases productivity without adding complexity. It allows you to use a range of existing investments in just a few clicks. It allows you to break down data silos and build federated data products from distributed data sets to support use cases and create and scale self-service usage and adoption across the organization.

Starburst Galaxy

Snowflake

Native data security

Native data security

Automated cluster management

Automated cluster management

    Built-in real-time usage monitoring

    Built-in real-time usage monitoring

    External marketplace

    External marketplace

    Internal data product sharing and marketplace

    Internal data product sharing and marketplace

    Federated Data Products

    Federated Data Products

    Automated data maintenance

    Automated data maintenance

    Easy to get started

    Easy to get started

    *Comparison based on publicly available information as of September 20, 2023

    Access

    True data access empowers end users with the ability to use all their data, no matter where it lives, across data lakes, data warehouses, and databases while having confidence in security and governance controls. True access is about meeting business needs on time while adhering to regulatory data sovereignty requirements. Your modern data lake analytics platform/lakehouse should free your data sources for analytics purposes, not confine them in another way.

    Starburst Galaxy

    Snowflake

    Data Observability

    Data Observability

    RBAC/ABAC

    RBAC/ABAC

    Column/Row Masking

    Column/Row Masking

    Time-based policies

    Time-based policies

     

    SOC 2 Type 2 compliance and ISO 27001 certified

    SOC 2 Type 2 compliance and ISO 27001 certified

    Private Link for AWS, Azure, and Google

    Private Link for AWS, Azure, and Google

    Near real-time data ingestion of streaming data

    Near real-time data ingestion of streaming data

    Cross region analytics without data movement

    Cross region analytics without data movement

     

    On-premise data federation

    On-premise data federation

     

    Performant Cross-cloud analytics without data movement

    Performant Cross-cloud analytics without data movement

     

    Universal search and schema discovery

    Universal search and schema discovery

    *Comparison based on publicly available information as of September 20, 2023

    Scalability

    Internet scale matters in an internet-powered world but not every workload needs that power and performance. A modern data lake analytics platform puts the control in your hands to ensure high-performance scalability is available at the click of a button or automatically when you need it most while optimizing price-to-performance for all analytics workloads and helping you maintain confidence that queries will execute as scheduled at within the forecasted budget.

    Starburst Galaxy

    Snowflake

    Interactive query performance

    Interactive query performance

    Consistently execute long-running batch queries

    Consistently execute long-running batch queries

     

     

    Single tenant

    Single tenant

     

    Fault Tolerant Execution

    Fault Tolerant Execution

    Materialized views

    Materialized views

    Performance on traditional warehouse structured data workloads**

    Performance on traditional warehouse structured data workloads**

    Similar to Snowflake

    Similar to Starburst

    Performance on semi-structured JSON workloads**

    Performance on semi-structured JSON workloads**

    Outperforms against Snowflake

    Underperforms against Starburst

    Smart indexing and caching (results and query)

    Smart indexing and caching (results and query)

    Autoscaling by adding/removing incremental nodes

    Autoscaling by adding/removing incremental nodes

    Customizable scaling for cost and performance optimization

    Customizable scaling for cost and performance optimization

    *Comparison based on publicly available information as of September 20, 2023.
    ** See GigaOm TCO Benchmark report

    Optionality

    Open file and table formats are table stakes in providing optionality. A modern data lake analytics platform goes beyond the fundamentals to ensure your business has full control over your data by accessing data where it lives by allowing choice in cloud providers, security, and BI tools. Optionality also means that your SQL scripts are easily transferable to new data architectures and platforms when you need them to move without requiring massive undertakings in conversion and rewriting.

    Starburst Galaxy

    Snowflake

    Run on multiple clouds

    Run on multiple clouds

     

    Supports popular open file formats

    Supports popular open file formats

    Supports first and third-party data catalogs

    Supports first and third-party data catalogs

    Supports Python

    Supports Python

    Standard SQL

    Standard SQL

    OSS MPP SQL query engine

    OSS MPP SQL query engine

     

    Supports Iceberg, Delta Lake, Hudi, and Hive table formats

    Supports Iceberg, Delta Lake, Hudi, and Hive table formats

    Supports Apache Ranger

    Supports Apache Ranger

     

    *Comparison based on publicly available information as of September 20, 2023

    Free test drive | Watch | Contact us

    Access and analyze your data with elastic scale and high performance your business demands. Take Starburst Galaxy for a free test drive, watch the on-demand demo (no form fill needed), or contact us.

    Some additional exploration

     

    What does the Snowflake data cloud do?

    At its core, Snowflake’s data cloud architecture is a data warehouse that is optimized for a cloud-based architecture. It allows the movement and transformation of data from storage (S3) to its proprietary and closed cloud data warehouse. This provides a holistic data cloud for elastic data management and consumption. Snowflake is not just a storage solution but a tool that provides performance, relational querying, security, and governance for data that resides within the confines of its digital walls. It can be used to consolidate structured and semi-structured data and power transformations, analytics, and reporting using a highly custom SQL.

    In addition to handling structured and semi-structured data, Snowflake recently announced support for unstructured data in a data warehouse architecture. By incorporating unstructured data, Snowflake also aims to expand into data science and machine learning use cases within a data warehouse architecture.

    Is Snowflake a cloud data platform?

    Yes, Snowflake is indeed a cloud data platform. It provides a data warehouse as a service designed for the cloud. This platform allows businesses to store and analyze data using cloud-based hardware and software with both storage and compute within the Snowflake account. With Snowflake, businesses can handle several aspects of data storage, including performance, scalability, and security, once the data is loaded into Snowflake’s proprietary and closed storage.

    Is Snowflake the same as AWS?

    The Snowflake platform is similar to a combination of AWS services but does not provide the full breadth of functionality of all of AWS combined. Consider Snowflake similar to combining Amazon S3, Amazon Redshift, Amazon Glue, Amazon EMR, and Amazon Athena, along with several other security, governance, and management services in a highly closed and propriety platform.

    How does Snowflake work?

    Snowflake operates a cloud data warehouse architecture. At the heart of this are virtual warehouses, which are essentially clusters of compute resources. In order to use Snowflake, there are multiple checkpoints. It’s typically recommended to first stage your data in a cloud data lake and then ingest it into Snowflake. Once your data is in Snowflake’s proprietary storage, you can transform the data within the platform for your analytical needs.

    Within the Snowflake cloud data warehouse, compute resources are separate from storage, meaning you can scale them independently based on your needs. This separation ensures that large queries won’t slow down smaller, more urgent ones and vice versa. It also means that you only pay for the compute resources you use. When queries require more nodes for processing, instead of adding more nodes to the existing cluster, Snowflake opts to add another cluster, doubling the total compute resources whether or not all the resources are needed.

    Data outside of Snowflake can be queried using External tables, unmanaged Iceberg Tables, and their propiartaary managed Iceberg Tables.

    Similar to Starburst, the Snowflake architecture is designed to be secure, fast, and easy to use.

    What are the benefits of the Snowflake data cloud?

    The Snowflake data cloud offers its users many benefits align with the approach taken by Starburst.io. For Snowflake customers, the ability to handle all types of data in one place simplifies data management and allows for more comprehensive analytics, a benefit that is also central to Starburst’s approach.

    Metadata handling is another significant advantage of Snowflake, with automatic collection and management of metadata making it easier for users to understand and use their data effectively. Similarly, Starburst also emphasizes the importance of metadata in its approach.

    The on-demand nature of Snowflake’s compute resources is a major advantage, allowing users to scale up or down instantly based on their needs. This mirrors Starburst’s emphasis on flexibility and cost-effectiveness.

    Finally, Snowflake’s proprietary architecture allows for high levels of concurrency, meaning multiple queries can be run simultaneously without affecting each other’s performance. This is a capability that is also highlighted in Starburst’s approach, with its platform built for analyzing large amounts of distributed data with high concurrency using an open-source foundation.

    In summary, both Snowflake and Starburst.io offer powerful solutions for data management with many similar benefits, including handling all types of data, effective metadata management, on-demand resources, and high levels of concurrency.

    What challenges exist with the Snowflake data cloud for SQL workloads?

    While Snowflake’s data cloud offers many benefits, it also presents certain challenges for SQL workloads. First off, Snowflake is difficult and excessively time-consuming to get started. It requires that all your data is first ingested into their closed and proprietary storage, which can take years. Furthermore, general Snowflake best practices, first have you move your data into a cloud data lake staging environment before it gets sucked up into Snowflake. This means you’re paying for data movement twice.

    For Snowflake customers, one of the key challenges is building effective data pipelines. The scalability of Snowflake can lead to the ingestion of excessive amounts of data, which can increase storage costs and potentially degrade the quality of the data.

    Another challenge is related to workload management. Snowflake’s multi-cluster architecture allows for high levels of concurrency, but it can also lead to increased complexity in managing and optimizing workloads. For instance, joining large tables of data directly from raw to presentation layers can cause workloads to run for hours and add significant costs to the warehouse.

    Next, while Snowflake provides a lot of automation, it still requires some manual intervention. For example, decisions around whether to use Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) need to be made. Additionally, while Snowflake automates many aspects of data management, it still requires DBA efforts.

    Customers of Snowflake of all sizes will also regularly cite costs going beyond forecasts very quickly as data volume increases, premium features are activated, and inefficient performance enhancement practices, i.e., doubling of clusters for autoscaling rather than adding the necessary amount of additional worker nodes to existing clusters.

    Lastly, Snowflake is a highly closed and proprietary platform that is not only difficult to get all your data into, but once in the platform, the proprietary storage and custom SQL make it extremely difficult and costly to offload workloads from Snowflake as data and query volumes grow.

    Is Snowflake ANSI SQL compliant?

    Yes, Snowflake is ANSI SQL compliant. This means that all of the most common operations are usable within Snowflake. Snowflake also supports all of the operations that enable data warehousing operations, like create, update, insert, etc. In addition to that, Snowflake supports a subset of ANSI SQL:1999 and the SQL:2003 analytic extensions. However, it’s important to note that though Snowflake is ANSI SQL compliant, Snowflake operates its own highly proprietary version of SQL built on top of ANSI standards with 100+ custom functions, making the scripts difficult to migrate out of Snowflake as volumes increase.

    How to do data sharing with Snowflake?

    Data sharing is limited to within the Snowflake ecosystem, which allows you to share selected objects in a database in your account with other Snowflake accounts. Here’s how you can do it:

    1. Create a Share: The provider creates a share of a database in their account and grants access to specific objects in the database. The provider can also share data from multiple databases, as long as these databases belong to the same account.
    2. Add Accounts to the Share: One or more accounts are then added to the share, which can include your own accounts (if you have multiple Snowflake accounts).
    3. Use Reader Accounts: If you want to share with people who don’t have Snowflake accounts, you can use Reader Accounts.

    In addition to the above, Snowflake offers other options for data sharing:

    • Listing: You can offer a listing privately to specific accounts, or publicly on the Snowflake Marketplace.
    • Direct Share: Use a Direct Share to share data with one or more accounts in the same Snowflake region.
    • Data Exchange: If creating listings that you offer privately to specific accounts isn’t an option, you can use a data exchange to share data with a selected group of accounts that you invite.

    These features enable an ecosystem of data sharing and collaboration exclusively within Snowflake. At a high platform cost, they allow for the creation of data applications and apps that leverage shared data for reporting.

    What are the layers of the Snowflake architecture?

    Snowflake cloud data warehouse consists of three core layers:

    1. Database Storage Layer: When data is loaded into Snowflake from your cloud data lake staging environment, it reorganizes that data into its proprietary internal optimized, compressed, columnar format that is built on top of a cloud data lake. Snowflake stores this optimized data in cloud storage. All aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are controlled by Snowflake.
    2. Query Processing Layer: This layer is responsible for executing SQL queries. Similar to Starburst and its use of OS Trino, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally. This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale benefits of a shared-nothing architecture.
    3. Cloud Services Layer: This layer provides services such as infrastructure management, metadata management, query parsing and optimization, access control, and more. It coordinates and handles all transactions and sessions, ensuring that all operations are secure and ACID-compliant.

    Beyond the three core components, there is also a built-in visualization component and data marketplace.

    Image source: Snowflake documentation

     

    Start Free with
    Starburst Galaxy

    Up to $500 in usage credits included

    • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
    • Get up and running in less than 5 minutes
    • Easily deploy clusters in AWS, Azure and Google Cloud
    For more deployment options:
    Download Starburst Enterprise

    Please fill in all required fields and ensure you are using a valid email address.