×
×

Starburst vs. Databricks

Discover how Starburst and Databricks compare across platform access, scalability, simplicity, and optionality, including real customer reviews and G2 Crowd ratings.

A real open data lakehouse lets you pick your open format.  Works with your data in and around the data lake, on-premises and in the cloud. Is easy to use.

 

 

What is Starburst Galaxy?

Starburst Galaxy brings the open data lakehouse architecture to life using open file and table formats and an optimized open MPP SQL query engine with OS Trino. Starburst Galaxy is used for both interactive ad-hoc analytics and long-running workloads like batch and ETL/ELT, and offers high scalability and query completion rates even as the amount of data (terabytes of data), query volume, and query complexity increases. It runs federated queries across data lakes, cloud data warehouses, on-premises databases, and relational data management systems like PostgreSQL and MySQL. Galaxy also supports enhanced fault-tolerant execution, smart indexing and caching, Data Products, universal search and schema discovery while truly separating compute and storage between Starburst and your cloud storage.

What is Databricks SQL?

Databricks, traditionally known for its Apache Spark origins and data science and machine learning capabilities, introduced Databricks SQL, available as a serverless and non-serverless data warehouse in the Databricks Data Intelligence Platform (formerly known as the Lakehouse Platform) to run SQL and BI applications using its proprietary Photon engine. It offers a governance model via a proprietary data catalog, use of open formats (Delta Lake being the preferred for a first-class experience), APIs, and integrations with third party tools. It provides general compute resources for SQL queries, visualizations, and dashboards that are executed against the tables in data in cloud object stores. It also enables find and share insights with the built-in SQL editor, visualizations and dashboards.

Starburst is a Leader in Big Data Processing and Distribution

Don’t take our word for it. Starburst is named #1 for Quality of Support and Ease of Use in G2 Crowd’s Grid Report based on real customer reviews. Additionally, customers said Starburst beat out Databricks in all of these categories: 

  • Quality of Support
  • Ease of Admin
  • Ease of Use
  • Meets Requirements
  • Breadth of Partner Applications
  • Product Direction
  • Data Visualization
  • Performance and Reliability

Simplicity

A simple and intuitive experience includes everything from simple pricing models, ease of governance and management, minimal overhead and time to get started, a simplified customization and configuration experience, and most importantly just easy to use by any SQL user — a PhD should be optional.

Starburst Galaxy

Databricks SQL

Built-in data security

Built-in data security

Automated cluster management

Automated cluster management

Built-in real-time usage monitoring

Built-in real-time usage monitoring

Automated data maintenance

Automated data maintenance

*

Get started within 10 minutes from first sign-on

Get started within 10 minutes from first sign-on

Simple pricing

Simple pricing

Comparison based on publicly available information as of October 30, 2023

*In preview. Contact us to learn more.

Access

Empower data teams with the ability to securely use all their data assets, no matter where they live, across data lakes, data warehouses, and databases. With your modern data lake analytics platform easily discover, create, govern, share, and collaborate on curated data sets by connecting your data silos before, during, and after your modernization journey.

Starburst Galaxy

Databricks SQL

Universal search

Universal search

Schema discovery

Schema discovery

Role and attribute-based access controls

Role and attribute-based access controls

Data lineage

Data lineage

Column/row masking

Column/row masking

Automatic end-to-end encryption

Automatic end-to-end encryption

Private link for AWS, Azure, and Google

Private link for AWS, Azure, and Google

*

SSO for client connectivity

SSO for client connectivity

*

Integration with AWS LakeFormation

Integration with AWS LakeFormation

Supports popular data sources for federation

Supports popular data sources for federation

Supports hybrid data architectures

Supports hybrid data architectures

Optimized connectors for federation

Optimized connectors for federation

 

First-class federation for multiple data catalogs

First-class federation for multiple data catalogs

Federated data products

Federated data products

Data product sharing

Data product sharing

Time-based access control policies

Time-based access control policies

Data observability

Data observability

Data profiling

Data profiling

Metadata tagging

Metadata tagging


Automatic data classification

Automatic data classification

Comparison based on publicly available information as of October 30, 2023

*In preview. Contact us to learn more.

Scalability

A modern data lake analytics platform with high concurrency puts the control in your hands to ensure performant scalability is available when you need it most while optimizing price-to-performance for all analytics workloads.

Starburst Galaxy

Databricks SQL

High concurrency

High concurrency

Fault tolerant execution (FTE)

Fault tolerant execution (FTE)

Ad-hoc and interactive queries

Ad-hoc and interactive queries

Materialized views

Materialized views

Autoscaling by adding/removing incremental nodes

Autoscaling by adding/removing incremental nodes

Customizable scaling for cost and performance optimization

Customizable scaling for cost and performance optimization

Smart caching

Smart caching

Results and repeated subquery caching

Results and repeated subquery caching

*

Smart indexing

Smart indexing

Index and cache resilience

Index and cache resilience

End-to-end streaming ingest with low latency

End-to-end streaming ingest with low latency

Comparison based on publicly available information as of October 30, 2023

*In preview. Contact us to learn more.

Optionality

A modern data lake analytics platform takes the lakehouse architecture beyond the basics of open file and table formats by providing choice in hybrid or cloud environments, more data federation, seamless cross-cloud and cross-region analytics, choice in data catalogs without compromising the user experience and offering an enhanced MPP SQL query engine based on open standards and supported by the largest internet companies in the world.

Starburst Galaxy

Databricks SQL

Run on multiple clouds

Run on multiple clouds

Supports popular open file formats

Supports popular open file formats

Standard ASNI SQL

Standard ASNI SQL

Supports Python

Supports Python

OS MPP SQL query engine

OS MPP SQL query engine

First-class data federation with first and third-party data catalogs

First-class data federation with first and third-party data catalogs

Natively run SQL on Iceberg, Delta Lake, Hudi, and Hive table formats

Natively run SQL on Iceberg, Delta Lake, Hudi, and Hive table formats

Built-in cross region querying

Built-in cross region querying

Built-in cross cloud querying

Built-in cross cloud querying

*

Supports hybrid cloud architecture

Supports hybrid cloud architecture

*

Supports Apache Ranger

Supports Apache Ranger

In platform capability to migrate Hive and Delta tables to Iceberg

In platform capability to migrate Hive and Delta tables to Iceberg

Dataframe API for Python

Dataframe API for Python

Comparison based on publicly available information as of October 30, 2023

*In preview. Contact us to learn more.

Contact us | Tutorial | Try

Access and analyze your data with elastic scale and high performance your business demands. Get started with a free Galaxy trial, watch this tutorial on Starburst Galaxy, or contact us.

Some additional exploration

What is Databricks SQL used for?

Unlike traditional SQL warehouses, Databricks SQL is a platform that provides SQL data warehousing capabilities on data stored in a cloud data lake. It is primarily used for data exploration, ad hoc analytics (without the need of data pipelines) and interactive big data analytics. The platform allows users to query data from a limited number of federated data sources using non-optimized connectors and requiring the use of their proprietary data catalog, Unity Catalog, for an optimal data federation experience.

Databricks SQL also offers its own visualization tooling and integration with various visualization BI tools such as Tableau, PowerBI, ThoughtSpot, Looker, and others to make it easier for data analyst to consume the data.

What is a Databricks workspace?

A workspace within the Databricks platform is an environment for data engineers, data science teams and others to access Databricks resources with the option of running a single or multiple workspaces within your Databricks account.

What type of SQL is used in Databricks?

Similar to Starburst, Databricks SQL uses the American National Standards Institute (ANSI) for SQL, which is a common standard for most SQL databases.

Can I write SQL in Databricks?

Yes. Similar to Starburst Galaxy, within Databricks SQL there is an environment via SQL endpoints for running SQL queries, creating dashboards with visualization tools, and sharing query results.

Can I write Databricks SQL in a text editor?

Yes, similar to Starburst Galaxy, you can write your SQL queries or scripts in any text editor of your choice and then copy and paste them into Databricks SQL or a Databricks notebook for execution.

Is Databricks SQL a program?

No, Databricks SQL is not a program. It is a SQL tool within the Databricks Lakehouse platform.

How does Azure Databricks differ from Databricks on AWS or GCP?

Databricks on Azure or Azure Databricks is the result of a collaboration between Microsoft and Databricks to bring Databricks on as a first party service to Microsoft Azure. The core functionality of Databricks remains the same across the three hyperscale cloud providers but there are some differences in how it is integrated with each cloud provider’s services.

  • Azure Databricks
    • Azure treats Databricks as a first party service and is built and supported in collaboration with the Databricks team.
    • Azure Databricks can be integrated into the Azure portal and can be provisioned with a single click.
    • Azure Databricks is highly optimized for the Azure environment as a result of the shared engineering, is tightly integrated with other Azure services.
    • Time will tell how Azure Databricks evolves with Azure Synapse and Azure Fabric.
  • Databricks on AWS and GCP
    • Databricks on AWS or GCP is not a first party service like Azure Databricks. Therefore, similar to other SaaS tools, it requires a bit more setup and understanding of AWS and GCP services like IAM (for AWS) and networking.

 

Contact Us to Learn More

We’ll send you a free download of Starburst, and a Starburst expert will reach out to schedule a call.

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.