Cookie Notice
This site uses cookies for performance, analytics, personalization and advertising purposes.
For more information about how we use cookies please see our Cookie Policy.
Manage Consent Preferences
These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.
Starburst Galaxy is a price-performant, fully-managed, multi-cloud data lake analytics platform powered by Trino, a leading open-source distributed MPP SQL query engine. Starburst Galaxy is used for both interactive ad-hoc analytics and long-running workloads like batch and ETL/ELT, and offers high scalability and query completion rates even as the amount of data, query volume, and query complexity increases. The service runs federated queries across the data lake, cloud data warehouses, on-premise databases, and relational data management systems like PostgreSQL and MySQL. Galaxy also supports fault-tolerant execution, smart indexing and caching, Data Products, and universal search and schema discovery.
Amazon Athena, available in serverless and dedicated versions, is a query service that analyzes data in Amazon Web Services (primarily Amazon S3) using standard SQL for ad-hoc analytics. Amazon Athena serverless has no infrastructure for customers to manage, and they only pay for queries that run. Amazon Athena was originally built on a fork of Presto (PrestoDB version .217), originally released in January 2019.
“The bottom line is that Starburst Galaxy is a huge force multiplier for us. Based on my experience in previous roles, I’ve been able to accomplish what would’ve taken two to three engineers in half the time and one tenth of the cost [compared to Athena].”
– Anonymous, Director of Software and Engineering
Learn more
“With Starburst, we can maximize the value of our data. We are now able to run queries on tables with terabytes of data in just a few seconds.”
– Staff Engineer, A Fortune 100 Cloud Computing Company
Learn more
“We were using data in the way we could. It was getting more expensive, slower, and feeble. We had to change our approach and look for other ways of enabling our users without infrastructure penalties. We were over-run by the limitations of our latest solution [Athena]… Starburst gives us a single platform to explore more data through connectivity, maintain data quality and governance, and provide the data to all of our employees using their visualization tools of choice.”
– André Gortari, Data Engineering Manager at Banco Inter
Learn moreDon’t take our word for it. Starburst is named #1 for Quality of Support and Ease of Use in G2 Crowd’s Grid Report based on real customer reviews. Additionally, customers said this about Starburst:
Going beyond key platform governance and management capabilities, a modern data analytics platform empowers data teams with easy-to-use functionality that increases productivity without adding complexity. It allows you to use a range of existing investments in just a few clicks. It allows you to build federated data products from distributed data sets to support business use cases and create and scale self-service usage and adoption across the organization.
Starburst Galaxy
Amazon Athena (Serverless)
Automated AWS compute plane set-up
Automated AWS compute plane set-up
Multi-cloud platform
Multi-cloud platform
Built-in data security
Built-in data security
(requires use of Lakeformation)
Data Products
Data Products
First and third-party data catalogs
First and third-party data catalogs
Automated cluster management
Automated cluster management
Built-in real-time usage monitoring
Built-in real-time usage monitoring
Built-in query scheduler
Built-in query scheduler
(requires Lambda functions)
Query sharing
Query sharing
Data profiling
Data profiling
Data lineage
Data lineage
True data access empowers organizations with the ability to use all their data, no matter where it lives, across data lakes, data warehouses, and databases while having confidence in security and governance controls. True access is about meeting business needs on time while adhering to regulatory data sovereignty requirements. Your modern data lake analytics platform/lakehouse should free your data sources for analytics purposes, not confine them in another way.
Starburst Galaxy
Amazon Athena (Serverles)
Cloud data federation
Cloud data federation
On-premise data federation
On-premise data federation
Cross-cloud/cross-region analytics
Cross-cloud/cross-region analytics
RBAC/ABAC
RBAC/ABAC
Column/Row Masking
Column/Row Masking
Time-based policies
Time-based policies
AWS Service account
AWS Service account
SOC 2 Type 2 compliance and ISO 27001 certified
SOC 2 Type 2 compliance and ISO 27001 certified
In platform universal search and schema discovery
In platform universal search and schema discovery
(Glue crawler available at additional charge)
SSO via AWS IAM, Okta, Azure AD, and Google
SSO via AWS IAM, Okta, Azure AD, and Google
Internet scale matters in an internet-powered world but not every workload needs that power and performance. A modern data lake analytics platform puts the control in your hands to ensure high-performance scalability is available at a click of a button or automatically when you need it most while optimizing price-to-performance for all analytics workloads while maintaining confidence that queries will execute as scheduled.
Starburst Galaxy
Amazon Athena (Serverless)
Ad-hoc and interactive queries
Ad-hoc and interactive queries
Multi-tenant
Multi-tenant
Fault Tolerant Execution
Fault Tolerant Execution
Built-in data catalog
Built-in data catalog
Cluster configuration
Cluster configuration
Cluster per account
Cluster per query
Autoscales by adding more nodes per cluster
Autoscales by adding more nodes per cluster
(adds more clusters)
Customizable scaling for cost and performance optimization
Customizable scaling for cost and performance optimization
Consistently executes long-running batch queries
Consistently executes long-running batch queries
Smart indexing and caching
Smart indexing and caching
Open file and table formats are table stakes in providing optionality. A modern data lake analytics platform goes beyond the fundamentals to ensure your business has full control over your data by accessing data where it lives digitally and physically, by allowing choice in cloud providers, security, and BI tools, and ensuring expert Trino support is available if and when your teams need it most.
Starburst Galaxy
Amazon Athena (Serverles)
OSS query engine
OSS query engine
Trino
Trino and Legacy Presto
Supports popular open table formats
Supports popular open table formats
Supports popular open file formats
Supports popular open file formats
Supports Python
Supports Python
Supports data catalogs beyond AWS Glue
Supports data catalogs beyond AWS Glue
Run on multiple clouds
Run on multiple clouds
Expert in-house Trino support
Expert in-house Trino support
Access and analyze your data with elastic scale and high performance your business demands.
Amazon Athena is AWS’s analytics engine that allows you to execute Athena queries terabytes and petabytes of data in and around S3. You can use Athena to execute data warehouse-like SQL queries on data in your lake, access data from federated sources, prepare data for machine learning models, build distributed data reconciliation engines, and perform multi-cloud data analysis while only being able to run on Amazon Web Services.
Amazon Athena is not an ETL tool in the traditional sense, but it can be used to simplify ETL data pipelines using its federated SQL queries and user-defined functions. However, it is not uncommon for long-running queries like ETL jobs to fail without warning.
No, they are not the same. Amazon Simple Storage Service (S3) is a cloud storage service that allows you to store and retrieve data stored within it (cloud data lake). Amazon Athena, on the other hand, is what you would use to run queries against S3 data using standard SQL that supports ANSI SQL.
No, Amazon Athena is not a SQL server. It is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Athena is built on top of Presto, a distributed SQL query engine, and can process large amounts of data in parallel. It supports a wide range of data formats, including CSV, JSON, ORC, Avro, and Parquet.
Athena allows you to create a ‘data warehouse’ like experience on Amazon S3. By defining schemas and running queries, you can efficiently organize and get your data. Furthermore, using APIs from AWS, visualization of the query results in business intelligence tools becomes possible.
To effectively use and manage Amazon Athena, instead of having the native built-in functionality you would expect of an analytics platform, Amazon Athena requires you to use several other AWS services. Here are some of the services that are required to make Athena work effectively.
Amazon S3 – Amazon S3 (Simple Storage Service) is an object storage service that offers scalability, data availability, security, and performance. It allows you to store and retrieve any amount of data from anywhere on the web. You can accomplish these tasks using the simple and intuitive web interface of the AWS Management Console. Data is stored in S3 buckets, which are containers for objects that you store in Amazon S3. S3 is the primary data source for Amazon Athena.
AWS Glue – this is the primary data catalog for Amazon Athena. The Glue data catalog is a fully managed, serverless data integration service that makes it easy to prepare and load data for analytics. Starburst Galaxy provides options in which data catalogs you use, including Starburst Gravity and AWS Glue.
Amazon Lake Formation – is a fully managed service that makes it easy to build, secure, and manage data lakes. Unlike Starburst Galaxy, which has these security and governance capabilities built in, Amazon Athena requires you to use Amazon Lake Formation to define and enforce database, table, and column-level access policies when using Athena queries to read data stored in Amazon S3
Amazon Redshift – this is AWS’s data warehouse. Similar to Starburst Galaxy, Amazon Athena also allows you to access data within Amazon Redshift.
AWS Lambda – You can use Lambda to execute SQL queries on Amazon Athena. You can create a Lambda function that uses the AWS SDK for Python (Boto3) to execute SQL queries on Amazon Athena. The Lambda function can be triggered by an event, such as an API Gateway invocation, to execute the query and return the query results.
Amazon DynamoDB – this is a fully managed NoSQL database service. The Amazon Athena DynamoDB connector (also available in the Starburst self-managed software offering) enables Athena to communicate with DynamoDB so that you can execute SQL queries on your tables.
AWS IAM – this is the identity and assessment service from AWS. Unlike the built-in capabilities with Starburst Galaxy, Amazon Athena uses AWS IAM policies as the primary means to restrict access to Athena operations. Users can create policies that grant or deny access to specific resources and configure permissions based on user roles or groups.
AWS Command Line Interface (CLI) – You can use the AWS CLI to interact with Amazon Athena. For example, you can use the aws athena start-query-execution command to run a query. You will then need to poll with aws athena get-query-execution until the query is finished. When that is the case, the result of that call will also contain the location of the query result on S3, which you can then download with aws s3 cp.
You cannot save results from the AWS CLI directly, but you can specify a Query Result Location, and Amazon Athena will automatically save a copy of the query results in an Amazon S3 location that you specify. You could then use the AWS CLI to download that results file.
Amazon Quicksight – this is AWS’s business intelligence (BI) service. Similar to the Starburst Galaxy and Quicksight experience, Amazon QuickSight retrieves data from Athena to enable visualization of the query results from Amazon Athena SQL queries.
Amazon EMR, formerly known as Amazon Elastic MapReduce – is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. You would use the two together when building an Apache Iceberg data lake. You can use Amazon EMR Spark to create an Iceberg table and load sample data, then use Athena to query the table, perform schema evolution, and more with the AWS Glue Data Catalog.
Unlike Starburst, Amazon Athena does not offer capabilities to build, manage, secure, and share curated data sets in the form of federated data products.
Amazon Athena is serverless, so there is no infrastructure to manage, and its pricing structure is you pay only for the queries you run.
Amazon Athena also recently introduced the ability to provision dedicated capacity for your Athena queries. With provisioned capacity, you can reserve a dedicated set of compute resources to run your queries. This puts the management responsibilities on customers creating high risks of wasted resources and rising costs with poor management.
Up to $500 in usage credits included
© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC
Up to $500 in usage credits included