Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes.

For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/ Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.

Functional/ Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/ Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.

Blog

Resources

Documentation

Tom Nats

Director of Customer Solutions

Starburst

Databricks and Starburst, a true unified open analytics platform

Last Updated: March 18, 2024

Data Lakehouse starburst enterprise

I remember it clearly, it was 2 o’clock in the morning back in the ancient year of 2000. I was working for a .com loading an Oracle data warehouse sitting on a single Sun Microsystems server. If one thing went wrong, it would delay the nightly load and cost four hours of rollback. There was no such thing as elasticity or a public cloud.

Fast forward almost 20 years. While the need for loading, moving, and analyzing data continues, tremendous advances abound in hardware, networking, and the commoditization of infrastructure started by Amazon’s AWS (VMware also deserves credit for this as well). One of the single most important advances is the separation of storage and compute driven by low cost object stores such as S3 by AWS. Companies can finally take control and complete ownership of their data using open storage formats such as Parquet and ORC. Together, these advances let us process data at a fraction of previous times, and with a variety of tools. We can at last free ourselves…

Trino and Spark: Open Analytics Platform

The diagram below illustrates the flexibility of an open data store. Using different processing tools, users and departments from multiple locations can add, remove, update and select data without limited storage or bandwidth restrictions. We’re no longer locked into a vendor-controlled ecosystem.

What does an open data, analytical ecosystem look like? Here at Starburst, we believe this architecture combines open source technology with enterprise features and security. It all begins with a distributed storage system such as AWS’ S3. The separate storage layer is located within your control and account. You can access your data with different compute technologies, freeing you from data lock-in.

Two specific technologies created within the last 5 years take advantage of this separation of storage and compute. They are Spark and Trino (formerly PrestoSQL). Spark was created out of Berkley’s AMP lab and is like a Swiss army knife as it can handle tasks such as ingestion, ETL (Extract Transform & Load) and machine learning. Trino was created from the folks over at Facebook and provides a highly concurrent SQL engine that can query large amounts of data against a variety of data platforms.

Together, these two technologies make up something we called the true unified, Open Analytics Platform. In a typical analytical environment, you ingest data, process it and make it available to users. Using both Trino and Spark together can handle these tasks with ease and work on a variety of platforms. The simple diagram below illustrates how both technologies work together to ingest, catalog, process, enrich and analyze different types and volumes of data.

We can break down these traditional tasks:

Data ingestion

Using Databrick’s Spark-based platform makes ingestion a breeze using batch or structured streaming.

Data lake management

Data lakes can quickly turn into ugly data swamps without strict governance and tools that allow for seamless management. Using Databricks and their open-sourced Delta Lake finally gives companies a method to build clean data lakes.

Machine & Deep learning

Spark and machine learning are usually found in the same sentence so it’s no surprise that a majority of companies are using it to take advantage of expanding their ML use cases using Databricks.

SQL query engine

Trino provides a massively parallel SQL engine that can join data from many different systems at scale on any cloud and on-premises. Use cases revolve around adhoc data analysis to high concurrent BI reporting. Starburst adds in enterprise features such as fine-grained access control, certified ODBC/JDBC drivers, enterprise tested connectors such as Oracle and 24/7 support by actual Trino committers.

Databricks and Starburst: Modern, open analytical platform

As companies migrate their on-premises systems to the cloud, they want to avoid any mistakes made in the past. They want low-cost storage separated from compute in their own account, complete ownership of their data using open formats such as Parquet/ORC and the ability to scale up and down based on their requirements.

The combination of Databricks and Starburst meet all current requirements for a modern, open analytical platform, helping companies future-proof their investments for whatever lies ahead.

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

Query your data lake fast with Starburst's best-in-class MPP SQL query engine
Get up and running in less than 5 minutes
Easily deploy clusters in AWS, Azure and Google Cloud

For more deployment options:

Download Starburst Enterprise

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Blog

Resources

Pages

Documentation

Databricks and Starburst, a true unified open analytics platform

Last Updated: March 18, 2024

Related posts

Get Started with Starburst Galaxy today

Trino and Spark: Open Analytics Platform

Data ingestion

Data lake management

Machine & Deep learning

SQL query engine

Databricks and Starburst: Modern, open analytical platform

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free with
Starburst Galaxy

For more deployment options:

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Starburst Galaxy

Starburst Enterprise

By Use Cases

By Industry

Documentation

Connect

Education

Filter:

Blog

Resources

Pages

Documentation

Databricks and Starburst, a true unified open analytics platform

Last Updated: March 18, 2024

Related posts

Building data lakes using AWS S3 object storage

6 Considerations for Choosing the Right Cloud Data Lake Solution

Introducing Starburst Galaxy on Google Cloud

Azure data lake with Starburst Galaxy for better analytics

Get Started with Starburst Galaxy today

Trino and Spark: Open Analytics Platform

Data ingestion

Data lake management

Machine & Deep learning

SQL query engine

Databricks and Starburst: Modern, open analytical platform

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free withStarburst Galaxy

For more deployment options:

Start Free with
Starburst Galaxy