Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes.

For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/ Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.

Functional/ Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/ Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.

Blog

Resources

Documentation

Tom Nats

Director of Customer Solutions

Starburst

How to Optimize Trino Performance with Different Storage Layers

Last Updated: March 25, 2024

Customers sometimes ask how they can make Trino even faster. One of my first questions back to them is “what are you using for your storage layer?” I often get a puzzled look back, then I explain that Trino is like having a very fast race car but if you don’t give it a nice, fast road to travel on, then you are severely limiting it. Let’s go into further details.

Since Trino is an in-memory query engine, it can only process data as fast as the storage can serve up that data. Sometimes this is under your control and other times it’s not. The diagram above illustrates the different types of storage that your data may reside in which can be queried by Trino.

When we talk about storage performance, it usually falls into two categories which are latency and throughput. Latency is the time it takes to get the data to the end user and throughput is the amount of data the query is reading over time. Think of throughput as the size of hose so the larger the hose, the more water can get through. Latency would be how fast the water can get from one side of the hose to the other.

When Trino users submit queries, they want them to return as fast as possible no matter how much data is involved. This of course depends upon numerous factors which the underlying storage plays a crucial role in determining the time it takes to execute queries.

The diagram below illustrates the different throughput storage layers. Distributed storage such as AWS S3, Azure ADLS and HDFS (Hadoop Filesystem) generally provide the highest throughput for queries which can span terabytes of data.

throughput storage layers

For latency, of course memory (or RAM) is the lowest followed by NoSQL/Document type systems with distributed storage being the highest.

latency storage layers

Depending on the use case and performance requirements, there are options to get data to the appropriate storage to meet these demands. The following are a few use cases that our customers have encountered and how they solved them.

usecase1

Use Case #1 – Low latency, real-time queries for custom application

This customer created an analytical application which mostly executed queries over many terabytes of data. However, there were a few cases where a user would type in a search criteria that would query a very small dataset. An example would be a list of states or regions. The service level agreement (SLA) was < 1 second for these types of queries. Although the data was on S3 storage, the response time wasn’t meeting the SLA so they moved these small data sets to indexed PostgreSQL database tables.

The result was the ability to provide response times for both types of queries ensuring their end-user experience was within their SLAs. This also satisfied the ability to keep their architecture simple by continuing to use Starburst Trino as the main entry point for the application.

usecase2

Use Case #2 – Frequent querying of low-performing database system

One of our customers had just adopted Starburst Enterprise in their environment and wanted to add a legacy Oracle database to their list of data sources available to their users. The only issue was they were afraid this extra usage would degrade performance too much on this system. Since they had a very large Hadoop system, it made more sense to copy the relevant tables using simple “create table as” statements using Trino from that Oracle system to Hadoop a few times a day. Users would query the data in the Hadoop system which can take advantage of Trino’s MPP parallel engine and return results much quicker as well as reduce the extra load on the Oracle system.

The result was providing a much needed data source to users without hurting the performance of the Oracle source system. This is a great example of using the power of Trino and distributed storage to offload access from a slower system by easily replicating data.

usecase3

Use Case #3 – Short term storage during ETL process

In our third use case, a company has a morning process where they create many rollup/aggregation tables and then execute numerous reports using Power BI. The data for these reports come from numerous systems and there are many temporary tables used in this process.

The solution is to build these temporary tables in the fastest storage available. In Trino, this is the memory connector or any distributed storage that is available. (note the diagram above) If the data is small enough, the memory connector can be used to provide lightning fast, temporary table storage (note – the memory connector isn’t meant for long term storage). If the tables are larger, then distributed storage can be used to get the fastest performance while writing and reading this data.

Based on your needs, there are plenty of knobs available in Trino to improve performance. When our customers implement Starburst Enterprise as their “query fabric”, it is important for them to look at their environments in a holistic manner. In order to meet strict SLAs and provide the ultimate user experience for their users, it’s important to consider the many different methods to meet these expectations using a variety of different storage layers available to Trino.

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

Query your data lake fast with Starburst's best-in-class MPP SQL query engine
Get up and running in less than 5 minutes
Easily deploy clusters in AWS, Azure and Google Cloud

For more deployment options:

Download Starburst Enterprise

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Blog

Resources

Pages

Documentation

How to Optimize Trino Performance with Different Storage Layers

Last Updated: March 25, 2024

Related posts

Use Case #1 – Low latency, real-time queries for custom application

Use Case #2 – Frequent querying of low-performing database system

Use Case #3 – Short term storage during ETL process

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free with
Starburst Galaxy

For more deployment options:

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Starburst Galaxy

Starburst Enterprise

By Use Cases

By Industry

Documentation

Connect

Education

Filter:

Blog

Resources

Pages

Documentation

How to Optimize Trino Performance with Different Storage Layers

Last Updated: March 25, 2024

Related posts

Starburst Enterprise LTS Backport Releases

Introducing New Data Observability Features in Starburst Galaxy – Now in Public Preview

Automating the “Icehouse” – Fully-managed Open Lakehouse Platform on Starburst Galaxy

What’s New in Starburst Galaxy – April 2024

Use Case #1 – Low latency, real-time queries for custom application

Use Case #2 – Frequent querying of low-performing database system

Use Case #3 – Short term storage during ETL process

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free withStarburst Galaxy

For more deployment options:

Start Free with
Starburst Galaxy