Get the Trino GuideStarburst Galaxy is the easiest and fastest way to get started with Trino
Last updated: June 1, 2023, Published: December 2, 2020
Trino is an open source distributed SQL engine for running fast analytic queries against various data sources ranging in size from gigabytes to petabytes. Trino was designed and built from scratch for interactive analytics. It approaches the speed of commercial data warehouses while scaling to the size of very large organizations.
In Fall 2012 a small team of four engineers at Facebook started working on Presto. By Spring 2013, the first version was successfully rolled out within Facebook. Later that year, Facebook open sourced Presto under the Apache License. In 2018 Martin, Dain, and David left Facebook to pursue building an open source community full-time, under the new name PrestoSQL.
In December 2020, PrestoSQL was rebranded as Trino. Trino (formerly PrestoSQL) brings the value of Trino to a broad array of companies in varying stages of cloud adoption who need faster access to all of their data. Companies like LinkedIn, Lyft, Netflix, GrubHub, Slack, Comcast, FINRA, Condé Nast, Nordstrom and thousands of others use Trino today.
In 2012, Martin Traverso, David Philips, Dain Sundstrom and Eric Hwang at Facebook started the development of Presto to address performance, scalability and extensibility needs for analytics at Facebook. Before Presto existed at Facebook, all data analysis relied on Hive, which was not suitable for interactive queries at Facebook’s scale. Facebook’s Hive data warehouse was 250 petabytes in size and needed to handle hundreds of users issuing tens of thousands of queries each day. Hive started to hit its limit and did not provide the ability to query other data sources within Facebook.
Presto was designed from the ground up to run fast queries at scale. Instead of creating a new system to move data into, Presto was designed to read the data from where it is stored via its pluggable connector system. In 2013, the initial version of Presto was rolled out in production at Facebook and, by the fall of the same year, Presto was officially open sourced by Facebook. After seeing its success at Facebook, Presto was adopted by other large web-scale companies like Netflix, Linkedin, Treasure Data and more.
In 2015, Teradata announced a large commitment of 20 engineers contributing to open source Presto and focused on adding enterprise features like security enhancements and ecosystem tool integration. In the same year, Amazon added Presto to its AWS Elastic MapReduce (EMR) offering. By 2016, Amazon announced Athena, in which Presto serves as a major foundational piece. Finally in 2017, Starburst was founded. Starburst was founded to drive the success and adoption of Presto everywhere.
At the end of 2018, the original creators of Presto left Facebook and founded the Presto Software Foundation to ensure the project remains collaborative and independent. The project became known as PrestoSQL. The community of contributors and users moved to the PrestoSQL codebase with the founders and maintainers of the project. Since then, the innovation and growth of the project has accelerated even further.
Towards the end of 2020, PrestoSQL was renamed to Trino in order to reduce confusion between PrestoSQL, the legacy PrestoDB project and other versions. The foundation was renamed to the Trino Software Foundation. The last release under the PrestoSQL name was 350.
Today, the original creators of Presto, Martin Traverso, Dain Sundstrom, David Philips and Eric Hwang, are core members of Starburst’s team and are driving the development of the open source Trino project. The project is maintained by a flourishing community of developers and contributors from many companies including Amazon, Bloomberg, Eventbrite, Gett, Google, Line, Linkedin, Lyft, Netflix, Pinterest, Red Hat, Salesforce, Shopify, Starburst, Treasure Data, Varada, Zuora and many more.
Related reading: The differences between PrestoDB, PrestoSQL, and Trino.
Since Trino is being called a database by many members of the community, it makes sense to define what Trino is not. Do not mistake the fact that Trino understands SQL with it providing the features of a standard database. Trino is not a general-purpose relational database and is not a replacement for databases like MySQL, PostgreSQL, or Oracle. Moreover, Trino was not designed to handle Online Transaction Processing (OLTP), similar to other databases designed and optimized for data warehousing or analytics.
A data lake is a single store of data that can include structured data from relational databases, semi-structured data and unstructured data. It can include raw copies of data from source systems, sensor data, social data and more. The structure of the data is not typically defined when the data is captured. Data is typically dumped into a data lake without much thought about accessing it.
Trino has become the choice for querying the data lake due to its high performance at scale. Unlike other options available today, Trino’s concurrency is limited only by the size of your cluster which can be scaled up and down as required. Trino also has connectors to the most popular data sources allowing for data federation across multiple data sources, providing the user with a holistic view of their entire data ecosystem. These connectors allow you query the data where it resides, shortening the data pipeline for your organization.
To learn more about Trino, check out our Trino FAQ page.
Trino is a distributed system that runs on one or more machines to form a cluster. An installation will include one Trino Coordinator and any number of Trino Workers. The Trino Coordinator is the machine to which users submit their queries. The Coordinator is responsible for parsing, planning, and scheduling query execution across the Trino Workers. Adding more Trino Workers allows for more parallelism and faster query processing.
Discover how well the Trino distributed SQL engine performs on different platforms, under different workloads, and against various alternatives.
These test results can help you make informed decisions on whether Trino is a good fit for your project, and how to configure a Trino deployment to handle different size workloads.
Starburst developers ran the TPC-DS benchmark on Starburst Enterprise vs. AWS EMR Presto.
Starburst ran 12x faster at 1/7th the cost.
Starburst Enterprise vs. AWS EMR performance benchmark
Concurrency Labs compared Starburst Enterprise and Redshift, using the TPC-H benchmark.
For comparable performance, the monthly cost of Starburst Enterprise was 45% less expensive.
Presto vs. Redshift performance benchmark
Starburst developers ran the TPC-DS benchmark on Starburst Enterprise (with cost-based query optimization) vs. the “vanilla” Presto 195.
Starburst Enterprise ran TPC-DS queries up to 13x faster.
Related reading: Cost-based query optimization white paper
Up to $500 in usage credits included