A query engine takes a request for data, translates it from human to machine language, and then fulfills the request by retrieving specific data. Query engines are the interface through which downstream users interact with databases, and are essential to complement the work of data scientists, BI analysts, and various decision-makers.
[CALLOUT] A query engine in action:
A user wants to scan data from a table in some database and filter out the results per their requirements. They tell the query engine what they want in simple, semantic terms. It then builds a plan to perform the scan, determines where and how to filter the results, and devises the order of operations to execute the query.
Similar to a Google search, a query engine turns a complicated process of information retrieval into a simple request. However, unlike Google, which pulls information from a paired down index, query engines go down to the databases themselves. That makes query engines vital tools for running analytics and uncovering insights.
More than just a data tool or yet another IT initiative, query engines are one of today’s most important business tools. The technical specifications — how the query engine understands queries and interacts with data — are important to understand, but the business impact is ultimately what matters. That depends on three features:
As the primary connection between end users and the data their decision-making processes require, the choice of a query engine matters. There are two critical features for modern query engines: a distributed architecture and SQL.
A distributed query engine uses a distributed architecture similar to massive parallel processing (MPP) databases. Trino is an example of a distributed query engine.
A coordinator/orchestrator then develops a plan to fulfill the query, and distributed “workers” actually execute it. This approach splits data across multiple nodes and leverages parallel data ingestion and processing rather than linear or chronological processing. The result is faster and more efficient query execution.
Most traditional databases and data warehouses utilize SQL query engines that follow a SQL standard for the query language. Not all databases follow the same type, though. Some SQL is “looser” than others, resulting in unpredictable variations that make it harder to correctly query the data source or get back the desired data.
A SQL query engine like Trino acts as a translation layer, translating SQL queries into an execution plan and distributing this plan among workers. A metadata store is used to hold tables, functions, etc. which the coordinator will use for interpreting the SQL.
Query engines bring a big data strategy to life by serving as the final piece of the data pipeline and facilitating access to abundant data across sources.
Query engines enable teams to:
A good query engine reduces the need to carefully organize, secure, and architect data, which becomes especially important as data volume and velocity both grow. A good query engine also allows you to query the data where it lives, reducing the amount of data pipelines and improving business agility.
Instead of asking someone else to provide data, users can run their own queries, including ad-hoc requests to explore whenever they may want. Meanwhile, the data team has more time and resources to focus on improving data engineering or building data products.
Query engines improve decision-making by connecting people with more data in less time and without the need for technical expertise. They further improve decision-making by translating data queries into a context that’s familiar and relevant for business purposes.
The right query engine is a solution to many problems facing data creators and data consumers alike. Therefore, the challenge is trying to orchestrate a data strategy without the aid of a query engine. And when that becomes untenable, the challenge shifts to evaluating, selecting, and implementing the right query engine.
Any query engine will be better than nothing. That said, query engines that limit access to data sources, duplicate or complicate data, fail at federated queries, or increase time to insight simply replace old challenges with new ones.
Trino, an open-source query engine first developed at Facebook, solves a common problem with data retrieval: having to go to the storage layer to query a database where it’s stored. Accessing data this way means the query engine has to move large data volumes through the pipeline, which is slow and error-prone.
Trino provides an elegant solution by removing the storage layer from the equation. The query engine sits on top of other query engines, intelligently distributing requests between them before integrating the results to reflect the original query. This query federation provides unlocks access to new mission critical insights for businesses.
Trino has the fastest speed in its class plus a community built around it to foster continual improvement — and Starburst is the commercial distribution of Trino. See how Starburst can simplify your information pipeline by scheduling a demo today.
Up to $500 in usage credits included