
Starburst is built on Trino for one key reason—it delivers unparalleled query performance. This focus was there at the beginning of the Trino project, and it continues to drive it forward. The latest chapter in Trino’s performance story is the Trino spooling protocol, which enables parallelizing the retrieval of large result sets. This shift pushes Trino performance to a whole new level. In a real-world example, that enhancement dramatically improved performance with Starburst’s ODBC driver.
But Trino isn’t just about performance. It’s also about choice, particularly choice around access to data in different data sources.
Continuing the theme of optionality and choice
At Starburst, we call this choice optionality, and it’s one of the founding principles of our mission. In the context of the spooling protocol, choice is also a game-changer. Optionality around data access and usage extends to the configuration of the spooling protocol. Multiple options exist to address simplicity and/or compliance for networking, security, performance, and scalability concerns.
Let’s jump in and see how this works.
Spooling protocol overview
First, let’s review the spooling protocol to make sure we’re all on the same page. The spooling protocol is a new, enhanced method for extracting data from a Starburst cluster. Rather than streaming results over traditional ODBC/JDBC connections, it persists query output and allows client tools to download those data segments directly. This approach significantly improves the performance of large result set extractions and decouples query completion from client-side result retrieval. It also reduces contention on the Trino coordinator, since large queries no longer need to be streamed through it.
With the spooling protocol, Trino workers write encrypted result segments directly to object storage and send references for those objects to the coordinator. The coordinator then returns those references to clients as pre-signed S3 URIs. Clients can then access either directly from S3 or via Starburst cluster nodes acting as a network proxy, depending on the selected retrieval mode.
Want more architectural information? Check out this visualization of a typical query from start to finish, and compare it with the traditional direct protocol approach.
The spooling protocol, built by Starburst engineers, is an example of the commitment to the open-source project which is the foundation of our engine. All Trino users can take advantage of this and Starburst engineers and consultants have expertise in its configuration and use.
Client support
Once enabled at the cluster level, the spooling protocol becomes the default when the connecting client supports it. If a client does not support spooling, the cluster automatically defaults back to the traditional direct protocol. This negotiation occurs transparently at the start of each user session.
The following are the supported clients for the spooling protocol.
To reiterate, older clients can also make requests from clusters using the spooling protocol. In this case, the cluster uses the direct protocol in those scenarios. Additionally, the spooling protocol can be used in multiple ways by Starburst users. Starburst Galaxy includes the spooling protocol by default, while Starburst Enterprise administrations can leverage the configuration documentation to enable it.
The full set of configuration properties are available here.
Client retrieval mode options
The focus of this article is the protocol.spooling.retrieval-mode property options. These options govern how clients fetch each segment and which component bears the input/output (I/O) burden. The following descriptions can initially help you pick the mode that best fits your network security, performance target, and operational risk.
Storage (the default)
STORAGE – Client downloads each segment directly from object storage using a pre-signed URI; 1 client HTTP request per segment. Fastest solution with the smallest load on the coordinator. This option requires client-to-storage network access and appropriate TLS trust for the storage endpoint. This is the simplest and most straightforward option. For this reason, it is the default setting.

Coordinator storage redirect
COORDINATOR_STORAGE_REDIRECT – Client first contacts the coordinator to obtain a pre‑signed URI, then downloads from storage; 2 client requests per segment. Slightly slower than STORAGE due to the extra hop, but maintains the same data path (client→storage). Useful when you need very short‑lived, just‑in‑time pre‑signed URLs minted by the coordinator.

This option differs from STORAGE, as each segment includes a URI to the coordinator, which then generates a pre-signed URI. The coordinator-issued URIs have shorter TTLs than the underlying storage objects, which can be beneficial for security-sensitive environments.
Perfect for when you want to ensure the pre-signed URI has a shorter TTL than the object itself in storage.
Coordinator proxy
COORDINATOR_PROXY – The client downloads segments through the coordinator, which fetches from storage and streams to the client. Simplifies client trust and avoids distributing storage CA bundles, but adds I/O load on the coordinator. This option can bottleneck at scale due to the network throughput available to the coordinator and HTTP concurrency limits on the node.

As shown in the sequence diagram above, the coordinator acts as a Layer 7 (application) proxy, creating an additional I/O burden on the coordinator. The coordinator simply relays bytes; it performs no CPU-intensive work since encoding of the data to the final format is handled by storage.
This approach can be suitable for small-scale use, but it represents a problem when scaled.
Worker proxy
WORKER_PROXY – Client is redirected to a worker that retrieves from storage and relays data to the client. Distributes I/O across workers for better scalability, but requires client network access to the worker nodes. This option is often not feasible in Kubernetes deployments where workers are not exposed.

Much like COORDINATOR_PROXY, this option provides an application proxy to object storage when clients cannot access it directly. Obviously, it scales better with more workers than with a single coordinator. Adding workers to the cluster during retrieval can also aid in scalability.
With this HTTP request, the worker acts as a simple proxy. It doesn’t process a segment; it reads bytes from storage and returns them to the client. It simply passes bytes without any transformation.
Ideal for higher-scale use cases than COORDINATOR_PROXY can satisfy, but requires network access to the worker nodes.
Topology-driven recommendations
The option descriptions and general recommendations above might be enough to help you make the best selection for your cluster configurations.
As we have been using spooling protocol with Starburst customers, the following recommendations have emerged that may help in your decision-making process.
Clients can reach object storage directly
This is the brunt of the typical cloud deployments in use today.
- Prefer STORAGE for the lowest coordinator/worker load and the highest throughput.
- Consider COORDINATOR_STORAGE_REDIRECT if you want shorter‑TTL URLs minted by the coordinator (at the cost of an extra hop per segment).
- Keep fs.segment.direct.ttl aligned with expected client download windows; increase if very large results are pulled over slower links.
- Ensure fs.location is unique to this cluster’s spooling use and not shared with catalogs or other clusters.
Clients cannot reach storage
Some environments have purposefully constrictive scenarios, such as security zones or on-prem S3-compatible storage that requires client CA distribution.
- Use COORDINATOR_PROXY when clients cannot directly reach storage or when avoiding the distribution of storage CA bundles to all clients. Monitor the coordinator’s network bandwidth saturation and HTTP request limits to scale them up if needed.
- Use WORKER_PROXY if clients can reach workers; this distributes network load and scales better than COORDINATOR_PROXY. This does require network exposure of workers, which is typically restricted in the case of Starburst being deployed on Kubernetes.
- For on‑premises S3 deployments, where CA bundles of S3 differ from CA bundles for Starburst, or differ by environment and can’t be merged into one truststore on the client’s side, proxy modes allow you to avoid breaking changes for the user. This is a solid reason for choosing COORDINATOR_PROXY or WORKER_PROXY over STORAGE.
Kubernetes environment considerations
Often, deploying with Kubernetes (k8s) is the best solution, and the following should be taken into consideration.
- Default to STORAGE / COORDINATOR_STORAGE_REDIRECT when possible
- Since WORKER_PROXY often isn’t feasible because workers are not routable from client networks, utilize COORDINATOR_PROXY when you need application-layer proxying.
Options, because one size does NOT fit all
The spooling protocol’s ability to parallelize the retrieval of significant result sets is a game-changer. With this approach, clients (inherently) are playing a more active role in pulling in their data. Given the level of optionality in Starburst deployments, multiple retrieval modes are available.
This article presented details on these options, along with real-world recommendations for their use. We would love to hear your feedback and questions on the Starburst Forum – and don’t hesitate to share your success stories in the Showcase category.



