5 considerations when configuring a cluster in Starburst Galaxy

September 21, 2023

Bo Myers
Senior Product Manager
Starburst
Guy Mast
Product Manager
Starburst

Bo Myers
Senior Product Manager
Starburst
Guy Mast
Product Manager
Starburst

More deployment options

Request Enterprise trial license key →

Starburst Galaxy offers convenience, power, flexibility, and scale. This software as a service (SaaS) cloud offering allows users to get started querying a variety of workloads in minutes. It connects to over 50 data sources, providing a single, universal point of access for data analytics, data applications, and AI workloads, all while utilizing an open data lakehouse powered by the Icehouse architecture of Trino and Iceberg. Recently, we’ve introduced numerous new features in Starburst Galaxy that enable you to tailor your cluster’s performance to the specific needs of your workload. In this article, we will walk you through the five things to consider when setting up your cluster for the first time:

Cluster execution modes
Cluster sizing
Autoscaling
Auto suspend
Cluster scheduling

Choosing a cluster execution mode

When configuring your cluster, you have the option to choose between three execution modes: Standard, Fault Tolerant, and Accelerated. One of the benefits of Galaxy is the ability to tailor clusters to specific workloads in terms of size, access controls, and execution modes.

The three execution modes in Starburst Galaxy are:

Warp Speed: Warp Speed is the technology behind our “accelerated” cluster mode. This is the execution mode we recommend for interactive analytics – analytics against user-facing applications or dashboards. Warp Speed utilizes smart indexing and caching to increase the performance of these workloads automatically.
Fault-tolerant: Fault-tolerance is our enhanced version of Trino’s fault-tolerant execution (FTE) mode. At the core, Fault Tolerant Execution (FTE) mode enables the execution of complex, long-running, and memory-intensive queries, where reliability is critical. This mode is ideal for write operations and transformation (ELT/ETL) workloads.
Standard: This is the most basic execution mode, and it is enabled for free clusters. For production use cases, we recommend defaulting to Warp Speed (accelerated clusters) unless you require auto-scaling, FTE, or are operating in regions where Warp Speed is not yet available.

Detailed Warp Speed considerations

Warp Speed is a good fit for interactive workloads that focus on querying the data lake, especially when querying 500M rows or more per table, or for data lake queries that take a few seconds or longer in incumbent architectures.

While, on average, you can expect a 4x improvement in query latency and a 7x improvement in CPU time, the level of improvement you will see depends on the data filtering in the queries. The more data that is being filtered, the more Warp Speed will accelerate those queries. Examples of optimal data sets for Warp Speed include the following:

Fraud and anomaly detection
IoT and telemetry data
Geospatial analytics
Clickstream data (e.g. Customer 360 analytics)
Log analysis (e.g. cybersecurity logs)

On the other hand, queries that leverage full table scans with no filters applied will not see the performance benefits of Warp Speed. However, Warp Speed will not negatively impact your query’s performance.

Note: At the time of this blog, Warp Speed is generally available on AWS, in private preview on GCP, and with Azure regions coming soon.

Detailed fault-tolerant considerations

Fault Tolerant Execution (FTE) mode means that a cluster will retry queries or parts of a query in the event of a failure without having to restart the entire query from the beginning. Intermediate exchange data is spooled and can be reused by another worker in the same cluster.

This is especially useful when queries require more memory than is currently available in the cluster. With FTE, those queries are still able to succeed. Multiple queries can share resources fairly and make steady progress.

In addition to the core fault-tolerant architecture found in OS Trino, Galaxy FTE clusters contain further enhancements that enable them to scale to support queries of up to 60 TB in size (whereas OS Trino is limited to ~3TB).

It is important to note that query processing in fault-tolerant execution mode can be slightly slower than normal operation. To address this condition, FTE will adaptively determine the optimal number of partitions to ensure that only the largest and most complex queries use the maximum of 1000 partitions. This adaptive partitioning functionality helps to preserve the overall efficiency of the engine. However, it is not recommended to use fault-tolerant execution mode if the majority of your queries are short-running.

Selecting a cluster size

The size of a Starburst Galaxy cluster determines the number of server nodes, including one coordinator and many workers, used to process queries. A larger cluster, comprising more nodes, is capable of processing more complex queries, handling a greater number of concurrent users, and delivering higher performance by utilizing additional resources.

Available cluster sizes include free, X-Small, Small, Medium, Large, X-Large, and 2X-Large. Best practice is to choose a cluster size based on initial needs and then resize your cluster as needs change. Memory failures or slow query processing are typical signs that it’s time to size up to a larger cluster.

Starburst cluster autoscaling

As you get comfortable with the needs of your individual workloads, you can choose to create custom cluster sizes with autoscaling by setting the minimum and maximum number of workers to scale between.

When you create custom clusters with different max and min values, the clusters will automatically scale up to the maximum number of workers for the configured cluster size when the combined CPU usage of all workers exceeds 60%. Autoscaling adds one or more workers to get the combined CPU usage of all workers below 60%.

The auto scaling process takes approximately four minutes to make the first adjustment. If the CPU usage continues to climb and exceeds 60%, the process repeats until the maximum number of workers is reached. The auto scaling process takes approximately 15 minutes to make the first adjustment.

Inversely, clusters automatically scale down to the minimum number of workers when the combined CPU usage of all workers drops below 60%. Autoscaling removes one or more workers until CPU usage approaches 60%.

Note: At the time of this blog, auto scaling is not supported with accelerated clusters.

Auto suspend (or idle shutdown)

In Starburst Galaxy, you can easily configure your cluster to suspend after a specified amount of time has passed while idle. A running cluster is classified as idle when no queries are submitted and all query processing has been completed.

A suspended cluster comprises a small configuration set and a mechanism for listening to incoming user requests. It does not include any actively running server nodes, and no costs are incurred. Therefore, this is an ideal feature for optimizing your cluster costs.

Available idle shutdown times include 1 minute, 5 minutes, 15 minutes, 30 minutes, and 1 hour. You can also configure your cluster to never suspend if you don’t want to wait for your cluster to warm up. The average warm-up time is ~5 minutes, so it is essential to consider the cost and performance trade-offs of the different idle shutdown times.

Starburst cluster scheduling

Finally, Starburst Galaxy enables administrators to prevent an idle cluster from auto-suspending by creating a cluster schedule. Your cluster must be in a running or suspended status to be affected by scheduling. A cluster that is stopped must be started manually and is therefore not impacted by any defined scheduling.

Cluster scheduling timeframes override idle shutdown times to ensure your cluster remains operational. Multiple days of the week and multiple time intervals per day may be configured to keep the cluster always on, ensuring you avoid the warm-up time.

It is also important to note that once the cluster schedule has been created, it can be applied to one or multiple clusters. This is especially useful for customers who want to keep their clusters running during business hours while saving costs overnight.

5 considerations when configuring a cluster in Starburst Galaxy

More deployment options

Choosing a cluster execution mode

Detailed Warp Speed considerations

Detailed fault-tolerant considerations

Selecting a cluster size

Starburst cluster autoscaling

Auto suspend (or idle shutdown)

Starburst cluster scheduling

Build an open data lake architecture with dbt Cloud and Starburst Galaxy

4 data product mistakes to avoid

Build and run scalable transformation pipelines using dbt Cloud and Starburst

Cookie Notice

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Analytical/Performance Cookies

Functional/Preference Cookies

Targeting/Advertising Cookies

Starburst’s mission is to free our customers to see the invisible and achieve the impossible

5 considerations when configuring a cluster in Starburst Galaxy

More deployment options

Choosing a cluster execution mode

Detailed Warp Speed considerations

Detailed fault-tolerant considerations

Selecting a cluster size

Starburst cluster autoscaling

Auto suspend (or idle shutdown)

Starburst cluster scheduling

Build an open data lake architecture with dbt Cloud and Starburst Galaxy

4 data product mistakes to avoid

Build and run scalable transformation pipelines using dbt Cloud and Starburst

Cookie Notice

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Analytical/Performance Cookies

Functional/Preference Cookies

Targeting/Advertising Cookies