Introducing Enhancements to Starburst Galaxy’s Autoscaler

Starburst Galaxy's autoscaler now considers several new metrics for proactive and cost-effective autoscaling

May 14, 2024

Bo Myers

Senior Product Manager

Starburst

Tyler Shapiro

Product Marketing Manager

Starburst

Bo Myers

Senior Product Manager

Starburst

Tyler Shapiro

Product Marketing Manager

Starburst

More deployment options

Request Enterprise trial license key →

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

How Starburst and Dell Created Exactly What Enterprise AI Needs

Since its launch, Starburst Galaxy has provided customers with the automatic scaling of compute resources, ensuring optimal performance for varying workloads.

Today we launched several enhancements for cluster autoscaling in Starburst Galaxy, now available in private preview. Key metrics now factored into the autoscaling process include:

CPU Load
Estimated runtime
Queue length
Insights from completed queries

These improvements aim to address a wider range of workloads and provision resources more efficiently, resulting in faster query execution times without increasing costs. In this blog, we’ll delve deeper into these enhancements, and we’ll also review the results of a real-world customer test comparing the legacy and enhanced autoscalers.

A proactive approach to autoscaling

Previously, if CPU utilization reached or exceeded 60%, the cluster would scale up to its maximum number of allowed workers. However, because automatic resource scaling was triggered solely based on CPU utilization, workloads constrained by other factors didn’t receive additional resources as quickly as needed, if at all.

In the image below, you can visualize the new proactive autoscaling behavior in Galaxy. In the top graph, CPU usage peaks at around 40%. At this peak, you can observe in the bottom graph that additional workers are added without CPU consumption hitting the legacy 60% threshold.

Smarter resource allocation

When workloads require additional capacity but don’t receive it due to constraints on resources other than CPU or delays in activating the additional resources, customers encounter slow query response times or failures. To mitigate these issues, a common practice is to overcommit compute resources, leading to increased costs.

With the improved autoscaler, Galaxy estimates computation time based on a broader set of metrics, enabling it to cater to a wider range of workloads effectively. Additionally, the autoscaling decision is now made earlier and more quickly, typically within two minutes, compared to at least four minutes previously for large queries. These enhancements guarantee faster query execution and eliminate the need for manual adjustment of resources.

The results are in

In a recent customer test, we evaluated the performance of Starburst Galaxy’s enhanced autoscaler against the legacy autoscaler across various workload sizes. The results speak for themselves. For example, with schema sf10000, query execution times saw a reduction from 5.86 to 4.52 minutes, demonstrating significant improvements in performance. Scaling up to a larger workload with schema sf100000, the enhancements were even more pronounced, with times dropping from 24.35 to 13.80 minutes.

Getting started with autoscaling in Galaxy is as simple as creating your free account today.

The Data Engineers Guide to Iceberg v3

Introducing Enhancements to Starburst Galaxy’s Autoscaler

More deployment options

Start for Free with Starburst Galaxy

How Starburst and Dell Created Exactly What Enterprise AI Needs