
Build an open data lake architecture with dbt Cloud and Starburst Galaxy

Bo Myers
Senior Product Manager
Starburst
Guy Mast
Product Manager
Starburst
Bo Myers
Senior Product Manager
Starburst
Guy Mast
Product Manager
Starburst
Starburst Galaxy offers convenience, power, flexibility, and scale. This software as a service (SaaS) cloud offering allows users to get started querying a variety of workloads in minutes. It connects to over 50 data sources, providing a single, universal point of access for data analytics, data applications, and AI workloads, all while utilizing an open data lakehouse powered by the Icehouse architecture of Trino and Iceberg. Recently, we’ve introduced numerous new features in Starburst Galaxy that enable you to tailor your cluster’s performance to the specific needs of your workload. In this article, we will walk you through the five things to consider when setting up your cluster for the first time:
When configuring your cluster, you have the option to choose between three execution modes: Standard, Fault Tolerant, and Accelerated. One of the benefits of Galaxy is the ability to tailor clusters to specific workloads in terms of size, access controls, and execution modes.
The three execution modes in Starburst Galaxy are:
Warp Speed is a good fit for interactive workloads that focus on querying the data lake, especially when querying 500M rows or more per table, or for data lake queries that take a few seconds or longer in incumbent architectures.
While, on average, you can expect a 4x improvement in query latency and a 7x improvement in CPU time, the level of improvement you will see depends on the data filtering in the queries. The more data that is being filtered, the more Warp Speed will accelerate those queries. Examples of optimal data sets for Warp Speed include the following:
On the other hand, queries that leverage full table scans with no filters applied will not see the performance benefits of Warp Speed. However, Warp Speed will not negatively impact your query’s performance.
Note: At the time of this blog, Warp Speed is generally available on AWS, in private preview on GCP, and with Azure regions coming soon.
Fault Tolerant Execution (FTE) mode means that a cluster will retry queries or parts of a query in the event of a failure without having to restart the entire query from the beginning. Intermediate exchange data is spooled and can be reused by another worker in the same cluster.
This is especially useful when queries require more memory than is currently available in the cluster. With FTE, those queries are still able to succeed. Multiple queries can share resources fairly and make steady progress.
In addition to the core fault-tolerant architecture found in OS Trino, Galaxy FTE clusters contain further enhancements that enable them to scale to support queries of up to 60 TB in size (whereas OS Trino is limited to ~3TB).
It is important to note that query processing in fault-tolerant execution mode can be slightly slower than normal operation. To address this condition, FTE will adaptively determine the optimal number of partitions to ensure that only the largest and most complex queries use the maximum of 1000 partitions. This adaptive partitioning functionality helps to preserve the overall efficiency of the engine. However, it is not recommended to use fault-tolerant execution mode if the majority of your queries are short-running.
The size of a Starburst Galaxy cluster determines the number of server nodes, including one coordinator and many workers, used to process queries. A larger cluster, comprising more nodes, is capable of processing more complex queries, handling a greater number of concurrent users, and delivering higher performance by utilizing additional resources.
Available cluster sizes include free, X-Small, Small, Medium, Large, X-Large, and 2X-Large. Best practice is to choose a cluster size based on initial needs and then resize your cluster as needs change. Memory failures or slow query processing are typical signs that it’s time to size up to a larger cluster.
As you get comfortable with the needs of your individual workloads, you can choose to create custom cluster sizes with autoscaling by setting the minimum and maximum number of workers to scale between.
When you create custom clusters with different max and min values, the clusters will automatically scale up to the maximum number of workers for the configured cluster size when the combined CPU usage of all workers exceeds 60%. Autoscaling adds one or more workers to get the combined CPU usage of all workers below 60%.
The auto scaling process takes approximately four minutes to make the first adjustment. If the CPU usage continues to climb and exceeds 60%, the process repeats until the maximum number of workers is reached. The auto scaling process takes approximately 15 minutes to make the first adjustment.
Inversely, clusters automatically scale down to the minimum number of workers when the combined CPU usage of all workers drops below 60%. Autoscaling removes one or more workers until CPU usage approaches 60%.
Note: At the time of this blog, auto scaling is not supported with accelerated clusters.
In Starburst Galaxy, you can easily configure your cluster to suspend after a specified amount of time has passed while idle. A running cluster is classified as idle when no queries are submitted and all query processing has been completed.
A suspended cluster comprises a small configuration set and a mechanism for listening to incoming user requests. It does not include any actively running server nodes, and no costs are incurred. Therefore, this is an ideal feature for optimizing your cluster costs.
Available idle shutdown times include 1 minute, 5 minutes, 15 minutes, 30 minutes, and 1 hour. You can also configure your cluster to never suspend if you don’t want to wait for your cluster to warm up. The average warm-up time is ~5 minutes, so it is essential to consider the cost and performance trade-offs of the different idle shutdown times.
Finally, Starburst Galaxy enables administrators to prevent an idle cluster from auto-suspending by creating a cluster schedule. Your cluster must be in a running or suspended status to be affected by scheduling. A cluster that is stopped must be started manually and is therefore not impacted by any defined scheduling.
Cluster scheduling timeframes override idle shutdown times to ensure your cluster remains operational. Multiple days of the week and multiple time intervals per day may be configured to keep the cluster always on, ensuring you avoid the warm-up time.
It is also important to note that once the cluster schedule has been created, it can be applied to one or multiple clusters. This is especially useful for customers who want to keep their clusters running during business hours while saving costs overnight.
This site uses cookies for performance, analytics, personalization and advertising purposes. For more information about how we use cookies please see our Cookie Policy.
These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages.
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites.