Data teams are increasingly challenged with ensuring reports and dashboards to remain responsive as business users request larger and larger datasets. The data lake architecture was designed to deliver flexibility, agility and quick time-to-market. But, to truly achieve performance and cost requirements in BI tools, data teams are challenged by heavy dataops and ineffective queries.
That’s why we are excited to announce that Warp Speed is now available in public preview in Starburst Galaxy. Warp Speed is a smart indexing and caching solution that autonomously accelerates your interactive workloads by an average of 40%.
What is Warp Speed?
Warp Speed leverages patented indexing technology that autonomously identifies and caches the most used or most relevant data based on usage patterns, while the rest of the data remains close to the source, optimizing data lake performance.
This autonomous acceleration eliminates the manual burden of selecting what data in the data lake to optimize and cache. Data engineers don’t need to worry about efficient partitioning strategies. It’s all handled for them. The autonomous nature of this solution is particularly useful for teams who don’t have the time or expertise to spend on query optimization strategies.
“We are trying to leverage Starburst not only with engineers but also with program managers and business analysts. Partition keys are a foreign concept for our business folks. We are excited about Warp Speed because that implementation detail would be abstracted away from the end user,” said Congrui Ji, Head of Data at FalconX.
Warp Speed Performance
To highlight the performance benefits of Warp Speed, we ran query #96 from TPCDS scale factor 1000 on Iceberg tables:
The result was a 5x performance improvement and a 10x reduction in CPU time when compared to Galaxy’s Standard execution mode cluster.
However, we know benchmark performance is oftentimes misleading, which is why we are excited to share that early customer testing is revealing an average of 40% improvement in interactive workloads.
How does Warp Speed work?
Check out the below query for an example of how Warp Speed analyzes queries in real-time and automatically creates appropriate index and cache elements to accelerate performance:
As you can see, indexes are automatically created based on query patterns, minimizing the execution of full table scans. This is especially useful for workloads that have an extensive set of filtering predicates or selective filters in queries that return a relatively small number of rows.
Under the hood, Warp Speed manages index and cache elements on the cluster’s SSDs in a proprietary format. When you enable Warp Speed, the most resource-intensive query operator – ScanFilterProject – is offloaded to Warp Speed. This significantly reduces the resources consumed by the operator, and the query as a whole, leading to faster query execution times.
When to use Warp Speed
The performance improvements of Warp Speed provide two benefits to data teams – teams can either speed up their interactive workloads to meet dashboard SLAs or teams can query more of their data lake at the same latency of their current workloads.
Warp Speed is most beneficial in use cases involving multi-dimensional data that needs to be filtered by many dimensions. A couple key examples are as follows:
- Fraud /Anomaly detection
- IoT/ Telemetry data
- Geospatial analytics
- Clickstream and Customer 360 analytics
- Logs analysis/ Cyber Security
How to get started with Warp Speed in Galaxy
It’s super easy to get started with Warp Speed. Simply select “accelerated” as the execution mode in the cluster creation dialog. Once you start running queries, Warp Speed will begin creating index and caching elements for you
To understand how Warp Speed impacted your workload, navigate to the query details page. You can view index and cache usage stats to better understand Warp Speed’s contribution in the execution process.
As this feature is in public preview, support is limited to AWS S3 and Tabular catalogs.
New to Starburst? Get started today by signing up for Starburst Galaxy.
Try Starburst Galaxy today
The analytics platform for your data lake.