The Starburst Enterprise 407-e LTS release provides Starburst customers with exciting new capabilities alongside more advanced connectivity, improved performance, and enhanced security. As always, this major release combines features that have been contributed back to the open source Trino project, as well as being curated for Starburst Enterprise customers.
There’s never been a better time for existing customers to upgrade their cluster, and new prospects to start their journey with Starburst.
To experience this latest release first hand, please visit our download site.
Starburst Warp Speed
Warp Speed (previously “Smart indexing and caching”) is now available as a GA feature! Warp Speed adds an indexing and caching layer for object storage catalogs to dramatically improve query performance on your data lake. This LTS release also adds REST endpoints for Warp Speed, and introduces the new index and cache resiliency feature as a public preview.
Warp Speed transparently adds an indexing and caching layer to catalogs using the Hive, Iceberg, or Delta Lake connectors. It automatically learns query patterns and identifies frequently accessed data to create optimal index and cache, while keeping infrequently accessed data where it is.
Warp Speed Improves data lake query performance up to 7x and reduces cloud compute costs up to 40%
Warp Speed helps customers take their data lake analytics to the next level, allowing them to move faster with critical decision-making while reducing data management costs. Data teams can better serve the business, providing the right data, right now, to the right people. Each query is executed in the optimal way to accommodate performance requirements using a combination of different acceleration technologies.
- Creates appropriate index types (bitmap, dictionary, tree) and tailors each one to every block
- Accelerates joins, filters, and searches
- Stored on SSD
- Proprietary SSD columnar block caching
- Optimal performance based on the frequency of data usage and its business priority
- Eliminates unnecessary table scanning
Warp Speed is a performance enhancement that goes beyond the engine itself. Warp Speed is a proprietary, patented technology. It offers a data lake analytics solution that autonomously accelerates query workloads without the need to move or model your data. The indexing breaks data into blocks, and automatically chooses the effective index to store each block. This speeds up queries significantly.
Warp Speed also detects which data to send to cache for even faster performance. It not only speeds queries, it also dynamically and autonomously updates cache based on analytical workload patterns and business priority.
The performance enhancements don’t stop with Warp Speed in the 407-e LTS release… there’s more!
Managed statistics (public preview)
The 407-e LTS release introduces the new managed statistics feature, which allows Starburst Enterprise to collect table and column statistics from select data sources that currently expose limited or no statistics. This feature allows queries to Oracle, PostgreSQL, and Teradata to take advantage of query optimizations that may dramatically improve query runtime and efficiency.
Managed statistics allow the cost-based optimizer to make smarter decisions for JOIN pushdown, reordering, and most stats-influenced optimizations, resulting in faster performance across numerous connectors.
The Starburst Enterprise REST API now includes endpoints that allow you to programmatically manage built-in access control for your Starburst Enterprise cluster. Nobody enjoys configuring policies “by hand” on any platform. No configuration required, using the built-in access control API, administrators can grant and deny privileges to roles through simple REST commands to expedite the onboarding process significantly. This allows customers to create/update/delete roles, policies, masks, and filters.
Built-in access control privileges has also been enhanced in this release with table properties and schema properties entity types. Rather than users being able to set table and schema properties as they saw fit, these new capabilities enable Grant /Deny privileges to specific table & schema properties users can utilize. For example, you can limit users to only create tables with ORC, not Parquet. This capability is also available with built-in access controls API.
Beyond our robust native security, Starburst is consistently invested in support for integrations with numerous other security and access protocols. In this release, we’ve expanded support for AWS Lake Formation access control with more granular control options, such as Glue user impersonation and selecting AWS roles per-catalog.
The implementation of Lake Formation data filters can allow customers to maintain a more granular access control over their data at a row, column, cell level. Starburst Enterprise Lake Formation support also now includes tag-based access control with Lake Formation tag support. Customers can map-to and impersonate different AWS users with tightly scoped permissions.
Often overlooked in our LTS releases, is the incremental capabilities added to our existing connectors themselves. Let’s be clear, the majority of Starburst connectivity is far more than merely reading data. The enhancements to connectors mean more support for SQL functions that improve performance and enable data transformations for more advanced analytics. The Starburst Enterprise 407-e LTS adds support for the notable following:
Expanded Fault-tolerant execution to support write operations on the MongoDB and BigQuery connectors, as well as exchange spooling in HDFS. These capabilities enable use cases on the lakehouse that include, building large rollup tables, preparing datasets for machine learning models, and wrangling data that feed into data applications. Starburst customers now have a super fast and easy-to-use solution for both interactive and longer-running data pipeline queries. Fault-tolerant execution runs data pipeline queries more intelligently by letting you reliably run much larger data pipeline queries, save costs by running non-latency-sensitive queries on much smaller clusters, and execute more queries concurrently.
New Starburst Cosmos DB connector (public preview) uses the API for NoSQL to read data stored in Azure Cosmos DB for NoSQL. Azure Cosmos DB is a fully managed platform-as-a-service with their NoSQL API at its core. Users of CosmosDB can create DBs of their choosing (NoSQL, MongoDB, Cassandra, Postgres, Table, etc) at their own discretion. The new connector supports reads from the NoSQL API.
Data lakes enable the implementation of a wide range of solutions, including raw data collection, flexible data access for users, and building fast and efficient data ware/lake-houses. From a data and analytics perspective, data lake solutions can act as a data staging ground, to transform raw data into a format for data analysis and reporting; and operate as something closer to a data warehouse with a built in query engine.
Open table formats like Apache Iceberg and Delta Lake allow users to interact with the data lake as easily as you would a database using SQL. Coupled with Hive, these open table formats allow for more analytics to be served out of the lake and reduce the need for data movement/migration which provides substantial cost savings.
The saved queries pane allows users to save recent query tabs for easy access at a later time. This pane contains two tabs; Recent which lists all query tabs run during the past seven days, and Saved Queries which lists query tabs run during the past seven days, any query tabs that have been saved. All this means is customers no longer need to worry about closing worksheets without saving, Starburst Enterprise takes care of that for you! Whether users are leveraging built-in access control, or Apache Ranger, or any other access control, they are set for saving and reopening their query worksheets.
To further improve our user experience, gathering troubleshooting information just got a whole lot easier. Admins can now leverage the query editor to ‘Run and troubleshoot’ to the query run options that runs a query and downloads an archive that contains diagnostics files. Starburst prides itself on its customer support and service. The ‘Run and troubleshot’ run options makes it easier for customers to capture and package query metadata, query plans, stacktrace, and more to help us work with them to troubleshoot, improve performance, drive adoption, and increase customer satisfaction.
These are just some of the highlights that have made it into this quarterly LTS release. For a complete list of features and changes, read the 407-e LTS release notes.
If you’re interested in hearing more, please register for our Starburst Warp Speed: Setting A New Standard of Data Lake Analytics webinar on March 28th.
Register for our webinar
The Starburst Warp Speed: Setting A New Standard of Data Lake Analytics webinar is on March 28th.