What’s New in Starburst Galaxy – re:Invent 2023 Edition

Announcing new features that help you build and scale interactive data applications directly on the data lake

Last Updated: December 1, 2023

At AWS re:Invent 2023, we announced several features that help simplify and accelerate development on the data lake. In this post, we will look at streaming ingest, automatic data classification, automatic data maintenance, secure data sharing, and natural language processing (NLP) in Starburst Galaxy. 

The data lake analytics platform

As the amount of data processed for application analytics continues to grow, more and more developers are turning to the data lake as a scalable and cost-efficient solution. However, building, governing, maintaining, and scaling a data lake requires specialized expertise and supporting technologies to get it right. For instance, a single use case could require an ingestion process, object storage, a compute engine, a governance tool, and a data catalog. And most data lakes support more than one use case.

Piecing these different tools together creates a complex and brittle data lake architecture that is not feasible for many data teams. 

Starburst Galaxy was built to address these challenges by providing an all-in-one, open data lake analytics platform,  allowing you to remove the burden of learning, integrating, and maintaining separate systems, all while retaining ownership of your data. 

Streaming ingest

Streaming ingest in Starburst Galaxy enables you to continuously ingest data from Kafka into your data lake in near real-time, ensuring your data is ready for analysis within minutes of initial collection. This is especially important for latency-sensitive use cases that require fast ingestion and processing to accelerate anomaly detection and decision making. 

With streaming ingest in Starburst Galaxy, engineers no longer need to write expensive custom code and stitch together commercial and open-source tools to land streaming data in their lake. Streaming ingest also automatically transforms and partitions the data into Iceberg tables, enhancing query performance, enforcing flexible schema evolution, and ensuring transactional consistency.

Learn more.

ABAC with AI-powered data classification

Once new data lands in the lake, you need to be able to quickly identify, secure, and govern that data. With the newly GAed attribute based access controls in Gravity (Starburst Galaxy’s universal governance layer), you can easily govern your data down to the row and column level by using tags. However, tagging is oftentimes a tedious and manual process.

Automatic data classification in Galaxy aims to remove that burden by proactively suggesting relevant tags from 20+ classifications that administrators can choose to accept or deny. This automation is particularly useful for teams handling sensitive data like personally identifiable information (PII). Now, as soon as PII lands in the lake, Galaxy will be smart enough to identify and restrict access to that data.

Learn more.

Automatic data optimization

Modern table formats like Apache Iceberg have made the aspiration of data warehouse-like performance within a data lake an exciting reality. The Iceberg table format natively supports a series of maintenance operations for efficient storage and fast query performance.

The new data optimization features in Starburst Galaxy allow you to configure and execute these maintenance tasks either on demand or on a schedule, leveraging the Iceberg API under-the-hood.

Learn more.

Universal data sharing

Data teams more and more often are faced with the daunting task of integrating third-party data into their analytics. This process is incredibly complex and time consuming due to the technical limitations and security concerns that exist. 

With Starburst Galaxy, you can easily package data sets into shareable data products to power end-user applications, regardless of source, format, or cloud provider. New functionality now allows users to securely share these high-quality data products with third-parties, such as partners, suppliers, or customers.

Learn more.

Self-service analytics powered by AI

Not only are data lakes notoriously hard to manage, but the majority of data teams are understaffed. New AI-powered experiences in Galaxy, like text-to-SQL processing, will enable data teams to offload basic exploratory analytics to business users, freeing up their time to build and scale data pipelines.

Learn more.

Why wait?

With a host of new features designed to make it easier to build, manage, and scale a data lake, Starburst Galaxy is the perfect choice for companies looking to power data-intensive applications at a fraction of the cost of a typical warehouse model. 

Whether you’d like guidance on getting started with Starburst Galaxy, would like to join one of our private preview programs, or just want to get up and running fast, we have a path for you.

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.