Introducing AI-powered data lake analytics in Starburst Galaxy

Leveraging generative AI for text-to-SQL and SQL-to-text workflows

Last Updated: January 26, 2024

Generative AI took the world by storm in 2023 and there has been a tremendous amount of hype around the possibilities it brings to businesses and consumers.  The pace at which products in this space are appearing is staggering.

At Starburst, we are in the nascent stages of our AI strategy, but we are embarking on our journey as the enterprise Trino company, focusing on what we do best: enabling you to utilize Trino across your organization at scale.

Today, we are opening up the power of Trino to business users with the introduction of two new AI-powered features in Starburst Galaxy:

  • SQL statement generation from business questions (text-to-SQL)
  • Query explanations  (SQL-to-text)

These will be available for demo at re:Invent and in public preview this week.

Getting more out of your catalog with text-to-SQL

When data sources are connected to Galaxy, there’s a potential for an explosion of data available to data consumers. Data literacy becomes a major issue and while data products can serve as a critical element to aid in literacy, consumption can still be a challenge unless use case examples are well documented. This is equally as true with the rest of your catalog.

The new text-to-SQL functionality allows users to ask natural language questions of their data within a schema or table with the result being a SQL statement that is intended to answer the question. As part of the prompting process, we provide additional metadata about the source to ground the LLM to help provide more pertinent results.

This is an experimental feature and foundational model limitations such as hallucinations will exist. Active research is underway on other approaches that will enable you to safely leverage additional business and technical context to generate more accurate results.

Improving business continuity and documentation with SQL-to-text

SQL-to-text is also being introduced as a complementary feature to text-to-SQL.  This enables Galaxy to not only explain the query that was generated and executed, it allows you to provide additional context and dig deeper into the questions you may have.

As a chatbot, SQL-to-text can be leveraged to generate any type of output you desire from a simple technical or domain-specific answer to a question, to summary or comprehensive documentation that can then be used with your data assets or data products.

This also means business continuity can be maintained – critical but undocumented transformations can quickly be explained, and with added context provided by the user during the Q&A process, businesses can more effectively derisk staff turnover

How do I get started?

As there is sensitivity around the topic of sending information to LLMs, the features are disabled by default. To enable the feature, the account privilege “Generative AI features” will need to first be enabled for roles that will be allowed to use this featureset.

From the query editor, you can either access text-to-SQL from the schema or table menu or SQL-to-Text from the context menu of a highlighted query

These features are in public preview and available for you to explore as soon as you sign up and connect a catalog to your free account!

*Please note that these are experimental features and a human touch is necessary to achieve the outcomes you are looking for.

A look ahead

Foundational models get us a long way towards a goal of efficiency, but does not take into account the richness of metadata and information available from Galaxy, often leading to generic or inaccurate answers. While we will continue to improve the generative AI capabilities within Galaxy, we are actively researching how we can leverage this richness to provide more accurate results that reflect the semantic context of your data.

If you’re passionate about generative AI and have thoughts on how this field is evolving, we’d love to chat with you over a virtual cup of coffee (or any other eBeverage of your choice)! Just reach out via our in-product chat and ask to speak with Ryo.

Start for Free with Starburst Galaxy

Up to $500 in usage credits included

Please fill in all required fields and ensure you are using a valid email address.

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.