My Journey with Starburst AI Tools

Using Starburst to solve Starburst's own business needs

Share

Linkedin iconFacebook iconTwitter icon

More deployment options

I’m Moa, and I work in our Go-to-Market (GTM) organization. I’m going to dive into a key topic that’s generated a lot of interest internally and externally—using Starburst’s own product to solve our business needs. 

Why Starburst uses Starburst 

Starburst on Starburst has a long history. Drinking our own champagne is a part of our mindset here. It aids our GTM motions by inspiring confidence internally and externally.

Specifically, we’ve always used our own product to help facilitate the core of our data processing while integrating with a variety of data sources. This includes classical transformation pipelines, medallion architectures, and data products

For my GTM team, two of the data sources we use regularly are Homerun Presales and Salesforce. Both systems have their own UI and API, but we wanted copies of these well-formed, high-quality datasets stored as Iceberg tables in our data lake. For Homerun Presales, we leveraged their API to extract data, and we used the Starburst Salesforce connector to query the underlying Salesforce tables.

Project Starlake

At Starburst, we call this internal data lake analytics platform Starlake. 

Catchy, isn’t it? And it’s growing all the time. 

Starlake already includes classical BI reports and dashboards, but we wanted to learn more from our data. I took the lead in building an internal GTM agent, leveraging some of our AI features. Specifically, I couldn’t wait to get hands-on with our AI functions and our Starburst AI agent and apply them to the Starlake project.

Let’s delve into some of the ways that my team used Starlake and Starburst’s AI features to solve real problems. 

Using Starburst AI functions

My goal to enhance Starlake was focused on AI, but I needed to get a few things in order before I could get started.

Solving for data access

My first problem was data access. Specifically, I needed to access specific datasets and datasources, and become familiar with what was available to me using data discovery. 

Luckily, this is the bread-and-butter of what Starburst can do. 

Our platform provides a single point of access to any organization’s entire spectrum of data sources. What’s more, we can easily use query federation to join these sources and treat them as a single source, while providing built-in security controls. 

How I used Starburst to access data

Fortunately, I already had access to the silver-layer tables from the external data sources we regularly use. 

For this reason, I started there. I accessed each of the primary tables from the Homerun Presales sources using the SQL below. The goal for starting with a generic SQL was for me to figure out which columns I should use.

Image showing a SQL query that accesses the dataset in the homerun repository.

Deciding what question to ask my dataset

But first, I needed to decide what question I ultimately wanted to ask my dataset, and then work backwards. 

This is always a pivotal moment in any data project, and I’m sure many of you have been there. At first, I was stuck thinking, “What should I ask?”. After all, since I’ve got access to AI models via Starburst, I could ask it anything. Just like when using an off-the-shelf AI chat agent trained on the public internet’s contents, the possibilities are endless.

It’s a problem that many data engineers and data analysts encounter, especially with AI tools. When you can ask any question, it’s hard to come up with the BEST question.

In the end, I decided to start with a specific question: “Looking at the pre-sales activities, should we hire any new pre-sales employees?”. 

Ok, first step done. But I needed to get there. Specifically, I needed to write a SQL function to access the data. 

Access the data

Next, it was time to write a SQL function to retrieve the data. Using Starburst, I wrote a simple SQL query to aggregate pre-sales activities over the last 90 days.

Image depicting a SQL statement that accesses the specific parts of customer data pertaining to selected employee hours.

I wrapped that query up as a Common Table Expression (CTE) so I could reference its values as a logical table in the next steps.

Formatting the data

Ultimately, I want to supply the prior CTE as input in a RAG request to an LLM. This led me to build a second CTE that rolls up all the rows from the activities table into a single JSON object using the SQL query below.

Image depicting a SQL statement that formats customer data.

Leveraging an LLM

Finally, I could use the ai.prompt() function to ask my question and leverage any of the configured LLM models I had access to. 

The prompt function allows me to ask a question in plain text, like “Looking at the pre-sales activities, should we hire any new pre-sales employees?”. This function is probably the most powerful function available to a Starbust user. It can read the data we are passing and create reasoning for the answer. I won’t cover here, but our Role-Based Access Controls allow organizations to enable or disable certain functions, ensuring that this ability is always controlled. 

In the SQL shown below, you can see the structure where I’ve added my question, the formatted data from the prior CTE, and the LLM model I’m using.

Send JSON object record for analysis.

The specific prompt I used in place of “My question goes here” in the SQL above is shown in the screenshot below.

Image depicting a specific prompt sent from Starburst to an LLM.

The LLM provided the following results.

Results

From here, I was then able to review the hiring recommendations generated by the AI model and ultimately present them to leadership for consideration. 

Image depicting the results of Starburst querying an LLM using a prompt on Starlake.
Image depicting the second half of the results of an LLM query.

Cool, this stuff works!!

Rinse, lather, repeat

This was a great first exercise, but now I wanted more.

Specifically, I wanted to use Salesforce data. A few Starburst All-Stars and I decided that a great use case would be to understand how long it takes for a new account executive to close their first deal.

I took the same approach as above:

  1. Identify the tables and columns to query
  2. Format the additional data into a JSON object
  3. Augment the LLM request with the internal data 

I then shared the results, and in every interaction, they would ask me a follow-up question like:

    • Oh, I want to see when the Account Executive closes their first new logo.
    • Can you filter to only show Mid-Market?
  • And so on and so on and so on.

Optimizing query performance using the Starburst AI agent

At this point, I was ready to use our AI agent and address the performance issue, because for every question I asked, our SQL was going back to Salesforce (which was also getting costly). It was query optimization time. 

The Starlake team was happy to help me create an Iceberg table from a join of the opportunity table and a few others. I’m glad I was talking with the experts, as they knew the exact column names and the best query filters to use to get all the data I needed.

It was at this point that our first AI agent was ready for testing.

Creating a data product

With my table created in Iceberg, the next step was to produce a data product and configure it to use my table as a source.

Image depicting the creation of a data product.

One way to improve our AI agent’s responses is to use additional metadata. Additional metadata helps drive contextual accuracy, and AI thrives on context. As has always been the case, I could have manually added descriptions for the data product, the tables, and the columns themselves.

Adding additional comments for improved metadata.

Generating metadata with AI

Instead, I used AI integrated into our product to generate descriptions for my dataset, including descriptions of each column. As with the underlying SQL functions, I could select from the configured LLMs I had access to.

Before

Generating metadata using an AI prompt in the Starburst UI.

After

Showing the impact of generated metadata, created using AI in the Starburst UI.

Following our product team’s advice, I reviewed the generated metadata to see if I needed to enhance it further before saving it. With that done, it was time to ask questions!

Chat with the agent

This is how the agent looks—a pretty typical chat interface.

Image showing the prompt for the Starburst AI Agent.

It allows me to ask questions about the metadata and the data in the text box at the bottom of the screen. I can also change the type of response I receive by selecting Data Engineer, Analyst, or Executive, just below the input text area. Right next to that, you can see that I can also try different LLM models.

NOTE: Since this is an internal use case, the following is demo data.

My first question

I started by asking for suggestions on how to launch a new marketing campaign. 

Showing the prompt for the Starburst AI Agent.

Generated SQL

As an Analyst, the AI agent returns the SQL it has generated. It allows you to make changes if you want. Otherwise, I can ask the agent to execute it.

Image showing generated SQL.

Results

Showing the results of the Starburst AI agent.

The AI agent really helped me scale up to the questions I was coming up with on my own. I also invited other All-Stars to ask their questions of our AI agent accessing this data product, thus allowing me to scale even more with the power of self-service.

Closing the loop

After transforming my first use case into a data product, the leadership team has been using it as part of our AI tool portfolio.

The second use case is in production, and we have users currently using it. The next step is to continue using other AI functions to help with churn by analyzing the sentiment of our customer conversations and identifying keywords using the classification function.

Why using Starburst on Starburst was a great success

Starburst’s AI tools have transformed how my team works with data. By combining data federation, Iceberg tables, and AI-powered SQL and metadata generation, we can move from questions to insights in minutes rather than hours. 

What began as an internal experiment with Project Starlake has become a self-service data experience for our entire GTM team. We no longer worry about where the data lives or how to access it. Instead, we focus on asking better questions and letting Starburst handle the rest. Using Starburst to solve our own business needs has brought us speed, confidence, and discovery. 

And the best part? We’re using our own platform. The benefits of drinking our own champagne continue to add up. 

For more information on this project, check out my recent webinar on the topic. 

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.
Start Free