
At Starburst, we leverage Kafka topics for stream event processing, and one of the ways we use them is to track user events. Like most engineering teams, we want to turn that stream into something useful without signing up to maintain yet another data pipeline. This post is the story of how we built that functionality using Starburst Galaxy’s data ingestion feature and AIDA, Galaxy’s conversational AI interface.
This story also comes with a twist.
The data we’re analyzing is usage telemetry pertaining to Galaxy’s data ingestion feature itself. So yes, we’re using data ingestion to understand how people use data ingestion. It’s as meta as it sounds, and has been a really fun project to work on!
Understanding how users adopt data ingestion
Setting up Kafka ingestion in Galaxy is a straightforward, guided process. But as the team was building this feature, we had a set of questions the setup wizard couldn’t answer.
These included:
Feature adoption
The guiding question here was how quickly are users discovering and trying data ingestion after it becomes available to them?
Usage friction
For this, we considered the user experience from their perspective. Where in the setup flow do users get stuck or give up entirely?
Common errors
Next, we asked what goes wrong most often and whether there is something we can do in the UX to prevent it.
Time-to-value
Finally, we considered how long it actually takes someone to go from first discovering the feature to having a successfully running live table.
These are the kinds of questions that product and engineering teams are always asking. And if you’ve ever tried to answer them with static dashboards, you know how it goes. By the time you’ve built the dashboard that answers today’s question, the team has already moved on to three new ones.
What we track (and what we don’t)
Our application emits tracking events to a Kafka topic. We used Galaxy data ingestion to stream those events into a Starburst Galaxy live table, which is essentially a managed, continuously updated Iceberg table backed by Amazon S3.
Here’s what the events capture:
Ingestion source CRUD events
This included creating, viewing, updating, and deleting ingestion source configurations.
Live table CRUD events
This event tracks the same lifecycle operations for live tables themselves.
Verify events
We also verify each event. This process involves test connection style actions, where a user validates their ingestion source or live table configuration before committing to it.
Partitioning and sorting configuration
This metric tracks whether users apply custom partition columns or sort orders to their tables.
Determining the user
It’s worth noting that one thing we deliberately don’t track is a user’s identity. Every event carries an anonymized user ID, so we can analyze behavior patterns and cohorts without ever knowing which person took which action. Privacy by design. It’s important to us to learn from usage patterns without compromising anyone’s identity.
On the Iceberg side, data first lands in a raw table, which directly represents the Kafka messages. From there, a transform table reshapes things into a query-friendly schema. This step flattens nested fields, casting types, and filtering out unnecessary columns. Once that’s in place, the data is ready to query.
Here’s a quick final check to confirm events are flowing:
SELECT * FROM usage_events WHERE name LIKE '%ingestion%' ORDER BY time DESC LIMIT 10;
Events show up within minutes of being produced to Kafka. No batch jobs to schedule, no orchestrator to keep an eye on – it just works.
AIDA provides conversational analytics, not dashboards
AIDA is Galaxy’s AI-powered data agent. You point it at a catalog, ask a question in plain English, and it generates SQL, runs it, and hands you the results within the flow of a conversation.
If you’ve ever needed to answer a product question the traditional way, you know the drill
- Write some SQL (or find someone who can)
- Run it
- Export the results
- Maybe build a chart in a BI tool
- Share it in Slack
- Get a follow-up question that requires a slightly different query, and repeat the whole cycle.
With AIDA, you skip all of that and just ask the question using natural language.
Example: AIDA in action
Let’s walk through a real example. We prompted AIDA with questions around data ingestion.
Here are some real prompts we used.
Prompt: “What are the top 10 most common errors during ingestion source verification this month?”

Result: The images below show the results. AIDA answered our questions perfectly and used the available context to deliver powerful results.

Digging deeper with AIDA
Next, we asked it a follow-up question that builds on the last results and digs deeper. AIDA naturally handles this depth, allowing users to follow a hypothesis or line of inquiry.
Prompt: “I want to understand friction points when users try to create ingestion sources and live tables.”

Result: AIDA returns information on the friction points as asked, including relevant details about ingestion sources and live tables.

Changing the direction of questioning
Sometimes one answer begets an entirely different question. To help with this, AIDA allows users to change direction with their questions. Let’s look at the following example.
Prompt: “How many users have customized table partitioning or sort order?”

Result: AIDA helps track down errors and identify problems. Because it has access to all of the context it needs, the insights are…insightful. Consider the following example. AIDA is able to return results of this type far more easily than conventional methods.

Prompt: “Are there any correlations between the Kafka source errors and the authentication type (such as SASL/PLAIN or SASL/SCRAM)?”
How AIDA lets you move faster and more dynamically using conversation
Getting each of these answers would have been a real pain in the neck in the old world. I’ll be honest, I don’t write SQL often enough to remember syntax off the top of my head, so the process usually involves a bunch of searching through the Trino SQL reference, followed by trial-and-error until the query works.
With AIDA, the loop from question to answer is measured in seconds, and I can immediately iterate and narrow things down with follow-up questions, like “now break that down by month” or “exclude Starburst internal accounts” without having to start over. The difference in speed and scale is significant.
How Starburst leverages this power using a meta loop
Now for the meta part. There’s something genuinely satisfying about using your own product to understand itself. We built data ingestion so that users could stream Kafka data into a high-performance Iceberg data lake without managing infrastructure. Then we used that exact same feature to stream our own usage telemetry into Iceberg. And then we used AIDA to ask questions about the data. These questions directly informed the next round of improvements to the ingestion feature itself.
How the meta loop works in practice
Our meta loop surfaced things we wouldn’t have thought to include in dashboards.
Here’s a good example.
- While poking around in AIDA, we noticed that a surprising number of users needed 10+ verification attempts to successfully create a live table.
- A few follow-up questions helped us dig in, and it turned out the verify step wasn’t giving clear enough feedback on Kafka connection failures.
- Users couldn’t tell whether it was a network connectivity issue or incorrect credentials, so they just kept retrying.
- That insight led directly to a UX improvement, and the whole thing started with a casual question in a chat window, not a pre-planned dashboard panel.
Each of these insights is immensely valuable, and we’re already working to implement iterations and improvements based on the results.
AIDA’s next step is visual, and that has big implications for dashboards
There’s something else. AIDA already eliminates the need to write SQL or configure dashboards for exploratory analysis. Today, results come back as text.
That’s about to change.
AIDA visualization support
Visualization support is in active development. The idea here is that you’ll soon be able to ask a question and get a chart back, not just numbers and text. When a conversational AI can both query and visualize, the emerging workflow collapses the following traditional BI tasks into a single conversational thread, allowing you to:
- Write a query
- Build a chart
- Tweak filters
- Share a dashboard
- Field requests for changes
That’s a pretty significant shift.
What AIDA means for dashboards
AIDA is already disrupting dashboard workloads, and that trend is only going to continue. Now, this doesn’t mean dashboards disappear overnight. Recurring KPIs and shared team views still benefit from a pinned dashboard that everyone can glance at.
But the balance is shifting.
The exploratory, iterative, “I just have a quick question” kind of work (which is most of what engineers and PMs actually do day to day) is moving to conversational interfaces. BI tools become the static reporting layer. The AI interface becomes where the actual thinking happens.
How we used AIDA to improve Starburst itself
In summary, we ingested a stream of usage-tracking events into Kafka, pointed Galaxy data ingestion at it, and had a queryable Iceberg table in minutes. No infrastructure to manage, no jobs to monitor. Then we used AIDA to ask questions about that data in plain English and got answers immediately – answers that directly shaped the product we were building.
If you’re sitting on event streams and find yourself spending more time building pipelines and dashboards than analyzing your data, this is worth a look. The Galaxy data ingestion docs are a good place to start.
Excited about AIDA?
We are too. We have a lot more coming, so stay tuned.
We’re working on a follow-up post covering best practices for using conversational AI agents for data exploration. It will include insights into prompt strategies, common pitfalls, building trust in the answers, and how to get the most out of tools like AIDA.



