Why Structured Data is the Ground Truth of AI

June 22, 2026

Starburst Team

More deployment options

Request Enterprise trial license key →

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

A Future Look at Data Systems for Agents

Most enterprise AI projects do not fail because the model is weak. They fail because the data underneath the model cannot be accessed or understood. This is why engineering the AI leap is increasingly becoming the next bottleneck in AI. Importantly, this is becoming an increasingly urgent need. Organizations are buying and building frontend AI tools while the data foundation beneath them stays broken. The model is rarely the weak link. It is waiting on data it cannot properly reach.

This topic was recently the focus of Starburst CEO Justin Borgman’s Keynote speech for AI & Datanova Miami.

That reframing matters because it changes where enterprise AI needs to spend its effort. If the bottleneck were the model, the answer would be to wait for the next release. It isn’t, so the work moves to the foundation, and specifically to structured data and the business context wrapped around it.

The phrase comes from the people building the hardware

The idea that structured data is the ground truth of AI is not a marketing slogan. It surfaced at NVIDIA GTC, where structured data was presented as the ground truth of AI inside a 120 billion dollar data ecosystem. In a later conversation with Michael Dell, Jensen Huang described structured data as its own AI bottleneck, one that controls access to the context layer that makes AI accurate, governed, and valuable.

The distinction Huang draws is the one that matters. Having the data is not the same as having the meaning behind it. Structured data is the operational ground truth of a business, and most enterprises are still failing to unlock it. That gap, not model quality, is what stalls AI in production.

Why meaning is the hard part

Consider a single word: revenue. Ask three teams, and you get three answers. Finance means recognized GAAP revenue, the number that is earned and lives on the income statement. Sales means booked ARR, the annualized value of signed contracts. Product means gross billings, the raw amount invoiced before discounts or refunds. None of these is sloppy. Each reflects a genuinely different question about the business, and each is correct from the perspective of the team using it.

For years, a human analyst absorbed that ambiguity. They knew which definition to use, and if they got it wrong, they corrected it on the next pass. That translation step was invisible because it worked, but it was doing more work than anyone wrote down. The analyst was not a bottleneck. They were a filter and the business translator for everyone downstream.

Remove the analyst, and the problem changes shape. An agent receives a question, picks one definition on its own, and runs it across thousands of queries before anyone notices the answer is wrong. This is not a model problem. Every company is running on roughly the same foundational models, and a more capable model still reasons correctly against the wrong definition of revenue. The model knows what revenue means in general. It does not know what your revenue means.

Context has to be curated, not dumped

The instinct is to fix this by handing the agent everything. That means every table, every definition, every piece of metadata in the organization. But that would be the wrong approach. Too little context and the agent fills the gap with a guess. Too much and it gets distracted by noise, deprecated tables, and conflicting sources. The right amount of context is curated, specific, and certified. Precision is what makes an agent trustworthy instead of confidently wrong.

How Starburst can help

This is a problem Starburst has solved before, for human consumers, under a different name. Data products are the curated, trusted layer that serves agreed definitions to everyone downstream, defined once and reused across every team that touches them.

What changes now is that AI is both a new consumer of those products and a new producer of them, which raises the bar. Definitions can no longer sit in a wiki page or an analyst’s head. They have to travel with the data, encoded rather than remembered.

Solving the data access problem

Access to business context is what separates AI that informs from AI you can act on. An enterprise context layer does three things. It holds the structure of your definitions, metrics, and rules, written the way your business works. It adds the meaning that connects them, so agents can reason across domains rather than look up isolated facts. And it applies governance at the moment context is served, so every answer stays auditable and traceable back to the person who certified the definition behind it. Capability is not the goal, but instead, trustworthiness at scale.

The data foundation for AI has to be fast and federated

How do we get that? The answer is interesting and involves a data access problem.

Context is only useful if you can reach the data without moving all of it first. Enterprise data has never lived in one place. It spans clouds, operational systems, and regulated environments, and Gartner expects 90 percent of organizations to adopt hybrid cloud by 2027. Forcing everything into one rigid platform creates friction at the exact moment AI demands flexibility.

A flexible data foundation lets you query data in place and move only what you choose. That foundation has to perform, too. Starburst now delivers up to 2x faster query performance than open-source Trino while supporting up to 180 concurrent queries under the same conditions, and the Icehouse pairs the openness of the lakehouse with the federation and performance enterprises need at scale. Access in place, governed context, and speed are not separate initiatives. They are the same foundation seen from three angles.

From answers to action

Once the data is accessible and the context is certified, the payoff is analytics that act rather than merely inform. AIDA reasons against your governed data products and explains its logical path, instead of guessing at user intent the way a generic text-to-SQL tool does. That guessing is what drives hallucination in a real business environment, and curated context is what removes it.

It also closes a distance that traditional dashboards never did, namely, the gap between knowing something and doing something about it. A dashboard surfaces a number and then hands the interpretation, the decision, and the action back to a human. The structured data and context layer underneath AIDA are what let the system carry that work forward, in a governed and auditable way, rather than stopping at a chart.

How do you take the next step?

None of this happens automatically. The leap has to be engineered, and the structured data underneath it is the ground truth that decides whether everything above it holds. This approach stops waiting to fix the backend and starts activating the foundation you already have, activating your data foundation as a context layer for production enterprise AI.

Starburst is built to solve the context layer problem. Try Starburst for free, and begin imagining how to put your data foundation to work.

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

The Data Engineers Guide to Iceberg v3

Why Structured Data is the Ground Truth of AI

More deployment options

Start for Free with Starburst Galaxy

A Future Look at Data Systems for Agents

The phrase comes from the people building the hardware

Why meaning is the hard part

Context has to be curated, not dumped

How Starburst can help

Solving the data access problem

The data foundation for AI has to be fast and federated

From answers to action

How do you take the next step?

Start for Free with Starburst Galaxy