
You’re under pressure to deliver data to fuel AI-driven business decision-making. And quite often, those pressures feel competing and contradictory.
In software, we say, “fast, good, or cheap – pick two.”
This is exactly the problem faced by those preparing AI projects. The data in question needs to be high-volume and drawn from across the organization. It has to be accurate. And, more than anything, data for AI has to be governed. Without strong data governance, your AI projects won’t ever see the light of day. Noncompliance will make it a nonstarter.
Pick two? You can’t, because you need all three. That is the problem facing AI adoption. Everything needs to be working correctly for it to provide real value.
The good news is that you don’t have to compromise on data access, governance, or compliance. Data products provide a method for packaging and shipping datasets that simplify governing large volumes of data.
Even better? Thanks to AI, data products are easier to build than ever. You can build them using AI, creating a virtuous circle, fuelled by AI innovation.
In this article, I’ll show you why data products are necessary for AI, and how to develop them for AI technology.
Data products: A proactive approach to data governance
A data product is a packaged, reusable data asset that includes comprehensive metadata, clear data lineage, and domain context.
I always say, “data products apply product thinking to data”. This means that instead of being “just a table,” data products treat data a bit like a SaaS product. They’re self-describing, versioned, and show clear data lineage.
All of this makes data products strongly suited for data management in AI workflows. In this paradigm, data producers can manage changes in release cycles to ensure quality and prevent downstream breakages. Meanwhile, data consumers can see clearly where the data has come from, who the data product owner is, and what it means in context.
With data products, data governance isn’t reactive. It’s proactive. You don’t sit around and wait for data consumers to break something. Rather, you package and roll out dataset changes in a deliberate, planned manner.
The video below walks through some of the basics around data products.
How data products both leverage AI and help build it
Now, this is the part where things get interesting. Turning raw data into data products requires more than just purchasing a set of tools.
It requires changing how your company works with data at an organizational level.
There are two distinct workflows here:
Developing data products for AI
This involves changing the way you approach working with data, where it’s stored, and how you govern it
Developing data products with AI
This involves operationalizing the creation and maintenance of data products with the use of AI agents to streamline delivery
Let’s look at each one by one.
How to develop data products for AI
Building data products for AI should center on three core principles. Let’s look at how each of those plays out.
Automate data governance
The universal truth about AI is that AI is only as good as the data it uses.
Bad inputs lead to bad outputs. This is true of all technology, but it’s perhaps especially true of AI. Here, context is king. And more specifically, data quality is king. AI’s unique nature makes data quality more important than ever.
There are a few reasons for this:
- AI is probabilistic, not deterministic. That means it’s not always easy (or indeed possible) to trace answers back to the data that produced them. This “black box” nature of AI differs from data analytics and complicates data governance.
- The unique security challenges of GenAI make data control critical. Prompt injection attacks, data poisoning, and the use of “Shadow AI” can compromise sensitive data if not strictly monitored and controlled.
- The data that feeds AI is spread across multiple distributed data systems. For most companies, a strictly centralized approach to compliance won’t work.
These challenges can’t be overcome purely with manual processes. They require an agile, federated, and automated approach to data governance. This means implementing automatic policy enforcement on top of manual gating built into your data workflows.
Maintain universal access, centralize selectively
In the past, companies tried to make large volumes of data accessible by centralizing it.
Chances are you’ve done down this path, too.
The problem? It doesn’t work. Indiscriminate centralization is expensive and produces months-long delays, if not outright failure.
A better approach is centralization by choice.
Instead of default centralization, you centralize the data you want, and leave the other data where it is. Over time, you can taper workloads using data products, adding more data sources over time as it makes sense for you. Instead of your data architecture driving you, you drive your data architecture.
This has many benefits.
Enable cross-team collaboration
Enhanced collaboration is one of the major benefits of using data products to enhance data access. The traditional problem with distributed data is that some data is so distributed that no one knows how to find it.
These data silos are disconnected islands adrift from the mainland of your corporate data estate. As a result, the data within them is often fractured, hard to use, and of dubious quality.
Data products break down data silos by being interoperable and encouraging collaboration. They foster an environment where teams can share their data, consume and build upon others’ work, and implement their own use cases. This means greater data reuse and less friction.
Because they’re self-documenting, data products solve many of the governance issues created by data silos. Consumers can see exactly where data is from, when it was last refreshed, and who owns it.
Put another way, data products improve governance by fostering accountability.
Developing data products with AI
The volume of data your company handles exceeds your capacity to process it manually.
That’s why automation in data governance is key. Without automation, you can’t scale.
But what does that mean for producing new data products?
Data products require rich metadata and documentation to work effectively. That’s especially true if the data is feeding AI. But building out that metadata itself takes time.
Thankfully, you can leverage AI to help build AI.
Documenting data products
Generating the metadata for new data products 100% manually requires time that no one has.
You can significantly reduce this time by leveraging an AI agent. The agent takes your existing documentation and data, using it to generate an initial rough cut of your metadata.
After an automatic pass, a human expert who’s knowledgeable in the data domain can make the revisions necessary to ship a production-grade data product. Once published, the user community can further correct and refine this data to increase discoverability and accuracy.
Importantly, all of this happens within a secure, governed workflow governed by strict access controls. All changes are vetted before you release them to production. Only authorized personnel can make and validate changes to metadata.
Searching data products
Data isn’t useful if no one can find it.
Finding data generally requires a base knowledge of SQL. But not all data stakeholders are SQL-conversant. Even if they are, they might not be in a position to use advanced features, such as complex joins or aggregate and window functions.
GenAI Large Language Models (LLMs) excel at parsing existing human language artifacts. They can also create their own new, unique text artifacts, including code.
By leveraging LLMs, an AI agent can enable natural language searches of metadata-rich data products. Users can describe what they want to see. The LLM translates this into syntactically correct SQL code automatically.
Natural language search eliminates the technical barrier to working with data. Executives, analysts, and data engineers alike can leverage AI to get the business insights they need immediately, without futzing over SQL syntax.
Simplifying data product development and maintenance
With data products, you don’t treat data as an undifferentiated sea of information. Instead, you treat data as a product.
Each product is a shippable production unit, with its own metadata, interface, and lifecycle.
By treating data as a product, you can deliver more data changes with higher quality, greater frequency, and improved scalability.
AI and data products belong together
By leveraging AI to create data products, you can significantly reduce the time needed to generate business value and make data-driven decisions.
This is more than a technological shift. It’s a culture shift.
However, choosing tools that integrate seamlessly into your existing data architecture can shorten your journey.
Starburst + data products + AI
AI is changing the world, and data products are part of that change. They provide the necessary data governance to make AI workloads actually function as you envision them, and you can leverage AI to make them as well.
Starburst is here to help. We’ve built our product to make it easy to build scalable data products for AI.
With Starburst, you can:
- Leverage data sources from across your organization and publish them in a secure, governed manner with strict access controls
- Build data products iteratively and collaboratively
- Scale easily with automated data governance
- Cut data product development time significantly with the Starburst AI Agent
To learn more about how to ship data products for AI easily, schedule a call with us today.



