How to Build Data Products

A guide to development for AI workflows

October 29, 2025

Evan Smith

Technical Content Manager

Starburst Data

Evan Smith

Technical Content Manager

Starburst Data

More deployment options

Request Enterprise trial license key →

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

Why Enterprise AI Success Comes Down to Data Access

You’re under pressure to deliver data to fuel AI-driven business decision-making. And quite often, those pressures feel competing and contradictory.

In software, we say, “fast, good, or cheap – pick two.”

This is exactly the problem faced by those preparing AI projects. The data in question needs to be high-volume and drawn from across the organization. It has to be accurate. And, more than anything, data for AI has to be governed. Without strong data governance, your AI projects won’t ever see the light of day. Noncompliance will make it a nonstarter.

Pick two? You can’t, because you need all three. That is the problem facing AI adoption. Everything needs to be working correctly for it to provide real value.

The good news is that you don’t have to compromise on data access, governance, or compliance. Data products provide a method for packaging and shipping datasets that simplify governing large volumes of data.

Even better? Thanks to AI, data products are easier to build than ever. You can build them using AI, creating a virtuous circle, fuelled by AI innovation.

In this article, I’ll show you why data products are necessary for AI, and how to develop them for AI technology.

Data products: A proactive approach to data governance

A data product is a packaged, reusable data asset that includes comprehensive metadata, clear data lineage, and domain context.

I always say, “data products apply product thinking to data”. This means that instead of being “just a table,” data products treat data a bit like a SaaS product. They’re self-describing, versioned, and show clear data lineage.

All of this makes data products strongly suited for data management in AI workflows. In this paradigm, data producers can manage changes in release cycles to ensure quality and prevent downstream breakages. Meanwhile, data consumers can see clearly where the data has come from, who the data product owner is, and what it means in context.

With data products, data governance isn’t reactive. It’s proactive. You don’t sit around and wait for data consumers to break something. Rather, you package and roll out dataset changes in a deliberate, planned manner.

The video below walks through some of the basics around data products.

How data products both leverage AI and help build it

Now, this is the part where things get interesting. Turning raw data into data products requires more than just purchasing a set of tools.

It requires changing how your company works with data at an organizational level.

There are two distinct workflows here:

Developing data products for AI

This involves changing the way you approach working with data, where it’s stored, and how you govern it

Developing data products with AI

This involves operationalizing the creation and maintenance of data products with the use of AI agents to streamline delivery

Let’s look at each one by one.

How to develop data products for AI

Building data products for AI should center on three core principles. Let’s look at how each of those plays out.

Automate data governance

The universal truth about AI is that AI is only as good as the data it uses.

Bad inputs lead to bad outputs. This is true of all technology, but it’s perhaps especially true of AI. Here, context is king. And more specifically, data quality is king. AI’s unique nature makes data quality more important than ever.

There are a few reasons for this:

AI is probabilistic, not deterministic. That means it’s not always easy (or indeed possible) to trace answers back to the data that produced them. This “black box” nature of AI differs from data analytics and complicates data governance.
The unique security challenges of GenAI make data control critical. Prompt injection attacks, data poisoning, and the use of “Shadow AI” can compromise sensitive data if not strictly monitored and controlled.
The data that feeds AI is spread across multiple distributed data systems. For most companies, a strictly centralized approach to compliance won’t work.

These challenges can’t be overcome purely with manual processes. They require an agile, federated, and automated approach to data governance. This means implementing automatic policy enforcement on top of manual gating built into your data workflows.

Maintain universal access, centralize selectively

In the past, companies tried to make large volumes of data accessible by centralizing it.

Chances are you’ve done down this path, too.

The problem? It doesn’t work. Indiscriminate centralization is expensive and produces months-long delays, if not outright failure.

A better approach is centralization by choice.

Instead of default centralization, you centralize the data you want, and leave the other data where it is. Over time, you can taper workloads using data products, adding more data sources over time as it makes sense for you. Instead of your data architecture driving you, you drive your data architecture.

This has many benefits.

Enable cross-team collaboration

Enhanced collaboration is one of the major benefits of using data products to enhance data access. The traditional problem with distributed data is that some data is so distributed that no one knows how to find it.

These data silos are disconnected islands adrift from the mainland of your corporate data estate. As a result, the data within them is often fractured, hard to use, and of dubious quality.

Data products break down data silos by being interoperable and encouraging collaboration. They foster an environment where teams can share their data, consume and build upon others’ work, and implement their own use cases. This means greater data reuse and less friction.

Because they’re self-documenting, data products solve many of the governance issues created by data silos. Consumers can see exactly where data is from, when it was last refreshed, and who owns it.

Put another way, data products improve governance by fostering accountability.

Developing data products with AI

The volume of data your company handles exceeds your capacity to process it manually.

That’s why automation in data governance is key. Without automation, you can’t scale.

But what does that mean for producing new data products?

Data products require rich metadata and documentation to work effectively. That’s especially true if the data is feeding AI. But building out that metadata itself takes time.

Thankfully, you can leverage AI to help build AI.

Documenting data products

Generating the metadata for new data products 100% manually requires time that no one has.

You can significantly reduce this time by leveraging an AI agent. The agent takes your existing documentation and data, using it to generate an initial rough cut of your metadata.

After an automatic pass, a human expert who’s knowledgeable in the data domain can make the revisions necessary to ship a production-grade data product. Once published, the user community can further correct and refine this data to increase discoverability and accuracy.

Importantly, all of this happens within a secure, governed workflow governed by strict access controls. All changes are vetted before you release them to production. Only authorized personnel can make and validate changes to metadata.

Searching data products

Data isn’t useful if no one can find it.

Finding data generally requires a base knowledge of SQL. But not all data stakeholders are SQL-conversant. Even if they are, they might not be in a position to use advanced features, such as complex joins or aggregate and window functions.

GenAI Large Language Models (LLMs) excel at parsing existing human language artifacts. They can also create their own new, unique text artifacts, including code.

By leveraging LLMs, an AI agent can enable natural language searches of metadata-rich data products. Users can describe what they want to see. The LLM translates this into syntactically correct SQL code automatically.

Natural language search eliminates the technical barrier to working with data. Executives, analysts, and data engineers alike can leverage AI to get the business insights they need immediately, without futzing over SQL syntax.

Simplifying data product development and maintenance

With data products, you don’t treat data as an undifferentiated sea of information. Instead, you treat data as a product.

Each product is a shippable production unit, with its own metadata, interface, and lifecycle.

By treating data as a product, you can deliver more data changes with higher quality, greater frequency, and improved scalability.

AI and data products belong together

By leveraging AI to create data products, you can significantly reduce the time needed to generate business value and make data-driven decisions.

This is more than a technological shift. It’s a culture shift.

However, choosing tools that integrate seamlessly into your existing data architecture can shorten your journey.

Starburst + data products + AI

AI is changing the world, and data products are part of that change. They provide the necessary data governance to make AI workloads actually function as you envision them, and you can leverage AI to make them as well.

Starburst is here to help. We’ve built our product to make it easy to build scalable data products for AI.

With Starburst, you can:

Leverage data sources from across your organization and publish them in a secure, governed manner with strict access controls
Build data products iteratively and collaboratively
Scale easily with automated data governance
Cut data product development time significantly with the Starburst AI Agent

To learn more about how to ship data products for AI easily, schedule a call with us today.

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

The Data Engineers Guide to Iceberg v3

How to Build Data Products

More deployment options

Start for Free with Starburst Galaxy

Why Enterprise AI Success Comes Down to Data Access

Data products: A proactive approach to data governance

How data products both leverage AI and help build it

Developing data products for AI

Developing data products with AI

How to develop data products for AI

Automate data governance

Maintain universal access, centralize selectively

Enable cross-team collaboration

Developing data products with AI

Documenting data products

Searching data products

Simplifying data product development and maintenance

AI and data products belong together

Starburst + data products + AI

Start for Free with Starburst Galaxy