AI Needs Both Data Access and Data Governance

Unpacking two key ingredients in AI production success

Share

Linkedin iconFacebook iconTwitter icon

More deployment options

What could topple all of your best-laid AI plans? The potential list is long. Implementing AI projects in enterprise environments is, after all, not always straightforward

Coming to terms with the technological demands alone can seem like the main point of failure. But there’s far more to AI than technology. In fact, context has rapidly emerged as the key bottleneck in AI failure. And context is really a connected problem made up of two factors: 

  • Poor data access
  • Lack of data governance   

In this article, I’ll dive into why data access is the best pathway to data governance, why this requires an organizational shift in mindset, and outline the four principles you can use to make your data available and governable for AI without a massive data centralization effort.

Why universal data access is the bedrock of successful AI

The numbers are clear. Enterprises are attempting to adopt AI, but not always succeeding on the first attempt. In fact, according to an MIT report, 95% of generative AI projects are failing

Despite this, AI remains a key priority for most businesses. As Bain reports, 74% of all businesses consider AI a top-three priority

The role of data access in AI project success

One reason for this disconnect? A lack of access to data, and specifically a lack of access to governed, contextual data. 

Making AI workflows a reality requires a lot of data, including contextual data, and not all of it is easily accessible or well governed using traditional technology. To most organizations, making that data accessible also means making it governable, and that has traditionally meant moving it, which is notoriously difficult from a governance standpoint. 

Because of this, data access and data governance quickly become parallel problems. This was already the case with analytics, and AI is only extending this need. 

Universal data access is achievable today

Luckily, there’s another approach that spares you the risk. Universal data access. We’ve said it repeatedly, AI is only as good as the data you feed it. That means access to both high-volume data and high-quality data. Importantly, this extends to data context. 

Universal access ends the historic practice of indiscriminate data centralization and replaces it with choice. You choose how you want to access your data. You can access this data where it lives if that makes sense. Meanwhile, data can be centralized if needed. But centralization isn’t the default. In this context, data centralization becomes a choice rather than a mandate.

AI data governance best practices to enable universal data access

What about data governance? All of this directly connects to the data governance needed for AI

With universal data access, all organizational data is available via a centralized hub using distributed query tools. This approach enables federated computational governance, allowing distributed data governance that balances local control with centralized monitoring, audit capabilities, and access controls. In other words,  data access runs in parallel with data governance. 

How should you get started? Here are some practical jumping-off points. 

Federate for data exploration

Everyone starts with data exploration, and this task is perfectly suited to data federation. Again, this approach doesn’t mean that you should never centralize your data. It means that you should start with decentralization as the default during data discovery. Later, if your use case requires it, you can migrate and centralize your data in a modern table format such as Apache Iceberg for better performance, governance, and maintainability.

Adopt a hybrid AI data strategy

Next, think of all your data, and that means both cloud data and on-premises data together. On-premises data is hardly a thing of the past. If you work in a highly-regulated industry – like finance, health care, etc. – you likely have some or a bulk of your data running in secure data centers under your direct management. You may also be bound by data sovereignty laws that prevent the movement of data across national boundaries.

When considering universal data access for AI, it’s important to adopt a hybrid AI data strategy that enables your data architecture to work with data that must remain within your local networks while maintaining regulatory compliance. This might include:

  • Hosting Large Language Models (LLMs) on-premises as opposed to using remotely hosted cloud services.
  • Hosting retrieval augmented generation (RAG) processes on-premises to keep sensitive data within the network boundary and maintain data protection
  • Running AI agents and AI systems locally.
  • Using metadata and security controls to identify and limit access to sensitive data, implement role-based access, and establish proper safeguards.

Enable AI to gather context in a compliant manner

The quality of data you feed to AI models makes all the difference. Part of that is making sure that the meaning and intended usage of tables and fields is clear and unambiguous – ensuring explainability in your AI systems.

To address this, review your data structures to ensure they’re clearly labeled and described. Leverage metadata to provide additional context around data’s origin and usage, including a full description of how each table and field functions and what its data is used for. You can leverage AI agents to help you out through automation.

Another useful contextual tool is end-to-end data lineage. Data lineage traces the movement of data across your company from upstream producers through all downstream consumers, providing visibility into the full data lifecycle. Documenting the connections between data sources provides additional context for LLMs to discern when and how specific training data should be used, reducing AI risk and improving decision-making outputs.

Use data products as your enabler

This all sounds great. But how do you implement it?

You still need a way to package, document, deploy, manage, and govern that data as it evolves over time through proper governance policies. This is where data products come in.

A data product is a curated, reusable dataset engineered to deliver trusted, AI-ready data for downstream consumers, with associated metadata to make it easy to find and consume. It consists of three components:

  • Data
  • Metadata
  • Data access patterns

For data consumers, data products make it easy to find and use high-quality data culled from multiple sources. For data producers, they boost data governance by improving data efficiency, quality, security, data integration, interoperability, collaboration, and transparency across workflows.

Check out my video below explaining how data products work. 

Once created, data products become their own self-describing data assets with built-in validation and auditability. They can be versioned in order to maintain backwards compatibility with existing data workflows.

Additionally, other teams across the enterprise can find and use them, either as end products or as components in their own data products. Creating data products enables modularity, encourages reuse, and reduces duplicated effort across the enterprise – helping you operationalize AI technologies while streamlining governance programs. 

Replacing BI Dashboards with AI interfaces

Leveraging all of these techniques together is powerful. It allows you universal access to all of your data, under governed conditions, underwritten by data products. 

What can you do with all of that power? 

Many things, but one of the post powerful production use cases is the Starburst AI Data Assistant (AIDA). AIDA replaces BI dashboards with conversational AI workflows. It accomplishes this by replacing static, unresponsive dashboards with something dynamic and interactive

If you want to know more about AIDA, check out this demo

Better data governance with Starburst 

AI data governance presents both business and technical challenges. But now, with the combination of universal data access and data products, your company can make data AI-ready faster than ever, aligning AI initiatives with business objectives and building trust with stakeholders.

Succeeding with universal data access means choosing the right platform. Starburst is designed to simplify the data foundations needed for AI and analytics, with strong governance capabilities. Use Starburst to easily manage data products at scale and provide a single point of access to data across your organization–all built on open standards and with zero vendor lock-in, allowing you to move beyond business intelligence dashboards. 

Contact us now to learn how Starburst can help you prepare today for the AI future of tomorrow.

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.
Start Free