
AI is no longer a lab experiment. It’s moving into production, and with that come challenges. In every industry, data leaders are under pressure to operationalize AI in a way that is governed, cost-effective, and tied directly to business value.
Increasingly, it’s that connection between AI and concrete business value that is not coming into clearer focus.
A recent MIT study, “The GenAI Divide: State of AI in Business”, reported that 95% of enterprise AI projects failed to deliver measurable returns. This headline statistic has been repeated everywhere — but the nuance matters. The failures were mostly task-specific, custom AI pilots designed for rapid ROI, which proved brittle and difficult to integrate into workflows. Meanwhile, general-purpose LLMs (like GPT) have seen far higher adoption and success.
For most enterprises, those early failures were not wasted effort — they were the learning curve that revealed the real bottleneck: fragmented, ungoverned data. The lesson from these is clear. The challenge is not the AI model itself. Instead, it’s the data foundation, governance, and workflow integration that determine success.
In that context, four questions consistently surface in my conversations with enterprise data leaders:
TL;DR
- Unified structured and unstructured data access
- SQL-native AI functions inside data platforms
- Vector search for unstructured enterprise context
- Federated queries minimizing data movement
- Built-in model governance and access controls
- Agentic workflows enabling intelligent business action
1. What real-world problems are enterprises solving today, and what’s next?
Data solves many problems, but across all organizations, certain problems are particularly suitable for a data-driven approach. Among them, fraud detection, supply chain optimization, and customer 360 dominate the landscape. One of the key reasons for this is that these problems blend structured and unstructured data to deliver better decisions. In doing so, they take the entirety of an organization’s data into consideration.
For example
- Fraud teams unify transactions, alerts, and case notes to spot hidden patterns.
- Supply chain teams combine ERP, telemetry, and vendor data for accurate ETAs.
- Customer teams mix clickstream, support, and product data to personalize engagement and learn customer sentiment.
Why context is key
In each case, structured and unstructured data are used to create a comprehensive picture. From this, we can draw one key conclusion: context is key, and to get that context, unlocking data of all types is necessary.
How do we improve on this and take the next step? In a word, agency.
Why context helps drive agency
Agentic workflows are the next piece of the puzzle, but not just any agentic agents. To succeed, the same approach to context needs to be taken into consideration. Organizations need agents that don’t just answer, but act. Whether this means rerouting a shipment, drafting an offer, or creating a governed data product automatically, agency is the next big area of expansion in unlocking true business value.
2. What are the biggest pain points in making data AI-ready?
Today, nearly all organizations want to embrace AI in one way or another, but the pathway to achieving real results isn’t always clear. Enterprises face recurring hurdles, and it’s worth reflecting on those pain points to understand what works and what doesn’t.
They include:
Discoverability debt
Undocumented data is one of the most powerful forms of data, but using it effectively can be a challenge.
Solution: AI can help solve that challenge, using AI-assisted documentation and data products.
Addressing fragmentation
Organizations today typically operate in multiple ecosystems, including multi-cloud or on-premises environments. This can create sprawl and impose a barrier when implementing AI.
Solution: This can be solved using specific data architectural approaches, such as federated queries and open formats like Iceberg.
Unlocking unstructured data
Unstructured data is like the dark matter of the data world. It holds tremendous power, but can be hard to access and operationalize. Nonetheless, leveraging the collective context contained in an organization’s PDFs, logs, and tickets holds the key to making AI more contextually aware and more valuable.
Solution: This can be solved by using vector and full-text search in SQL.
Filling data governance gaps
Data governance needs to scale in line with AI adoption itself to ensure compliance, both internal and external. Knowing who can call which model, how often, and at what cost is a serious barrier to adopting AI at many organizations, particularly those operating in high-compliance environments like finance.
Solution: To address this, data governance needs to be implemented at the foundational level through model access management.
Reducing tool sprawl
Organizations often use too many bespoke tools, creating complex data stacks that amplify and multiply difficulty.
Solution: To solve this, businesses should adopt SQL-first AI workflows that analysts can run directly.
3. How do privacy and governance evolve with AI?
Regulation and risk management demand that governance extend beyond tables to models and prompts. For many organizations, these aren’t afterthoughts. They’re starting points.
To help with this, it’s important to see governance, compliance, and security as operating hand in hand from the beginning. They are foundational and need to be considered as such. This involves:
Minimize data movement
Traditional strategies centralized data from the outset, but this is inefficient. Instead, bring the compute to the data using federation.
Govern models like data
Governance and data shouldn’t push against each other. Manage your government goals the same way you would datasets. This means using Role-Based Access Control (RBAC), quotas, and audit logs for AI endpoints.
Meet customers where they are
AI projects succeed when everyone is comfortable, and this is especially true of your customers. To help build comfort, meet your customers where they are by using private endpoints or air-gapped deployments.
Build on open standards
Openness and interoperability are important building blocks to AI success. You can leverage them using orchestration layers like the MCP Server plug AI safely into enterprise ecosystems.
4. What breakthroughs in data platforms shifted thinking this past year?
AI has advanced significantly in a short time. In the last year alone, it has reached a number of milestones, including:
- AI SQL Functions: Summarize, classify, and prompt directly from SQL.
- Vector search in Iceberg: Treat unstructured data as a first-class citizen.
- Model governance: Role-based access, usage telemetry, cost control.
- AI Agents: Agent-assisted discovery and documentation.
- MCP Server: Opening the platform to multi-agent, multi-product AI orchestration.
Together, these innovations shift the conversation from experimentation to governed, operational AI inside the data platform.
Embracing the challenge of making AI operationally successful
The next wave of enterprise AI will be led by organizations that move from experimentation to execution. Success will come to those who unify their data, govern it consistently, and design for real-world action.
With capabilities like vector search, model governance, and agentic workflows now embedded directly into the data platform, the foundation for scalable, responsible AI is here. The opportunity is no longer about testing what is possible, but building what is practical. The companies that start now will be the ones turning data and AI into measurable business impact.
The early failures documented by MIT weren’t wasted. They were the necessary experiments that showed what doesn’t work. Now, with stronger data foundations, governance, and agentic workflows, enterprises are entering the stage where AI projects can succeed at scale.
The real question is no longer “Can AI query my data?” but “How can my data platform power safe, intelligent action across the business?”
At Starburst, we’re addressing exactly these challenges through our federated query engine, open Iceberg-based data access, and built-in AI capabilities like vector search, SQL AI functions, context layer for agents, and model access governance — all designed to make AI operational, governed, and scalable.
Note: This blog post was originally published on LinkedIn. It has been expanded and republished with additional thoughts.
FAQs about AI data platforms
What capabilities are essential for an AI data platform?
An AI data platform must do more than host models. It must operationalize them. At its core, it unifies access to both structured and unstructured data so AI systems can work with full business context rather than isolated datasets. Essential capabilities include SQL-native AI functions, vector and full-text search for unstructured data, and federated query engines that minimize data movement. Equally important is built-in governance, including role-based access to models, usage tracking, cost controls, and auditability. Together, these features ensure AI is powerful, governed, scalable, and directly tied to measurable business outcomes instead of fragile, one-off experiments.
How does a modern data platform support enterprise AI adoption?
A modern data platform unifies fragmented datasets to feed AI models with reliable, governed information. By separating compute from storage and utilizing open formats, these platforms allow organizations to run advanced analytics and machine learning workflows directly on the data source without unnecessary movement or duplication. This operational approach ensures that AI projects move beyond experimental pilots to deliver scalable, measurable business value based on a single source of truth.
What is the benefit of using federated queries for AI data strategy?
Federated queries allow data teams to access and analyze data across disparate environments without physically moving or centralizing the data first. In doing so they significantly reduce the complexity and latency associated with traditional ETL pipelines. This enables real-time access to fresh data for AI models. By analyzing data where it lives, enterprises can accelerate the development of agentic workflows and ensure that models always operate on the most current business context.
How should data governance evolve to accommodate AI and LLMs?
Expand beyond static tables to encompass the management of AI models, prompts, and access controls. This involves treating models with the same rigor as datasets. Implement Role-Based Access Control (RBAC) and comprehensive audit logs to track who invokes specific models and at what cost. Effective governance ensures that agentic AI workflows operate securely within compliance boundaries, mitigating risks associated with unauthorized data exposure or model misuse.
Why is vector search important for a modern AI data platform?
Vector search capabilities allow data platforms to index and retrieve unstructured data by understanding the semantic meaning behind the content. Integrating this feature directly into the data platform enables access to complex data types, turning them into first-class citizens alongside structured rows and columns. This unification is critical for providing Large Language Models (LLMs) with the necessary context to generate accurate, domain-specific insights for business applications.
How do data platforms resolve “discoverability debt” in AI projects?
Data platforms resolve discoverability debt by using AI-assisted documentation and agentic workflows to identify and catalog dark data that often sits unused in silos. By making this undocumented data visible and accessible through a unified interface, organizations can leverage their complete informational assets rather than relying on partial datasets. Visibility like this is essential for training robust AI models that require deep organizational context to function effectively.



