Why Context is the Missing Element in Your AI Data Strategy

And why Starburst is the context layer that your AI needs

Share

Linkedin iconFacebook iconTwitter icon

More deployment options

What’s the most important part of any AI data strategy?

Many companies don’t know, or else assume that the problem doesn’t matter. After all, AI is supposed to make life less complicated, and it’s often assumed that this extends to thinking about the data that goes into an AI workflow. But to be successful in production, AI requires not only a data strategy but also an awareness of the types of data that matter most in differentiating between AI success and failure. 

We’re not making this up. Some surveys contend that as many as 95% of AI projects never make it to production. By 2027, Gartner predicts, 40% of current AI projects may end up in the dustbin if they don’t manage to understand one key difference. Context. In the realm of AI, all roads lead to context. 

Why? Simple. AI solutions that work well on the general problems they’re trained on, but hit the buffers when they enter production. The difference is all down to context. Real-world users confronting the software with a wide range of questions expose how little the system truly knows about its domain. That’s exactly where a lack of context becomes a major drawback. Generative AI solutions need the right context, data quality, and data sources to succeed. 

In this article, I’ll dig into exactly why context is so important and how to set up your AI agent for success without rebuilding your data architecture from the basement up.

Why context is the non-negotiable part for any AI data strategy

There are several factors that go into an AI data strategy. All of them are important, but each in different ways. For example, you can’t successfully launch an AI agent within an enterprise business without solid data governance and access controls.

Context, however, is the one non-negotiable element of an AI data strategy that often goes overlooked. I define “context” as any data that provides critical information on your business or AI use cases.

Why is context important?

Why is context important? A Large Language Model (LLM) brings a lot to the table, including its ability to process natural-language queries and generate unique, human-consumable outputs. But AI models only possess general knowledge derived from their primary training datasets. They lack specific details about your business and use case. The newer and more innovative your use case is, the less the model knows. And since many areas of your business are unique to your organization and how it operates, without that context, your AI agents will struggle. 

Given this, context can come in various forms, including:

  • Existing data stores, such as data warehouses
  • Semantic layers that define centralized metrics using standardized terminology
  • Unstructured data, such as PDFs (annual reports, financial filings, case law, academic studies, etc.)

Just as importantly, such context often can’t be consumed raw. Instead, it’s first converted to a format, such as vector storage, making it easy to search using proximity-based techniques, such as nearest-neighbor. This retrieved data is then sent to the AI model in queries so it can deliver better, more accurate outputs.

What enables context in AI

Sounds simple enough. The problem is, context exists across your company. It’s often scattered and hidden in places you may not even know exist, that is, if it’s written down at all.

Enabling context means enabling access to data across your enterprise. It means setting down how things operate and the logic behind them, and then providing access to all of that context from one central point. 

And that means enabling federated data access.

How data federation provides the necessary access to context

Federated data access frees you from the task of centralizing all of your data. Centralization, after all, is often a fraught endeavor that seldom ends well. We’ve written on that topic elsewhere

In contrast, data federation provides a centralized point of access to all your data, no matter where it lives, allowing you to discover, test, and consume it across your enterprise. This is the data foundation on which scalable AI workflows depend.

Lighting up federated data access means that:

  • Everyone can find the high-quality data they need to power their AI use cases
  • Important context doesn’t remain lost in hard-to-find data sources
  • Data is available today, not after a long, expensive, and (likely) doomed data centralization effort

Why companies don’t pay enough attention to context (and why this fails)

Context is really the core of the AI problem at this point in time. It’s why an AI-powered agent might seem to work under limited, artificial testing, but fail in production. It’s why AI agents fail in their tasks or models hallucinate. Solving the context problem, therefore, is critical to AI success. 

How do we get there? The first step is to think of context in the right way. In general, companies tend to fall into one of three traps when thinking about context. 

“We have all the data we need”

It can often seem like your AI projects already have all the data they need. After all, iIf an agent responds well to your test questions, you’re all set, right? Not necessarily. 

Many teams, buoyed by a strong relevance score during testing, rush to production without considering how the agent will operate in real-world conditions. Faced with a need for context, suddenly, things begin to fail.

The truth is, if you’re not using data federation to access data from across the company, you won’t have all the data you need. Only federated access ensures you can unlock valuable data previously hidden in data silos that have remained isolated and ignored. Without this, AI-driven decision-making is built on an incomplete data foundation.

“We’ll put everything we need all in one big bucket”

This is really the data centralization problem, come again. And as with analytics, centralization has all of the same problems seen in analytics.

In and of itself, data centralization isn’t a bad thing. Some data does need to be centralized for performance reasons. But mass, indiscriminate data centralization projects are risky and failure-prone. What’s more, they often suffer from a kind of infinite regress, requiring more and more data to provide adequate value to get started. In the end, they delay the launch of new AI initiatives as everyone waits for data to become available.

And there’s another point. Context is always, constantly being created by the humans who operate in your organization. Context proliferates and propagates. This represents an even larger problem for data centralization because it can never catch up to the moving target of new contextual data being created. 

Infinite regress returns once more. 

“It’s an organizational problem”

It can be tempting to see context as a problem that will one day be solved through adequate systematization. After all, the existence of data silos is often a sign of some sort of structural problem in the organization. As a result, companies might sometimes conclude that getting the data they need is a matter of shuffling around deck chairs and restructuring the business.

The problem with this way of thinking is that organizational change is not really a problem to be solved but rather a state of being. Organizations are living, breathing things, and change is part of the ongoing state in one way or another. And that means contextual problems will constantly be created. There is no endpoint to context, nor is it an organizational problem that can be solved any more than an enterprise can be stopped entirely and held in a suspended state. To move forward, context will be created one way or another. 

Enabling access to the AI context you need today

So if you can’t eliminate context, and you can’t use AI without it, you need a way to capture that context as it’s created. It’s a problem the AI industry is only now waking up to. 

Data federation is the answer to the context problem that AI needs

Luckily, there’s a solution in the form of federated data access. With federation, you can start building and vetting your AI agents today using federated data. As time goes on and you gain a better sense of usage patterns and performance, you can move business-critical data into formats such as Apache Iceberg. The Iceberg format offers superior performance compared to legacy table formats like Hive, while also providing advanced features such as rollback and snapshots.

Starburst is the context layer your AI needs

Starburst is here to help. We’re built to enable exactly this two-tiered approach to AI projects. You can quickly access your data across the enterprise, enabling seamless collaboration and enterprise-grade data governance. Starburst handles everything from data ingestion (batch and streaming) to table maintenance and capacity management, enabling you to power the context your AI agents or other AI workloads need with minimal overhead.

Starburst is the enterprise data platform for managing data at scale in the AI age, and that means we are the context layer you need to power your AI. Your data teams can easily discover data and package it into data products, making data easier to discover and govern than ever before. AI agents can consume these data products directly, eliminating manual data pipeline bottlenecks and accelerating the lifecycle from raw data to business outcomes. 

Discover Starburst AI Data Assistant (AIDA)

You can also leverage the Starburst AI Data Assistant (AIDA) to prepare data for AI at a fraction of the time and cost it would take to prepare it by hand. Check out the video below to learn more. 

Find out more about how Starburst can get your AI agents moving from prototype to production. Contact us for a demo today.

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.
Start Free