Assembling your AI data strategy
Evan Smith
Technical Content Manager
Starburst Data
Evan Smith
Technical Content Manager
Starburst Data


More deployment options
The steady rise of AI presents new and exciting opportunities for companies in every industry. The future potential is dramatic. According to McKinsey, organizations reap the greatest value from AI when CEOs oversee efforts to rewire the company’s DNA. In this sense, AI is both foundational and transformational. The importance of this makes crafting an AI data strategy a boardroom-level requirement in most organizations.
But where do you start? With each passing month, more options become available. This is a huge opportunity but also a crossroads, requiring a reassessment of existing organizational strategy and solutions.
We’re here to help.
AI technology meets business strategy
It all starts with your business. A successful AI data strategy requires the right data architecture to match the needs of your business. This means one that’s fast and flexible enough to access data wherever it lives, drawing insights from across your organization. Today, the best technology to achieve this is the open data lakehouse. A data lakehouse combines the analytics and governance capabilities of a data warehouse with the flexibility of a data lake. Importantly, it already provides the ideal foundation for data analytics. Now, it also extends that foundation to include AI. This makes the lakehouse the perfect technology needed to secure a successful AI strategy that meets the needs of your business.
But how do you dig into this problem in your own business? In this article, we’ll look at the pillars of a successful AI data strategy, taking you from business need to technological solution. We’ll show you how the data lakehouse fulfills these needs and how this can make your AI data strategy fit your overall data strategy–all without rebuilding everything from the ground up.
What is an AI data strategy?
Let’s start at the beginning. What is an AI data strategy? In essence, an AI strategy is a plan. It translates your business use cases into concrete actions and maps them to AI technologies that will deliver results. It turns the requirements you’ve identified into actionable steps you can take to transform, discover, query, and govern the massive, high-quality data you’ll need to gather for AI from across your enterprise.
Why you need an AI data strategy today
Generative AI (GenAI) is driving the rise in AI use cases in every industry. In enterprise businesses, this usually means agentic AI, an architecture in which AI agents interact autonomously with datasets without direct human oversight. Increasingly, AI agents also interact with other agents, creating positive feedback loops where AI can exponentially increase the value it adds.
The importance of AI models in Agentic AI
Most agentic AI applications are powered Large Language Models (LLMs). These massive neural network models are trained using large amounts of data and operate by making probabilistic predictions based on their training on these patterns. When people talk about AI, this is usually what they mean.
But LLMs aren’t the only way to generate value from AI. For that, you need to look at the importance of context in AI workflows.
Why context is king in Agentic AI
AI is powerful, but it’s also fundamentally different from most data technologies. The insights it derives are not queries but probabilistic results derived from training models and based on context. Ultimately, it’s context here that’s key.
Due to their probabilistic nature, agentic AI systems work better the more relevant, high-quality data you feed them. While LLMs excel at recognizing and mapping general patterns, they know nothing about the specific context of your business. To bridge that gap, you need to add context unique to your own situation, and your own business problems. In data terms, this means adding proprietary data to the mix – data that’s high-quality, fresh, and contextually relevant to your use cases. Once in place, you can use this data either to:
- Supply context to the LLM in queries using retrieval-augmented generation (RAG).
- Fine-tune an existing model to incorporate updated information or adjust its responses to echo your enterprise’s style and tone.
Ok, so you know you need to add context to your AI data strategy. What’s the best way to do that?
The three pillars of a comprehensive AI data strategy
Delivering an AI data strategy isn’t easy. In many ways, it requires rewiring the company, not just technologically, but around a common mission and set of AI use cases that the company believes will have the highest impact on revenue and productivity.
With those use cases in hand, you can craft an AI data stategy that spans the business, breaking down data silos and creating frameworks for sourcing high-quality data in a responsible manner.
Overall, a successful AI data strategy should address the following three pillars:
Access
If you’re like most organizations, your data doesn’t just live in one place. Instead, it lives in dozens or even hundreds of locations, systems, and platforms spread across your organization. These might be cloud systems, on-premises storage, or Software as a Service (SaaS) business apps like Salesforce, Workday, or ServiceNow. To be useful for business, data from these sources needs to be integrated and transformed. In the past, organizations handled this via mandatory data centralization – a one-size-fits-all solution that resulted in significant project delays and restricted who had access to data.
A successful, modern AI data strategy requires guaranteeing fast access to data where it lives. This means employing data connectors to prevent the formation of disconnected data silos. With data connectors, you can enable cross-system access to data using a SQL query engine such as Trino or Presto.
Using this approach, you start with fast, decentralized access to your data. After that, you can choose to centralize critical workloads for better performance or governance. The choice is always yours. Open table formats, such as Apache Iceberg, enable fast data ingestion using either bulk file loading or streaming protocols.
Collaboration
Curating data for AI requires all hands on deck. For that to work, it’s important to make publishing and discovering curated sets of high-quality data as easy as possible.
The best solution to this problem is to use data products. Data products wrap around datasets, providing attribute-based and role-based access controls. This enables the teams closest to the data to package and share it easily and securely with others. Teams can then share and collaborate as they choose, creating a more scalable data infrastructure that works at higher velocity.
Governance
AI is only as good as the data that feeds it. Agentic AI networks require highly curated data produced at scale to be successful. That requires a governance strategy to ensure high data quality and well-regulated access to sensitive data.
Strong data governance has always been important. Today, it takes on a new dimension in the world of agentic AI. Businesses need to ensure that sensitive data is handled compliantly, even without direct human intervention.
Several technologies are key to addressing governance at scale in an AI data strategy:
- Data products provide trackable lineage and verifiable data quality, which are particularly useful when creating agentic AI workflows.
- Role-based access control (RBAC) limits access to sensitive data based on job role – another essential element for agentic AI use cases.
- Ensuring data is compliant at scale and that agents are interacting with it appropriately—e.g., by using automated tagging to identify and label personally identifiable information (PII), and auditing AI use of data.
How an AI data strategy fits into your overall data strategy
One of the best ways to think of implementing an AI strategy is as an extension of your overall data strategy. Agentic AI, like AI models themselves, is built on data. As such, you can think of AI as the newest tool in your data strategy. In this sense, you could see your data architectural strategy as yielding three different business outcomes: Analytics, data applications, and AI & ML.
Each of these use cases has its own specific requirements. At the same time, they also share basic access, collaboration, and governance needs.
The emergence of AI doesn’t obviate the need for analytics and data applications; it’s another requirement we need to incorporate into our architecture. Your data strategy needs to evolve to support all three use cases.
How Icehouse architecture will power your AI data strategy
Existing data architectures, like data warehouses and data lakes, can’t handle data at the scale and velocity demanded today. They’re built for the analytics era and lack key data governance features needed to scale AI effectively.
Starburst’s Icehouse architecture offers a solution. This data lakehouse architecture combines the Iceberg open table format with the Trino SQL query engine. These two technologies enable fast, decentralized access to data with the option of centralizing mission-critical workloads. The Icehouse also provides a superior solution for data governance, thanks to Iceberg’s support for rich metadata.
For many enterprise customers, an Icehouse architecture is ideally suited to handle both the current generation of AI use cases and future ones down the road. Most importantly of all, this solution works in tandem with your analytics needs, extending a solid foundation from BI to AI.
How Starburst makes your AI data strategy real
Starburst knows data best. Our Icehouse architecture unlocks the value of the Apache Iceberg for AI and analytics. It provides a single point of access, collaboration, and governance via:
- A fast and scalable query engine and ingestion framework powered by Iceberg and Trino.
- Data products that power self-service pipelines built around collaboration.
- Powerful data governance, allowing you to create AI on-premises, in the cloud, across clouds, or a combination of all three. This helps to drive AI workloads in the most sensitive and high-compliance industries.
We’re also rolling out new features that will make scaling your AI data strategy even easier. Stay tuned!
Starburst and your AI data strategy
A successful AI data strategy requires a data architecture that can scale and evolve to meet the shifting needs in this rapidly growing space. With Starburst and the Icehouse architecture, you can provide a single location for data access, collaboration, and governance. Starburst fits into your existing data architecture, future-proofing it for AI.
To learn more about how Starburst can accelerate your AI data strategy, contact us today.