Dremio Acquired By SAP

What does it mean for customers?

Share

Linkedin iconFacebook iconTwitter icon

More deployment options

Dremio has always been one of the choices available for enterprise customers looking for a compute engine centered on data lake and data lakehouse architecture. Its promise was fast SQL on Iceberg, solid reflection-based caching, and a clean interface for analysts working primarily in a lakehouse. 

But there have also been problems. For example, its federation breadth has never matched its acceleration story, and its concurrency at enterprise scale has been a documented weak point. These weaknesses of Dremio have long been documented, but there is now a new problem added to the mix. Dremio was recently acquired by SAP, a move that now puts its entire strategic direction in question at exactly the moment when agentic workloads are moving from pilot to production. 

Let’s unpack this news and dive into some of the implications this could hold for Dremio customers. 

Dremio’s roadmap now belongs to SAP, not its customers

The first point to note is that acquisitions themselves change things. And just like all acquisitions, this one comes with an inevitable change in focus, direction, and a change in the level of independence. Until now, Dremio has been an independent company with its own engineering priorities, setting its own product direction based on customer feedback and competitive pressure in the data platform market. After the acquisition closes, those priorities belong to SAP, a company with a very different set of existing products, partner relationships, and reasons to steer customers toward consolidation on its own platform. 

What this might mean? 

It’s too early to say what this might mean overall, but we know from looking at previous acquisitions that it can impact the product roadmap and direction. For example, features and integrations that Dremio was building independently will now have to compete for budget and attention inside a much larger organization where Dremio is one piece of a much bigger portfolio. Strategic shifts are already visible, with core cloud roadmaps feeling the impact; for instance, Dremio’s Cloud Enterprise edition has effectively been put on hold while the deal works through approvals.

Furthermore, this shifting corporate focus is already creating immediate friction at the product level. As of Dremio 26.0, enterprise customers deploying on Kubernetes automatically transmit operational telemetry data back to Dremio’s corporate endpoint over port 443—tracking that their own documentation describes as on by default and notes is “not considered a best practice” to disable because it directly drives their usage-based billing calculations. For organizations with strict data governance requirements, sovereign cloud requirements, or entirely air-gapped environments, this mandatory approach introduces a massive regulatory compliance headache at the exact moment they need architectural stability.

Dremio’s “open” positioning gets harder to defend under SAP

One of Dremio’s core appeals has always been its positioning around open formats and open ecosystems. The idea has always been that it gives customers long-term flexibility without locking them in. That pitch resonated because it addressed a real problem. Proprietary data platforms extract switching costs that compound over time, and openness was a meaningful differentiator in a market where lock-in is the norm.

That pitch is harder to make as part of SAP, a large proprietary software vendor with its own data products, its own cloud, and strong financial incentives to encourage customers to consolidate on SAP technology over time. Dremio’s query engine was never fully open to begin with. It runs on a proprietary engine and supports limited open formats beyond Apache Iceberg, which already narrowed the openness story considerably. 

But the open lakehouse model only works if your query engine, table format, and catalog are interoperable across the ecosystem, not just adjacent to open-source components. Dremio’s native connector coverage of 28 connectors is also substantially lower than alternatives like Starburst (52+), which severely limits how open your data federation story can be in practice when you’re trying to reach the full range of modern cloud, warehouse, and SaaS systems your organization actually runs. 

Dremio’s federation engine struggles with the concurrency that agentic workloads demand

Before the acquisition, Dremio was already facing documented questions about how well its federated query engine scaled under real enterprise conditions. At high concurrency on large datasets, users routinely encounter memory errors and query problems. Standardized performance testing demonstrates that Dremio’s architecture routinely fails to complete queries at a concurrency of just 2+ on SF1000 datasets, revealing instabilities that surface quickly when multiple users or workloads hit the engine at the same time and that become much harder to manage as usage grows.

Data federation is only useful if it holds up under load, and the promise of querying across your entire data estate from a single engine only matters if the engine can handle the concurrency demands of real workloads. 

For AI use cases, this problem is more pronounced because an AI agent querying your data via text-to-SQL may be running dozens of complex federated queries in parallel as part of a single task. If the underlying engine struggles at a baseline concurrency of two users, that failure mode compounds quickly when parallelized AI agents need to consume data products across many sources simultaneously, and the agents your business is counting on start returning memory timeouts and errors instead of answers.

Dremio’s semantic layer labels data but doesn’t enforce business rules

Dremio has been building toward a semantic layer that helps standardize how data is described and accessed across teams, and the vision behind it is useful: a shared vocabulary for data that reduces the time analysts spend reconciling definitions and helps AI agents understand what they’re querying. 

The problem is that Dremio’s current semantic layer merely organizes and labels data without packaging metadata, access rules, or business constraints into a productized, enforceable framework. Because the definitions it provides function as suggestions rather than hard constraints, different teams and different AI agents can still interpret the same underlying data in incompatible ways, and the platform has no mechanism to stop them.

That gap matters more as AI becomes a first-class user of your data. Agentic workloads require a governed foundation, a consistent, enforced understanding of what data means, how it should be used, and who can see it. Without that active governance, AI agents produce outputs that look authoritative but reflect whatever interpretation the agent happened to apply, which is the kind of inconsistency that erodes trust in AI-generated analysis. SAP has talked about a “knowledge graph” approach to solving this problem, but that remains on a roadmap rather than a shipped product that customers can evaluate today.

Acquisitions like this tend to slow roadmaps and deepen lock-in

Large software companies acquire data platforms to add technology to an existing suite, to acquire a customer base, or to push a strategic bet into a faster lane. In SAP’s case, all three likely apply, and none of those motivations are necessarily bad for Dremio customers in the long run. But the near-term pattern with acquisitions in the data space is consistent. The acquired product’s roadmap slows while integration work gets prioritized, and customers who signed up for one product’s vision often find themselves being guided toward consolidation with the acquirer’s broader platform over time, sometimes by design and sometimes simply because that’s where the engineering investment goes.

This is the same dynamic that drives the TCO and vendor lock-in problem that data teams have been navigating for years. The costs of building on a platform compound in both directions. When the platform is investing heavily, you benefit from that momentum, but when investment slows or redirects, the switching costs you’ve accumulated make it hard to leave. GenAI workloads in particular require an open data foundation that can reliably reach your full data estate, and that access can’t depend on a single vendor’s continued investment in the connectors, concurrency handling, and governance tooling that make it work.

New Dremio customers should wait until the acquisition settles

For most organizations evaluating a data platform today, now is not the right time to make a new commitment to Dremio. Buying into a platform at this stage means your contract terms, your support relationships, and your renewal pricing are all subject to change as SAP rationalizes its product portfolio, integrates Dremio’s team, and decides which parts of the Dremio roadmap align with SAP’s own priorities. 

The on-premises path now pushes teams heavily toward Kubernetes container deployments, and there is no documented upgrade route from v25 to v26, which means existing on-premises customers are already navigating uncertainty that new customers would be walking into knowingly. If your use case is narrow, say fast SQL acceleration on an Iceberg lakehouse where you own all the data and have no meaningful need to query across external systems, Dremio’s core engine is still capable and the acquisition may not change much for you in the near term. Most enterprise data teams, though, need to reach databases, warehouses, SaaS tools, and on-premises systems, and they have AI workloads that need to query across all of it reliably at scale. For those teams, limited connector coverage, concurrency weaknesses, and an unresolved acquisition make a new Dremio commitment difficult to justify when solid alternatives are available today.

Existing Dremio customers have a short window to evaluate alternatives on their own terms

The SAP acquisition hasn’t fully closed, which means the deal is moving toward an expected close in Q3 2026 while Dremio operates in an interim state. That independence creates a window, probably measured in months rather than quarters, during which existing customers can run an unhurried evaluation of alternatives without the added complexity of a post-merger support org, without waiting to see how SAP reprices the product, and without the negotiating dynamic that comes with trying to exit a platform that a large vendor has an interest in keeping you on. Once the deal closes and SAP begins integrating Dremio into its broader platform, switching costs go up in ways that are hard to predict in advance.

The more important reason to look now is that agentic workloads don’t wait for procurement timelines, and the organizations already running AI agents in production are doing so on platforms that were chosen months ago. The data platform decisions you make in the next few months will determine whether your AI agents have access to governed, consistent, high-concurrency data, or whether they’re constrained by a platform that wasn’t designed for that kind of demand. If you’ve been considering moving to an open data lakehouse anyway, this is a reasonable forcing function. If you’re not sure whether you’re ready, there are clear signals that tell you when a lakehouse move makes sense. And if cost is part of the conversation, reducing data warehouse costs with an open Icehouse foundation doesn’t require a full migration overnight.

Starburst offers broader federation, better concurrency, and a governed context layer

When evaluating alternatives, the things that actually matter are connector coverage across your full data estate, query performance at production concurrency levels, a semantic layer that enforces business rules rather than just labeling data, and a vendor whose roadmap is their own. 

Starburst was built on the fastest version of the Trino engine, a massively parallel, distributed SQL query engine designed specifically for federated, multi-source workloads at enterprise scale. It seamlessly covers 52+ native connectors compared to Dremio’s 28, runs 30–40% faster in head-to-head federated benchmarks, and costs up to 55% less over a comparable period.

Crucially, Starburst’s Context Layer and AI-Ready Data Products govern how every team and AI agent reasons over your data, not just what it’s called. Backed by active features like AIDA and native Model Context Protocol (MCP) server endpoints, Starburst packages metadata, business rules, and access policies into a single package. This guarantees that your autonomous AI models and human analysts are always deriving answers from a consistent, trusted, and fully independent ground truth.

The SAP acquisition may work out fine for Dremio customers who are already committed to the SAP ecosystem and see Dremio as part of a broader SAP consolidation. For everyone else, it’s a reasonable moment to pressure-test your assumptions about where your data platform is headed.

FAQ

Does the SAP acquisition mean Dremio’s product is going away?

Not immediately, but the roadmap is no longer Dremio’s to control. SAP will set Dremio’s development priorities in the context of its own product portfolio, which means features Dremio was building independently are now contingent on SAP’s investment decisions rather than on Dremio’s own competitive roadmap. Key offerings like Dremio’s Cloud Enterprise edition are already on hold during this transition.

What are the most immediate practical concerns for current Dremio customers?

On-premises deployments now require Kubernetes, there’s no documented migration path from v25 to v26, and as of version 26.0 on-premises Kubernetes deployments automatically transmit telemetry data back to Dremio’s corporate endpoint by default to calculate usage-based billing.

Why does federation breadth matter more now than it did three years ago?

Agentic workloads query across your entire data estate: databases, warehouses, SaaS tools, and data lakes, often in parallel and at high concurrency. A federated query engine with 28 connectors and documented instability at scale isn’t built for that demand, and the gap between what Dremio covers and what a modern enterprise needs to reach has become harder to work around.

What’s the difference between a semantic layer that labels data and one that governs it?

Labeling tells users and agents what data is called. Governing packages metadata, precise business logic, and role-based access controls into a productized layer. Without active governance, different teams and AI agents will continuously draw different, non-compliant conclusions from the same data, and the platform has no mechanism to enforce consistency.

Is Dremio a good fit for agentic workloads?

In most enterprise environments, no. Agentic workloads generate high volumes of parallel, multi-source text-to-SQL queries, which is where Dremio’s structural concurrency limitations and connector gaps show up most clearly. Add to that a semantic layer that labels data without enforcing business rules, and you have a platform where agents can access data but can’t be guaranteed to interpret it consistently, which results in timeouts or authoritative hallucinations that undermine the value of the agents themselves.

Is now a good time to become a Dremio customer?

For most organizations, no. Committing to a platform mid-acquisition means accepting contract terms, support structures, and pricing that are all subject to change as SAP integrates Dremio into its portfolio. Unless your use case is narrow enough that Dremio’s data lake-centric acceleration covers everything you need, the combination of product uncertainty and available alternatives makes a new commitment hard to justify right now.

Should we consider Dremio alternatives?

Yes, and the timing is better now than it will be after the acquisition closes. While Dremio is still operating independently, you can run a clean evaluation without post-merger complexity or pricing changes. More importantly, if your organization is moving AI agents toward production, the data platform you choose in the next few months will directly shape whether those agents have access to governed, high-concurrency data or whether they’re constrained by a platform that wasn’t designed for that demand.

Is Starburst a drop-in replacement for Dremio?

For Iceberg and data lake or lakehouse acceleration workloads, the migration path is straightforward and immediate. For teams that need broader federation across 52+ sources, robust enterprise concurrency that scales past the 2+ concurrent user wall, and fully productized AI data governance via an open, independent platform, Starburst covers more ground than Dremio does, without a pending acquisition complicating the roadmap.

 

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.
Start Free