
Cloud data warehouses (CDWs) became the default analytics platform because they made structured BI fast and easy. Snowflake, BigQuery, and Redshift still power huge parts of modern data stacks—and for many teams, this data infrastructure fulfills its primary mission. But the moment your data stops looking like “clean tables for dashboards,” warehouses can become an expensive center of gravity rather than a helpful one.
The real question isn’t “Is my warehouse bad?”. It’s whether a warehouse still makes sense as the default platform for the shape of data and workloads you have now and in the future.
TL;DR
- Keep your cloud data warehouse if most of your value comes from structured BI/reporting and costs stay predictable.
- Re-evaluate when raw/semi-structured data is growing fast, ad-hoc concurrency is spiking, or ML/streaming teams keep working around the warehouse.
- Open lakehouse setups (object storage + open table formats + separated compute) can reduce lock-in and let you use different engines on the same data.
- Migration risk is driven more by proprietary features and org readiness than by SQL itself; phased moves beat all-in-one migrations.
Understanding the current landscape
CDWs emerged as the solution to on-premises infrastructure limits, including:
- Elastic compute
- Managed operations
- Consumption pricing.
They’re excellent at what they were built for, namely structured analytics with clear, repeatable SQL workloads.
But data has become messier. Modern teams work with semi-structured and unstructured data, event streaming data, iterative ML and feature engineering, and “explore first, model later” workflows. These patterns don’t naturally fit the model of loading data into a proprietary warehouse and then querying it.
This mismatch is why the lakehouse pattern took off. The idea is straightforward. Data engineers keep data in low-cost object storage, add reliability through open table formats, and run compute engines as needed. If you want a deeper primer on that architecture, see Starburst’s neutral overview of the data lakehouse model.
Open table formats, such as Apache Iceberg, Delta Lake, or Apache Hudi, make this possible by bringing transaction support, schema evolution, and time travel to data lakes. For a quick explainer on what “open table formats” actually add, this open table formats guide is a useful reference.
Once your lake can behave like a warehouse, the question shifts from whether you can make the switch to when you should. If you’re comparing formats, see Iceberg vs. Delta Lake and the broader Iceberg, Delta, and Hudi comparison.
Signs your current solution may be falling short
Replacing a warehouse is a big move. Don’t do it because it feels like the default. Instead, do it because the signals are already there. Here are the most common patterns that suggest it’s time to reconsider.
1. Costs rising faster than value
This isn’t just about your cloud compute or cloud storage bill going up. It’s more than that. Watch for spending that grows faster than your data volume or query count. If your team fights the metering model through aggressive caching, retention hacks, or query gymnastics, that’s a red flag. When exploratory analysis gets labeled as “too expensive to run here,” you’ve stopped using your warehouse for what warehouses do best–helping people understand data.
When you’re paying premium warehouse prices to store and scan raw history, a lakehouse storage layer often becomes the more rational default. For more information on this model, check out Starburst’s article on reducing warehouse costs with an Icehouse. It provides a concrete view of how teams effectively shift cold/raw data.
2. Concurrency and latency friction
CDWs can scale, but they usually scale by costing more. You’ll see this as analyst queueing or “warehouse wars” during peak hours, followed by periodic performance-tuning sprints just to stay afloat. Large bursts of ad-hoc users force constant resizing decisions. These symptoms suggest your workload shape has outgrown your data model.
3. Workload mismatch (especially ML and streaming)
Warehouses shine when used for BI workloads. They struggle when you force everything through them. Overall, they require a usage model that justifies the high cost of the warehouse itself. Often, this isn’t how most people use data.
For example, if your ML teams export or duplicate data into Spark, Flink, or feature stores anyway, you’re already admitting the warehouse isn’t the right tool. Similarly, if real-time pipelines have become a patchwork of pre-aggregation, separate streaming stores, and shadow copies, you’re maintaining multiple systems because no single platform covers batch, streaming, and BI cleanly.
A lakehouse doesn’t magically fix streaming or ML—but it gives you a shared, open foundation to run the right engine for each job. For mixed-engine setups, this open data lakehouse overview is a good map of the pieces.
4. Data gravity drifting to object storage
If your “landing zone” is already cloud object storage, whether Amazon S3, ADLS, or GCS, and the warehouse only holds a curated subset, you’re halfway to a lakehouse, whether you call it that or not. At that point, ask yourself why the warehouse is still the primary home of the data when most of it lives elsewhere.
5. Lock-in blocking strategy
Vendor lock-in is fine until it isn’t. You’ll feel this when migrations seem prohibitively expensive because of proprietary formats or functions, when you avoid best-of-breed tools because they don’t integrate cleanly, or when multi-cloud or M&A requirements are looming. Open formats and engines are specifically designed to keep data portable across compute choices.
If you want a focused explanation of portability and why Iceberg is often the center of that discussion, see what Apache Iceberg is and why teams use it.
The case for open architectures
Open lakehouse architectures separate storage from compute. Storage sits in object stores using open table formats, while compute is elastic, swappable, and workload-specific. One dataset can serve interactive SQL for BI, batch processing for heavy transforms, and specialized ML frameworks for training and scoring—all without duplication.
This flexibility is why lakehouses are common in mixed workload environments. Cost is the other driver. Object storage is priced for scale, not for boutique analytics. When you store raw and historical data in a lakehouse and spin compute up only when needed, your economics often get simpler and cheaper over time.
For a concrete illustration of how open lakehouse parts fit together (Iceberg plus a SQL engine), Starburst’s Icehouse Manifesto lays out the concept.
If you’re not ready to replace, shift the default
Plenty of organizations don’t rip out their warehouse. They rebalance instead. Keep the CDW as a high-polish BI serving layer. Move raw, semi-structured, or cold history into open table formats on the lake. Query across both while you migrate the workloads that benefit most. That hybrid posture lowers risk and lets you prove value before committing.
Evaluating migration complexity
Migration difficulties usually come from two sources.
First is technical dependency on proprietary features. If your warehouse usage is mostly standard SQL, moving data and translating queries is manageable. But if you rely heavily on platform-specific SQL functions, specialized caching or indexing behavior, or stored procedures and proprietary ML features, expect more refactoring. This is the real technical debt.
Second is organizational readiness. Open architectures can add operational complexity, especially for teams used to fully managed CDWs. Managed lakehouse services are improving fast, but they aren’t identical to the warehouse experience. Be honest about your team’s appetite for change, their ability to support new engines, catalogs, and governance models, and whether you can run two systems in parallel for a while.
A pragmatic, phased migration path
The safest way to decide is to migrate the minimum meaningful thing. Start by proving the concept. Pick one high-cost or high-friction workload, land its data in an open table format, and run it with a second engine in parallel.
Once that works, expand to adjacent workloads, standardize your metadata and catalog approach, and validate governance and SLAs. Finally, rebalance by keeping the warehouse where it excels (dashboards, polished BI, niche features) while letting the lakehouse become the default home for everything else.
Making the decision
Keep your CDW if your core workloads are structured BI and reporting, performance is stable without constant tuning, costs are predictable, and vendor lock-in isn’t strategically painful.
Shift toward a lakehouse if raw and semi-structured data is ballooning, concurrency and ad-hoc growth are driving spend, ML and streaming workloads keep escaping to other systems, or portability matters to your roadmap.
In practice, the best decision is usually not a binary swap. It’s a deliberate shift in what you treat as your default data platform.
Conclusion
There’s no universal answer. Cloud data warehouses are still great for structured BI and managed simplicity. But when your workload mix changes—more raw data, more engines, more real-time, more ML—the warehouse can become a costly bottleneck.
That’s why many teams are moving their default storage to an open data lakehouse foundation while keeping data warehouses for what they do best.
From Starburst’s perspective, this open approach is often described as an Icehouse architecture: a lakehouse built on Apache Iceberg and Trino, delivered in a way that lets teams keep data open while getting warehouse-like performance. If you want the quick “what is it?” framing, see Icehouse 101.
If you’re exploring this path, the best first step is the pragmatic one. Begin by offloading one breaking workload, prove the economics and performance, then expand from there.
For hands-on engine context, Starburst’s Trino + Iceberg primer is a good follow-up.



