×
×

5 Challenges of data warehouses

“Our previous legacy data warehouse was cumbersome, with over-engineering and constant failures. Starburst and Iceberg helped us eliminate these challenges, providing a sturdier, more reliable platform. We’ve moved from constant firefighting to a relaxed and easy-to-use environment.”
– Shawn Crenshaw, Director of Data, Yello

Data warehouses are an important part of the data landscape. 

Many companies choose to use data warehouses, either on their own or alongside other data solutions. Whether or not a data warehouse is the best solution for a particular use case depends on the specifics of that use case. 

Modern Data Lakes For Dummies

Data Mesh Book Cover

Get your free copy

Some things to consider when using a data warehouse:

  • How much your company wants to invest in infrastructure
  • The types of questions and business processes your company needs to answer, and how much those questions may change in the future
  • Where your company is in its data analysis journey
  • Deciding what data needs to be in a central repository

What are the benefits of data warehouses?

  1. Unlike data mining, data warehouses allow data consumers to quickly and efficiently access data after it has been loaded in.
  2. The data in data warehouses can be queried by end users of many different skill levels because it is structured in a pre-defined schema.

What are the challenges of data warehouses?

1. The data in data warehouses must be structured

 To achieve this, it must be processed before it can be loaded into the data warehouse. This can be both time and resource intensive.

2. Data warehouses typically hold historical data

However, this can lead to data warehouses becoming so large that the storage costs become too expensive to justify. This may lead to older historical data being discarded even though it might still have some value.

3. Data warehouses must be designed before they are built

This means that they are not flexible for new use cases that might occur after they are created.

4. Single source of truth 

However, there are always new sources of data. Given the schema-on-write nature of a data warehouse, there is significant effort required to add new data into a data warehouse. This constant battle between new data sources coming in and the effort needed to add them means a data warehouse rarely achieves true “single source of truth” status.

5. Data warehouses do not work well with all data types

For example, video content, audio content, and data contained in document form are not amenable to data warehouse storage.

Related reading: Unstructured data 

Types of data stored in a data warehouse

Some types of data lend themselves well to storage within a data warehouse. For example, financial transaction data, operational data, customer relationship data, and enterprise resource planning data are typically stored in a data warehouse. 

However, organizations don’t typically store all of the data they collect in a data warehouse. To do so would be cost-prohibitive in terms of both volume and database administrator bandwidth. 

Social media data, documents, and sensor data are some examples of unstructured data that might not be stored in data warehouses because they cannot be easily consolidated or structured. Data of this type is typically handled by other technologies, such as data lakes or data lakehouses, that do not restructure data before it is stored.

Some organizations use a data warehouse as their only analytical data repository. In these organizations, data analysts would only have access to data stored in a data warehouse. This could be limiting because data warehouses might not store all of the data the organization collects. 

Whether or not this is a problem depends on the questions that the organization needs to answer. If new questions need to be answered or new data becomes available, it can be difficult to adjust the data warehouse. If this is a problem, the organization might consider using a data lake alongside its data warehouse or using a data lakehouse to improve the lifecycle of their data.

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.