Data Fundamentals

What is Data Mesh?

Data mesh is a new approach based on a modern, distributed architecture for analytical data management. It enables end users to easily access and query data where it lives without first transporting it to a data lake or data warehouse. The decentralized strategy of data mesh distributes data ownership to domain-specific teams that manage, own, and serve the data as a product.

Download O'Reilly Data Mesh eBook

The main objective of data mesh is to eliminate the challenges of data availability and accessibility at scale. Data mesh allows business users and data scientists alike to access, analyze, and operationalize business insights from virtually any data source, in any location, without intervention from expert data teams.

Simply put, data mesh makes data accessible, available, discoverable, secure, and interoperable. The faster access to query data directly translates into faster time to value without needing data transportation.

Why Data Mesh and Why Now?

Global data creation is projected to exceed 180 zettabytes in the next five years. Current data platforms have several architectural failures that hinder enterprise data processing and inhibit business growth.

3 Problems of Current Data Platforms

Problem #1: Until now, enterprises used a centralization strategy to process extensive data with various data sources, types, and use cases. However, centralization requires users to import/transport data from edge locations to a central data lake to be queried for analytics, which is time-consuming and expensive.

How Data Mesh Solves It: The distributed architecture of data mesh views data as a
product with separate domain ownership of each business unit. This decentralized data ownership model reduces the time-to-insights and time-to-value by empowering business units and operational teams to access and analyze “non-core” data quickly and easily.

Problem #2: As global data volumes continue to increase, the query method in a centralized management model requires changes in the entire data pipeline that fails to respond at scale. It slows down the response time to new consumers/data sources as the number of sources increases, which negatively affects business agility to get value from data and respond to change.

How Data Mesh Solves It: Data mesh delegates datasets ownership from the central to the domains (individual teams or business users) to enable business agility and change at scale. Data mesh architecture steers enterprises towards real-time decision-making by closing the time and space gap between an event happening and its consumption/process for analysis.

Problem #3: Data transfer is often susceptible to data residency and privacy guidelines that prohibit data migration if the data is stored in particular geographies or legal jurisdictions, such as data stored in an EU country but needing to be accessed by a user in North America. Abiding by data governance regulations is time-consuming and tedious, and can significantly delay data processing and analysis teams need for critical business intelligence that helps them maintain a competitive advantage.

How Data Mesh Solves It: In decentralized data management, the domains are responsible
for the quality, security, and transfer of their data products. Data mesh provides a connectivity layer that enables direct access and query capabilities by technical and non-technical users to data sets where they reside, avoiding costly data transfers and residency concerns.

Moving Towards Swift Data Access

Data mesh connects siloed data to help enterprises move towards automated analytics at scale. It allows businesses to escape the consumptive trap of monolithic data architectures and save themselves from massive operational and storage costs. This new distributed approach aims to clear the data access bottlenecks of centralized data ownership by giving data management and ownership to domain-specific business teams.

Hear from Data Mesh founder, Zhamak Dehghani

Benefits of Data Mesh

Business Agility and Scalability

Data mesh powers decentralized data operations, independent team performance, and data infrastructure as a service provision, resulting in improved time-to-market, scalability, and business domain agility. It eliminates the process complexities and IT backlog to reduce operating and storage costs.

Faster Access and Accurate Data Delivery

Data mesh offers easily governable and centralized infrastructure based on a self-service model without underlying complexity for faster data access and accurate delivery. Businesses can access data from anywhere with SQL queries with much lower latency. The distributed architecture reduces the processing and intervention layers that delay time to insight.

Flexibility and Independence

Enterprises adopting data mesh architecture are becoming vendor-agnostic businesses that are not locked in with one data platform. The distributed infrastructure allows companies unparalleled flexibility and choices due to connectors to many systems.

Platform Connectivity and Data Security

The decentralized framework allows cloud applications to be connected to on-site sensitive data, which can be live streaming or existing on devices in real-time. Data mesh queries/compiles data analytics where the data resides, instead of requiring users to make a copy and route it through a public network to a data warehouse.

It eliminates the risk of data breach or information loss to improve security and reduces data latency to improve overall performance in various use cases including, live streaming, online gaming, financial trading, etc., through platform connectivity in a distributed model.

Robust Data Governance for End-to-End Compliance

Distributed architecture reconciles data ingestion with its sources, formats, and volumes to allow businesses to control their security at the source system. The decentralized data operations simplify compliance with global data governance guidelines for quality data delivery and ease of data access.

Cross-Functional Teams for Improved Transparency

The centralized data ownership of traditional data platforms isolates expert teams, creates a lack of transparency, and fails to provide contingency against data control/ownership loss. Data mesh decentralizes data ownership by distributing it among cross-functional domain teams, including domain experts, business teams, IT, and agile virtual teams through its domain-oriented approach for improved transparency and data quality.

Data Mesh in Action – Get More Out of Distributed Data

Data mesh unlocks endless possibilities for businesses in various consumption scenarios, including behavior modeling, analytics, and data-intensive applications. It could be core data comprising the business sales data or/and non-core data encompassing web data and clickstream; the distributed architecture enables easy data access and faster delivery without a vendor lock-in with an expensive enterprise warehouse.

Get started with Starburst

Starburst includes everything you need to install and run Trino on a single machine, a cluster of machines, or even your laptop.

Free Download

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.