What is data mesh?

Why data mesh is driven by data federation

April 8, 2024

Starburst Team

More deployment options

Request Enterprise trial license key →

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

Data Mesh Architecture: Domain-oriented Ownership

Data Mesh is a strategic approach to strengthening an organization’s digital transformation journey as it centers on serving up valuable and secure data products. Data Mesh evolves beyond traditional, monolithic, and centralized data management methods, utilizing data warehouses and data lakes. To do this, it uses data federation to access data wherever it lives.

Data Mesh enhances organizational agility by empowering data producers and consumers to access and manage big data, eliminating the need to delegate tasks to the data lake or data warehouse team. A solution for data silos and data integration, data mesh allocates data ownership to domain-oriented groups or business units that serve, own, and manage data as a product. All of which improves data-driven decision-making for data leaders.

Free Data Mesh books

We have lots of information on data mesh, including the assortment of data mesh resources below.

What are the 4 principles of data mesh?

Core Principles of Data Mesh

#1 Domain-oriented data ownership and architecture

To understand what domain-driven data is, we must know what a domain is. A domain is an aggregation of people organized around a common functional business purpose.

Data Mesh proposes that domain ownership is responsible for the management of the data, metadata, and policies, and is created by the business function of the domain. The domains are responsible for the assimilation, transformation, and provision of data to the end-users. Eventually, the domain exposes its data as data products, whose entire lifecycle is owned by that domain.

#2 Data as a product

Data products are produced by the domain and consumed by downstream domains or users to create business value. Data products differ from traditional data marts in that they are self-contained and responsible for aspects such as security, provenance, and infrastructure, ensuring the data remains up to date. Data products enable a clear line of ownership and responsibility and can be consumed by other data products or by end consumers directly to support business intelligence and machine learning activities.

Whitepaper: A guide to data products: creating and managing reusable data assets

#3 Self-serve data platform

The concept of a self-serve data infrastructure is that it is made up of numerous capabilities that members of the domains can easily use to create and manage their data products. The self-serve data platform is supported by an infrastructure engineering team, whose primary concern is the management and operation of the various technologies in use. This illustrates the separation of concerns; domains are concerned with data, and the self-serve data platform team is concerned with technology. The measure of success of the self-serve data platform is the autonomy of the domains.

#4 Federated computational governance

Traditional data governance and access controls can be seen as an inhibitor to producing value through data. Data Mesh enables a different approach by embedding governance concerns into the workflow of the domains. There are numerous aspects to data governance; however, when considering Data Mesh, it is imperative that usage metrics and reporting become part of this definition. Data sharing, usage, and how that data is being used are key data points to understanding the value and, hence, success of individual data products.

What are the benefits of a data mesh?

The implementation of Data Mesh promotes organizational agility for organizations that want to thrive in an uncertain economic climate. All organizations need to be able to respond to changes in their environment with a low-cost, high-reward approach. Introducing new data sources, complying with changing regulatory requirements, or meeting new analytics requirements are all drivers that will precipitate changes to an organization’s data management activities. Current data management approaches are typically based on complex and heavily integrated data pipelines (ETL, ELT) and data ingestion between operational and analytical data systems, struggling to change in time to support the business needs in a timely fashion in the face of these drivers. The purpose of Data Mesh is to provide a more resilient approach to data, enabling efficient responses to these changes.

Related reading: 10 benefits and challenges of data mesh

Why is data mesh a good thing? Data Mesh is a socio-technical approach involving people, processes, and technology

Data Mesh is a ‘socio-technical’ approach that requires changes to the organization across all three dimensions of people, process, and technology. Organizations adopting Data Mesh may allocate 70% of their efforts to people and processes, and 30% to the technology that enables the future Data Mesh state.

People: From the central data team to the decentralization of business domains

Embarking on a Data Mesh journey will result in significant organizational changes and adjustments to employees’ roles. Existing workers will be critical to the success of adopting a Data Mesh, as they have invaluable tacit knowledge to contribute to the Data Mesh journey. Therefore, the transition of data ownership from a central data team to decentralized domain-driven design should be approached as well as a realignment of existing data-focused employees. There are also changes to management hierarchies and reward mechanisms.

Process: Changes within the organization

To promote a sustainable and agile data architecture, implementing Data Mesh will require process changes within the organization. If we consider data governance, new processes around data policy definition, implementation, and enforcement will be required, which will impact the process of accessing and managing data, as well as the processes pertaining to exploiting that data as part of business-as-usual(BAU) business processes.

Technology capabilities to implement and operate a distributed data mesh

Technology capabilities are a key enabler to implement and operate a Data Mesh. New technology is likely to be required for a number of reasons:

Reducing the friction of exploiting across technologies and ensuring the interoperability of those new technologies is likely to be critical.
Enable domains to be self-sufficient and focus on their first-class concern, which is data rather than technology.
Enable the purchase of new data platforms online, and allow the data they expose to be used seamlessly.
Enable automatic reporting of governance aspects across the data mesh, such as data product usage, compliance with standards, and data product feedback.

Whether you should adopt data mesh: Advice from Zhamak Dehghani, the founder of the Data Mesh paradigm and former director of technology at ThoughtWorks

The truth is that Data Mesh may not be the correct fit for every organization. Data Mesh is primarily aimed at larger organizations that encounter uncertainty and change in their operations and environment. If your organization has small data needs that remain constant over time, then Data Mesh is likely an unnecessary overhead.

Related reading:

Learn all about strategy, implementation, and execution of Data Mesh firsthand from Zhamak Dehghani.

Data lake vs data mesh

The data lake is a technology approach, whose main objective has traditionally been as a single repository to move data to in as simple a manner as possible, where the central team is responsible for managing it.

Sure, data lakes provide significant business value with raw and open file formats and reduce storage costs. They also suffer from several concerns, with the primary issue being that once data is moved to the lake, it loses context. For instance, we might have multiple files defining a customer, each from a different system —logistics, payments, or marketing. Which one is most accurate for real-time data analysis?

Furthermore, since the data in the data lake has not been pre-processed, data issues will inevitably arise. The data consumer will then typically have to liaise with the data lake team to understand and resolve data issues, which becomes a significant bottleneck to using the data to answer the initial business question.

In comparison, Data Mesh is more than just technology; it combines both technology and organizational aspects, including the idea of data ownership, data quality, and autonomy. So consumers of data have a clear line of sight around data quality and data ownership, and data issues can be discovered and resolved much more efficiently.

Ultimately, data can be used and trusted.

Related reading: Data Mesh vs Lake vs Warehouse vs Fabric

Data fabric vs data mesh

The key distinction between a data fabric and a data mesh lies in their focus: a data fabric is a technological approach, whereas a data mesh is about organization, people, and technology.

Data fabric focuses on a collection of various technological capabilities that collaborate to produce an interface for end-users who consume data. Many supporters of data fabric advocate for automation through technologies like ML to simplify data management tasks, enabling end users to access data more easily. For simple data usage, there is some value in this; however, for more complex situations or where business knowledge needs to be integrated into the data, the limitations of Data fabric will become apparent.

Arguably, a Data fabric could be used as part of a Data Mesh self-serve platform, where the data fabric exposes data to the domains that can then embed their business knowledge into a resulting data product.

As Darnell-Kanal Professor of Computer Science at the University of Maryland at College Park, Daniel Abadi notes that the distinction between a Data fabric and a Data Mesh is not immediately apparent. He advises, “Ultimately, an optimal solution will likely take the best ideas from each of these approaches.”

Related reading:

What does the Data Mesh look like?

Data Mesh Architecture: How to integrate data mesh with your ecosystem

Organizations ready to implement Data Mesh will need help connecting their data sources for a quick win with Starburst. Below, we highlight how to:

#1 Connect to data sources where it resides

As you begin your Data Mesh journey, the first step is to connect to data sources. A key Data Mesh implementation principle is to connect your enterprise data by leveraging your existing investments: lakes or warehouses, on-premises or in the cloud, structured warehouses or non-structured lakes. Unlike the single-source-of-truth approach to centralize all your data first, you’re leveraging and querying the data where it resides. It is the first Data Mesh win for many Starburst customers, as our 40+ connectors enable the ability to connect to data sources.

#2 Create logical domains

After establishing connectivity across all data sets, the next goal is to create an interface for business and analytics teams to find their data. In data mesh terms, we call that a logical domain. It’s called logical because we’re not moving data into a repository for data consumers to access. Instead, we’re creating a logical place where they can log in to a dashboard as a semantic layer to view the data that’s been made available to them.

All the data you need resides in your domain, alongside domain teams empowered to work autonomously. In essence, we’re promoting the concept of self-service, empowering data consumers to take more control independently.

#3 Enable teams to create data products

Once you provide domain teams with the necessary data, the next step is to teach them how to convert this data into useful products. Then, with a data product, create a library or a catalog of data products that you can share.

Starburst has a built-in data catalog that enables you to very quickly search, discover, and identify data products that might be of interest and improve the lives of data scientists and data engineers.

Creating data products is a powerful capability, as you’ve enabled your data consumers to quickly move from discovery to ideation and insight, because we’re creating and using data products across the organization.

How to build and manage a data mesh approach

Those eager to embark on their Data Mesh journey for democratization and scalability will find the 90-Day Data Mesh Pathfinder helpful. In fact, many enlist a Pathfinder to help them with this ambitious endeavor. With the right strategy, it is not labor-intensive and there is a low cost, low risk, and high reward exercise.

Start by designing and building your data mesh pathfinder

The purpose of a pathfinder is to explore how Data Mesh will fit into your organization from a technology, people, and process perspective. You’ll also identify your strengths and weaknesses so that when you’re ready to begin your Data Mesh transformation program, we can curate all the learnings from the Pathfinder to accelerate in the areas where you can move quickly, and slow down in the areas where you need remedial work.

Key activities in your data mesh pathfinder workshop

Select the Pathfinder use case and agree on scope (e.g., 1 Domain, 6 Data Products, 3 Data Sources)
Establish Pre-MVP environment for early design and enablement activities
Data product design, refinement, and consumption
Domain Owner, Data Product Owner, and Consumer Enablement Training
Showcase the MVP
Integrate the Data Mesh into your Data Strategy

Related reading: The Data Mesh Pathfinder eBook

Related workshop: 2-hour Data Mesh Pathfinder workshop

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

The Data Engineers Guide to Iceberg v3

What is data mesh?

More deployment options

Start for Free with Starburst Galaxy

Data Mesh Architecture: Domain-oriented Ownership

Free Data Mesh books

What are the 4 principles of data mesh?

#1 Domain-oriented data ownership and architecture

#2 Data as a product

#3 Self-serve data platform

#4 Federated computational governance

What are the benefits of a data mesh?

Why is data mesh a good thing? Data Mesh is a socio-technical approach involving people, processes, and technology

People: From the central data team to the decentralization of business domains

Process: Changes within the organization

Technology capabilities to implement and operate a distributed data mesh

Whether you should adopt data mesh: Advice from Zhamak Dehghani, the founder of the Data Mesh paradigm and former director of technology at ThoughtWorks

Data lake vs data mesh

Data fabric vs data mesh

What does the Data Mesh look like?

Data Mesh Architecture: How to integrate data mesh with your ecosystem

#1 Connect to data sources where it resides

#2 Create logical domains

#3 Enable teams to create data products

How to build and manage a data mesh approach

Start by designing and building your data mesh pathfinder

Key activities in your data mesh pathfinder workshop

Start for Free with Starburst Galaxy

Data Mesh Architecture: Domain-oriented Ownership

Data Mesh: Data as a Product

Data Mesh: Self-Service Data Infrastructure

Data Mesh: Federated Computational Governance

The Data Engineers Guide to Iceberg v3

What is data mesh?

More deployment options

Start for Free with Starburst Galaxy

Data Mesh Architecture: Domain-oriented Ownership

Free Data Mesh books

What are the 4 principles of data mesh?

#1 Domain-oriented data ownership and architecture

#2 Data as a product

#3 Self-serve data platform

#4 Federated computational governance

What are the benefits of a data mesh?

Why is data mesh a good thing? Data Mesh is a socio-technical approach involving people, processes, and technology

People: From the central data team to the decentralization of business domains

Process: Changes within the organization

Technology capabilities to implement and operate a distributed data mesh

Whether you should adopt data mesh: Advice from Zhamak Dehghani, the founder of the Data Mesh paradigm and former ​​director of technology at ThoughtWorks

Data lake vs data mesh

Data fabric vs data mesh

What does the Data Mesh look like?

Data Mesh Architecture: How to integrate data mesh with your ecosystem

#1 Connect to data sources where it resides

#2 Create logical domains

#3 Enable teams to create data products

How to build and manage a data mesh approach

Start by designing and building your data mesh pathfinder

Key activities in your data mesh pathfinder workshop

Start for Free with Starburst Galaxy

Data Mesh Architecture: Domain-oriented Ownership

Data Mesh: Data as a Product

Data Mesh: Self-Service Data Infrastructure

Data Mesh: Federated Computational Governance

Whether you should adopt data mesh: Advice from Zhamak Dehghani, the founder of the Data Mesh paradigm and former director of technology at ThoughtWorks