Join us on October 8-9 in New York City for AI & Datanova 2025!

What is data mesh?

Why data mesh is driven by data federation
  • Starburst Team

    Starburst Team

Share

Linkedin iconFacebook iconTwitter icon

Data Mesh is a strategic approach to strengthen an organization’s digital transformation journey as it centers on serving up valuable and secure data products. Data Mesh evolves beyond traditional, monolithic, and centralized data management methods, utilizing data warehouses and data lakes. To do this, it uses data federation to access data wherever it lives. 

Data Mesh enhances organizational agility by empowering data producers and consumers to access and manage big data, eliminating the need to delegate tasks to the data lake or data warehouse team. A solution for data silos and data integration, data mesh allocates data ownership to domain-oriented groups or business units that serve, own, and manage data as a product. All of which improves data-driven decision-making for data leaders.

Free Data Mesh books

We have lots of information on data mesh, including the assortment of data mesh resources below.

What are the 4 principles of data mesh?

Core Principles of Data Mesh

#1 Domain-oriented data ownership and architecture

To understand what domain-driven data is, we must know what a domain is. A domain is an aggregation of people organized around a common functional business purpose.

 Data Mesh proposes that domain ownership is responsible for the management of the data, metadata, and policies, and is created by the business function of the domain. The domains are responsible for the assimilation, transformation, and provision of data to the end-users. Eventually, the domain exposes its data as data products, whose entire lifecycle is owned by that domain.

#2 Data as a product

Data products are produced by the domain and consumed by downstream domains or users to create business value. Data products differ from traditional data marts in that they are self-contained and responsible for aspects such as security, provenance, and infrastructure, ensuring the data remains up to date. Data products enable a clear line of ownership and responsibility and can be consumed by other data products or by end consumers directly to support business intelligence and machine learning activities.

Related blogs:

Webinar: Empowering modern analytics strategies with data products

Whitepaper: A guide to data products: creating and managing reusable data assets

#3 Self-serve data platform

The concept of a self-serve data infrastructure is that it is made up of numerous capabilities that members of the domains can easily use to create and manage their data products. The self-serve data platform is supported by an infrastructure engineering team, whose primary concern is the management and operation of the various technologies in use. This illustrates the separation of concerns; domains are concerned with data, and the self-serve data platform team is concerned with technology. The measure of success of the self-serve data platform is the autonomy of the domains.

#4 Federated computational governance

Traditional data governance and access controls can be seen as an inhibitor to producing value through data. Data Mesh enables a different approach by embedding governance concerns into the workflow of the domains. There are numerous aspects to data governance; however, when considering Data Mesh, it is imperative that usage metrics and reporting become part of this definition. Data sharing, usage, and how that data is being used are key data points to understanding the value and, hence, success of individual data products.

What are the benefits of a data mesh?

The implementation of Data Mesh promotes organizational agility for organizations that want to thrive in an uncertain economic climate. All organizations need to be able to respond to changes in their environment with a low-cost, high-reward approach. Introducing new data sources, complying with changing regulatory requirements, or meeting new analytics requirements are all drivers that will precipitate changes to an organization’s data management activities. Current data management approaches are typically based on complex and heavily integrated data pipelines (ETL, ELT) and data ingestion between operational and analytical data systems, struggling to change in time to support the business needs in a timely fashion in the face of these drivers. The purpose of Data Mesh is to provide a more resilient approach to data, enabling efficient responses to these changes.

Related reading: 10 benefits and challenges of data mesh

Why is data mesh a good thing? Data Mesh is a socio-technical approach involving people, processes, and technology

Data Mesh is a ‘socio-technical’ approach that requires changes to the organization across all three dimensions of people, process, and technology. Organizations adopting Data Mesh may allocate 70% of their efforts to people and processes, and 30% to the technology that enables the future Data Mesh state.

People: From the central data team to the decentralization of business domains

Embarking on a Data Mesh journey will result in significant organizational changes and adjustments to employees’ roles. Existing workers will be critical to the success of adopting a Data Mesh, as they have invaluable tacit knowledge to contribute to the Data Mesh journey. Therefore, the transition of data ownership from a central data team to decentralized domain-driven design should be approached as well as a realignment of existing data-focused employees. There are also changes to management hierarchies and reward mechanisms.

Process: Changes within the organization

To promote a sustainable and agile data architecture, implementing Data Mesh will require process changes within the organization. If we consider data governance, new processes around data policy definition, implementation, and enforcement will be required, which will impact the process of accessing and managing data, as well as the processes pertaining to exploiting that data as part of business-as-usual(BAU) business processes.

Technology capabilities to implement and operate a distributed data mesh

Technology capabilities are a key enabler to implement and operate a Data Mesh. New technology is likely to be required for a number of reasons:

  • Reducing the friction of exploiting across technologies and ensuring the interoperability of those new technologies is likely to be critical.
  • Enable domains to be self-sufficient and focus on their first-class concern, which is data rather than technology.
  • Enable the purchase of new data platforms online, and allow the data they expose to be used seamlessly.
  • Enable automatic reporting of governance aspects across the data mesh, such as data product usage, compliance with standards, and data product feedback.

Whether you should adopt data mesh: Advice from Zhamak Dehghani, the founder of the Data Mesh paradigm and former ​​director of technology at ThoughtWorks

The truth is that Data Mesh may not be the correct fit for every organization. Data Mesh is primarily aimed at larger organizations that encounter uncertainty and change in their operations and environment. If your organization has small data needs that remain constant over time, then Data Mesh is likely an unnecessary overhead.

Related reading:

Learn all about strategy, implementation, and execution of Data Mesh firsthand from Zhamak Dehghani.

Data lake vs data mesh

The data lake is a technology approach, whose main objective has traditionally been as a single repository to move data to in as simple a manner as possible, where the central team is responsible for managing it. 

Sure, data lakes provide significant business value with raw and open file formats and reduce storage costs. They also suffer from several concerns, with the primary issue being that once data is moved to the lake, it loses context. For instance, we might have multiple files defining a customer, each from a different system —logistics, payments, or marketing. Which one is most accurate for real-time data analysis? 

Furthermore, since the data in the data lake has not been pre-processed, data issues will inevitably arise. The data consumer will then typically have to liaise with the data lake team to understand and resolve data issues, which becomes a significant bottleneck to using the data to answer the initial business question.

In comparison, Data Mesh is more than just technology; it combines both technology and organizational aspects, including the idea of data ownership, data quality, and autonomy. So consumers of data have a clear line of sight around data quality and data ownership, and data issues can be discovered and resolved much more efficiently. 

Ultimately, data can be used and trusted.

Related reading: Data Mesh vs Lake vs Warehouse vs Fabric

Data fabric vs data mesh

The key distinction between a data fabric and a data mesh lies in their focus: a data fabric is a technological approach, whereas a data mesh is about organization, people, and technology.

Data fabric focuses on a collection of various technological capabilities that collaborate to produce an interface for end-users who consume data. Many supporters of data fabric advocate for automation through technologies like ML to simplify data management tasks, enabling end users to access data more easily. For simple data usage, there is some value in this; however, for more complex situations or where business knowledge needs to be integrated into the data, the limitations of Data fabric will become apparent.

Arguably, a Data fabric could be used as part of a Data Mesh self-serve platform, where the data fabric exposes data to the domains that can then embed their business knowledge into a resulting data product.

As Darnell-Kanal Professor of Computer Science at the University of Maryland at College Park, Daniel Abadi notes that the distinction between a Data fabric and a Data Mesh is not immediately apparent. He advises, “Ultimately, an optimal solution will likely take the best ideas from each of these approaches.”

Related reading:

What does the Data Mesh look like?

Data Mesh Architecture: How to integrate data mesh with your ecosystem

Organizations ready to implement Data Mesh will need help connecting their data sources for a quick win with Starburst. Below we highlight how:

#1 Connect to data sources where it resides

As you begin your Data Mesh journey, the first step is to connect to data sources. A key Data Mesh implementation principle is to connect your enterprise data by leveraging your existing investments: lakes or warehouses, cloud or on­-premises, structured warehouse or a non-structured lake. Unlike the single-source-of-truth approach to centralize all your data first, you’re leveraging and querying the data where it resides. It is the first Data Mesh win for many Starburst customers, as our 40+ connectors enable the ability to connect to data sources.

#2 Create logical domains

After generating connectivity across all the various data sets, the next goal is to create an interface for business and analytics teams to find their data. In data mesh terms, we call that a logical domain. It’s called logical because we’re not moving data into a repository where data consumers can access it. Instead, we’re creating a logical place where they can log into a dashboard as a semantic layer, to see the data that’s been made available to them.

All the data you need resides in your domain alongside domain teams that are empowered to work autonomously. In essence, we’re promoting the concept of self-service, empowering data consumers to take more control independently.

#3 Enable teams to create data products

Once you provide domain teams with the necessary data, the next step is to teach them how to convert this data into useful products. Then, with a data product, create a library or a catalog of data products that you can share

Starburst has a built-in data catalog that enables you to very quickly search, discover, and identify data products that might be of interest and improve the lives of data scientists and data engineers

Creating data products is a powerful capability, as you’ve enabled your data consumers to quickly move from discovery to ideation and insight, because we’re creating and using data products across the organization.

How to build and manage a data mesh approach

Those eager to embark on their Data Mesh journey for democratization and scalability will find the 90-Day Data Mesh Pathfinder helpful. In fact, many enlist a Pathfinder to help them with this ambitious endeavor. With the right strategy, it is not labor-intensive and there is a low cost, low risk, and high reward exercise.

Start by designing and building your data mesh pathfinder

The purpose of a pathfinder is to explore how Data Mesh will fit into your organization from a technology, people, and process perspective. You’ll also identify your strengths and weaknesses so that when you’re ready to begin your Data Mesh transformation program, we can curate all the learnings from the Pathfinder to accelerate in the areas where you can move quickly, and slow down in the areas where you need remedial work.

Key activities in your data mesh pathfinder workshop

  1. Select the Pathfinder use case and agree on scope (e.g., 1 Domain, 6 Data Products, 3 Data Sources)
  2. Establish Pre-MVP environment for early design and enablement activities
  3. Data product design, refinement, and consumption
  4. Domain Owner, Data Product Owner, and Consumer Enablement Training
  5. Showcase the MVP
  6. Integrate the Data Mesh into your Data Strategy

Related reading: The Data Mesh Pathfinder eBook

Related workshop: 2-hour Data Mesh Pathfinder workshop

 

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.
Start Free