Last Updated: May 26, 2023 Published: May 1, 2021
Data Mesh is a strategic approach to strengthen an organization’s digital transformation journey as it centers on serving up valuable and secure data products. Data Mesh evolves beyond the traditional, monolithic, and centralized data management methods of utilizing data warehouses and data lakes.
Data Mesh improves organizational agility by empowering data producers and data consumers with the accessibility to access and manage big data, without the trouble of delegating to the data lake or data warehouse team. A solution for data silos and data integration, data mesh allocates data ownership to domain-oriented groups or business units that serve, own, and manage data as a product. All of which improves data-driven decision-making for data leaders.
To understand what domain-driven data is, we must know what a domain is. A domain is an aggregation of people organized around a common functional business purpose.
Data Mesh proposes that domain ownership is responsible for management of the data, metadata, policies and created by the business function of the domain. The domains are responsible for the assimilation, transformation, and provision of data to the end-users. Eventually, the domain exposes its data as data products, whose entire lifecycle is owned by that domain.
Data products are produced by the domain and consumed by downstream domains or users to create business value. Data products are different from traditional data marts, as they are self-contained, and are in themselves responsible for aspects such as security, provenance and infrastructure concerns related to ensuring that the data is kept up to date. Data products enable a clear line of ownership and responsibility and can be consumed by other data products or by end consumers directly to support business intelligence and machine learning activities.
The concept of a self-serve data infrastructure is that it is made up of numerous capabilities that can be easily used by members of the domains to create and manage their data products. The self-serve data platform is supported by an infrastructure engineering team, whose primary concern is the management and operation of the various technologies in use. This illustrates the separation of concerns, domains are concerned with data and the self-serve data platform team is concerned with technology. The measure of success of the self-serve data platform is the autonomy of the domains.
Traditional data governance and access controls can be seen as an inhibitor to producing value through data. Data Mesh enables a different approach by embedding governance concerns into the workflow of the domains. There are numerous aspects to data governance, however when considering Data Mesh, it is imperative that usage metrics and reporting become part of this definition. Data sharing, usage and how that data is being used are key data points to understanding the value and hence success of individual data products.
The implementation of Data Mesh promotes organizational agility for organizations who want to thrive in an uncertain economic climate. All organizations need to be able to respond to changes in their environment with a low-cost, high reward approach. Introducing new data sources, needing to comply with changing regulatory requirements or meeting new analytics requirements are all drivers that will precipitate changes to an organization’s data management activities. Current data management approaches are typically based on complex and heavily integrated data pipelines (ETL, ELT) and data ingestion between operational and analytical data systems struggling to change in time to support the business needs in a timely fashion in the face of these drivers. The purpose of Data Mesh is to provide a more resilient approach with respect to data to efficiently respond to these changes.
Data Mesh is a ‘socio-technical’ approach that requires changes to the organization across all three dimensions of people, process and technology. Organizations that adopt Data Mesh may spend 70% of their efforts on people and processes and 30% on the technology to enable the future Data Mesh state.
Embarking on a Data Mesh journey will result in significant organizational changes and adjustments to employees’ roles. Existing workers will be critical to the success of adopting a Data Mesh, as they have invaluable tacit knowledge to contribute to the Data Mesh journey. Therefore, the transition of data ownership from a central data team to decentralized domain-driven design should be approached as well as a realignment of existing data-focused employees. There are also changes to management hierarchies and also reward mechanisms.
To promote a sustainable and agile data architecture, implementing Data Mesh will require process changes within the organization. If we consider data governance, new processes around data policy definition, implementation and enforcement will be required which will impact the process of accessing and managing data, as well as the processes pertaining to exploiting that data as part of business-as-usual(BAU) business processes.
Technology capabilities are a key enabler to implement and operate a Data Mesh. New technology is likely to be required for a number of reasons:
The truth is that Data Mesh may not be the correct fit for every organization. Data Mesh is primarily aimed at larger organizations that encounter uncertainty and change in their operations and environment. If your organization is small with respect to its data needs and those data needs don’t change over time, then Data Mesh is probably an unnecessary overhead.
Learn all about strategy, implementation, and execution of Data Mesh first hand from Zhamak Dehghani.
The data lake is a technology approach, whose main objective has traditionally been as a single repository to move data to in as simple a manner as possible, where the central team is responsible for managing it.
Sure, data lakes provide significant business value with raw, and open file formats and reduce storage costs. They also suffer from a number of concerns with the primary issue is that once data is moved to the lake, it loses context. For example, we may have many files containing a definition of customer, one from a logistics system, one from payments and one from marketing, which one is correct for real-time data analysis?
Furthermore data in the data lake will not have been pre-processed, so data issues will inevitably arise. The data consumer will then typically have to liaise with the data lake team to understand and resolve data issues, which becomes a significant bottleneck to using the data to answer the initial business question.
In comparison Data Mesh is more than just technology, Data Mesh combines both technology and organizational aspects including the idea of data ownership, data quality and autonomy. So consumers of data have a clear line of sight around data quality and data ownership and data issues can be discovered and resolved much more efficiently.
Ultimately data can be used and trusted.
The difference between a data fabric and data mesh is that data fabric is a technological approach and that data mesh is about organization, people, and technology.
Data fabric concentrates on a collection of various technological capabilities that collaborate to produce an interface for the end-users that consume data. Many of the supporters of data fabric espouse automation through technologies like ML of many of the data management tasks to enable end users to access data in a simpler way. For simple data usage there is some value in this, however for more complex situations or where business knowledge needs to be integrated into the data then the limitations of Data fabric will become apparent.
Arguably a Data fabric could be used as part of a Data Mesh self-serve platform, where data fabric exposes data to the domains who can then embed their business knowledge into a resulting data product.
As Darnell-Kanal Professor of Computer Science, University of Maryland at College Park Daniel Abadi says the difference between a Data fabric and Data Mesh is not obvious. He advises, “Ultimately, an optimal solution will likely take the best ideas from each of these approaches.”
Organizations that are ready to implement Data Mesh will need help connecting their data sources for a quick win with Starburst. Below we highlight how:
As you begin your Data Mesh journey the first step is to connect to data sources. A key Data Mesh implementation principle is to connect your enterprise data by leveraging your existing investments: lakes or warehouses; cloud or on-premise; structured warehouse or a non-structured lake. Unlike the single-source-of-truth approach to centralize all your data first, you’re leveraging and querying the data where it resides. It is the first Data Mesh win for many Starburst customers as our 40+ connectors enable the ability to connect to data sources.
After generating connectivity across all the various data sets, the next goal is to create an interface for business and analytics teams to find their data. In data mesh terms, we call that a logical domain. It’s called logical, because we’re not moving data into a repository where data consumers can access it. Rather, we’re creating a logical place where they can log into a dashboard as a semantic layer, to see the data that’s been made available to them.
All the data you need resides in your domain alongside domain teams that are empowered to work autonomously. In essence, we’re promoting the concept of self-service where data consumers are empowered to independently do more on their own.
When you provide a domain team access to the data they need, the next step is to teach them how to convert domain data into data products. Then, with a data product, create a library or a catalog of data products that you can share.
Starburst has a built-in data catalog that enables you to very quickly search, discover, and identify data products that might be of interest and improve the lives of data scientists and data engineers.
Creating data products is a powerful capability as you’ve enabled your data consumers to very quickly move from discovery to ideation as well as to insight, because we’re quickly creating and then using data products across the organization.
Those who are eager to get started or just getting started on their Data Mesh journey for democratization and scalability will find the 90-Day Data Mesh Pathfinder helpful. In fact, many enlist a Pathfinder to help them with this ambitious endeavor. With the right strategy, it is not labor-intensive and there is a low cost, low risk and high reward exercise.
The purpose of a Pathfinder is an exercise on how Data Mesh will fit into your organization from a technology, people, and process perspective. You’ll also identify your strengths and weaknesses so that when you’re ready to begin your Data Mesh transformation program, we can curate all the learnings from the Pathfinder to accelerate in the areas where you can move quickly, and slow down in the areas where you need remedial work.
Data Mesh TV is a monthly educational program for data leaders by data leaders about data monetization, optimizing data products, and accelerating digital transformation initiatives with Data Mesh.
Listen on the go: Apple Podcast | Spotify
Listen on the go: Apple Podcast | Spotify
“Data Mesh is certainly the future for our business, and probably for many others, particularly ones which have a legacy of acquisitions, and the need for merging of different data sets to form a new larger entity. Having the ability to query data where it resides using Starburst is enormously powerful and makes a huge impact on the ability for data to provide answers.” Read more
“Decentralized access is definitely the future…We are currently in the process of creating data products, which Starburst is really helping with. Previously, without a single point of secure data access, creating a data product was not possible. With the abstraction layer that Starburst provides across different data sources, it has become our analytics engine for the Data Mesh.” Read more
Data Mesh provides a strategic framework for the people, process, and technological aspects of that journey towards becoming data-driven.
Enjoy exclusive access to Data Mesh content, including on-demand talks, panel discussions featuring Zhamak Dehghani, Founder of Data Mesh, and more!
Data Fabric and Data Mesh continue to sustain legions of hype and debate.
Up to $500 in usage credits included