Despite the investments and effort poured into next-generation data storage systems, data warehouses and data lakes have failed to provide data engineers, data analysts, and data leaders trustworthy and agile business insights to make intelligent business decisions. The answer is Data Mesh – a decentralized, distributed approach to enterprise data management.
Founder of Data Mesh Zhamak Dehghani defines Data Mesh as “a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments – within or across organizations.” She’s authoring an O’Reilly book, Data Mesh: Delivering Data-Driven Value at Scale and Starburst, the ‘Analytics Engine for Data Mesh,’ happens to be the sole sponsor. In addition to providing a complimentary copy of the book, we’re also sharing chapter summaries so we can read along and educate our readers about this (r)evolutionary paradigm. Enjoy Chapter Five: Principle of Data as a Product!
We’ve finally arrived at one of the most crucial principles of Data Mesh: data as a product, where organizations apply product thinking to domain-oriented data. Why is this significant? Put simply, when this principle is executed well, the business is poised to unlock an enormous amount of value from their data.
To recap how we got here, the first generation of data platforms were data warehouses and the ownership resided with the warehouse team, which limited access, usability and value to the analysts actually using data to create organizational value. Next came data lakes which moved ownership to the data users resulting in 45% of the data scientist’s time devoted to data cleansing and organization. Now, Data Mesh shifts the responsibility to as close to the source of the data as possible. This approach eliminates friction of access and usability, and also improves the overall experience of the data users — data scientists, data analysts, data explorers and everyone in between. As a result of frictionless access to data and the agility to respond to external and internal organizational changes, there is a significant impact on the overall business bottomline with faster time-to-insight. This approach isn’t unique to Data Mesh, but over the last decade we’ve seen an industry wide shift that addressing problems is cheaper and more effective when it’s done as close to the source of data. Zhamak reminds us, “Data is not what gets shared on a mesh, it is only a data product that can be worthy of sharing on the mesh.”
What Successful Data Products Should Embody
Successful products should have these three common characteristics: feasibility, value, and usability. This chapter primarily focuses on the usability and value of a data product.
For a data product to be usable, there are baseline data usability characteristics that every data product must exhibit. It must be: discoverable, understandable, addressable, secure, interoperable/composable, trustworthy, natively accessible, and valuable on its own. We highlighted a few standouts below and you can read the rest in the pre-released copy of the book.
Traditionally, discoverability with centralized data happens as a catalog listing, with available datasets, owners, location, sample data, etc. In contrast, Data Mesh embraces a source-oriented solution with data product discoverability, where information is intentionally provided by the data product itself. This information may include “source of origin, owners, run-time information such as timeliness, quality metrics, sample datasets, and most importantly, information contributed by their consumers such as the top use cases and applications enabled by their data.” With this information, data consumers or users can easily explore the available data products, search and find the needed datasets, and gain confidence in cultivating a data-driven mindset.
After the data consumer discovers a data product, the next step is to understand it. Get to know the semantics of the data, as well as the syntax in which the datasets are presented to the data user. Data consumers also need to understand how exactly the data is presented to them (i.e. data serialization, what SQL queries to execute, etc) as well as the data schema (the underlying representation of the data). By understanding data schemas, we can support the understanding of the data product in a self-serve manner. And ultimately, understanding a data product and creating value no longer requires end user “hand holding” which is a baseline data usability quality.
Trustworthy and Truthful
While understandability and discoverability closes the gap between what the data consumer knows and doesn’t know about the data, it requires far more to trust the data. It’s crucial that the data represents the business accurately in terms of the events or transactions that have occurred, and the probability of truthfulness of the aggregations and projections that have been created by the business. To eliminate uncertainty surrounding the data, a service level agreement would certainly help. These agreements may include details around interval of change (how often changes in the data are reflected), timeliness (time between the business fact occurs and is served to data users), completeness (availability of necessary information), statistical shape of data (distribution, range, volume), lineage (data journey from source to now), precision and accuracy over time (degree of business truthfulness as time passes), and operational qualities (freshness, general availability, performance).
The usability of a data product often hinges on how easily data consumers can access it with their native tools. As a nod to empathetic design — provide the same data to data analysts and data scientists, but in the way that aligns with their skill sets and tools. For instance, some data analysts are only comfortable with SQL to generate data visualizations or reports. Meanwhile, data scientists that curate and structure the data to train their models typically expect file based data, whilst analytical application developers might expect a real-time stream of events.
Valuable on its own
Data products must be inherently valuable for the data users. If the data product owner can’t summon any value out of the data product, it’s best not to create one, so a data product should carry a dataset that is independently valuable and meaningful.
Creating Data Products: Empathy by Design
Interestingly enough, many Data Mesh early adopters also often say that the principles are rather intuitive and that they have the most sincere intention to implement them, except the implementations are strongly influenced by the familiar techniques of the past.
For a fresh start, begin by applying empathy to product thinking with internal stakeholders and collaborating with them on designing the experience, collecting consumption metrics, and building valuable internal tools for developers as well as the business. After that, it’s where traditional product thinking ends and where ‘data as a product’ takes on a new life of its own.
In this section, Zhamak introduces a few considerations to integrate with Data Mesh to realize data as a product. Again, we highlight a few vital points and feel free to examine the rest in the pre-released copy of the book.
Introducing the Domain Data Product Owner Role
The data product owner is responsible for the success of their domain’s data products by delivering value. How? They develop and create a vision and roadmap for the data products. Then, they synthesize how datasets integrate with other datasets and transform them into decisive business insights and action. They also demonstrate always-on qualities by measuring and improving the quality and richness of their domain’s data. The lifecycle of the domain’s datasets is also under the purview of a data product owner: when to change, revise and retire data and schemas.
Data product owners must also define success criteria and business-aligned key performance indicators for their data products. Are data consumers happy and satisfied? Satisfaction can be measured through net promoter score (i.e. the rate of consumers that recommend the data products, shortened lead time for data product consumers to discover and use the data product successfully, and the growth of users).
Introducing the Domain Data Product Developer Role
Each domain should include data product developer roles. They’re responsible for actually building, maintaining and serving one or multiple data products throughout their lifecycle. Data product developers “work closely with their collaborating application developers in defining the domain data semantic, mapping data from application context (data on the inside) to the data product context (data on the outside). It’s also possible to form new teams to serve data products that don’t naturally fit into an existing operational domain.”
Think of Data as a Product, Not Merely an Asset
For decades, we’ve referred to “data as an asset” or that “data must be managed like an asset.” At first glance, these are harmless metaphors, however it has shaped our perceptions and it’s turned our focus towards metrics that were inconsequential to the business. For example, “the number of datasets and tables captured in the lake or warehouse, or the volume of data.” “Data as an asset” encouraged the storage of data rather than the act of sharing it.
When we rebrand the catchphrase to “data as a product,” we should measure our success through data usage, number of data consumers, and their satisfaction using the data; underscoring sharing the data vs. storage and providing a quality data product that the business deserves.
A Platform Shift to Support Data Mesh
Data as a product creates a new world view where data can be trusted, built, and served with deep empathy for data consumers. Its success is measured through the value delivered to the data consumers and that ultimately translates to the bottomline in the business. This ambitious shift requires an underlying supporting platform and so the next chapter looks at a self-serve data platform which describes how to make building data products feasible.
Read along with us!
Get your complimentary access to pre–release chapters from the O’Reilly book, Data Mesh: Delivering Data–Driven Value at Scale, authored by Zhamak Dehghani now.