×
×

The data product lifecycle: getting the most out of your data investments

By: Sanjeev Mohan, Sajid Khan
March 2, 2023
Share: Linked In

As of late, clients have learned about data products by way of data mesh. It’s a concept coined by Zhamak Dehghani as a way for organizations to derive the most value out of data. While data as a product is a key pillar of data mesh, they are not necessarily linked. One can’t have a data mesh without a data product, but the reverse is not true. Sure, data mesh as a broader construct has a lot of value at an organizational level. However, Zhamak posits that the adoption of data products will surpass the adoption of data mesh. 

That seems to be the trend as there’s a lot of interest with our clients surrounding the construct of a data product. It’s appealing to organizations because the concept of a data product is geared towards reaping faster business value from their data investments. One of the biggest promises of data products for our clients is that it’s self-governed. And it’s robust because it empowers the business to adapt to change in a more sophisticated way. 

Business users have struggled to derive business value from data as they invest in building data assets within their IT ecosystems. And the promise of the data product is that it will liberate data, as users create a data product that can realize business value in a short period of time and in a self-serving manner. In fact, we’ve moved beyond collecting as much data as possible, or collecting the right data, and instead focused on asking the right questions about the data. 

Let’s examine what a data product is and see what a data product lifecycle looks like.

Data products turn data into a consumable output

Prior to data products, if the business needed a report and a user had access to data, the user would create temporary tables, then a report, and move on to the next project. The problem with this approach is that no one within the business knows how to discover it. Even if users discover it, they wouldn’t know whether it’s current or not. Not to mention, the temporary data assets are never retired and stay on incurring costs and other overhead costs.

Instead, we want the ability to version data products, share it with others, annotate the product and apply product management discipline around it. 

What is a data product?

A data product is simply an amalgamation of three different things: it is a deep knowledge of data combined with a domain knowledge, and then you overlay it with product management disciplines. To enhance your data products, you can also combine various data products to create more data products. 

For those who need a simpler definition: a data product is a consumable output meant to solve a business problem. 

What is the data product life cycle? Ownership, tips, and definitions.

As there are stages in a data lifecycle, there are also stages in a data product lifecycle and it starts with the data product manager aligning data products with business goals and outcomes. 

The Data Product Manager Role 

If you look on LinkedIn, there are already thousands of data product managers and data product owners. Why is there a new role in the business, named data product manager or data product owner? There are several reasons why the Data Product Manager role is crucial. 

Let’s say a data consumer has a two reports from different systems — but they don’t match. It appears that there’s a data quality problem, but who is responsible and who will fix it? Maybe with one report, the data source was Oracle, another was SAP. Then there’s data from Salesforce and when you combine them with HubSpot, there are different admins for different platforms. There is no notion of who is the right data owner. 

#1 Start with business outcomes and ROI

The Data Product Manager owns and is responsible for data, builds data products with the data consumer informing them of their requirements, calculates the ROI, prioritizes consumer requests, and aligns resources. 

#2 Align resources to attend to the data product lifecycle

A team is critical to attend to the entire data product lifecycle: build, maintain, operate, retire. 

This team will have a number of people involved. Today when we build things, we have a team, the team builds, it moves to the next project, but the team won’t progress to the next project because the data product has a lifecycle. The Data Product Manager and team decides whether a data product is no longer useful and to retire it. 

#3 Define data contract: attributes, schema, KPIs, SLAs

Deciding what data product to build will encounter the concept of data contracts. A data contract is something that the consumers request the data producer. A common definition of a data contact is to is you’re document your schemas, SLAs, and KPIs.

Moreover, for a data product to be useful, there are baseline qualities that every data product must have. It must be: discoverable, understandable, addressable, secure, interoperable, trustworthy, natively accessible, and valuable on its own. All eight of these qualities need to be in the data contract. 

The domain-based Data Steward

In a hybrid approach, organizations will still have some data that require centralized domain ownership and data product definition, but you’ll also have data that will be decentralized with the domain’s business experts or domain-based data stewards. 

#4 A metadata plane for data products

The idea of a platform layer, metadata plane, or a data product catalog tracks the entire life cycle of data products and its usage, such as: who is using it, what is a version number, how fresh is my data, how high is the data product rated. All of this metadata for data products is now added into this metadata plane.

# 5 Build a reusable framework 

Building a reusable framework is a way of defining a standard. You’ll be addressing storage and compute: data lake, lakehouse, or data warehouse? What will be the analytical engine on top of that? Is the framework scalable? The idea of building a framework is very much like manufacturing and/or an assembly line, such that you can produce something that’s reusable. 

Data Engineer: business first, technology second

After the standards have been defined and created, you can integrate technology and involve engineering. You will notice how late the technology step shows up in this new data product lifecycle paradigm. This is the cultural shift we need to adopt.

# 6 Build and test for the data contract 

Because we’ve defined what the data consumer needs in the data contract, the contract becomes a focal point to build and test the data product. The data contracts has checks, agreements, and conditions that allow us to meet the consumer needs.

#7 Deploy with DataOps

Moreover, we need to hold data ops accountable. So we need automation everywhere, we need orchestration, we need to have observability, continuous testing, version control, all that goes into Dataops. Then, you’re ready to launch the data product.

#8 New Product Creation

At this point, the data product is created, but keep in mind, it also has a life of its own. Data products will change, so you’ll need a way to create new data products or modify existing data products. When that request comes in, you’ll need a way to update the data contracts, so a loop forms. 

The loop returns to the Data Product Manager

In this example, we  have an arrow going back to data contracts. You’d create a new version of the data contract and store it in your data product catalog. The Data Product catalog is essentially a marketplace, where users discover data products. And from that data product catalog, you’ll continue on your journey to create a new version of a data product. 

After reviewing this cycle, there are certainly specific industries that are better positioned to extract value from data products, primarily in the regulatory and governance industries like financial services. There are a few reasons why. 

Why Financial Services is eager to get started with data products

When industries such as financial services are looking at data to answer regulatory questions, they have metrics that they need to report out.

Financial services organizations especially and banks and capital markets, have been struggling with very complex business questions surrounding fraud, risk management, AML, Know Your Customer, all of which require organizational data, but in a form that is connected across various systems within the organization, and connects across various silos that exists in the organization. This is where speed matters. 

That’s why data products are appealing for organizations because there’s a framework that allows them to bring multiple data domains or data assets under the umbrella of data products to answer business questions in a relatively short time. 

And because building and optimizing data products takes a village, below we outline the Starburst and Deloitte partnership. 

The Starburst and Deloitte partnership: helping organizations derive value from their data investments with data products

Starburst and Deloitte share a similar mindset and approach in how our clients drive value from their data investments. In fact, we began our journey together by educating our clients on the value of data mesh and data products.

Second, some of the industries — Healthcare and Financial Services— where Deloitte is seeing a greater demand for data products is where we are partnering with Starburst to drive that demand, together.

And then lastly, Deloitte is leveraging Starburst to build our own data products — we are also creating data and analytical products that we can take to the market. And Starburst is a platform that we use to bring our own data products and analytical products to market. 

Data products

Enable data producers and consumers to create, publish, discover, and manage curated datasets

Start today
Sanjeev Mohan

Principal, SanjMo

As an established thought leader in the areas of cloud, big data and analytics, I research and advise on changing trends and technologies in the modern cloud data architectures. I started my data and analytics journey at Oracle where I worked on emerging technologies and built cutting-edge solutions. Until recently, I was a Gartner research vice president known for my prolific work and attention to detail. I am privileged to regularly present on topics pertaining to end-to-end data pipelines and am excited to help businesses discover what their data can do for them.

Sajid Khan

Principal, Deloitte

Sajid Khan is a Cloud AI and Analytics Leader delivering large scale digital transformation strategy with a focus on cross functional business value realization through data, analytics, and AI on Cloud. Practice builder for Cloud Data, Analytics and AI at Deloitte, focusing on demand generation, maturing partnership with Hyperscalers through joint GTM campaigns, developing account based strategies, building talent and ecosystems. Sajid has an MBA from The University of Chicago Booth School of Business and is a Google Cloud Certified Professional Cloud Architect.

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.