Starburst and Data Products: The Key to a Data Mesh

April 28, 2022

Colleen Tartow, Ph.D.

Engineering

Starburst

Colleen Tartow, Ph.D.

Engineering

Starburst

More deployment options

Request Enterprise trial license key →

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

Operationalizing data products at scale with AI

The key to success for any company is deriving business value from data in a robust, scalable, and timely fashion. A huge part of a successful data strategy is treating data as a product, as evidenced by the huge interest and buzz around Data Mesh. Starburst’s new Data Products user experience is the ideal platform for creating and consuming data products in order to assist in the implementation of a Data Mesh.

The Centralization Bottleneck

It’s 2022 and everybody knows that data is a key differentiating factor for successful businesses. But that said, there is often a dichotomy around how data is valued at companies: On the one hand, data should be valued at all levels, and that’s often articulated as the quest to be “data-driven.” On the other hand, it can be hard to get buy-in, and companies are often resistant to actually investing in data initiatives that they’re not confident will ultimately work. This contradiction often manifests itself as an effort to funnel all data to a central data team that is tasked with collecting, understanding, and curating data from all corners of the organization. This is, frankly, a herculean effort that only gets more complex at scale.

Centralized data efforts can be a challenge in several ways: first, building pipelines to move data around can be surprisingly complex. Second, no human can be an expert in all data across any modern company, it’s simply not feasible. Tasking a team with owning expertise in data logic across business units is often the central challenge and can lead to significant bottlenecks. Lastly, combining data from disparate sources and in multiple formats in a common repository is a challenge when there’s little to no forethought or even cooperation from the data producers, who all have their own day jobs with their own milestones and goals to hit.

It helps to remember that the ultimate goal is to use data to make strategic decisions. Therefore, it is necessary to rethink how we have traditionally done things. The benefits of data-driven decision-making are clear, starting with increased transparency, consistency, and accountability for strategic decisions. Building a strategy with data can increase customer retention, help you enter new markets, and reduce overall costs.

Data Mesh: Embracing Decentralization

Data Mesh seeks to solve this bottleneck by embracing a decentralized data management paradigm, both from architectural and organizational aspects. Data Mesh has four pillars: domain-oriented ownership and architecture, data as a product, self-service data infrastructure, and federated computational governance.

The domains are by definition the subject matter experts who create the data and understand best how to curate it – treating data as a product. They do this work via a self-serve data platform built by a central IT organization and using a federated computational governance model.

The focus in a Data Mesh is on letting the subject matter experts with the most knowledge about the data use it to create data products, which are produced by the domains themselves, the data producers. There’s no more central data engineering team creating a bottleneck for all data curation because it happens instead within the domains. Therefore, you need to adjust responsibilities and product ownership to reflect the fact that data is a full-fledged, first-class product that these teams are accountable for. In the end, this will serve to remove the bottlenecks and improve the flow of data from the source to the consumer.

Treating Data as a Product

Domain-driven design and data as a product go hand in hand because they espouse the idea of keeping data as the responsibility of the teams which produce the data and control its source. Data is a first-class citizen and a product of the domain. The teams then need to be aware and aligned with the downstream consumers of their data and develop data products that ultimately drive business decisions and value. In a Data Mesh, each domain is responsible for ingesting, processing, and serving its data products to downstream consumers. What this means organizationally is that data engineering and software engineering become tightly aligned, and ideally part of the same functional team, so they can work with a data product owner to produce high-quality, curated data products.

Data products are the heart of the Data Mesh. A data product can range from a simple, cleansed list of transactions to a highly curated and complex group of datasets. In practice, data products are frequently far more complex, and can even be used to produce other data products within the same or different domains. For example, user profile information can be combined with product information to drive marketing efforts, which are in turn used to create a customer value data product.

Starburst Data Products

Starburst recently introduced Data Products, our one-stop-shop for creating, maintaining, and using data products. From a Starburst perspective, data products are the result of our analytical queries that connect to multiple data sources, aggregating and analyzing multiple data sources to create uniquely valuable datasets. This approach has tremendous advantages:

In a world where you have thousands of Data Products, it is going to be much easier and cheaper to operate data products as queries.
You aren’t required to store the data independently
You don’t need a team of engineers or architects to manage release cycles or data duplication across each product.

These are huge advantages and allow you to get up and running with Data Products in a Starburst environment quickly and without a huge infrastructure buy-in.

Our unique user experience allows data producers and data engineers to both curate data and define the relevant metadata to provide all of the context an end-user needs to use this data product. Additional important metadata such as usage metrics, bookmarks, commentary, sample queries, and more are automatically available as part of the data product.

What we provide with Starburst is the ability to both curate data as a product and consume those data products within a single technology. This is of course incredibly powerful, but we wanted a visual user experience to be part of this workflow as well, so data producers and consumers have a common interface where they can create and discover data products.

The beauty of using the Starburst Enterprise engine here is that this source data can be anywhere and we can easily connect to it, regardless of location — on-prem, in the cloud, or both. Particularly with Starburst Galaxy, we provide a fast and straightforward way to query data across multiple data sources without having to build complex pipelines. This forms the backbone of a self-service data infrastructure.

Starburst’s Data Products interface is targeted at the interactions between data producers and consumers, so they can use their common language, SQL, to define and consume data products. Then a larger fraction of their time can be spent on what is most important – creating and refining business insights and strategy based on reliable data. When data products are designed by people who understand the problems and opportunities that the business is trying to solve, magic happens. ✨