I had an opportunity to meet with Vincent in Singapore earlier this year to talk about his work at Red Hat with OS Climate (OS-C). Red Hat joined OS-C in 2021 with the goal to build the breakthrough technology and data platforms needed to more fully integrate the impacts of climate change in global financial decision-making and risk management. The global vision and data platform approach that the open source community is taking to address the climate agenda is inspiring and I asked Vincent to join me on Data Mesh TV to share this with the audience.
From the OS-C website, “OS-C is establishing an Open Source collaboration community to build a data and software platform that will dramatically boost global capital flows into climate change mitigation and resilience. Our goal is to rapidly accelerate the shift of global investment away from relatively Green House Gas-intensive and climate-vulnerable companies, technologies, and infrastructure into mitigation, resilience, and adaptation that is financially sustainable and high-impact — especially in developing countries — as well as to enable design of better policy that effectively engages capital markets in addressing climate change and accelerating adoption.”
As I learned more about OS-C, I realized that the data problems that they are working to solve are not that different from the challenges that I see in companies every day.
We all have diverse data types sitting in many different repositories with different teams asking for access to that data so that they can execute their analytics use cases. There is an urgency to make that data available in a way that can accelerate analytics and drive actionable, trusted insight.
This is the same journey that many of us find ourselves in. We feel the frustration in collecting all that data and we recognize the urgency to exploit the full data value.
As a data leader, you are fortunate if you are even more excited and passionate about the business outcomes than you are about the data. I was fortunate to interview Vincent and share his passion for this episode, listen here.
A blueprint for an open data platform
We often hear the same response from our guests when we ask them how they started their platform design. Everyone starts with understanding their consumer’s needs. Vincent noted that these consumers may not be the most technically skilled and that they don’t have access to the right tools or the right data. These consumers are working to solve challenging climate questions, and they might not have the time, budget, or capability to design their own platform or find the data on their own. To solve this challenge, we need to provide a simpler access layer that enables any consumer to rapidly access and analyze climate data. Many organizations could often make the same statement about the diverse internal consumers and their need to access customer, finance, or product data.
OS-C has developed a federation layer with Trino, to allow them to connect to over a thousand data sources. Vincent noted that they access over 35k different data sets. This is an incredible volume of data that changes at a high velocity and comes in many different varieties. The federated data approach enables OS-C to keep up with the integration requirements. Another key part of the platform is the self-service infrastructure to empower teams with greater autonomy in solving their own climate models or data challenges. OS-C is documenting their approach to create an open source blueprint that anyone can use to build their own data platform.
OS-C wants teams around the world to use this blueprint to accelerate their climate analytics journey. They don’t need to reinvent the wheel and they don’t need to discover the data on their own. The blueprint gives them everything they need to get started, so that they can focus their efforts on improving analytics models and taking action. The fundamental goal is to build a community that can collaborate on development.
A recipe for data products
Transparency is a critical part of the OS-C approach. Contributors and consumers are provided with code that they can use to reproduce and improve data products. Teams can see where the data comes from, how it is integrated and where it is used. This is all codified in OS-C repositories so that different teams can test and reproduce the exact same outcomes. As teams work to improve the recipes, they are shared across the OS-C community so that everyone advances together. The transparency and sharing drives a powerful construct for accelerating climate analytics.
OS-C is looking to add a new data exchange capability to extend data access to non-contributors. Today, contributors share commercial data (under restricted license), university research data, public data and even corporate data. A data exchange will make it easiest for contributors and non-contributors to find and access the data they need. Contributors will have full visibility into where their data is used and this helps to drive end-to-end accountability for the data product lifecycle.
Vincent noted that OS-C has created a blueprint for teams to accelerate their own self-service capabilities, to solve their unique climate analytics. OS-C is organizing data product recipes to simplify adoption, incentivize data sharing and to drive rapid ideation.
Addressing the challenges of digital sovereignty
OS-C has recently created a new stream that is focused on Environment, Social, Governance(ESG) disclosure. There are a lot of organizations reporting on ESG and that information is often only available in unstructured formats, via formal reports. There are many groups that are working to collect all of the ESG information but none of them have been successful, as this data is everywhere. Vince noted that there are hundreds of databases with ESG data and some of those databases reside in countries with strict sovereignty laws.
Using a federated approach, OS-C is able to connect to sources across the sovereignty lines. When data is not allowed to leave the country, they perform the processing in-country and define how anonymized statistics or summary data can be shared. They rely on an open source policy engine and Trino to build rules that guide who is able to access the data. These policies determine what any consumer can actually see, ensuring that OS-C is fully compliant with any digital sovereignty rules.
As noted on the OS-C website, “The OS-C technology platform will accelerate development of scenario-based predictive analytic tools and investment products that manage climate-related risk and finance climate solutions across every geography, sector, and asset class. The OS-C Open Source organization will enable alignment of the stakeholder community on priority data and modeling needs, focus shared resources on executing those priorities, and accelerate adoption.”
OS-C and the open source community is building a leading common data platform that is heavily influenced by data mesh concepts. As OS-C continues to mature, there is a lot that the rest of us can learn and emulate as we develop our own data mesh architectures. Watch the full episode for a review of these ideas and so many other great ideas that Vincent shared.
Create data products
Curated data sets enable self-service insights by creating standards across teams and business units to enable fast, repeatable use.