I participated in a panel discussion with Karl Eklund, Principal Architect at Red Hat, and William Schnoeppner, Director of Research and Consulting at EMA at Datanova 2022: The Data Mesh Summit to discuss the current state of data. We offered a sneak peek into the survey results, a study conducted with Red Hat that is expected to be released on Tuesday, March 22. The 2021 State of Data report can be downloaded here.

At Datanova, we talked about how the industry has changed and is going to change in the future, including the need for more useful data and decentralization:

The need for more data

Survey results showed that over the course of the last two years, because of the COVID-19 pandemic, companies have realized that having fast access to accurate data is an essential business function. The top reason behind this shift is the desire for customer engagement. Organizations need speed to ramp up their digital transformation initiatives across the board. In addition, companies are also looking to differentiate the types of data they collect. This includes streaming data, videos, event data, and images, just to name a few. The one thing that these all have in common is the ability to reach customers in a more digital format.

Companies are embracing this trend by changing their technology and increasing automation across the board. Furthermore, Karl has seen that many companies are increasing their data science presence because they need to consume more data and create more domains than ever before. Companies have new data sets and they need to consume that data at a rapid pace in order to iterate on models. And, because models don’t stay perfect forever, organizations need access to all their data at an appropriate cadence.

Moving toward decentralized data architecture

The survey respondents commonly have around 4-6 data platforms in their ecosystem, and many said that they are focusing on moving back to a decentralized model. Karl notes that many companies actually started out with a decentralized model but then made a huge effort to organize their data in a centralized organization, like a data warehouse. However, because organizations have always had data scattered across their enterprise, this centralized model could never work. Furthermore, companies are accumulating data at a much faster rate, and so funneling everything through one bottleneck becomes problematic and inefficient.

Businesses want results quickly. In order to make the fastest business decisions with the most accurate data, companies realize that they need to embrace a decentralized data architecture. And that is why the Data Mesh has really hit a nerve.

However, the transition to a decentralized data architecture is not an easy task. The survey showed that many organizations experienced increased complexity, longer time to deployment, and more costs when adopting new technologies and platforms. The panel agreed that though it might be hard to break the chains of centralized data, a decentralized approach will ultimately free users to access all of their data in a more timely manner and make better business decisions for their organizations.

The challenges of data pipelines

Some of the main challenges of data pipelines among survey respondents were excessive time spent with break and fix too many data pipelines, and combining data in motion with data at rest. Large enterprise companies are still moving data back and forth with changing architecture and tools. But, at its core, the one question is, how do you get users the data as quickly as possible without having to duplicate it? Organizations need to make data quality and data governance a priority. Furthermore, the data maturity of an organization has no correlation to its size. Often the most immature organizations are the biggest ones because they have the most data spread around their organization which adds much more complexity.

Companies need to start thinking about how they want to consume the right data in a way that is most productive. The data pipeline is a means to an end but it is still critically important. Therefore, it is crucial that organizations spend time developing their data strategy.

The rise of data science

“Data science is really about bringing automation to a decision-making process,” said Karl. It would be ideal to remove humans from the loop. As we progress through automated workloads, trying to get models into production, the traditional view of passing things over to every group no longer makes sense when you get to the actual business problem that you are trying to solve. Organizations need models that they can trust and that can respond quickly to new input signals.

Centralized data architecture models are dying off and organizations are focusing more on data science and machine learning and general ease of data access. We are now at a point where machine learning and data science are mature enough to put into practice, and many companies are discovering how useful that is to their business. Karl said, “What really moves the needle in organizations is a coherent data strategy.” Only 41% of teams had a concrete data strategy and knew how to deal with data sprawl. A business strategy with true business outcomes needs to come before embarking on a decentralized architecture journey

Where is the technology going?

Karl and I watch what new companies are getting funded in the data space in order to gauge the future of technology. By doing this, you can see who is coming out with new ideas and trying to solve common pains within an organization. In particular, they are seeing an emphasis in the data governance, data cataloging, and data security spaces. It is crucial that organizations have a clear purpose in mind, and more picking the right tools for the job; they need to do the groundwork with data strategy and get the most value out of their data experts. “Evaluate where we are not getting the most value out of our greatest asset, our people.”

The Big Data world is constantly changing and evolving. Because of this, it is crucial to understand the architectures, paradigms, and practices companies are trending too. This Annual State of Data survey illustrates that data is more important than ever and companies need to be able to access it quickly and accurately. A decentralized data architecture, a move a lot of organizations are making, allows companies to keep their data where it lives but access it all together, giving businesses the ability to make the best decisions for their organizations.

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.