Fully managed in the cloudStarburst GalaxySelf-managed anywhereStarburst Enterprise
- Start Free
Fully managed in the cloud
Last updated: December 8, 2023
Data lakes are centralized repositories designed to store vast amounts of data in a scalable and cost-effective manner.
Governance in a data lake may face challenges such as excessive security and data accessibility. For example, accessibility often face challenges related to over-provisioned access and potential restrictions on data ownership.
Data mesh integrates federated computational governance, enabling each domain to have autonomy over its data while ensuring overall compliance. Data mesh ensures that each domain or business unit owns and manages its data, promoting self-serve data infrastructure and accessibility.
While data lakes can scale, they may face challenges related to constant attention, maintenance, and potential scalability issues with increasing data volumes. With a modern data lake, it brings quality to your data lake by adding key data warehousing capabilities such as transactions, schemas and governance.
Data mesh addresses scalability by distributing data ownership, allowing each domain to independently optimize storage and compute, resulting in a more scalable solution.
In response to the challenges of data warehouses, the data lake architecture emerged. Many were thrilled with this new option because of its access to data based on data science, machine learning model training workflows, and support of parallelized access to data.
The data lake architecture is similar to a data warehouse in that the data gets extracted from the operational systems and is loaded into a central repository.
However, unlike data warehousing, a data lake holds a vast amount—terabytes and petabytes—of structured, semi–structured, and unstructured data in its native format until it’s needed. Once the data becomes available in the lake, the architecture gets extended with elaborate transformation pipelines to model the higher value data and store it in lakeshore marts. Essentially, we moved from ETL to ELT processing.
The data lake architecture is often described in the following way:
You can see from the visual below that a data lake architecture generates complex, unwieldy data pipelines resulting in unmanaged, untrustworthy and inaccessible data sets. Also as data lakes grow in size and in usage, they become expensive to scale and to meet the performance demands of the business. Unfortunately, we still relied on a centralized team to perform the ELT, so again, as business users request a change, they have to wait for the central team to respond. Similar to the data warehouses, this approach limits the value of data to data analysts, which ultimately restricts the business in making informed data-driven decisions.
Related reading: Data mesh architecture
“Data Mesh is certainly the future for our business, and probably for many others, particularly ones which have a legacy of acquisitions, and the need for merging of different data sets to form a new larger entity. Having the ability to query data where it resides using Starburst is enormously powerful and makes a huge impact on the ability for data to provide answers.” Richard Jarvis, CTO, EMIS Group
“The implementation of Starburst on the data lake allows analysts and data scientists quick and simple access to data that exists in the organization for business value and insights. ETL processes that took many months and at high costs have become extremely fast and accessible to analysts at negligible costs.” — Shlomi Cohen, EVP, Head of Business Data and Analytics, Bank Hapoalim
Up to $500 in usage credits included
Up to $500 in usage credits included