Organizations today gather data from many different sources. This means that data teams have to spend a lot of time learning about, managing, and keeping track of various systems.
Every day, these teams receive repetitive requests from their downstream partners, who want more information about the data. To fulfill these requests, data engineers have to find the right system, locate the specific dataset, and go through the related documentation. If all this information was stored in one place, it would save the engineers time and effort.
That’s why we’re excited to announce the public preview of Gravity in Starburst Galaxy. Gravity helps manage all the data catalogs connected to your Galaxy domain – from your cloud data lake to PostgreSQL, Elastic or Snowflake. It offers features to make searching, organizing, governing, and sharing of data assets easy and informative.
The difficult reality of data discovery and governance
With the overabundance of data available today, the easy part is collecting it. The hard part is understanding it. We’ve seen customers run into four key problems:
- Data fragmentation: Data is often scattered across multiple systems, databases, and platforms within an organization. This fragmentation creates silos and complicates the process of discovering and accessing relevant data.
- Inconsistent data governance: Data silos can lead to inconsistent data standards and definitions across different systems and departments. This inconsistency makes it challenging to establish uniform governance policies and ensure data integrity and consistency.
- Lack of metadata and documentation: Inadequate metadata and documentation about data assets make it arduous for data users to comprehend the meaning, origin, quality, and lineage of the data. Without proper documentation, it becomes harder to search for and effectively utilize the data.
- Evolving data landscape: The data landscape is constantly evolving, with new sources, technologies, and platforms emerging regularly. Keeping up with these changes and discovering relevant data becomes an ongoing challenge.
To address these difficulties, organizations are investing in data discovery tools that attempt to streamline the process of finding, understanding, and utilizing data effectively. However, the addition of each new tool is another system teams need to learn, manage, and document. However, each new tool brings more configuration requirements, management, maintenance and documentation.
Gravity’s approach is simple – create a unified access and governance layer that lets you manage all your data in one place. By eliminating the need to juggle multiple access points, data engineers can focus on optimizing data pipeline availability and reliability.
This one centralized hub enables streamlined data management, ensures effective data governance, promotes collaboration, enhances scalability and flexibility, and simplifies data consumption for end-users.
Gravity achieves these goals with:
- Universal search and catalog – Quickly locate relevant datasets in and around your data lake using advanced search capabilities.
- Built-in access controls – Establish and maintain control over all your connected data assets with fine-grained RBAC and ABAC.
- Data product capabilities – Attach business context to curated data sets and then share those data sets with appropriate teams via data products.
- An open architecture – Connect to any data store, file format and table format and have the freedom to change your data stack as new requirements emerge over time.
Let’s take a closer look at how all these features work in Gravity.
Discovery and catalog
At the center of Gravity is an automatic data cataloging solution – simply connect your data sources, and Gravity pulls in all the relevant metadata (including tables, schemas, and descriptions). It provides a unified view of data across sources, systems, and teams, ensuring you have all the necessary information at your fingertips.
Key features include:
Catalog explorer allows users to browse through the metadata of a catalog to view schemas, tables, views, and columns. You can see which schema and tables exist and what columns and data they contain to create queries using the information you discover.
Schema discovery helps teams find and log new files – or any changes to old files – in a data lake. It analyzes a root object in an object storage location and returns the structure of any discovered tables.
Universal search helps you locate any data object connected to your Starburst Galaxy account including cloud RDBMS, OLTP, and data warehouses, among 18+ data sources.. You can search for the names of catalogs, schemas, tables, views, columns, or data products. Search parses Galaxy’s cached description of your catalog, which means it does not require compute to operate.
Built-in access controls
Gravity brings fine-grained centralized governance to all your connected data assets across clouds, regions, and data sources. This means administrators can easily grant permissions to specific data assets or subsets of the data via a simple interface.
We’ve also added a powerful tagging feature that lets you control access to multiple data items at once based on attributes other than role or object to further simplify governance at scale. To learn more about Galaxy’s attribute-based access control, read our announcement blog and tutorial.
Data products capabilities
Gravity also provides the ability to create and manage data products in Starburst Galaxy. Data products increase the discoverability of your data, and are assigned to the data domains that define your business.
You can use data products to accomplish a number of goals. For example, you can expose curated datasets to data consumers, providing high visibility to important data within your organization from a single interface.
Additionally, data products let you supply more detailed business context to users of your datasets with the data products details view. For example, you may decide to include syntax to execute a particular SQL statement that returns useful information, or attach images and links that support the dataset.
An open architecture
Lastly, like everything at Starburst, Gravity was built with openness in mind. Gravity connects to your cloud data sources and is interoperable with nearly any data environment, including all modern open file and table formats.
With Gravity, you also have the ability to bring your own AWS Glue or Hive metastore to give you more flexibility over your data architecture decisions.
Getting started with Gravity in Starburst Galaxy
Gravity in Starburst Galaxy brings all the components of your data together into a single access layer so you can spend less time on accessing multiple systems and more time delivering organized, high quality data assets to your stakeholders. With Gravity, discover where your data lives, catalog valuable metadata, and secure your data – all in one platform.
It’s easy to get started. Simply sign up for Starburst Galaxy today, connect your first data source, and let Gravity do the rest.
Try Starburst Galaxy today
The analytics platform for your data lake