Last Updated: 2023-12-15

Background

Starburst Galaxy includes a universal search feature. This assists data discoverability across catalogs, schemas, tables, views, columns, data products, tags, owners, or contacts. This added transparency is backed-up by role-based security access, ensuring maximum visibility to those with the necessary privileges, while restricting access to other users.

This helps break down organizational knowledge silos, freeing data consumers to efficiently find, query, and analyze datasets more easily. It also assists data engineers and platform administrators, by providing a global view across the data pipeline.

Prerequisites

You need a Starburst Galaxy account to complete this tutorial. Please be sure to complete the tutorial titled Starburst Galaxy: Getting started before attempting this tutorial.

Learning outcomes

Upon successful completion of this tutorial, you will be able to:

About Starburst tutorials

Starburst tutorials are designed to get you up and running quickly by providing bite-sized, hands-on educational resources. Each tutorial explores a single feature or topic through a series of guided, step-by-step instructions.

As you navigate through the tutorial you should follow along using your own Starburst Galaxy account. This will help consolidate the learning process by mixing theory and practice.

Tutorial scenario

The data engineers at Chryse Corp. aim to enhance the discoverability of datasets used by data analysts. To achieve this, they plan to add both tags and metadata. This will make it easier for data analysts to find and access the relevant data, improving overall data exploration and analysis processes.

You'll help them start this process by adding tags and metadata to their Starburst Galaxy datasets, focusing on the astronauts and missions tables that are included in the demo catalog. Then, you'll use Starburst Galaxy's search features to find these datasets.

Background

Universal search uses metadata to create a searchable index of different data assets across all catalogs and clusters connected with your account. This synopsis is meant as a jumping-off point for further data discovery and querying activities.

Notably, this universality extends to assets held across different clouds, whether it be AWS, Azure, or GCP. Although you can see the location of this data, you cannot transfer data across clouds and the results are based on metadata rather than the data itself.

Universal search is continuously updated on a streaming basis, so any changes you make within Starburst Galaxy will take immediate effect. Changes made outside of Starburst Galaxy are updated less frequently. Those changes will be reflected in a batch process updated approximately once every 24 hours.

Video: Use tags in Starburst Galaxy

The following video walks through the first two sections in this tutorial. It shows you how to create tags and add metadata to tables and columns.

You can choose to watch the video and follow along using your own account. Alternatively, if you prefer, you can skip the video and proceed directly to the step-by-step instructions provided later in the tutorial.

Background

In Starburst Galaxy, you can add tags to data entities, including catalogs, schemas, tables, views, columns, data products, tags, owners, or contacts.

Universal search works in tandem with this, allowing users to filter their results based on the types of tags involved.

In this section of the tutorial, you'll create a set of tags to help data consumers find data. Specifically, you'll use the astronaut dataset to create a missions tag and two additional tags nested under missions, called personnel and info.

Step 1: Sign into Starburst Galaxy

Sign into Starburst Galaxy in the usual way. If you have not already set up an account, you can do that here.

Step 2: Verify that your role is set to accountadmin

Only the data entity owner can add metadata to data entities. In this tutorial, you'll add metadata from the accountadmin role.

Step 3: Open the Access control menu

In Starburst Galaxy, tags are part of the Access control menu.

Step 4: Create a missions tag

In Starburst Galaxy, you can create tags nested within other tags. This is exactly what you're going to do in this tutorial.

You'll begin by creating a top-level tag called missions. It will help data consumers identify data entities that contain mission information. Afterwards, you'll create two other tags nested inside the missions tag.

Step 5: Create a nested tag under the missions tag

Now it's time to explore how to create nested tags.

You're going to start by creating a personnel tag nested under the missions tag that you created in the previous step.

Step 6: Create a second nested tag under the missions tag

Now it's time to create your second nested tag, info. This tag identifies columns that hold general mission information.

Step 7: View tags

You've created three tags, but you can only see the missions tag listed in the tags section.

That's because this is the only top-level tag you created, the other two were nested inside missions.

Background

Universal search works by using metadata, but not all of your tables and columns have metadata from the outset. Luckily, Starburst Galaxy allows you to add metadata at any point to columns and tables.

In this section of the tutorial, you'll add metadata to the astronauts table and several of its columns. Later in this tutorial, you'll use this metadata with universal search.

Step 1: Select astronauts table in catalog explorer

To add metadata to a table, you need to select the table in the catalog explorer.

Remember that Starburst Galaxy uses the catalog.schema.table hierarchy. You're going to navigate down that hierarchy until you find the astronauts table.

Step 2: View the metadata for the astronauts table

Starburst Galaxy displays important basic information about the astronauts table.

You can access additional information about the table's metadata by expanding these details.

Step 3: Add a description to the table

Starburst Galaxy shows you several metadata fields that you can edit. Each of these has a pencil icon next to it, allowing you to update or add additional metadata.

You're going to begin by adding to the table description, which is currently empty.

Step 4: Add a tag to the table

Now you're going to add additional metadata to the astronauts table by adding a tag.

Just like before, you can do this by selecting the corresponding pencil icon, this time in the Tags row.

Step 5: Add a contact to the table

You've added two types of metadata.

Now you're going to do the same thing by editing the Contacts field.

Step 6: Add a tag to a column

Now it's time to pivot towards looking at columns.

Starburst Galaxy lists each of the columns in the table, making it easy to add metadata at the column level. For this tutorial, you're going to add metadata to specific columns in the astronauts table.

Notice that each column in this table already has one tag listed. This is because you added the missions.personnel tag to the whole table, and each column inside the table has inherited it.

Let's add a tag to the mission_number column.

Step 7: Add a description to a column

You can add descriptions to columns just like you did with tables.

Let's add a description to the mission_number column to test it out.

Step 8: View the tags summary at table level

Now that you have added tags to the astronauts table and the columns inside it, it's time to explore how Starburst Galaxy reports tag usage.

You're going to start by looking at tags at the table level first.

Step 9: View the tags summary at column level

Notice that the nested tags missions.info and missions.personnel are listed as being in use, denoted by the 1.

Even though each column in the astronauts table inherited the missions.personnel tag, it is only considered to be in-use once because it was added to a single data entity.

Step 10: Manage tag settings

Starburst Galaxy also allows you to manage tag settings for column-level tags. This allows you to see where the tag is being used, remove the tag, and edit any of its settings.

The following video guides you through the remaining steps in this tutorial. Specifically, it shows you more information about using universal search.

You can choose to watch the video and follow along using your own account. Alternatively, if you prefer, you can skip the video and proceed directly to the step-by-step instructions provided later in the tutorial.

Background

Universal search allows you to use keywords to find a number of different types of data entities. These include:

String matching

For the search to function properly, the keyword in the search term must match the first part of a data entity's name. It can also match the characters after an underscore.

For example, the search term ‘cust' would return a data entity named customer, but also an entity named profile_customer. However, a search for the term ‘omer' or ‘file' would not return either of these results because the matching occurs only on the first part of strings.

Filtering

When a keyword matches a data entity, you can further filter the results by:

Universal search is improving rapidly, and keyword searches will match with more metadata in the future. For even more information, review the documentation.

Step 1: Access universal search menu

Universal search can be accessed in two different ways.

Step 2: Begin searching keywords

It's time to test out universal search using the tags you added earlier in this tutorial.

Universal search works like many other search systems and involves keywords. You can choose how you want to filter the results. The default filter is datasets, but you can choose to filter by data products, tags, owners, or contacts.

Step 3: Search for tags

Now it's time to search for one of the tags you added. This works in a similar way to searching for Datasets, but with one twist.

Let's explore that twist in more detail.

Step 4: Exploring the search results menu

Universal search provides the best matches first, but sometimes it's necessary to dig deeper. This is when the Search results menu comes in, providing a number of search results and filter options.

Step 5: Change the search filter

Search filters can be updated. You can filter a search by asset type, catalog, tag, contact, and owner.

Right now you are filtering on the missions.personnel tag. Let's change the filter type to see how the results change.

Step 6: Using the catalog explorer search field

You can also use the Catalog explorer to search for catalogs, schemas, tables, views, and columns.

This type of search works in a similar way to universal search, and uses the same matching process.

Step 7: Enter a catalog explorer search

Time to try your first search in this field.

Tutorial complete

Congratulations! You have reached the end of this tutorial, and the end of this stage of your journey.

Now that you've completed this tutorial, you should have a better understanding of just how easy and convenient it is to use universal search in Starburst Galaxy.

Continuous learning

At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.

Next steps

Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.

Tutorials available

Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.