Last Updated: 2024-04-16

Background

Azure Private Link is a Microsoft Azure service that enables you to securely connect your Azure Virtual Network to Azure Platform as a Service (PaaS) resources, Azure Virtual Machine (VM) instances, and Azure Kubernetes Service (AKS) clusters. This approach provides a secure way to access these services over a private endpoint located inside your virtual network, eliminating the need to expose connections to the public internet.

Starburst Galaxy extends support for Azure Private Link across specific catalogs. This tutorial will guide you through the process of configuring PrivateLink for Azure data lake storage (ADLS).

Scope of tutorial

In this tutorial, you will learn how to configure Azure Private Link for Azure data lake storage (ADLS) using a Starburst Galaxy metastore. You will not cover the internal steps performed by the Starburst technical resource.

Although the tutorial is written for users of the Starburst Galaxy metastore, you can also use an external Hive metastore running on a virtual machine. If you choose this approach, you will also need to complete another tutorial, Configure Azure Private Link for a database or HMS running on a VM.

Learning objectives

Once you've completed this tutorial, you will be able to:

Prerequisites

About Starburst tutorials

Starburst tutorials are designed to get you up and running quickly by providing bite-sized, hands-on educational resources. Each tutorial explores a single feature or topic through a series of guided, step-by-step instructions.

Background

If you are configuring Private Link for the first time you are encouraged to work with a Starburst technical resource. This individual will work with you to set up the environment needed to complete the tutorial.

Contacting your technical resource

To be assigned this resource, you should reach out to your regular Starburst account team for assistance.

Working together

Once assigned, your Starburst technical resource will work with you to set up an environment where you can complete the tutorial.

Please review the following overview of this process before beginning the tutorial.

Your responsibilities:

Background

Understanding the Azure Private Link architecture is important when completing the steps in this tutorial. In this section you will learn about this architecture and the way that Starburst Galaxy uses it to securely connect private clouds.

This tutorial also follows a corresponding Azure quickstart on the same topic. It is recommended that you consult this documentation if you want to learn more about Azure Private Link.

Reference architecture

The following diagram illustrates a Private Link connection to an ADLS account.

Review the diagram to ensure that you understand the architecture that you will create in this tutorial.

Background

It's time to get started. In this section, you'll begin by confirming that your Azure storage account is configured for ADLS Gen2. After that, you'll obtain your Azure storage account ID.

You'll need to provide this information to the Starburst support team so that they can create the private endpoints.

Step 1: Sign in to Azure portal

You're going to start by signing in to the Azure portal. Remember to sign into the account containing the Azure storage account that you would like to connect using Private Link. If you use multiple Azure accounts, ensure that you pick the correct one.

Step 2: Select storage account

Now it's time to find the right storage account. Depending on your workflow, you might have multiple storage accounts in the same Azure account. Once again, make sure you select the correct one.

Step 3: Confirm hierarchical namespace property

Azure Data Lake Storage Gen2 is a set of capabilities that you use with the Blob Storage service of your Azure Storage account. Notably, ADLS Gen2 includes hierarchical namespace functionality, allowing users to organize their data into directories and subdirectories. This facilitates better organization and management compared to ADLS Gen1.

This step will show you how to confirm that a hierarchical namespace has been enabled on your ADLS account.

Step 4: Record storage account resource ID

Next, it's time to record your storage account resource ID. Starburst support will need this ID to create two private endpoints in the Starburst Galaxy Vnet.

Step 5: Open support ticket

You are going to use the automated assistant in Starburst Galaxy to open a support ticket and provide support with the Storage account resource ID that you just copied. You will also need to provide your preferred Starburst Galaxy Private Link configuration name.

Background

When working with private endpoints in Azure Data Lake Storage Gen2, it is considered best practice to create two private endpoints:

This is because operations that target the Data Lake Storage Gen2 resource may be redirected to Blob Storage and vice versa. Creating two private endpoints ensures that all operations will complete successfully.

Starburst support will use the Storage account resource ID that you provided to create these private endpoints. You will then need to manually accept the endpoint connections.

Step 1: Access private endpoint connections settings

You're going to begin by selecting your private endpoint connection settings. This is found in the Networking section of the Azure portal.

Step 2: Accept connections

Once Starburst support has created the private endpoints, you will see the connections listed as Pending.

Step 3: Create an ADLS catalog using Private Link

The process of creating an ADLS catalog in Starburst Galaxy using Private Link follows the process of creating a non-Private Link catalog.

The details of this process are covered in the tutorial Configure an Azure Data Lake Storage (ADLS) catalog.

Tutorial complete

Congratulations! You have reached the end of this tutorial, and the end of this stage of your journey.

You're all set! Now you can use Private Link to configure access to data in your Azure Data Lake Storage account.

Continuous learning

At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.

Next steps

Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.

Tutorials available

Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.