Last Updated: 2024-04-16

Background

Azure Private Link is a Microsoft Azure service that enables you to securely connect your Azure Virtual Network to Azure Platform as a Service (PaaS) resources, Azure Virtual Machine (VM) instances, and Azure Kubernetes Service (AKS) clusters. This approach provides a secure way to access these services over a private endpoint located inside your virtual network, eliminating the need to expose connections to the public internet.

Starburst Galaxy extends support for Azure Private Link across specific catalogs. This tutorial will guide you through the process of configuring Private Link for Azure data lake storage (ADLS).

Scope of tutorial

In this tutorial, you will learn how to configure Azure Private Link for Azure data lake storage (ADLS). You will not cover the internal steps performed by Starburst technical support.

When you configure an ADLS catalog in Starburst Galaxy, you have two metastore options, the Starburst Galaxy Metastore and Hive Metastore. If you plan on using your own Hive Metastore, you must also configure Private Link access to it. The steps for doing so are included in this tutorial for those that need them.

Learning objectives

Once you've completed this tutorial, you will be able to:

Prerequisites

About Starburst tutorials

Starburst tutorials are designed to get you up and running quickly by providing bite-sized, hands-on educational resources. Each tutorial explores a single feature or topic through a series of guided, step-by-step instructions.

Background

If you are configuring Private Link for the first time you are encouraged to work with a Starburst technical resource. This individual will work with you to set up the environment needed to complete the tutorial.

Contacting your technical resource

To be assigned this resource, you should reach out to your regular Starburst account team for assistance.

Working together

Once assigned, your Starburst technical resource will work with you to set up an environment where you can complete the tutorial.

Please review the following overview of this process before beginning the tutorial.

Your responsibilities:

If you are using an HMS on a VM:

Background

Understanding the Azure Private Link architecture is important when completing the steps in this tutorial. In this section you will learn about this architecture and the way that Starburst Galaxy uses it to securely connect private clouds.

This tutorial also follows a corresponding Azure quickstart on the same topic. It is recommended that you consult this documentation if you want to learn more about Azure Private Link.

Reference architecture

The following diagram illustrates a Private Link connection to an ADLS account.

Review the diagram to ensure that you understand the architecture that you will create in this tutorial.

Background

It's time to get started. In this section, you'll begin by confirming that your Azure storage account is configured for ADLS Gen2. After that, you'll obtain your Azure storage account ID.

You'll need to provide this information to the Starburst support team so that they can create the private endpoints.

Step 1: Sign in to Azure portal

You're going to start by signing in to the Azure portal. Remember to sign into the account containing the Azure storage account that you would like to connect using Private Link. If you use multiple Azure accounts, ensure that you pick the correct one.

Step 2: Select storage account

Now it's time to find the right storage account. Depending on your workflow, you might have multiple storage accounts in the same Azure account. Once again, make sure you select the correct one.

Step 3: Confirm hierarchical namespace property

Azure Data Lake Storage Gen2 is a set of capabilities that you use with the Blob Storage service of your Azure Storage account. Notably, ADLS Gen2 includes hierarchical namespace functionality, allowing users to organize their data into directories and subdirectories. This facilitates better organization and management compared to ADLS Gen1.

This step will show you how to confirm that a hierarchical namespace has been enabled on your ADLS account.

Step 4: Determine public network access setting

ADLS accounts provide three options for public network access:

You need to determine the public network access configuration for the storage account you are about to configure for Private Link.

Step 5: Record storage account resource ID

Next, it's time to record your storage account resource ID. Starburst support will need this ID to create two private endpoints in the Starburst Galaxy Vnet.

Step 6: Open support ticket

You are going to use the automated assistant in Starburst Galaxy to open a support ticket and provide support with the Storage account resource ID that you just copied. You will also need to provide your preferred Starburst Galaxy Private Link configuration name.

Background

When working with private endpoints in Azure Data Lake Storage Gen2, it is considered best practice to create two private endpoints:

This is because operations that target the Data Lake Storage Gen2 resource may be redirected to Blob Storage and vice versa. Creating two private endpoints ensures that all operations will complete successfully.

Starburst support will use the Storage account resource ID that you provided to create these private endpoints. You will then need to manually accept the endpoint connections.

Recall that when you accept your first Private Endpoint connection to an ADLS Storage account, existing access will in turn be blocked. Please keep this in mind If you are about to accept the first endpoint connection.

Step 1: Access private endpoint connections settings

You're going to begin by selecting your private endpoint connection settings. This is found in the Networking section of the Azure portal.

Step 2: Accept connections

Once Starburst support has created the private endpoints, you will see the connections listed as Pending.

Background

You should only complete this section if both of the following are true:

If you meet both of the above criteria, you need to make sure that your HMS is set up to use Private Link.

In this section, you'll begin by determining if your HMS is already set up for Private Link. If it isn't, you can follow the steps provided to set it up. This will require you to configure both a public and internal load balancer for your HMS.

Step 1: Confirm HMS Private Link status

You can confirm the Private Link status of your HMS with a few quick clicks in the Azure portal.

Step 2: Add public load balancer

If this is your first time setting up a public load balancer for this VM, it's crucial to understand that once configured, all outbound traffic will be directed through this load balancer. However, it's important to note that by default, a public load balancer does not include an outbound rule. This means that from the moment the load balancer is added until the outbound rule is established, your HMS will be unable to access the public internet.

This is significant because Azure relies on a public endpoint to authenticate your Service Principal Name (SPN) credentials or Access Key for your ADLS account. Any actions initiated from your HMS, such as attempting to run a CREATE SCHEMA command, during this interim period will result in failure.

The good news is that creating the outbound rule typically only takes a few minutes after the load balancer setup. Once this rule is in place, your HMS will no longer encounter issues with CREATE SCHEMA commands.

However, it's important to issue a final caution: Azure VMs default to caching credential verifications for a certain period. Therefore, if you test the CREATE SCHEMA command after completing all the aforementioned steps, it's advisable to either perform it on a completely new ADLS storage account or restart your HMS to ensure the credential cache is cleared.

Step 3: Configure public load balancer

It's time to add configuration details for the load balancer.

Step 4: Add outbound rule to public load balancer

This step will ensure that outbound traffic is allowed. It's ok if your environment requires a slight deviation to adhere to security protocols. The important thing is that you create an outbound rule that allows traffic from this VM running your HMS on port 443.

Step 5: Add internal load balancer for Private Link

Now it's time to add the internal load balancer that is required for Private Link. This follows a very similar process to adding a public load balancer.

Step 6: Configure internal load balancer

Step 7: Create Private Link service

Now that you have your load balancers set up, you're ready to create the Private Link service for your HMS.

Step 8: Configure Private Link outbound settings

Step 9: Complete Private Link service configuration

Step 10: Record Private Link service Alias

Step 11: Accept connection for Private Link service

Once Starburst support has created the private endpoint to your Private Link service, you will see the connection listed as Pending under Private endpoint connections.

The screenshots below show an example of two private endpoint connections. Yours will only have one.

Tutorial complete

Congratulations! You have reached the end of this tutorial, and the end of this stage of your journey.

You're all set! Now you can use Private Link to configure access to data in your Azure Data Lake Storage account.

Continuous learning

At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.

Next steps

Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.

Tutorials available

Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.