Last Updated: 2024-03-04

Background

Google Sheets is a web-based spreadsheet application developed by Google as part of the Google Suite. Like other spreadsheet applications, it stores data in rows and columns.

Starburst Galaxy is able to connect to Google Sheets data using a Google Sheets catalog. Once connected, this data can be queried just like any other datasource.

Scope of tutorial

In this tutorial, you will learn how to configure a Google Sheets catalog in Starburst Galaxy. To facilitate testing, we've provided sample data in a CSV file.

The first part of this tutorial will walk you through the process of uploading the CSV file to a new Google Sheet. You will then create a catalog in Starburst Galaxy and connect it to your Google Sheet. Finally, you will run a few queries against that catalog to learn more about the nuances of querying a spreadsheet.

Learning objectives

Once you've completed this tutorial, you will be able to:

Prerequisites

About Starburst tutorials

Starburst tutorials are designed to get you up and running quickly by providing bite-sized, hands-on educational resources. Each tutorial explores a single feature or topic through a series of guided, step-by-step instructions.

As you navigate through the tutorial you should follow along using your own Starburst Galaxy account. This will help consolidate the learning process by mixing theory and practice.

Background

It's time to get started, and you're going to begin by getting familiar with the data. The sample data for this tutorial is stored as a CSV file in Google Drive.

You're going to open this CSV file using Google sheets.

Step 1: Open sample CSV file

You're going to begin by signing in to your Google account and accessing the sample CSV file using Google Drive.

Step 2: Record Google service account email

A Google service account JSON key is required to configure secure access between Starburst Galaxy and your Google Sheet. As part of this access configuration, you need to share your Google Sheet with your Google service account email address. This step will show you how to find that email address.

Step 3: Share Google Sheet with Google service account

Now that you have your service account email, it's time to share your new Google Sheet with your service account.

Background

Now it's time to pivot over to Starburst Galaxy to begin the process of connecting your Google Sheets spreadsheet.

Starburst Galaxy separates users by role. Configuring a new catalog will require access to a role with appropriate privileges. Today, you'll be using the accountadmin role.

This is a quick step, but an important one.

Step 1: Sign into Starburst Galaxy

Sign into Starburst Galaxy in the usual way. If you have not already set up an account, you can do that here.

Step 2: Set your role

Your current role is listed in the top right-hand corner of the screen.

Background

Now it's time to create a new Google Sheets catalog in Starburst Galaxy. This catalog will allow Starburst Galaxy to connect to the data held in your Google Sheet.

The steps below will show you how to start the process of configuring a new catalog.

Step 1: Create new catalog

Step 2: Select Google Sheets

Starburst Galaxy allows the creation of catalogs for a number of different data sources. In this case, you are going to create a new catalog in the Google Sheets category.

Step 3: Input name and description

The new catalog needs both a name and description. This ensures that you can find it later.

Background

When you connect Starburst Galaxy to a new datasource, it is necessary to undergo an authentication process. This helps ensure that you are connecting the right data source and that you have the appropriate permissions.

Step 1: Enter connection details

Now it's time to provide connection details. In this case, you'll need the JSON key from your Google service account to proceed.

Step 2: Test and Connect

You're almost there! Time to test the connection and then complete the process of creating your new Google Sheets catalog.

Step 3: Save access controls

Notice that the read-only option is already selected for you, and you are unable to change it. This is because Google Sheets is a read-only catalog.

Background

The catalog is now created, but you are not able to use it until you add it to a cluster. Clusters provide the resources to run queries against catalogs.

Step 1: Add catalog to cluster

You can add your catalog to an existing cluster or create a new cluster.

Background

Now it's time to test out your new catalog by writing queries to access data from Google Sheets.

When you query a Google Sheet catalog in Starburst Galaxy, the basic format of the query looks like this:

SELECT *
FROM
TABLE (catalog_name.system.sheet (id => ‘googleSheetId'));

The two pieces of information you must supply are the catalog_name and googleSheetId. Let's begin by locating your googleSheetId for the sample CSV file.

Step 1: Record Google Sheet ID

A Google Sheet ID is a unique string identifier associated with an individual Google Sheets file. Each Google Sheet has one unique ID, which can be used as a way of referencing that Google Sheet to query the data inside it.

The easiest way to find the Google Sheet ID is by inspecting the URL of the Google Sheet in question. The ID is a part of every Google Sheets URL. This string can then be copied directly from your browser.

Step 2: Query Google Sheet

You're all set to query your Google Sheets catalog. Let's begin with a simple SELECT * to return all columns.

SELECT *
FROM
TABLE (catalog_name.system.sheet (id => ‘googleSheetId'));

Step 3: Query range from Google Sheet

So far, so good. But in production environments, you're more likely to query a range of cells, rather than the entire dataset.

To do this, you will need to add the tab name to your SQL statement.

Let's write a query that will return cells A1 through B4 from the Google Sheet.

SELECT *
FROM
TABLE(catalog_name.system.sheet(
          id => 'googleSheetId',
          range => 'TabName!A1:B4'));

Step 4: Query second range

When specifying a range of cells in a query, it is important to be careful. The Google Sheets connector assumes that the first row in the range is the header row. This can sometimes have unintended consequences.

This is especially important when you are joining a Google Sheet to a table in another data source.

Let's take a look at an example for reference.

SELECT *
FROM
TABLE(catalog_name.system.sheet(
          id => 'googleSheetId',
          range => 'TabName!A2:B5'));

Step 5: Run federated query using Google Sheets and another data source

One of the most powerful things about Starburst Galaxy is its ability to easily join data in Google Sheets with other data sources.

Let's check out this functionality by joining data from the customer Google Sheet with data from the tpch catalog built into Starburst Galaxy.

SELECT t.name, t.phone, o.orderkey, o.orderstatus
FROM
TABLE(catalog_name.system.sheet(
          id => 'googleSheetId')) t
JOIN tpch.tiny.orders o ON t.custkey = t.custkey;

Step 6: Find top ten orders by price

Now it's time to run one more federated query, but this time you'll add some conditional logic to restrict the results to the top ten most expensive orders in descending order.

SELECT t.name, t.phone, o.orderkey, o.orderstatus, o.totalprice
FROM
TABLE(catalog_name.system.sheet(
          id => 'googleSheetId')) t
JOIN tpch.tiny.orders o ON t.custkey = t.custkey
WHERE o.orderstatus = 'O' 
AND o.totalprice > 100000
ORDER BY o.totalprice DESC
LIMIT 10;

Tutorial complete

Congratulations! You have reached the end of this tutorial, and the end of this stage of your journey.

You're all set! Now you can query the data in your Google Sheets spreadsheet.

Continuous learning

At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.

Next steps

Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.

Tutorials available

Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.