Last Updated: 2024-04-05

Background

In this tutorial, you will learn how to add conditional logic to your SQL queries using two different expressions, IF and CASE. You will work through a series of scenario-based activities to facilitate your learning.

The IF expression

The IF expression allows you to conditionally return a value based on a specified condition. It is simple and concise, and is suitable for basic conditional checks with a single condition.

The CASE expression

The CASE expression evaluates multiple conditions in order and returns a result based on the first condition that is true. It also supports an ELSE clause to specify a default result if none of the conditions are met.

The CASE expression is more versatile and allows for multiple conditions and different result values for each condition. It can also be used in various parts of a query, such as in the SELECT clause, WHERE clause, or ORDER BY clause.

Choosing between IF and CASE

In queries where you have a single boolean condition to evaluate, with two possible results, the IF expression can make your query shorter and easier to read. Otherwise, it's better to use the CASE statement.

Scope of tutorial

This tutorial uses Starburst Galaxy to teach you how to use conditional logic in your SQL queries. It is suitable for those that already have a basic understanding of SQL.

Learning objectives

Once you've completed this tutorial, you will be able to:

  • Use the IF expression to add conditional logic to a query.
  • Use the CASE expression to add conditional logic to a query.
  • Choose between IF and CASE based on the query requirements.

Prerequisites

You need a Starburst Galaxy account to complete this tutorial. Please see Starburst Galaxy: Getting started for instructions on setting up a free account.

About Starburst tutorials

Starburst tutorials are designed to get you up and running quickly by providing bite-sized, hands-on educational resources. Each tutorial explores a single feature or topic through a series of guided, step-by-step instructions.

As you navigate through the tutorial you should follow along using your own Starburst Galaxy account. This will help consolidate the learning process by mixing theory and practice.

Background

You'll begin your exploration by learning how to use the IF expression. You can use the IF expression to construct a SQL query that behaves differently based on a given boolean condition.

The syntax for the IF expression is as follows:

IF(condition, true_value, false_value)

If the condition evaluates to true, the true_value argument will be returned. If it evaluates to false, false_value will be returned.

Activity scenario

Imagine you are a data analyst at a firm that sells parts to other businesses.

The company's customers are classified into one of five segments, based on the type of work they do: building, automotive, machinery, household, or furniture.

You have been asked to return a report that classifies the company's customers into one of two segment types, based on their segment. Customers in the building, automotive, and machinery segments will be assigned the "heavy" segment type indicating their connection to "heavy industry". Customers in the household or furniture segments should be assigned the "light" segment type.

This is the perfect opportunity to use the IF expression in a query.

Background

It's time to begin writing queries that use the IF expression. We're going to use the query editor in Starburst Galaxy to complete this activity.

Starburst Galaxy comes with some sample catalogs to get you started. We'll be using one of these, the tpch catalog, for this exercise.

Step 1: Sign into Starburst Galaxy

The tutorial requires you to have a Starburst Galaxy account set up.

  • Sign in to the account using the email associated with the account and password.

Step 2: Set cluster, catalog, and schema

When you write a query, Starburst Galaxy needs to know which datasets you're accessing, and which schemas and tables you need.

Usually this means using the fully-qualified table name, which is written in a specific format: ...

You can think of this as the full name of the table. But just like a full name for a person, the fully-qualified table name is long. To avoid having to write this out each time, you can easily set the cluster, catalog, and schema for your session ahead of time and operate using shorter queries.

  • Using the left-hand navigation menu, select Query>>Query editor.
  • In the query editor, select the appropriate Cluster, Catalog, and Schema using the drop down menus.
  • Set Cluster to free-cluster.
  • Set Catalog to tpch.
  • Set Schema to sf1.

Step 3: Run example query

To see the IF expression in action, let's run a sample query. This one will return the custkey and acctbal from the customer table. It will also return a third column called goodstanding that will have a value of 1 if the acctbal is less than 1000 and a value of 0 otherwise. We're going to limit the results to 5.

  • Copy the following query and paste it into your query editor:
SELECT custkey, acctbal, 
IF(acctbal < 1000, 1, 0) AS goodstanding 
FROM customer 
LIMIT 5;
  • Run the query by clicking the Run (limit 1000) button.

Step 4: Run query for segment type

Now that you have practiced with a simple IF expression, you will test the query that assigns a segment type based on the customer's market segment.

For testing purposes, you will limit the results to 10 rows, rather than returning all customers.

  • Copy the following query and paste it into your query editor:
SELECT custkey, mktsegment,
    IF 
        (mktsegment IN 
        ('BUILDING','AUTOMOBILE','MACHINERY'),
        'heavy','light') 
    AS seg_type
FROM customer
LIMIT 10;
  • Run the query by clicking the Run (limit 1000) button.

Background

Now that you've learned how to use the simple IF expression, it's time to move on to the more versatile CASE expression.

Simple CASE expressions

In the simple form of the CASE expression, a single input value is evaluated for equivalence against a series of possible matching values. When a matching value is found, Starburst returns the given result.

If none of the values equal the input value, the ELSE result will be returned. If there is no ELSE clause and no matching value, NULL is returned.

All the possible return values must be of the same data type.

If there is more than one possible match, Starburst Galaxy will stop at the first match it finds, reading from left to right (or top to bottom).

The syntax for the simple CASE expression is as follows:

CASE input_value
    WHEN match_value THEN result
    [ WHEN ... ]
    [ ELSE result ]
END

Example:

SELECT a,
    CASE a
        WHEN 1 THEN 'one'
        WHEN 2 THEN 'two'
        ELSE 'many'
    END AS b

In the example above, there will be two columns in the result set, a and b. The first column is the value of a. The second column uses a CASE expression to return one of three different results, depending on the value of a.

If a is equal to 1, the CASE expression will return the string one. If a is equal to 2, it will return two. If a is equal to any other value, the ELSE result will be returned – in this example, the string many.

Searched CASE expressions

The searched form of the CASE expression is more powerful and flexible than the simple form. With the simple form, you can only compare one value against the others, and you can only evaluate for equivalence. This is equivalent to asking"is this value equal to that other value?".

With a searched

CASE expression, you can evaluate a series of boolean expressions – in other words, any expression that results in TRUE or FALSE.

As with the simple form, searched CASE expressions are evaluated from left to right (or top to bottom). Evaluation stops and a result is returned for the first expression that evaluates to TRUE.

If no conditions are true, the ELSE result is returned (or NULL if there is no ELSE clause).

The syntax for the searched CASE expression is as follows:

CASE
    WHEN condition THEN result
    [ WHEN ... ]
    [ ELSE result ]
END

Example:

SELECT a, b,
    CASE
        WHEN a > b THEN 'more'
        WHEN a < b THEN 'less'
        ELSE 'same'
    END AS c

In the example above, the result will have three columns. The first column is the value of a, and the second column is the value of b.

The third column uses a searched CASE expression to return a string. If the value of a is greater than b, it returns more. If a is less than b, it returns less. The ELSE clause will only be invoked when a is equal to b and it returns the string same.

Activity scenario

Imagine that you are a data analyst at a retail firm.

You have been asked to return a report that classifies the company's customers into one of three segments based on the value of their customer ID key, which indicates how early in the company's life they became your customer.

The three segments are "innovators", "early adopters", and "majority." Customers with IDs equal to or below 10,000 will be classed as innovators, customers with IDs between 10,000 and 50,000 will be early adopters, and those with IDs above 50,000 will be in the majority segment.

You can use the CASE expression to achieve this goal.

Background

You're going to use the Starburst Galaxy query editor again for this activity. You'll continue using the built-in tcph catalog.

Before jumping into writing the query for segment classification, you're going to run some queries to experiment with the simple and searched forms of the CASE expression.

Step 1: Execute query with simple CASE expression

The following SQL queries the region table from the tpch catalog. It maps region names to their commonly-used abbreviations using the simple form of CASE.

  • Copy the following query and paste it into your query editor:
SELECT  regionkey,
        name,
        CASE name
            WHEN 'AFRICA' THEN 'EMEA'
            WHEN 'AMERICA' THEN 'AMER'
            WHEN 'ASIA' THEN 'APAC'
            WHEN 'EUROPE' THEN 'EMEA'
            WHEN 'MIDDLE EAST' THEN 'EMEA'
            ELSE 'UNKNOWN'
        END AS abbreviation
FROM region;
  • Run the query by clicking the Run (limit 1000) button.

Step 2: Execute query with searched CASE expression

Now you will try out the searched form of CASE.

You may have noticed in the previous query that some of the regions mapped to the same abbreviation. You can simplify that query by using a searched CASE expression instead.

  • Copy the following query and paste it into your query editor:
SELECT regionkey,
        name,
        CASE
            WHEN name IN ('AFRICA', 'EUROPE', 'MIDDLE EAST') THEN 'EMEA'
            WHEN name = 'AMERICA' THEN 'AMER'
            WHEN name = 'ASIA' THEN 'APAC'
            ELSE 'UNKNOWN'
        END AS abbreviation
    FROM region;
  • Run the query by clicking the Run (limit 1000) button.
  • Notice that this query returns the same results as the previous query.

Step 3: Run customer segment query

It's time to write the query that segments customers based on their time with the company.

Question: Given what you have learned so far about using CASE, which form do you think is more appropriate for this query – simple or searched?

If you answered "searched" you are correct! We need to use inequality comparisons in our query to segment customers, so the searched CASE expression is the most appropriate.

  • Copy the following query and paste it into your query editor. Note that we are using a WHERE clause to limit the number of records returned:
SELECT  custkey,
        name,
        CASE
            WHEN custkey <= 10000 THEN 'innovator'
            WHEN custkey > 10000 AND custkey <= 50000 THEN 'early adopter'
            WHEN custkey > 50000 THEN 'majority'
        END AS adopt_segment
FROM customer
WHERE custkey IN (1000,10000,15000,50000,100000)
ORDER BY custkey ASC;
  • Run the query by clicking the Run (limit 1000) button.

Tutorial complete

Congratulations! You have reached the end of this tutorial, and the end of this stage of your journey.

You have learned how to write queries that use powerful conditional logic. As a reminder, the IF expression is a good choice when you need to include a calculated column in your query results that will have one of two possible values, based on a single boolean condition.

For more complex conditional logic, the CASE expression is the recommended option.

Continuous learning

At Starburst, we believe in continuous learning. This tutorial provides the foundation for further training available on this platform, and you can return to it as many times as you like. Future tutorials will make use of the concepts used here.

Next steps

Starburst has lots of other tutorials to help you get up and running quickly. Each one breaks down an individual problem and guides you to a solution using a step-by-step approach to learning.

Tutorials available

Visit the Tutorials section to view the full list of tutorials and keep moving forward on your journey!

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.