Welcome to the Advanced SQL Features in Presto series. In this series you are going to cover a set of SQL features that expands beyond basic SQL commands. The features you are going to start with are foundational and you will use them throughout the rest of the series. These are generally useful in a whole bunch of situations, especially when finessing the output of your queries. This list of examples is in no way comprehensive so feel free to check the documentation for more examples.
Advanced SQL Features in Presto: Part 1
Format function
You often have numbers, dates, or strings that you want formatted into one string in a specific way in your output. Perhaps you are creating a report and you want to show a number in a certain way. The format function can let you do that. The format function returns a formatted string using the specified java.util.Formatter format specifiers and arguments.
The specifiers follow formats depending on the value types passed to them. To understand what each component does and the list of possible values, refer to this documentation. Notice, that the only required field (without brackets) is only the conversion specifier. All the others specifiers are optional.
- Numbers:
%[argument_index$][flags][width][.precision]conversion
- Dates:
%[argument_index$][flags][width]conversion
The first example formats the pi function. This function just returns the value of pi, formatted using the format conversion specifier %.5f. The dot means to put a decimal point in here and the five says to format it with five decimal places and the f says to format it as a floating point number, and you can see that it outputs pi with five decimal places.
presto> SELECT format(‘pi = %.5f’, pi());
_col0 |
Next, format can generate leading decimal digits using %03d. the zero is actually a format specifier that indicates to format with leading decimal digits or leading zeros . The three says to format with a minimum of three digits which outputs 007.
presto> SELECT format(‘agent %03d’, 7);
_col0 |
To format big numbers like monetary values with commas, you can use %, operator. Notice, if you check the format string documentation you see that the comma is a flag that has the result include locale-specific grouping separators. In this case, the grouping separators will default to the US locale and will place a comma every 3 numbers.
presto> SELECT format(‘$%,.2f’, 1234567.89);
_col0 |
You can use %-7, to print trailing spaces. And %7 (without the -) indicates to print leading spaces. This prints the string hello world with two trailing and leading spaces.
presto> SELECT format(‘%-7s,%7s’, ‘hello’, ‘world’);
_col0 |
To specify the location and frequency of the arguments passed to the format function. So unlike the hello world example, where both arguments were printed in the same order that they are in the arguments, you can format things in a different order and multiple times in the same string.
presto> SELECT format(‘%2$s %2$s %3$s %1$s’, ‘a’, ‘b’, ‘c’);
_col0 |
Finally, you can also format dates where you have only passed the date into the function one time, but then you format it multiple times by referencing the date argument multiple times here. Then you can print the date with the day of the week, and then the month of the year as a name.
presto> SELECT format(‘%1$tA, %1$tB %1$te, %1$tY’, date ‘2006-07-04’);
_col0 |
CASE expressions
Case expressions let you do conditional logic and return different values depending on what the input is. There’s two different forms of CASE expressions:
- Simple CASE,
- A more generic search CASE
Simple CASE expressions
In the first CASE expression you pass in a value and then when the value is equal to some value, you return a specific output. When column n is one, then it returns the string ‘one’, when n is two, it returns the string ‘two’, otherwise it returns the string ‘many’.
presto> SELECT n, CASE n WHEN 1 THEN ‘one’ WHEN 2 THEN ‘two’ ELSE ‘many’ END AS name FROM (VALUES 1, 2, 3, 4) AS t (n);n | name —+—— 1 | one 2 | two 3 | many 4 | many (4 rows) |
Searched CASE expressions
The other form of CASE expressions is more generic. Instead of passing in a value on the CASE keyword, WHEN conditions are used with arbitrary SQL expressions. When column n equals one then it returns the string ‘aaa’, when n is in a list the values, it returns the string ‘bbb’, otherwise it returns the string ‘ccc’. You can have any number of WHEN expressions here but it’s important to note that the WHEN expressions are evaluated in the order that they appear in the CASE expression. So if you have overlapping conditions that evaluate true, for example when n is 1 in the following example, the first condition in the list is the only one that will be evaluated. Notice, when n is 1 it returns ‘aaa’ not ‘bbb’.
presto> SELECT n, CASE WHEN n = 1 THEN ‘aaa’ WHEN n IN (1, 2, 3) THEN ‘bbb’ ELSE ‘ccc’ END AS name FROM (VALUES 1, 2, 3, 4) AS t (n);n | name —+—— 1 | aaa 2 | bbb 3 | bbb 4 | ccc (4 rows) |
IF expressions
Similar to CASE, there is an IF expression. The IF expression takes a boolean expression to evaluate, a return value when it’s true, and a return value when it’s false. In this example, you use the IF expression to pluralize some return values. So you want to make it singular when there is one item, otherwise you want to make it plural. You can do some really powerful formatting in your final output with a combination of IF and CASE and the format function.
presto> SELECT format(‘I have %s cat%s.’, n, IF(n = 1, ”, ‘s’)) AS text FROM (VALUES 0, 1, 2, 3) AS t (n); text —————- I have 0 cats. I have 1 cat. I have 2 cats. I have 3 cats. (4 rows) |
TRY Expressions
If you know that your data is a little bit dirty and you want to keep processing when you encounter bad input where most of the data is clean, the TRY expression can come in handy. Presto will fail the query whenever you hit bad data because it is designed to always return correct answers. The Presto philosophy holds returning the correct answer is of the utmost importance for a database because people have to trust your results.
In the following example you divide by zero, a bad cast from a string to an integer, and an integer overflow which all cause a query to fail. If you wrap these types of statements in a TRY, you get NULL as a response but the query will keep going. You can use CASE or COALESCE to turn those NULLs into some other default value if you want.
It is important to be careful with the TRY expression because whatever it wraps, it will suppress any errors for the entire statement. For example, if an exception occurs within a deeply nested expression and you wrap the entire expression in TRY, that exception will be caught and possibly cause unexpected missing data from your results. You should be mindful when using these TRY expressions.
presto> SELECT TRY(8 / 0) div_zero, TRY(CAST(‘abc’ AS integer)) not_integer, TRY(2000000000 + 2000000000) overflow;div_zero | not_integer | overflow ———-+————-+———- NULL | NULL | NULL (1 row) |
Lambda Functions
Presto also supports lambda functions, which behave similarly to those found in more modern programming languages. Lambda expression allows you to write a function inline and then pass it into another function, which then gets applied over your data. There are two forms of lambda functions; one input and multiple inputs. The lambda expression has the inputs on the left side, an arrow, and then the output expression on the right that’s gonna be evaluated using the inputs.
If you have a lambda function with multiple inputs, you have to put a parentheses around the input arguments. If you have a single input lambda function, parentheses are optional. Below you will use a transform function that takes an array and the single input lambda function that increases each array element by 8. To demonstrate the multiple input lambda expression, you will use a reduce function that applies the lambda over all the array elements to produce a final output.
presto> SELECT transform(ARRAY [5, 6], x -> x + 8); _col0 ———- [13, 14] (1 row)presto> SELECT reduce(ARRAY [5, 20, 50], 0, (s, x) -> s + x, s -> s); _col0 ——- 75 (1 row) |
This is first post of the Advanced SQL Features in Presto blog series. You just learned about a few of the many functions and SQL syntax that will be used throughout this series and is useful in general. You covered the format function, CASE and IF expressions, TRY expressions, and Lambda functions. This post should give you a taste for what is possible and you’ll definitely wanna read the documentation to learn about all the different ways that you can use the SQL functions. These are really powerful tools and really handy when you need to have formatted values in a report. In the next post, you will cover handling JSON using SQL in Presto.
What are some next steps you can take?
Below are three ways you can continue your journey to accelerate data access at your company
- 1
- 2
Automate the Icehouse: Our fully-managed open lakehouse platform
- 3
Follow us on YouTube, LinkedIn, and X(Twitter).