Published: April 7, 2023
Structured Query Language (SQL) is a query language used to access and manage data stored in database management systems (DBMS) and nearly all relational database management systems (RDBMS). The syntax differs slightly across each DBMS but all are fundamentally based on a standard version of SQL, known as ANSI SQL. This acts as a template document describing the official features of the language.
According to IEEE Spectrum, SQL is the second programming language everyone needs to know. Luckily, SQL is an easy language to learn. However, a drawback to learning SQL is the difficulty in finding an environment in which to practice.
That’s why learning more about SQL using Starburst Galaxy is powerful. Special emphasis will be placed on understanding how SQL operates in real usage conditions.
First, navigate to Galaxy and follow these step-by-step instructions to set up your account.
Before you begin writing SQL in Starburst Galaxy, there are a few terms you should know — cluster, catalog, and schema.
There is a hierarchical relationship between these constructs, described by the image below.
As you can see:
When you log in to your Galaxy account, you should see the sample cluster, sample catalog, and demo schema available for querying under Cluster explorer. These are the cluster, catalog, and schema you will use for the SQL exercises.
When you write a query to gather information from one or more tables, you have to let the system know which table(s) contain the data you require. Usually this means using the fully-qualified table name.
The fully-qualified table name is written in the format <catalog_name>.<schema_name>.<table_name>. To avoid having to write this out each time, you can easily set the cluster, catalog, and schema for your session ahead of time.
You can set the cluster, catalog, and schema for your session from the top right corner of your screen. The schema list will become available once you choose your catalog.
Set the cluster and catalog to sample and set the schema to demo. There are two tables within the demo schema — astronauts and missions.
SQL exists for many purposes. Think of SQL as coming in two distinct flavors: Data Definition Language (DDL) and Data Manipulation Language (DML).
DDL operations create and define both structures and objects in a database and define the metadata within a DBMS.
Some examples of DDL operations include CREATE TABLE, CREATE INDEX, ALTER TABLE, and DROP VIEW.
DML operations, the focus of this lesson, modify data within a database. DML is used to manipulate data. Records , or rows, are created, modified, and accessed via DML.
Some examples of DML operations are INSERT INTO table VALUES (‘val1’, ‘val2’, ‘val3’), SELECT…FROM…JOIN…WHERE.
SELECT
statement?SQL keywords and functions are not case-sensitive.
SELECT is the same as select. However, by convention, key words and functions are written in all uppercase letters.
SELECT
statement in SQL?Now, let’s take a closer look at an example of a DML command.
The SELECT statement specifies the output of a SQL query. In its simplest form, it identifies the columns to select and the table from which to select. The data returned from the SELECT statement is called the result-set.
The general syntax for a basic SELECT statement is:
SELECT column1, column2,…
FROM table_name;
In the Starburst Galaxy query editor, view the data that is available for analysis in the astronauts table. To do so, type the following query into the query editor and press Run (limit 1000).
SELECT * FROM astronauts;
SELECT*
mean?The * returns all columns from a table. To improve query performance, It is better practice to identify which columns you would like, rather than using the *.
Now, imagine that you wanted to read the data in specific columns. Selecting all of the data would be an inefficient way of doing that. Luckily, you can restrict the columns that are returned by your query. To do this, you will need to specify the columns by name that you would like to include. This is called projections. The example below shows projection in action. Notice that the column names are separated by commas:
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
astronauts;
This approach works well, but it returns a lot of results. Sometimes, this might present a problem. Luckily, you can also restrict the number of results displayed. To do this, use the LIMIT clause to specify the number of rows that you would like to return. In the example below, this is set to 10.
You can also use the AS
keyword to create an alias for each column. Sometimes the existing column names are very long or aren’t meaningful enough. Using AS is a simple way to set the column name in the results to something different. This alias only exists for the duration of the query.
SELECT
name AS full_name,
nationality,
mission_title,
mission_number,
hours_mission AS mission_duration
FROM
astronauts
LIMIT 10;
WHERE
used in SQL?Imagine that you wanted to restrict the results of your query so that only results that fit a given condition were returned. This would be an example of conditionality, and would apply logical conditions to your search results. If a record fits those conditions, it will be returned; if it does not fit those conditions, it will not be returned. To accomplish this, the WHERE
clause is used. The syntax for the command is shown below:
SELECT
column1,
column2, ...
FROM
table_name
WHERE
condition;
It’s time to try this out. Use the WHERE clause in the code below to query the missions table and only return missions that are classified as a success. Note that strings must be enclosed in single quotes.
SELECT
company_name,
status_rocket,
cost,
status_mission
FROM
missions
WHERE
status_mission = 'Success';
Do you notice anything interesting? There are many astronauts from the U.S. and U.S.S.R. You can use the WHERE clause with the NOT LIKE operator to only return the astronauts from countries that are not the U.S. or U.S.S.R. You’ll also need to include a wildcard, which substitutes characters in a string. Two of the most common wildcards used with the LIKE (or NOT LIKE) operator are the percent sign (%) and underscore (_).
If you are trying to search for countries that are not the U.S. or U.S.S.R. which wildcard do you think is needed? Notice that both countries start with U.S., but one has extra characters, while the other does not. For this reason, the % wildcard is suitable.
Run the following query to see all of this in action:
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
astronauts
WHERE
nationality NOT LIKE 'U.S.%';
WHERE
clause operatorsThere are several operators that can be used in the WHERE clause. Use this table as your guide.
Operator |
Description |
> | greater than |
< | less than |
= | equal |
>= | greater than or equal |
<= | less than or equal |
<> | not equal |
IN | specifies the set of possible values |
BETWEEN | specifies a range of possible values |
LIKE | specifies a pattern |
Q: How many astronauts were born during or before the year 1930?
A: The answer is 70. To get this answer, run the following query:
SELECT
name,
year_of_birth
FROM
astronauts
WHERE
year_of_birth <= 1930
Imagine that you wanted to control the order in which results were displayed. To do this, you can use the ORDER BY clause to sort a result set by one or more output expressions. These results are returned in ascending order by default. To sort the records in descending order, use the DESC keyword, as shown in the example below.
It’s time to test out this command. Use ORDER BY to view the results of the previous query in descending order of nationality and name.
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
astronauts
WHERE
nationality NOT LIKE 'U.S.%'
ORDER BY
nationality,
name DESC;
Q: Who are the two oldest astronauts in the database?
A: The answer is Georgi Beregovoi and John H. Glenn Jr. To get this answer, you can run the following query:
SELECT
name,
year_of_birth
FROM
astronauts
ORDER BY year_of_birth;
aggregate
function in SQL with an example?Aggregate functions operate on a set of values to compute a single result. Some examples are:
In the following example, you’ll use one query to find three different aggregate values: the number of trips, the longest mission in hours, and the smallest mission in hours from countries outside of the U.S. and U.S.S.R. Run the query to see the results:
SELECT
COUNT() as trips,
MAX(hours_mission) as longest_mission,
MIN(hours_mission) as shortest_mission
FROM
astronauts
WHERE
nationality NOT LIKE 'U.S.%';
That’s great if you want to evaluate all the data together. Now, imagine that you want to increase the granularity level and evaluate the aggregates specifically for each country. You can accomplish this by using the GROUP BY clause, which groups rows that have the same values into summary rows. Try it out yourself:
SELECT
nationality,
COUNT() AS number_trips,
MAX(hours_mission) AS longest_time,
MIN(hours_mission) AS shortest_time
FROM
astronauts
WHERE
nationality NOT LIKE 'U.S.%'
GROUP BY
nationality;
Now, change the query to restrict the order in which the results are returned. Add an ORDER BY clause and sort the results by most trips per country. In the case of a tie, sort additionally by longest mission.
SELECT
nationality,
COUNT() AS number_trips,
MAX(hours_mission) AS longest_time,
MIN(hours_mission) AS shortest_time
FROM
astronauts
WHERE
nationality NOT LIKE 'U.S.%'
GROUP BY
nationality
ORDER BY
number_trips DESC,
longest_time DESC;
Time to try something new. Revisit the previous missions query, and add an aggregate function to determine which company spent the most money on successful missions. Order this query by the total highest cost:
SELECT
company_name,
SUM(cost),
status_rocket
FROM
missions
WHERE
status_mission = 'Success'
GROUP BY
company_name,
status_rocket
ORDER BY
SUM(cost) DESC;
This lesson introduced you to the basics of SQL, including SELECT, WHERE, ORDER BY, GROUP BY, and various aggregate functions. There is still plenty to explore!
Join Starburst Academy to register for the complete class and learn about a very important topic in SQL – joining tables. It’s free!