Cookie Notice
This site uses cookies for performance, analytics, personalization and advertising purposes.
For more information about how we use cookies please see our Cookie Policy.
Manage Consent Preferences
These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.
Luckily, SQL is an easy language to learn. However, a drawback to learning SQL is the difficulty in finding an environment in which to practice.
That’s why learning more about SQL using Starburst Galaxy is powerful. You can quickly and easily create a free Starburst Galaxy account and start learning right away. Let’s learn a bit more about SQL and then dive into some hands-on exercises.
Published: April 7, 2023 | Last Updated: August 4, 2023 | Author: Erin Rosas
SQL stands for structured query language.
SQL is a powerful language that plays a vital role in managing and analyzing data in relational databases, making it an essential skill for anyone involved in data analysis.
SQL is a query language used to access and manage data stored in database management systems (DBMS) and nearly all relational database management systems (RDBMS).
The syntax differs slightly across each DBMS but all are fundamentally based on a standard version of SQL, known as ANSI SQL. This acts as a template document describing the official features of the language.
The difficulty level of learning SQL for beginners can vary depending on the individual’s programming background and prior experience with databases. For those with a programming background, SQL may seem straightforward and relatively easy to grasp, while those without a programming background might find it challenging initially.
However, compared to many other programming languages, SQL is often considered more approachable for beginners. If you’re new to programming and want to start by working with databases and querying data, SQL might be more straightforward to learn.
Related reading: Starburst data analytics certification and training
As databases become more prevalent, SQL has emerged as the principal query language for efficiently accessing and handling data stored within them, especially in relational databases where data is organized in table structures with rows and columns.
Databases form the bedrock of numerous enterprise applications, making SQL proficiency essential for businesses and professionals alike. Mastering SQL enables and empowers data professionals to interact seamlessly with vast amounts of data, retrieve specific information, and manipulate data. This skill is especially critical at a time when data-driven organizations are striving to make informed decisions as a way to reduce costs and optimize productivity.
The ease of learning Python or SQL can vary depending on an individual’s background and learning preference
The differences:
SQL | Python | |
Use | SQL is primarily used for tasks related to database management, data manipulation (SELECT, INSERT, UPDATE, DELETE), data definition (CREATE, ALTER, DROP), and data control (GRANT, REVOKE). | Python is a general-purpose programming language used for a wide range of tasks, including web development, data analysis, scripting, automation, and more. |
Approach | SQL is primarily a declarative language for database operations. | Python is a multi-paradigm language that supports procedural, object-oriented, and functional programming styles. |
Application | SQL is specifically used for database-related tasks, such as managing data, querying databases, and defining database structures. | Python is used for a wide range of applications, from building web applications to data analysis, machine learning, and scientific computing. |
First, navigate to Galaxy and follow these step-by-step instructions to set up your account.
There is a hierarchical relationship between these constructs, described by the picture below.
When you log in to your Galaxy account, you should see the sample cluster, sample catalog, and demo schema available for querying under Cluster explorer. These are the cluster, catalog, and schema you will use for the SQL exercises.
When you write a query to gather information from one or more tables, you have to let the system know which table(s) contain the data you require. Usually this means using the fully-qualified table name.
The fully-qualified table name is written in the format <catalog_name>.<schema_name>.<table_name>. To avoid having to write this out each time, you can easily set the cluster, catalog, and schema for your session ahead of time.
You can set the cluster, catalog, and schema for your session from the top right corner of your screen. The schema list will become available once you choose your catalog.
Set the cluster and catalog to sample and set the schema to demo. There are two tables within the demo schema — astronauts and missions.
SQL exists for many purposes. Think of SQL as coming in two distinct flavors: Data Definition Language (DDL) and Data Manipulation Language (DML).
DDL operations create and define both structures and objects in a database and define the metadata within a DBMS.
Some examples of DDL operations include CREATE TABLE, CREATE INDEX, ALTER TABLE, and DROP VIEW.
DML operations, the focus of this lesson, modify data within a database. DML is used to manipulate data. Records , or rows, are created, modified, and accessed via DML.
Some examples of DML operations are INSERT INTO table VALUES (‘val1’, ‘val2’, ‘val3’), SELECT…FROM…JOIN…WHERE.
A SQL statement is a command or instruction written in SQL and used for managing and interacting with relational databases.
SQL statements allow users to perform various operations on the data stored in a database, such as querying data, inserting new records, updating existing records, deleting data, and managing the structure of the database itself.
Here are some common SQL commands:
Before we talk about the select statement in SQL, a word about syntax. SQL keywords and functions are not case-sensitive.
SELECT is the same as select. However, by convention, keywords and functions are written in all uppercase letters.
Now, let’s take a closer look at an example of a DML command.
The SELECT statement specifies the output of a SQL query. In its simplest form, it identifies the columns to select and the table from which to select. The data returned from the SELECT statement is called the result-set.
The general syntax for a basic SELECT statement is:
SELECT column1, column2,…
FROM table_name;
In the Starburst Galaxy query editor, view the data that is available for analysis in the astronauts table. To do so, type the following query into the query editor and press Run (limit 1000).
SELECT * FROM astronauts;
The * returns all columns from a table. To improve query performance, It is better practice to identify which columns you would like, rather than using the *.
Now, imagine that you wanted to read the data in specific columns. Selecting all of the data would be an inefficient way of doing that. Luckily, you can restrict the columns that are returned by your query. To do this, you will need to specify the columns by name that you would like to include. This is called projections. The example below shows projection in action. Notice that the column names are separated by commas:
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
astronauts;
This approach works well, but it returns a lot of results. Sometimes, this might present a problem. Luckily, you can also restrict the number of results displayed. To do this, use the LIMIT clause to specify the number of rows that you would like to return. In the example below, this is set to 10.
You can also use the AS keyword to create an alias for each column. Sometimes the existing column names are very long or aren’t meaningful enough. Using AS is a simple way to set the column name in the results to something different. This alias only exists for the duration of the query.
SELECT
name AS full_name,
nationality,
mission_title,
mission_number,
hours_mission AS mission_duration
FROM
astronauts
LIMIT 10;
Imagine that you wanted to restrict the results of your query so that only results that fit a given condition were returned. This would be an example of conditionality, and would apply logical conditions to your search results. If a record fits those conditions, it will be returned; if it does not fit those conditions, it will not be returned. To accomplish this, the WHERE clause is used. The syntax for the command is shown below:
SELECT
column1,
column2, ...
FROM
table_name
WHERE
condition;
It’s time to try this out. Use the WHERE clause in the code below to query the missions table and only return missions that are classified as a success. Note that strings must be enclosed in single quotes.
SELECT
company_name,
status_rocket,
cost,
status_mission
FROM
missions
WHERE
status_mission = 'Success';
Do you notice anything interesting? There are many astronauts from the U.S. and U.S.S.R. You can use the WHERE clause with the NOT LIKE operator to only return the astronauts from countries that are not the U.S. or U.S.S.R. You’ll also need to include a wildcard, which substitutes characters in a string. Two of the most common wildcards used with the LIKE (or NOT LIKE) operator are the percent sign (%) and underscore (_).
If you are trying to search for countries that are not the U.S. or U.S.S.R. which wildcard do you think is needed? Notice that both countries start with U.S., but one has extra characters, while the other does not. For this reason, the % wildcard is suitable.
Run the following query to see all of this in action:
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
astronauts
WHERE
nationality NOT LIKE 'U.S.%';
There are several operators that can be used in the WHERE clause. Use this table as your guide.
Operator |
Description |
> | greater than |
< | less than |
= | equal |
>= | greater than or equal |
<= | less than or equal |
<> | not equal |
IN | specifies the set of possible values |
BETWEEN | specifies a range of possible values |
LIKE | specifies a pattern |
Q: How many astronauts were born during or before the year 1930?
A: The answer is 70. To get this answer, run the following query:
SELECT
name,
year_of_birth
FROM
astronauts
WHERE
year_of_birth <= 1930
Imagine that you wanted to control the order in which results were displayed. To do this, you can use the ORDER BY clause to sort a result set by one or more output expressions. These results are returned in ascending order by default. To sort the records in descending order, use the DESC keyword, as shown in the example below.
It’s time to test out this command. Use ORDER BY to view the results of the previous query in descending order of nationality and name.
SELECT
name,
nationality,
mission_title,
mission_number,
hours_mission
FROM
astronauts
WHERE
nationality NOT LIKE 'U.S.%'
ORDER BY
nationality,
name DESC;
Q: Who are the two oldest astronauts in the database?
A: The answer is Georgi Beregovoi and John H. Glenn Jr. To get this answer, you can run the following query:
SELECT
name,
year_of_birth
FROM
astronauts
ORDER BY year_of_birth;
Aggregate functions operate on a set of values to compute a single result. Some examples are:
In the following example, you’ll use one query to find three different aggregate values: the number of trips, the longest mission in hours, and the smallest mission in hours from countries outside of the U.S. and U.S.S.R. Run the query to see the results:
SELECT
COUNT() as trips,
MAX(hours_mission) as longest_mission,
MIN(hours_mission) as shortest_mission
FROM
astronauts
WHERE
nationality NOT LIKE 'U.S.%';
That’s great if you want to evaluate all the data together. Now, imagine that you want to increase the granularity level and evaluate the aggregates specifically for each country. You can accomplish this by using the GROUP BY clause, which groups rows that have the same values into summary rows. Try it out yourself:
SELECT
nationality,
COUNT() AS number_trips,
MAX(hours_mission) AS longest_time,
MIN(hours_mission) AS shortest_time
FROM
astronauts
WHERE
nationality NOT LIKE 'U.S.%'
GROUP BY
nationality;
Now, change the query to restrict the order in which the results are returned. Add an ORDER BY clause and sort the results by most trips per country. In the case of a tie, sort additionally by longest mission.
SELECT
nationality,
COUNT() AS number_trips,
MAX(hours_mission) AS longest_time,
MIN(hours_mission) AS shortest_time
FROM
astronauts
WHERE
nationality NOT LIKE 'U.S.%'
GROUP BY
nationality
ORDER BY
number_trips DESC,
longest_time DESC;
Time to try something new. Revisit the previous missions query, and add an aggregate function to determine which company spent the most money on successful missions. Order this query by the total highest cost:
SELECT
company_name,
SUM(cost),
status_rocket
FROM
missions
WHERE
status_mission = 'Success'
GROUP BY
company_name,
status_rocket
ORDER BY
SUM(cost) DESC;
This lesson introduced you to the basics of SQL, including SELECT, WHERE, ORDER BY, GROUP BY, and various aggregate functions. There is still plenty to explore!
Join Starburst Academy to register for the complete class and learn about a very important topic in SQL – joining tables. It’s free!
Up to $500 in usage credits included
© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC
Up to $500 in usage credits included