In this hands-on lab, we guide you through the formation of data lake analytics using Amazon Simple Storage Service (Amazon S3) and Starburst Galaxy, with Covid-19 data as our sample set.
After creating external tables from the Covid-19 public data lake, we perform various analytics such as grouping, filtering, and aggregating our data to answer the proposed business questions. We also highlight the Great Lakes Connectivity capabilities available in Starburst Galaxy which enables connectivity to numerous data lakehouse file and table formats that are available today including Hive, Delta Lake, and the quickly growing Apache Iceberg table format.
Once the necessary analytics are complete, we utilize role-based access control to set the proper permissions for our consumer tables.
Note: To participate in the hands-on part of this lab, you’ll need access to an AWS account. Here are instructions on how to create an AWS account, if you don’t have one.