+
As more and more organizations are looking to the cloud to help fulfill their operational and analytics needs; so is the data center of gravity shifting rapidly from their on-premise legacy data warehouses alongside this move.
Data in the cloud is exploding exponentially, technology has and is evolving to cope with these changes; but crucially, organizations are faced with some of the same challenges as before. However, these particular challenges are more easily overcome using highly elastic and nearly infinitely scalable infrastructure resources that the public cloud provides.
Trino, the underlying open-source technology that powers Starburst, has been part of this story from the beginning by helping fill two key criteria:
- Providing a seamless analytics bridge between on-premises and the cloud. This bridge reduces the dependency on building ETL pipelines, since Starburst is connecting your cloud data to your data on-premises.
- Provide a highly performant analytics on the Cloud Data Lake itself.
Through Amazon Web Services (AWS), Starburst has been providing this functionality to its customers from the start. Starburst reduces vendor lock-in and simplifies access to data, no matter where it resides. It also reduces the wait time for ETL/ELT and instead promotes a self-service environment to data and business analysts.
If you need the data, go get it; it is right there waiting for you!
Another key technology that has changed the infrastructure landscape dramatically is Kubernetes. Since its inception over six years ago, creating a vast community and platform for new technology innovations that support modern developers. Though Kubernetes has helped accelerate application delivery and improved the agility of developers, it too is not without its challenges.
Organizations are often restricted in their ability to leverage Kubernetes from their own lack of knowledge and limited talent available to their development and infrastructure teams. However, the industry is continually changing, and solutions like Amazon Elastic Kubernetes Service (Amazon EKS) and Starburst are emerging to help new users easily adopt, manage, and operate Kubernetes.
Amazon EKS helps customers bridge some of these technological gaps around Kubernetes. Users can quickly create a scalable Kubernetes cluster on AWS and integrate them natively with other AWS services. Amazon EKS and Starburst gives users access to a powerful, easy to use, reliable, and integrated Kubernetes management platform.
Starburst and Amazon EKS: Wait…you mean Kubernetes can be easy?
Let’s explore how Starburst and Amazon EKS together make it easier for users looking to adopt a cloud-native strategy for their infrastructure. Consider the following scenario where you are a data analyst in the midst of data migration:
Your data is in the process of being migrated from a legacy Hadoop or Data Warehouse system on-premises to Amazon Web Services. You are being asked to query and provide insights around that data along with the new data that is starting to fill up your Amazon Redshift cluster. Oh, and don’t forget the other data that is starting to gather in Amazon S3 that you haven’t yet figured out what to do with. What do you do?
No problem! Data Engineers can continue to work on building pipelines and optimizing the data storage needs. For you, it is business as usual using Starburst to gather the combined data across all these data sources (legacy data warehouse, Amazon Redshift and Amazon S3), all while still using your familiar BI tool of choice. So, you continue to delight the business with amazing insights and dashboards during a major data migration to the cloud!
Sit back and relax while the engineers complete the migration. As part of the migration the data engineers redefine the semantic views to point to the new data sources so you can work uninterrupted. Awesome.
This story highlights the foundation of Starburst as a single source of data access for your users. This access complies with current enterprise security standards for data authorization and access controls. In effect, Starburst is making it easier for organizations to self-service data to their users without having to figure out how to make several disparate technologies work together, while still retaining security controls, or by being forced to bring data together into data silos for individual groups of users.
In AWS, the Starburst technology stack runs on Amazon Elastic Kubernetes Service (Amazon EKS). Starburst leverages the elasticity of Amazon Autoscaling Groups provisioned via Managed Nodes on Amazon EKS to provide an optimized infrastructure platform for the application to run on. Starburst’s Hive connector integrates with the AWS Glue catalog, providing structured access to data in Amazon S3. The enterprise Amazon Redshift connector allows the end user to federate their queries between the cloud data warehouse and the cloud data lake in Amazon S3. This fills a much-needed gap to allow the user to query data between these two key data sources in the cloud seamlessly. This is necessary because within the data landscape, a “single source of truth” is becoming less relevant and the concept of a “query fabric” is becoming more important.
This query fabric can then be extended to include Amazon Kinesis Data Streams, Amazon RDS instances as well as a number of other relational or noSQL databases. The end result is a very powerful analytics engine providing access to several disparate data systems, with a simple SQL-based interface, accessible via a JDBC or ODBC driver.
These drivers can be used to integrate with a wide number of BI or SQL query applications. For example, an Data Analyst who is comfortable with using Tableau, does not have to change how they currently build reports or dashboards – they just use the Starburst driver to point to their cluster and continue working as before. This greatly simplifies onboarding with near-zero disruption.
Furthermore, as illustrated in the scenario, once the users have been onboarded, simple SQL constructs like semantic views can further obfuscate the underlying data sources so that even when a data source has shifted as in the case from on-premises HDFS to Amazon S3, the analyst doesn’t even notice.
In summary, the Starburst and Amazon EKS solution allows enterprises to overcome common technical challenges associated with Kubernetes, including learning and resourcing gaps, onboarding, and operational difficulties. Together, the solution creates a holistic environment for developers and infrastructure engineers to create a sustainable and innovative infrastructure strategy. In this post, we covered how Starburst Enterprise supports Amazon EKS users in their operational management of Kubernetes through full lifecycle management, including importation, provisioning, security and configuration of clusters.
Want to see Starburst in action? Try it for free here!
What are some next steps you can take?
Below are three ways you can continue your journey to accelerate data access at your company
- 1
- 2
Automate the Icehouse: Our fully-managed open lakehouse platform
- 3
Follow us on YouTube, LinkedIn, and X(Twitter).