As more and more companies turn to low-cost object storage to store a majority of their data, providing easy access to this data has become vitally important. The need to transform and load data to other systems that provide specific features is still a necessity for certain requirements but querying an object store directly is gaining popularity.
When looking at the AWS platform, there are lots of choices to land, process and query data. The diagram below illustrates our view of how Presto adds value. Using this open source SQL query engine not only allows the immediate querying of data as it lands on S3, it also provides connectors to other systems thereby avoiding the need to move data around just to query it.
The different software products listed below are not an exhaustive list but is representative of what our AWS customers are using at this point in time. Depending upon your business and data requirements, there are numerous solutions available and most of them have become elastic in nature which makes it easier on the wallet.
We are seeing more and more AWS customers placing all of their data on S3 and accessing that data based upon different business requirements. They really like Presto because it allows them to query the data directly where it lives regardless if it’s in S3 or other data sources that Presto can connect to. This is illustrated in the diagram below. We have lots of companies that keep “dimensional” type data in a relational database (MySQL) and join that data with raw data in S3. This saves them time and money because they do not have to move data into other products such as Redshift.
In the future blog posts, we’ll provide customer use-cases on how they have built data lakes using S3 object storage and deployed Presto to query and analyze that data along with any challenges they have encountered.