Cookie Notice
This site uses cookies for performance, analytics, personalization and advertising purposes.
For more information about how we use cookies please see our Cookie Policy.
Manage Consent Preferences
These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.
These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.
These cookies allow our website to properly function and in particular will allow you to use its more personal features.
These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.
Fully managed in the cloud
Self-managed anywhere
Use the input above to search.
Here are some suggestions:
Join us for Datanova 2024, October 23-24th. We'll be discussing advancing analytics with Open Data Lakehouse innovations.
Learn moreA data lakehouse is new data management paradigm combining the benefits of a data lake and a data warehouse. Object storage also known as a data lake is used for it’s flexibility, low cost and large volumes of data. With the data lake, open table formats, like Apache Iceberg, are used in conjunction with open file formats such as Parquet files to improve reliability and provide data warehouse-like functionality and performance. The combination of a data lake with the recent developments within open table formats allows for low cost, high performance analytics at scale.
— Mitchell Posluns, Senior Data Scientist, Assurance
— Sachin Gopalakrishna Menon, Senior Director of Data, Priceline
— Sachin Gopalakrishna Menon, Senior Director of Data, Priceline
In this video, Starburst Developer Advocate Monica Miller and Director, Customer Solutions Tom Nats run through the basics of a Starburst Data Lakehouse, when companies should adopt a data lakehouse model, and how to get started creating your own data lakehouse along with a short demo of Starburst Galaxy.
Apache Iceberg is an open source table format that brings database functionality to object storage such as S3, Azure’s ADLS, Google Cloud Storage and MinIO. This allows an organization to take advantage of low-cost, high performing cloud storage while providing data warehouse features and experience to their end users without being locked into a single vendor.
Read MorePartitioning is used to narrow down the scope of the data that needs to be read for a query. When dealing with big data, this can be crucial for performance and can be the difference between getting a query that takes minutes or even hours down to seconds!
Read MoreOne key feature of the Apache Iceberg connector is Trino’s ability to modify data that resides on object storage. As we all know, storage like AWS S3 is immutable which means they cannot be modified. This was a challenge in the Hadoop era where data needed to be modified or removed at the individual row level. Trino allows full DML(data manipulation language) using the Iceberg connector which means full support for update, delete and merge.
Read MoreSchema evolution simply means the modification of tables as business rules and source systems are modified over time. Trino’s Iceberg connector supports different modifications to tables including the table name itself, column and partition changes.
Read MoreTime travel in Trino using Iceberg is a handy feature to “look back in time” at a table’s history. As we covered in this blog, each change to an Iceberg table creates a new “snapshot” which can be referred to by using standard sql.
Read MoreAWS S3 has become one of the most widely used storage platforms in the world. Companies store a variety of data on S3 from application data to event based and IoT data. Oftentimes, this data is used for analytics in the form of regular BI reporting in addition to ad hoc reporting.
Read MoreBack in the Gentle introduction to the Hive connector blog, I discussed a commonly misunderstood architecture and uses of our Hive connector. In short, while some may think the name indicates Trino makes a call to a running Hive instance, the Hive connector does not use the Hive runtime to answer queries.
Read MoreWelcome back to this blog post series discussing the awesome features of Apache Iceberg. The first post covered how Iceberg is a table format and not a file format. It demonstrated the benefits of hidden partitioning in Iceberg in contrast to exposed partitioning in Hive. There really is no such thing as “exposed partitioning.”
Read MoreWelcome back to this blog series discussing the amazing features of Apache Iceberg. In the last two blog posts, we’ve covered a lot of cool feature improvements of Iceberg over the Hive model. I recommend you take a look at those if you haven’t yet.
Read MoreWelcome back to the Trino on Ice blog series that has so far covered some very interesting high level concepts of the Iceberg model, and how you can take advantage of them using the Trino query engine. This blog post dives into some of the implementation details of Iceberg by dissecting some of the files that result from various operations carried out using Trino.
Read MoreStarburst and Trino really enabled us to accelerate time to insight improve our conversion rates and enable robust modeling.
— Mitchell Posluns, Senior Data Scientist
We can make decisions faster based on the analytics. Instead of waiting for days, this happens in near real-time.
— Sachin Gopalakrishna Menon, Senior Director of Data
Starburst was very data-lake friendly. It was as if it was built for that model. That was a key differentiator for us. We were very invested in the data lake.
— Ivan Black, Director
Our data lake backbone was on a traditional Hadoop infrastructure. While that approach had its day, it’s not flexible. We needed to scale out and separate our compute from our storage without moving the data.
— Mike Prior, Principal IO Engineer
The data warehouse and data lake space is constantly evolving, and our enterprise focus means we have to support customer requirements across different platforms. Starburst gives us the ability to move quickly to support ever-changing use cases within complex enterprise environments.
— David Schulman, Head of Partner Marketing
Starburst powers the serving layer of Zalando’s data lake on S3. We have more than a thousand internal users and 100s of applications using Starburst daily, running 30k+ queries that are processing half a petabyte per day.
— Onur Y., Engineering Lead
© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC
Up to $500 in usage credits included