Fully managed in the cloudStarburst GalaxySelf-managed anywhereStarburst Enterprise
- Start Free
Fully managed in the cloud
A new data management paradigm combining the benefits of a data lake and a data warehouse. Object storage also known as a data lake is used for it’s flexibility, low cost and large volumes of data. With the data lake, open table formats, like Apache Iceberg, are used in conjunction with open file formats such as Parquet files to improve reliability and provide data warehouse-like functionality and performance. The combination of a data lake with the recent developments within open table formats allows for low cost, high performance analytics at scale.
— Mitchell Posluns, Senior Data Scientist, Assurance
— Sachin Gopalakrishna Menon, Senior Director of Data, Priceline
— Sachin Gopalakrishna Menon, Senior Director of Data, Priceline
In this video, Starburst Developer Advocate Monica Miller and Director, Customer Solutions Tom Nats run through the basics of a Starburst Data Lakehouse, when companies should adopt a data lakehouse model, and how to get started creating your own data lakehouse along with a short demo of Starburst Galaxy.
Apache Iceberg is an open source table format that brings database functionality to object storage such as S3, Azure’s ADLS, Google Cloud Storage and MinIO. This allows an organization to take advantage of low-cost, high performing cloud storage while providing data warehouse features and experience to their end users without being locked into a single vendor.Read More
Partitioning is used to narrow down the scope of the data that needs to be read for a query. When dealing with big data, this can be crucial for performance and can be the difference between getting a query that takes minutes or even hours down to seconds!Read More
One key feature of the Apache Iceberg connector is Trino’s ability to modify data that resides on object storage. As we all know, storage like AWS S3 is immutable which means they cannot be modified. This was a challenge in the Hadoop era where data needed to be modified or removed at the individual row level. Trino allows full DML(data manipulation language) using the Iceberg connector which means full support for update, delete and merge.Read More
Schema evolution simply means the modification of tables as business rules and source systems are modified over time. Trino’s Iceberg connector supports different modifications to tables including the table name itself, column and partition changes.Read More
Time travel in Trino using Iceberg is a handy feature to “look back in time” at a table’s history. As we covered in this blog, each change to an Iceberg table creates a new “snapshot” which can be referred to by using standard sql.Read More
AWS S3 has become one of the most widely used storage platforms in the world. Companies store a variety of data on S3 from application data to event based and IoT data. Oftentimes, this data is used for analytics in the form of regular BI reporting in addition to ad hoc reporting.Read More
Back in the Gentle introduction to the Hive connector blog, I discussed a commonly misunderstood architecture and uses of our Hive connector. In short, while some may think the name indicates Trino makes a call to a running Hive instance, the Hive connector does not use the Hive runtime to answer queries.Read More
Welcome back to this blog post series discussing the awesome features of Apache Iceberg. The first post covered how Iceberg is a table format and not a file format. It demonstrated the benefits of hidden partitioning in Iceberg in contrast to exposed partitioning in Hive. There really is no such thing as “exposed partitioning.”Read More
Welcome back to this blog series discussing the amazing features of Apache Iceberg. In the last two blog posts, we’ve covered a lot of cool feature improvements of Iceberg over the Hive model. I recommend you take a look at those if you haven’t yet.Read More
Welcome back to the Trino on Ice blog series that has so far covered some very interesting high level concepts of the Iceberg model, and how you can take advantage of them using the Trino query engine. This blog post dives into some of the implementation details of Iceberg by dissecting some of the files that result from various operations carried out using Trino.Read More
This webinar, hosted by Dain Sundstrom, co-founder of Trino and Chief Technology Officer at Starburst, Tom Nats, Director of Customer Solutions at Starburst, and Mike Marolda, Manager of Product Marketing at Starburst, dives into Starburst Galaxy and the Data Lakehouse.
Join us as we provide an introduction to Starburst Galaxy on the data lake. In this session, we will provide you with an introduction to Starburst Galaxy, using the platform for data lake analytics and more.
Proprietary and patented indexing technology sets a new benchmark in data lake analytics, empowering organizations to more quickly and efficiently derive greater insights from their data.
Learn more about the Starburst solution, a Data Lakehouse.
Learn the value of decentralized data access, and how to use the Starburst Enterprise as a lakehouse engine and a single point of secure access to data in any location and regions.
This solution brief discusses how the Starburst Data Lakehouse may be right for you.
Learn how to create and share high quality data products vital for the lakehouse
Spend less time managing your data and more time analyzing it. Starburst Galaxy is a fully-managed platform providing you with the easiest way to access the power of Starburst’s best-in-class data lake MPP SQL engine
Learn how Starburst and Trino achieve lightning fast performance and data transformations on the data lake
From native fine-grained access control, to robust integrations, learn how Starburst Enterprise can easily secure your data lake
Leverage the power of Trino and Delta Lake, deliver fast performance, and advanced data transformations to operationalize your lakehouse
Your lakehouse doesn’t need data in one place. Learn how Starburst Galaxy can federate to multiple sources and different locations
Starburst and Trino really enabled us to accelerate time to insight improve our conversion rates and enable robust modeling.
— Mitchell Posluns, Senior Data Scientist
We can make decisions faster based on the analytics. Instead of waiting for days, this happens in near real-time.
— Sachin Gopalakrishna Menon, Senior Director of Data
Starburst was very data-lake friendly. It was as if it was built for that model. That was a key differentiator for us. We were very invested in the data lake.
— Ivan Black, Director
Our data lake backbone was on a traditional Hadoop infrastructure. While that approach had its day, it’s not flexible. We needed to scale out and separate our compute from our storage without moving the data.
— Mike Prior, Principal IO Engineer
The data warehouse and data lake space is constantly evolving, and our enterprise focus means we have to support customer requirements across different platforms. Starburst gives us the ability to move quickly to support ever-changing use cases within complex enterprise environments.
— David Schulman, Head of Partner Marketing
Starburst powers the serving layer of Zalando’s data lake on S3. We have more than a thousand internal users and 100s of applications using Starburst daily, running 30k+ queries that are processing half a petabyte per day.
— Onur Y., Engineering Lead
Up to $500 in usage credits included