×

Tag: Data Scientist

Showing 44 results

How fast access to data and quality ML code can enable competitive differentiation and innovation

February 16, 2023

2022 ended with many successful AI models being deployed, including OpenAI’s ChatGPT. There’s no doubt that there will be plenty more successes in 2023....

6 Reasons to Attend Datanova 2023: For Data Rebels

January 5, 2023

Over the past few weeks, we’ve shared a few examples of what it means to be a data rebel. Hopefully you’ve recognized yourself in...

Over 80 Data & Analytics Statistics, Data, Trends, and Facts

December 28, 2022

Most organizations have data and continue to generate and collect it on a daily basis, but have a far more difficult time in getting...

What Are The Different Types Of Data Products

December 16, 2022

As we’ve gone from Data Mesh theory to practice, organizations have been shifting their focus towards the central tenet of Data Mesh — building...

Apache Iceberg Time Travel & Rollbacks in Trino

December 7, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

Apache Iceberg Schema Evolution in Trino

November 22, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

Apache Iceberg DML (update/delete/merge) & Maintenance in Trino

November 17, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

Explore A New Way Of Utilizing A Data Lakehouse

November 10, 2022

A data lakehouse combines the principles of a data lake and a data warehouse to include the best of both worlds. Data lakehouses are...

Iceberg Partitioning and Performance Optimizations in Trino

November 8, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

Countdown to Trino Summit 2022

November 1, 2022

It’s finally here! We are closing in on the final countdown to Trino Summit 2022, and I can feel myself getting more excited with...

Accenture Master Class: How to Adopt a Data Product Mindset

October 17, 2022

Since Datanova: The Data Mesh Summit and our in-person executive discussions on data products and Data Mesh, we’ve been validating the data product approach...

Building Reporting Structures on S3 using Starburst Galaxy and Apache Iceberg

October 4, 2022

AWS S3 has become one of the most widely used storage platforms in the world. Companies store a variety of data on S3 from...

The Data Virtualization Evolution is Just Beginning

October 4, 2022

Data virtualization revolutionized the data infrastructure space by serving data consumers directly on top of data stores, without the need to move data elsewhere....

Accenture Master Class: Creating Data Products

September 27, 2022

Since Datanova: The Data Mesh Summit and our in-person executive discussions on data products and Data Mesh, we’ve been validating the data product approach...

Accenture Master Class: Why Organizations Should Create Data Products

September 6, 2022

Since Datanova: The Data Mesh Summit and our in-person executive discussions on data products and Data Mesh, we’ve been validating the data product approach...

AWS Dev Day Recap: Data Lake Analytics with Starburst Galaxy

August 5, 2022

On Wednesday, August 3rd, I had the opportunity to share a hands-on lab exploring Data Lake reporting structures with my AWS partner in crime,...

Starburst Lakehouse: Data Warehouse Functionality, Without The Cost

July 14, 2022

Next-Gen data management and analytics strategies We’ve all lived it. Heard it. Adapted to it. The next analytics strategy with numerous ‘modern’ technologies to...

A Better Solution For Managing and Maintaining Data Pipelines, Now In Public Preview

July 6, 2022

Customers who want a single, super fast and easy-to-use solution for both interactive and longer-running data pipeline queries now have a solution: take advantage...

Transforming Your Data Pipelines with Starburst

June 9, 2022

Current State of ETL/ELT Extract-transform-load, more commonly known by its street name “ETL”, has been around since the early days of computing. Bringing together...

The Past, Present, and Future of Trino

May 24, 2022

Recently, I had the pleasure of chatting with Ravit Jain on his show “The Ravit Show” to discuss the evolution of Trino and where...

ETL vs Interactive Queries: The Case for Both

May 5, 2022

This is Part 1 of a 2-part blog about how Trino can support both interactive and batch use cases.  In Part 1, we will...

Enter the Starburst Space Quest League for a Chance to Win Big!

April 18, 2022

Calling all data pros! Are you ready for a $20k payday? Yes, you heard it right – you could be walking away with $20,000...

Faster Query Processing: CPU Time

March 25, 2022

A key engineering responsibility at Starburst is on performance enhancements. One is to reduce the amount of time that a CPU has to work...

What A SQL Query Engine Can Do For Big Data

February 16, 2022

Nod with me if you’ve suffered from the following problems with processing and analyzing Big Data via a centralized approach: different query languages, niche...

The Right Way to Query Across Data Sources in Tableau (or, The Cross-Database Join Is Not Always Your Friend)

January 13, 2022

Summary Use the right tool for the right job. Not doing so means the difference between your Tableau viz rendering in seconds vs. minutes...

Achieving Lightning-Fast Analytics on the Salesforce Customer 360

January 6, 2022

Over the past twenty or so years, companies have experienced a Cambrian explosion of where their customer data resides.Cloud and on-premises enterprise applications aim...

5 Reasons to Sign Up for Starburst Galaxy

December 22, 2021

The original vision of Starburst was to make querying distributed data as simple, fast, and painless as possible. Starburst Galaxy, our serverless, fully-managed SaaS...

Intro to Trino for the Trinewbie

November 2, 2021

If you haven’t heard of Trino before, it is a query engine that speaks the language of many genres of databases. As such, Trino...

Data Mesh: Data as a Product

October 21, 2021

Data Mesh is based on four central concepts, the second of which is data as a product. In this blog, we’ll explore what that...

Data Mesh: Domain-oriented Ownership & Architecture

October 14, 2021

Insane in the domain! Insane in the brain! Crazy insane, got no domain! - Cypress Hill, sort of Data Mesh is based on four...

The Intelligent Edge

September 13, 2021

Today’s digital world is an expanding frontier of emerging technologies. There are endless innovations, inspired by data, informed by data, enabled by data, and...

How Assurance Unlocked More Business Value with Starburst

September 9, 2021

By leveraging Starburst, Assurance was able to improve conversion rates, reduce costs, and enable robust modeling. Read the full case study here. ...

What Data Mesh Means for Data Analysts and Data Scientists

September 7, 2021

Get early access to free early release chapters (including the newly released chapter!) of the O’Reilly book, Data Mesh: Delivering Data-Driven Value at Scale,...

Accelerating Data Science with Trino

August 31, 2021

At our Datanova for Data Scientists conference on July 14, I held a discussion with Dain Sundstrom and David Philips, CTOs of Starburst, about...

Why Performance Matters: Parquet, Delta Lake, Dynamic Filtering

August 26, 2021

My fascination with SQL query performance started quite some time ago and I contributed a paper on efficient processing of data warehousing during my...

Data Mesh: The Answer to the Data Warehouse Hypocrisy

March 25, 2021

Note: I start this piece with some technical background that has nothing to do with the data mesh, and is only relevant to data...

A Gentle Introduction to the Hive Connector

February 12, 2021

One of the most confusing aspects when starting with the Hive connector comes from the complex Hive model and overlapping use cases of this...

Top 10 Reasons to Migrate from OS Presto on EMR to Starburst Enterprise Presto

November 13, 2020

In today’s data architecture economy, there are no shortages of options when it comes to choosing various distributions and deployment strategies for a given...

Presto & Data Science: Getting Data Into the Hands of Data Scientists (Faster)

June 26, 2020

A few days ago I read a Gartner report stating that data scientists spend 23% of their time on data collection and preparation. I...

Free Presto Book to Support the Community

April 14, 2020

As you probably know, Starburst is one of the main contributors and sponsors of the Presto open source project and the community around Presto....

How a Telecommunications Giant Established Universal Data Access

April 3, 2020

  Our customer base has been growing quickly, and we’re excited to share a case study highlighting one of our largest clients, a telecommunications...

2019 in Review: Fueled up and ready to go

February 4, 2020

  What happened in 2019? 2019 was a big year for Starburst. Today we shared some of our major accomplishments: ...

Starburst Enterprise & Databricks Delta Lake Support

June 13, 2019

TL;DR - There is now Starburst Enterprise Databricks Delta Lake compatibility.   Delta Lake The big data ecosystem has many components but the one...

General SQL Features in Presto

January 1, 2019

Welcome to the Advanced SQL Features in Presto series. In this series you are going to cover a set of SQL features that expands...

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.

s