×

Tag: Data Engineer

Showing 100 results

Starburst data lake certification and training

July 24, 2023

Data analytics certification program to learn about topics such as data lakes and data lakehouses, and modern table formats like Apache Iceberg.

Debating the future of data engineering

May 19, 2023

Data engineers have typically functioned as a central hub for engineering tasks. They work with multiple departments and business units across the enterprise. But as the decentralized, data-product-driven architecture of the data mesh approach becomes more popular, and more organizations find themselves on this decentralization journey, what happens to that centralized data team?

Fueling Trino large-scale geospatial analysis with Starburst Warp Speed

March 27, 2023

In our last post, we discussed two methods for running geospatial analysis with Trino and the Hive connector and explored a few optimization techniques...

Run optimized geospatial queries with Trino

March 23, 2023

The Trino open source distributed query engine is known as a choice for running ad-hoc analysis where there’s no need to model the data and...

Has the notion of a single data source for Financial Services run its course?

January 20, 2023

More than any other industry, Financial Services is likely to only partially realize the elusive utopian state of 'the single source of truth' for...

Building a federated data lakehouse with Starburst Galaxy

January 11, 2023

We are eleven days into the new year, and I have spent the past two weeks exerting unreasonable amounts of effort trying to make...

6 Reasons to Attend Datanova 2023: For Data Rebels

January 5, 2023

Over the past few weeks, we’ve shared a few examples of what it means to be a data rebel. Hopefully you’ve recognized yourself in...

Simplified Cloud Storage Governance with Starburst and Immuta

January 4, 2023

Accessing data in cloud storage has been an ongoing challenge for analysts, data engineers, and organizations as a whole. Additional work is required to...

Over 80 Data & Analytics Statistics, Data, Trends, and Facts

December 28, 2022

Most organizations have data and continue to generate and collect it on a daily basis, but have a far more difficult time in getting...

Tableau Cloud + Starburst: New Connector Supports Shift to Cloud-based SaaS

December 19, 2022

The shift to cloud-based software-as-a-service platforms is accelerating in just about every tech industry. So it wasn’t much of a surprise to the analytics...

What Are The Different Types Of Data Products

December 16, 2022

As we’ve gone from Data Mesh theory to practice, organizations have been shifting their focus towards the central tenet of Data Mesh — building...

Apache Iceberg Time Travel & Rollbacks in Trino

December 7, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

Building lakehouse with dbt and Trino

November 30, 2022

In this series, we demonstrate how to build data pipelines using dbt and Trino with data directly from your operational systems. They can use...

Apache Iceberg Schema Evolution in Trino

November 22, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

Reliving the Hype: Highlights from Trino Summit 2022

November 18, 2022

Last week in San Francisco was one for the Trino history books. After three years of planning, rescheduling, planning, and rescheduling some more, Starburst...

Apache Iceberg DML (update/delete/merge) & Maintenance in Trino

November 17, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

Explore A New Way Of Utilizing A Data Lakehouse

November 10, 2022

A data lakehouse combines the principles of a data lake and a data warehouse to include the best of both worlds. Data lakehouses are...

Iceberg Partitioning and Performance Optimizations in Trino

November 8, 2022

This post is part of the Iceberg blog series. Read the entire series: Introduction to Apache Iceberg in Trino Iceberg Partitioning and Performance Optimizations...

Join the Team: Realizing the Promise Of Big Data

November 4, 2022

I have been in and around data since my days with Microsoft Access, Excel, and SQL Server circa 2000, and was fortunate to witness...

Countdown to Trino Summit 2022

November 1, 2022

It’s finally here! We are closing in on the final countdown to Trino Summit 2022, and I can feel myself getting more excited with...

Build a Data Lakehouse Reporting Structure with dbt and Starburst Galaxy

October 18, 2022

Since my first introduction to dbt, I was intrigued to say the least. Working as a data engineer, I was attempting to manage complicated...

Accenture Master Class: How to Adopt a Data Product Mindset

October 17, 2022

Since Datanova: The Data Mesh Summit and our in-person executive discussions on data products and Data Mesh, we’ve been validating the data product approach...

Data Sharing Culture With Your Data Lakehouse

October 11, 2022

Corporate data is no doubt a valuable asset. Except it’s an open secret that data alone isn’t inherently valuable, nor will it produce valuable...

Second Edition of Trino: The Definitive Guide

October 5, 2022

Starburst has played a key role in the Trino community for a long time now. We contribute  to the success of Trino every day....

Building Reporting Structures on S3 using Starburst Galaxy and Apache Iceberg

October 4, 2022

Defining AWS S3 AWS S3 has become one of the most widely used storage platforms in the world. Companies store a variety of data...

The Data Virtualization Evolution is Just Beginning

October 4, 2022

Data virtualization revolutionized the data infrastructure space by serving data consumers directly on top of data stores, without the need to move data elsewhere....

Delivering Text Search Capabilities Directly on the Data Lake with Starburst

September 29, 2022

In the big data analytics world, enabling analytics on unstructured text is a powerful capability. For that reason, it would be of use that...

Accenture Master Class: Creating Data Products

September 27, 2022

Since Datanova: The Data Mesh Summit and our in-person executive discussions on data products and Data Mesh, we’ve been validating the data product approach...

Rethinking SIEM Solutions

September 13, 2022

As organizations strive to become more agile, there has been a mass movement jumping headfirst into what is called a security data lake. Gartner...

The Difference Between Micro-Partitioning vs. Indexing and a Better Way

September 8, 2022

When optimizing your analytics database performance, one of the most important decisions is to choose how data is stored and accessed. There are two...

Accenture Master Class: Why Organizations Should Create Data Products

September 6, 2022

Since Datanova: The Data Mesh Summit and our in-person executive discussions on data products and Data Mesh, we’ve been validating the data product approach...

Identify threats faster with a security data lake

August 26, 2022

The glory days of SIEM are over. Security teams are not only measured by their ability to collect as much data as possible, but...

Practical Security And Policy-Based Governance In A Data Mesh

August 11, 2022

Proponents of Data Mesh understand its many game-changing benefits for large scale organizations. For those who are new to this reimagined framework, Data Mesh...

The choice is yours: Open source Trino and Starburst Galaxy

August 9, 2022

A few months back when Starburst Galaxy launched on AWS, Google Cloud, and Azure, I wrote a blog on What Fully-Managed Means to Starburst....

AWS Dev Day Recap: Data Lake Analytics with Starburst Galaxy

August 5, 2022

On Wednesday, August 3rd, I had the opportunity to share a hands-on lab exploring Data Lake reporting structures with my AWS partner in crime,...

Near Real-Time Ingestion For Trino

August 4, 2022

It is quite popular in today's data climate for modern data architectures to have some sort of batch processing system to move data into...

Use Metabase as a Starburst BI tool

August 3, 2022

Metabase can now be used to connect to SEP, Starburst Galaxy, and Trino as a BI tool and client.Metabase excels at providing BI insights...

Starburst Lakehouse: Data Warehouse Functionality, Without The Cost

July 14, 2022

Next-Gen data management and analytics strategies We’ve all lived it. Heard it. Adapted to it. The next analytics strategy with numerous ‘modern’ technologies to...

Scaling Up: When to Migrate from PostgreSQL to a Data Lake

July 13, 2022

One of the true pillars of the tech revolution, PostgreSQL is an OLTP database designed primarily to handle transactional workloads. The technology has been...

A Better Solution For Managing and Maintaining Data Pipelines, Now In Public Preview

July 6, 2022

Customers who want a single, super fast and easy-to-use solution for both interactive and longer-running data pipeline queries now have a solution: take advantage...

Confessions of a Space Quest League Advocate

July 6, 2022

Mission 2 Wrap and Mission 3 Launch We all know at least one pandemic puzzler, a devoted crossworder, or a religious wordler who finds...

Starburst Acquires Varada To Deliver Faster (and Cheaper) Data Lake Analytics

June 23, 2022

I’m excited to announce the acquisition of Varada, a data analytics accelerator, based out of Tel Aviv, Israel. Varada offers a data lake analytics...

Employee Perspective: Accelerating Data-Driven Insights in AdTech

June 16, 2022

Before I joined Starburst, I worked in the AdTech industry where companies buy and sell user data for online targeting advertisement campaigns or ML/AI-based...

Let’s Get Granular: Why Granularity Impacts Role-Based Access Control

June 14, 2022

About a month into my first job I finished building my first data pipeline ever. I soaked in the “I Made THAT!” moment, and...

Transforming Your Data Pipelines with Starburst

June 9, 2022

Current State of ETL/ELT Extract-transform-load, more commonly known by its street name “ETL”, has been around since the early days of computing. Bringing together...

Data Lake Analytics for Smart, Modern Data Management

May 27, 2022

Best-in-class organizations need fast, reliable data analytics that enable business leadership to identify patterns and key insights that will help them predict the best...

The Past, Present, and Future of Trino

May 24, 2022

Recently, I had the pleasure of chatting with Ravit Jain on his show “The Ravit Show” to discuss the evolution of Trino and where...

Part 2: How to Run Batch Processes Using Starburst Galaxy

May 19, 2022

This is Part 2 of a 2-part blog about how Trino can support both interactive and batch use cases. In Part 1, we explored...

New release of the dbt-trino adapter

May 9, 2022

dbt labs released version 1.1 of the dbt-core project in late April, 2022. This did not catch the maintainers of the dbt-trino project by...

ETL vs Interactive Queries: The Case for Both

May 5, 2022

This is Part 1 of a 2-part blog about how Trino can support both interactive and batch use cases.  In Part 1, we will...

Starburst and Data Products: The Key to a Data Mesh

April 28, 2022

The key to success for any company is deriving business value from data in a robust, scalable, and timely fashion. A huge part of...

Enter the Starburst Space Quest League for a Chance to Win Big!

April 18, 2022

Calling all data pros! Are you ready for a $20k payday? Yes, you heard it right – you could be walking away with $20,000...

Faster Query Processing: CPU Time

March 25, 2022

A key engineering responsibility at Starburst is on performance enhancements. One is to reduce the amount of time that a CPU has to work...

Simplifying Policy Enforcement for Your Data Mesh with Starburst Enterprise and Immuta

March 23, 2022

This blog was co-authored by Alex Breshears, Product Manager at Starburst In today’s global economy, it’s impossible to understate the importance of being able...

The 2022 State of Data: A Sneak Peek

March 10, 2022

I participated in a panel discussion with Karl Eklund, Principal Architect at Red Hat, and William Schnoeppner, Director of Research and Consulting at EMA...

What A SQL Query Engine Can Do For Big Data

February 16, 2022

Nod with me if you’ve suffered from the following problems with processing and analyzing Big Data via a centralized approach: different query languages, niche...

6 Reasons to Attend Datanova 2022: #5, Join our Immuta-Sponsored Data Product Lab

February 2, 2022

Let this summit be the one that will pull you out of your PJs and into the data products lab with your lab coat...

6 Reasons to Attend Datanova 2022: #3, Hear Customer Stories

January 28, 2022

Start off the year right by registering for our two-day virtual conference, Datanova: The Data Mesh Summit. As of late, with the rise of...

Top 6 Reasons to Migrate to the Cloud

January 25, 2022

Starburst released the 2021 State of Data market research report, conducted by Enterprise Management Associates (EMA), in collaboration with Red Hat, early last year....

6 Reasons to Attend Datanova 2022: #4, Accenture Master Class

January 19, 2022

So far, we’ve highlighted a few reasons why you should attend Datanova: The Data Mesh Summit: The Woz and Justin Borgman. The next reason...

The Right Way to Query Across Data Sources in Tableau (or, The Cross-Database Join Is Not Always Your Friend)

January 13, 2022

Summary Use the right tool for the right job. Not doing so means the difference between your Tableau viz rendering in seconds vs. minutes...

6 Reasons to Attend Datanova 2022: #6, Zhamak Dehghani, Creator of Data Mesh

January 7, 2022

The self-professed “troublemaker” Zhamak Dehghani, who coined Data Mesh will join us for not one but two sessions! ...

5 Reasons to Sign Up for Starburst Galaxy

December 22, 2021

The original vision of Starburst was to make querying distributed data as simple, fast, and painless as possible. Starburst Galaxy, our serverless, fully-managed SaaS...

Starburst Stargate: One Cluster to Rule Them All

December 9, 2021

I think of Starburst Stargate as the Lord of the Rings feature. Or the galactic empire feature. In a prior blog post, I introduced...

Part 2 of Current Data Patterns Blog Series: Data Lakehouse

December 6, 2021

As companies shift their analytical ecosystems from on-premise to cloud and try to avoid “data lock-in”, we’re noticing some very interesting data patterns. This...

Data Fabric vs. Data Mesh: What’s the Difference?

November 18, 2021

I am increasingly getting asked about the difference between the Data Fabric and the Data Mesh. They are both emerging paradigms designed to solve...

Rethinking the Modern Data Stack

November 17, 2021

Over the past few years the “modern data stack” has entered the vernacular of the data world, describing a standardized, cloud-based data and analytics...

Tableau is Just Better with Starburst

November 15, 2021

I’m one of those strange people who has always enjoyed doing performance testing. The thought of spinning up lots of machines to do my...

Intro to Trino for the Trinewbie

November 2, 2021

If you haven’t heard of Trino before, it is a query engine that speaks the language of many genres of databases. As such, Trino...

Data Mesh: Data as a Product

October 21, 2021

Data Mesh is based on four central concepts, the second of which is data as a product. In this blog, we’ll explore what that...

Data Mesh: Domain-oriented Ownership & Architecture

October 14, 2021

Insane in the domain! Insane in the brain! Crazy insane, got no domain! - Cypress Hill, sort of Data Mesh is based on four...

The Analytics Engine for Distributed Data

October 1, 2021

The idea of a single source of truth has been around since the beginning of big data. However, over the years, through the data...

Data Mesh: A Software Engineer’s Perspective

September 28, 2021

You might have heard about Data Mesh recently, which is a modern approach to managing data and analytics in a distributed, domain-driven fashion. At...

Data Mesh Book Bulletin: Principle of Domain Ownership

September 21, 2021

Despite the investments and effort poured into next-generation data storage systems, monolithic, centralized data warehouses and data lakes have failed to provide the line...

Does the data mesh make data integration harder?

September 17, 2021

Every five years, a small group of leaders in the data management research community get together to do a self assessment --- what are...

The Intelligent Edge

September 13, 2021

Today’s digital world is an expanding frontier of emerging technologies. There are endless innovations, inspired by data, informed by data, enabled by data, and...

How Assurance Unlocked More Business Value with Starburst

September 9, 2021

By leveraging Starburst, Assurance was able to improve conversion rates, reduce costs, and enable robust modeling. Read the full case study here. ...

Why Performance Matters: Parquet, Delta Lake, Dynamic Filtering

August 26, 2021

My fascination with SQL query performance started quite some time ago and I contributed a paper on efficient processing of data warehousing during my...

Part 1 of Current Data Patterns Blog Series: Hybrid Distributed Data Store and RDBMS

August 12, 2021

As companies shift their analytical ecosystems from on-premise to cloud and try to avoid “data lock-in”, we’re noticing some very interesting data patterns. This...

Kafka and Starburst: 3 Considerations for Accelerating Time to Value

July 27, 2021

Kafka was created at LinkedIn and open sourced into the Apache Software foundation in early 2011. It was developed to optimize writes especially for...

Data Federation and Data Virtualization Never Worked in the Past But Now it’s Different

July 13, 2021

Thirty years ago it was already commonplace for large businesses to have hundreds --- even thousands of different database instances managing data from the...

Starburst Elements: A Holistic View of Your Cluster and Query Environment with Starburst Insights

July 1, 2021

This is the fifth episode in our video series, Starburst Elements, focused around anything and everything Starburst. In this episode, our Product Manager Vishal...

Query Federation Made Simple at Comcast

June 24, 2021

The media and telecommunications provider now known as Comcast began as a regional operator with just five channels and 12,000 customers. Today, Comcast has...

Rapid Controlled Access to Data with Starburst and Immuta

June 16, 2021

A growing number of enterprises are experiencing the benefits of the Starburst single point of access to all of their data that allows them...

Redefine Your Analytics Without ETL Using Starburst and Amazon EKS

June 14, 2021

+ As more and more organizations are looking to the cloud to help fulfill their operational and analytics needs; so is the data center...

Starburst Stargate: The Final Frontier in Analytics Anywhere

June 9, 2021

Today we announced Starburst Stargate, the industry’s first gateway for global cross-cloud analytics. I’m excited to share more behind why we built this and...

Trino on Ice IV: Deep Dive Into Iceberg Internals

June 8, 2021

Welcome back to the Trino on Ice blog series that has so far covered some very interesting high level concepts of the Iceberg model,...

Managing Secrets in Trino

June 3, 2021

Most companies want to follow good security practices. With the number of security breaches coming out daily, it almost feels like a matter of...

Starburst Supports Launch of Delta Sharing, the First Open Protocol for Secure Data Sharing

May 26, 2021

At Starburst, we believe in building optionality into your data architecture & strategy. To us, optionality means building for flexibility so that you don’t...

Trino on Ice III: Iceberg Concurrency Model, Snapshots, and the Iceberg Spec

May 25, 2021

Welcome back to this blog series discussing the amazing features of Apache Iceberg. In the last two blog posts, we’ve covered a lot of...

Starburst Elements: Start Fast with Starburst Galaxy

May 20, 2021

This is the fourth episode in our video series, Starburst Elements, focused around anything and everything Starburst. In this episode, our Product Manager Vishal...

Trino on Ice II: In-Place Table Evolution and Cloud Compatibility with Iceberg

May 11, 2021

In-place table evolution and cloud compatibility with Iceberg ...

Starburst Elements: Introduction to Starburst Galaxy

May 7, 2021

This is the third episode in our video series, Starburst Elements, focused around anything and everything Starburst. In this episode, our Product Manager Vishal...

Data Pandemic Stories: How Data Drives Digital Transformation in a Crisis, by Promethium

April 28, 2021

After debuting our blog series on data pandemic stories with a story from Tableau  and a perspective from Privacera, we are excited to bring...

Trino On Ice I: A Gentle Introduction To Iceberg

April 27, 2021

We’re excited to debut this blog series ‘Trino on Ice’ with a gentle introduction to Iceberg. Stay tuned for future posts from the Trino...

Data Pandemic Stories: How Data Drives Digital Transformation in a Crisis, by Privacera

April 14, 2021

After debuting our blog series on data pandemic stories with a story from Tableau, we’re excited to bring you a viewpoint from Syed Mahmood,...

Data Mesh: The Answer to the Data Warehouse Hypocrisy

March 25, 2021

Note: I start this piece with some technical background that has nothing to do with the data mesh, and is only relevant to data...

A Gentle Introduction to the Hive Connector

February 12, 2021

One of the most confusing aspects when starting with the Hive connector comes from the complex Hive model and overlapping use cases of this...

The Future of Analytics: In Conversation With Matt Fuller

February 5, 2021

Datanova is just next week. More than 2,000 data and analytics leaders will join us to learn more about how to unlock the value...

Reasons to Attend Datanova 2021: # 6, Technical Training from the Creators of Trino

February 5, 2021

We love data engineers at Starburst. They are our people, even when their Starburst Data equivalents try to trick Marketing into pronouncing the data...

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.

s