How does data federation work?

Tutorial: Easily access all of your data, no matter where it lives

Last Updated: December 1, 2023

For data to be valuable, it has to be useful and that means it needs to drive business insights. But in the modern data landscape, data often resides in multiple locations. For example, an organization might use a data warehouse for one use case and a data lake for another. Sometimes these choices follow divisions in the organizational structure with different departments each creating their own data source. 

This creates a siloing problem. In such a scenario, gaining insights becomes both complicated and costly, often requiring data to be moved from one system to another to create a single source of truth. But realizing this source of truth can be an unending task and many businesses either commit limitless resources to the task or never achieve success at all. Think of sisyphus rolling his rock up the hill for eternity. 

Solving data silo issues

Solving this problem is really what data federation is all about. Federation unlocks the value of your data by creating a connection across multiple data sources. This powerful approach gives businesses more options and increased flexibility, opening up a world of possibilities. Using federation, your organization no longer needs to move data unnecessarily to a central source of truth. Instead, you can focus on creating insights and driving value.

How do you connect to disparate data sources?

Federation is best when it offers lots of options. This ensures that no matter where your data lives it can connect with data in other sources. Starburst’s connector ecosystem includes 50+ connectors, allowing connections to both cloud and on-prem data sources. 

This breadth of connectors includes many enhanced proprietary connectors, further enhancing the options available. Overall, federation lowers costs, increases convenience, and improves versatility.

Getting started: query federation tutorial

Discover, locate, govern, and query your data from multiple data sources

Access tutorial

Who uses data federation?

Any data professional who manages data or queries data from multiple sources from data federation. This includes:

  • Data managers(i.e. Data engineers, data architects) create catalogs to connect to their organization’s data sources.
  • Data consumers(i.e. Data scientists, data analysts) write queries to federate data across data sources.

How does data federation work?

The Trino SQL query engine uses connectors to communicate with many data sources simultaneously, processing and joining data from disparate sources as needed to complete a query.

Supporting this, our connector ecosystem is broad, and we’re continuously adding and improving connectors. 

We connect to a variety of types of data sources, including NoSQL stores like Elasticsearch or MongoDB and relational databases like PostgreSQL. Additionally, we simplify data lake analytics by supporting all major table formats, including Iceberg and Delta Lake, persisted on Amazon S3, Azure Blob, and Google Cloud object stores.

The following image displays some of the connectors included in our connector ecosystem. 

How do I federate data with Starburst platforms?

Federation is easy with Starburst Galaxy. To get started, simply create catalogs to connect to the data sources you’d like to include. 

Next, join tables from different data sources in the same way you would join tables from the same data source.

The following video walks through federation in more detail using a sample dataset. You can use the same dataset with  Starburst Galaxy.

Want to try federating data for yourself? 

Starburst Academy has you covered. We’ve got several hands-on labs and tutorials to get you up and running quickly with federation. 

Tutorial: Federate multiple data sources

Practice federating data in Starburst Galaxy and using some of the other features available

Practice federating data

Course: Federate data with a simple query

Set up Starburst Galaxy and federate data with a simple query.

Practice on Galaxy

Start for Free with Starburst Galaxy

Up to $500 in usage credits included

You will need a valid email in order to activate your free trial.

Please fill in all required fields and ensure you are using a valid email address.

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

  • Query your data lake fast with Starburst's best-in-class MPP SQL query engine
  • Get up and running in less than 5 minutes
  • Easily deploy clusters in AWS, Azure and Google Cloud
For more deployment options:
Download Starburst Enterprise

Please fill in all required fields and ensure you are using a valid email address.