Share

Linkedin iconFacebook iconTwitter icon

More deployment options

Model Context Protocol (MCP) is quickly becoming the most widely used and accepted standard for unlocking context and functionality in your agentic applications. Integrating agentic AI workflows into our daily lives is a common practice here at Starburst, and we also dogfood our own products, using Starburst to power Starburst

Starburst Galaxy + MCP

Starburst Galaxy and MCP bring it to the next level. When we set out to support MCP in Starburst Galaxy, we made a deliberate architectural choice to host the MCP server as a managed, multi-tenant service inside our existing control plane rather than shipping a local binary. To that end, we’ve made MCP available and simple to use on every Starburst Galaxy account, which we use to ask natural language questions over a governed multi-cloud data lake.

This post unpacks the architecture behind Galaxy’s hosted MCP server, explains the tradeoffs we made, and details how we are building an enterprise-grade foundation for AI-driven data access.

A quick primer on MCP

MCP is an open standard that defines how AI agents discover and interact with external systems. An MCP server advertises capabilities like tools, resources, or prompts, and an MCP client (typically an AI agent) connects to the server and uses those capabilities on behalf of a user.

Why MCP matters

The protocol has recently experienced rapid adoption, but most implementations focus solely on the client-server connection. What happens after the connection, involving authentication, authorization, governance, and observability, is left as an exercise for the implementer. However, this is where the most interesting architectural decisions live, because it’s where you determine whether an enterprise data platform can safely expose its capabilities to AI agents.

Why Starburst Galaxy overturns MCP deployment conventions

How Starburst Galaxy uses MCP is important because its approach differs from many MCP server deployments. 

In other MCP deployment models, a local MCP server, called a STDIO server, runs as a process on the user’s machine. Here, the AI client communicates with it over STDIO or a local HTTP transport, and the server connects directly to a data source (i.e. a database, an API, or a file system). This traditional MCP deployment model works well for proof-of-concept or individual use cases, but it shifts critical concerns onto the user in the following ways. 

Credentials live on the client machine

The local server needs database passwords, API keys, or other credentials. Those credentials are stored in config files, environment variables, or keychains with no centralized rotation or revocation.

There is no multi-tenancy

Each user runs their own server instance. There is no shared infrastructure to enforce consistent policies, audit access, or manage capacity.

Observability is fragmented

Each local instance logs to its own stdout. There is no unified view of which agents are running what queries, how much data they are reading, or whether they are behaving as expected.

Installation/management issues

STDIO servers require manual installation by users and manual configuration. At Starburst, we feel that STDIO MCP will quickly fall by the wayside in favor of robust remote MCP servers.

Maintenance problems

The STDIO server is managed by users themselves and may not be up to date. There’s no way to ensure they have the most recent versions with any bug fixes, security patches, etc.

Governance is an afterthought

Role-based access control, data masking, and row-level security all require server-side enforcement. In data discovery use cases, a local process that connects directly to a data source bypasses any governance layer between them.

Why Starburst Galaxy’s MCP deployment model is different

With Starburst Galaxy, this is unnecessary. It already manages multi-cloud Trino clusters with enterprise governance built in. This means that running the MCP server locally would require re-implementing (or skipping) every control that the platform already provides. 

There is a better way. Let’s dive into what Galaxy’s hosted MCP offering provides out of the box:

Authentication, authorization, and query governance

Security is where the hosted model pays for itself. The MCP server implements a multi-layered security stack that would be impossible to replicate in a local deployment. We support MCP’s integration with the OAuth 2.1 protocol, allowing clients to self-discover Galaxy’s functionality without managing sensitive credentials. We also integrate seamlessly with Starburst Gravity, providing Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC). Because of this, AI agents using Galaxy’s hosted MCP are only allowed to perform operations of the logged-in user.

Additionally, we use Gravity’s Universal Search to ensure that your AI agents are set up to discover the highest-quality data first using data products.

Read-only by design

The query execution tool enforces a strict allowlist of SQL statement types. Every query is parsed using the Trino SQL parser before execution. Only read-only operations are permitted, like SELECTs, SHOW (catalogs, schemas, tables, columns, roles, grants, functions), or non-destructive EXPLAIN plans.

Any statement that modifies data is rejected at parse time, before it ever reaches a Trino cluster. While the MCP specification provides advisory annotations like `readOnlyHint` for clients (which Galaxy MCP does set), Galaxy goes further: read-only is enforced server-side through SQL parsing, not left to client cooperation.

Audit logging + Observability

When an AI agent executes SQL against your data platform, you need to know what happened, when, and why. The MCP server integrates with Galaxy’s existing observability infrastructure to provide this visibility. 

This has two implications. 

First, every MCP operation is recorded through Galaxy’s audit log system. These audit events feed into the same compliance pipeline that covers all Galaxy portal operations — available to account administrators through the audit log interface and query insights view, and exportable for centralized compliance monitoring.

Meanwhile, every SQL query executed through the MCP server carries metadata that identifies its origin:

  • Source: Tagged as `mcp-query-tool`, allowing platform teams to filter and analyze MCP-originated queries separately from interactive or BI tool queries.
  • Client info: The AI client’s User-Agent header is forwarded, distinguishing queries from those made by Claude Desktop vs. a custom agent framework.
  • Trace tokens: Clients can pass trace tokens that propagate through the entire execution path, from the MCP server through the Trino cluster to the data source.

This means you can answer questions like: “How many queries did AI agents run last week?”, “What is the p99 latency for MCP-originated queries?”, and “Which catalogs are agents querying most frequently?”

No infrastructure for you to manage

Starburst Galaxy solves the difficult problems of scaling HTTP servers, administering governance across your data stack, and understanding the intricacies of the MCP protocol, so you can focus on your business logic. There are no processes to run, servers to manage yourself, infrastructure to upgrade, or anything else in the maintenance of your agentic workflows. We provide that for you. Starburst is active in the MCP community, and you can be assured that it uses the latest best practices.

Excellent integration with tools you already use

Remote MCP servers are quickly becoming the main way that major AI products integrate with one another. 

 

For example, in Anthropic’s Claude tools, MCP (also called “connectors”) is a well-supported option. With Starburst MCP, you don’t need to worry about integrating with Anthropic, Google, or other systems, as we’ve already done all the work to integrate with them.

How Starburst Galaxy achieves its MCP architecture

Galaxy is a multi-tenant platform. Starburst Galaxy’s overall architecture can be found here. Organizations across industries share the same infrastructure, each seeing only their own clusters, catalogs, and data. The MCP server inherits this model directly. 

The image below shows how Galaxy MCP architecture works. 

Image depicting the Starburst Galaxy MCP data architecture for a typical Starburst Hosted MCP depolyment.

Unpacking the Starburst Galaxy MCP architecture 

The MCP server runs as its own Kubernetes deployment, on dedicated node pools, with independent scaling, health checks, and monitoring. It shares Galaxy’s central metadata store, authentication infrastructure, and access control engine, but runs as an isolated service with its own resource boundaries. 

We chose to host in the control plane this way because we anticipate agentic workflows will scale independently and faster than the rest of our infrastructure, and we want to be intentionally receptive to increased demand without affecting the rest of our infrastructure.

Rate limiting is applied at the edge, scoped per host. This means that one noisy tenant cannot exhaust the capacity for others. However, we can scale up horizontally in just a couple of minutes to anticipate additional demand.

Airlift library and Java on Starburst

Because Starburst is built on Trino, we have years of experience building high-performance Java services, and we leveraged that knowledge in our hosted MCP offering. In particular, because our MCP functionality is exposed over HTTP, we built our protocol support directly into the open-source Airlift library. Airlift is a tried-and-true foundation for Trino’s distributed system, and by leveraging its existing HTTP, concurrency, logging, and telemetry primitives, we can focus on delivering business value rather than reinventing the wheel. 

To editorialize a bit, as someone who used to write Java services using Spring, building them on Airlift is a much more pleasant experience.

Considering Starburst Galaxy MCP architectural tradeoffs

This hosted model is not without tradeoffs. Within the MCP server itself, query execution is bounded by result set size limits and query execution timeouts. If a query returns more data than the configured ceiling, the server cancels the query and returns a partial result with a clear warning. If a query exceeds the maximum execution time, it is terminated.

We chose to implement queries as a one-tool operation, instead of mimicking the access pattern of the Trino protocol, because we didn’t want to run into frequent situations where the nondeterminism of LLMs periodically checks for query results but then stops doing so occasionally, and to avoid rogue queries polluting agent contexts with more data and higher costs. 

Additionally, we know that data is extremely sensitive, and we do not store it in any interim data store. This introduces a temporary limitation – that will be resolved soon, especially with the proliferation of skills – to ensure a good customer experience.

Queries also introduce a network hop — though that latency is negligible compared to query execution time, which typically dominates the end-to-end response. It requires the MCP server to be highly available, and it means Starburst operates the infrastructure on behalf of the customer. But for enterprises that need governance, auditability, and security — the very organizations that Starburst Galaxy serves — those tradeoffs are clearly worth making.

Starburst Galaxy MCP in Claude Desktop and Claude Code

We have covered the architecture, but what does it look like from the client side? Our hosted MCP supports basic authentication (your username/password, or also via a service account), and we also support the OAuth 2.0 flow, so you don’t have to worry about handling sensitive passwords as environment variables. Let’s walk through how to enable Claude to query your Galaxy account’s clusters using MCP.

Step 1: (OAuth only) Make an OAuth Client

The first step is to create an OAuth Client in Galaxy. In this example, because we’re using a desktop-hosted application like Claude.ai or Claude Code, we should create a public OAuth client to minimize the sensitive credentials we need to manage.

First, while assuming a role that has both the View public OAuth clients and Manage OAuth clients privileges, open up the OAuth clients sub-menu item, and then click Create new OAuth client.

Image depicting the Starburst UI when making an OAuth client for an MCP deployment.

Make sure Public is the type of OAuth client, and Custom is the application. Choose an appropriate name and description, and for the redirect URIs, choose one or more URLs in the format http://localhost:PORT/callback (where PORT is an open port on your machine). You may make multiple callback URLs in this client that map to one of a few open ports. For claude.ai or Claude Desktop, you will also want to add the URL https://claude.ai/api/mcp/auth_callback as a valid callback URI.

Here’s an example configuration that will work for Claude Code.

Image showing the forms to fill out in the Starburst UI when creating an OAuth client for MCP.

Click Create OAuth client, and copy the new client ID. Done! Now we can move on to the next step.

Claude Desktop

Open Claude Desktop and navigate to the Customize screen. Click the Connect your apps tab, then search for “Starburst” and connect.

In the connection configuration, set the server URL to https://<account-name>.mcp.galaxy.starburst.io. Enter the client ID and (optional) client secret. Make sure that you have added a callback URI ‘https://claude.ai/api/mcp/auth_callback’ in your redirect URIs in your OAuth Client.

Click Connect to establish the connection. You can now use Claude Desktop to query your Galaxy data through the MCP server.

Claude Code

Adding a new MCP server using Claude Code is simple. Make sure you’ve taken note of your OAuth Client ID and a port from the callback URI configuration when you configured your client.

OAuth Client

Replace YOUR_CLIENT_ID and YOUR_PORT with the OAuth client ID (copied from above) and the valid Redirect URI port from your OAuth client. Also, replace MY_ACCOUNT with your Galaxy account domain.

claude mcp add --transport http \
  --client-id YOUR_CLIENT_ID \
  --client-secret \ # optional, only include this line if using a private OAuth client
  --callback-port YOUR_PORT \
  my-starburst-galaxy-account https://MY_ACCOUNT.mcp.galaxy.starburst.io

That’s it! You now have a working Galaxy MCP server without managing any credentials, and the agent only sees the data each user is allowed to access.

Username/Password (including Service Accounts)

Adding a configuration with username/password authentication is easy. You can use your own username/password, or a service account will work as well. You will want to replace MY_ACCOUNT with your Galaxy account domain, and MY_TOKEN is a base64-encoded string in the form username:password. Claude also supports reading this sensitive variable from environment variables as well.

claude mcp add-json my-starburst-galaxy-account '{"type":"http","url":"https://MY_ACCOUNT.mcp.galaxy.starburst.io","headers":{"Authorization":"Basic MY_TOKEN"}}'

Future Enhancements

The MCP specification is rapidly evolving. The most recent version (2025-11-25) of the specification introduces new utilities and access control patterns. 

Context Problems

MCP and AI integrations, in general, can suffer from context issues. AI Agents use “context windows” to enhance their models and to provide context-aware results to users. Protocols such as MCP can suffer from a problem commonly known as “context bloat,” in which the MCP server adds excessive context, leaving the agent confused or overwhelmed. Alternatively, agents may lack sufficient context to deliver meaningful results.

Starburst is aware of the latest best practices regarding MCP and will continuously update the MCP server to avoid these context problems. The three most exciting changes for Galaxy are skills, apps, and authentication improvements. Let’s dive into how each of the three will supercharge agentic workflows in Starburst.

Skills

Complex analytical queries frequently take a long time and can involve long operations over large data sets. As mentioned above, queries on Galaxy MCP have a couple of important limitations that limit the size of results and the length of individual queries. We will extend our toolset to accommodate long-running queries and tasks, but skills are a great way to provide a set of tools for composition without bloating context windows. Skills will be added to MCP in the next revision of the protocol, scheduled for this June.

Apps

MCP apps provide protocol-level support for rich UIs in client MCP applications such as Claude Desktop. In the data world, the words “data” and “visualization” go together like code and coffee. Text-based agents summarizing data or performing investigations are incomplete without the ability to view trends and access auto-generated dashboards during each agent session. Providing rich visualizations is on our roadmap to expose via MCP.

Authentication

Previous versions of the MCP specification encouraged the use of OAuth Dynamic Client Registration (DCR). We opted against supporting DCR, as our hosted MCP is available on the public internet, allowing unauthenticated requests to add OAuth clients could create a confusing and insecure environment. 

The MCP specification is moving away from DCR as the recommended approach for MCP client registration. Instead, the newest version of the spec recommends supporting OAuth Client ID Metadata Documents as a better way for dynamic clients to continue using OAuth authentication. As the Client ID Metadata Documents spec becomes an approved standard and clients support it, Galaxy will evolve in this direction, too. This will eliminate the need for preconfigured ports in OAuth clients.

What this means for enterprise AI

The conversation around MCP has largely focused on connecting AI agents to tools. That is important, but it is only half the story. The harder problem, and the one that determines whether enterprises actually adopt agent-driven data access, is governance.

Who ran the query? Were they allowed to? Did the agent access data it should not have? Can we prove it to an auditor? These are prerequisites for any organization operating under regulatory, compliance, or security obligations.

Starburst Galaxy’s hosted MCP server is designed from the ground up to answer them. It is not a bolt-on or an adapter. It is a governed, auditable, secure entry point for AI agents into your data platform — one that operates across AWS, Azure, and Google Cloud with the same access policies and governance regardless of where the data lives.

We are building the infrastructure so enterprises can say “yes” to AI-driven data access rather than “not yet.”

Start a Galaxy free trial today.

 

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.
Start Free