Need help confirming and testing spooling + spill behavior in Starburst

knikhilreddy99 · December 1, 2025, 11:20pm

Hi Lester, thanks for the details!

We have two AMI Dev clusters — one in us-east and one in us-central-122. Spooling is enabled only on the central-122 cluster, and I can see the log message “spooling is enabled with …” which confirms that the configuration is loaded correctly.

Now I’d like to compare the two clusters (spooling ON vs OFF) to understand the real execution differences. I’m trying to validate:

How query execution changes with and without spooling
The impact on memory usage, performance, and result handling
Whether spooling is actually being triggered when running large result-set queries through the trino-python-client

Here is the test query I’m using:

WITH base AS (
    SELECT *
    FROM "ami_sb_insights"."ami_sbe_config"."biac_audit_session"
    LIMIT 400000
)
SELECT b.*, t.n
FROM base AS b
CROSS JOIN UNNEST(sequence(1,10000)) AS t(n);

This produces ~4 billion rows, so it should be large enough to trigger spooling if it’s active.

Could you guide me on the best way to validate spooling behavior across both clusters? For example:

Is this a suitable query to show a clear difference between “spooling enabled” and “spooling disabled”?
Should I run the same query on both clusters and compare worker memory, S3 spool segment creation, or runtime?
Are there specific metrics, logs, or UI pages you recommend monitoring while testing?

Thanks again for your help!