Well, the very simplest way to turn a PyStarburst DF into a parquet file(s) would be with the save_as_table()
function described at Dataframe write functions — PyStarburst.
To test this out, I first spun up the Jupyter notebook from the GitHub - starburstdata/pystarburst-examples project then navigated into the tpch.ipynb
notebook.

I used my Starburst Galaxy account details & credentials, but any Starburst cluster will do, and I ran all the cells up to, and including, this one.

Then I created a new cell with the code below into a catalog.schema I already had set up.
# save existing DF into a new table (note type:hive isn't needed on SEP)
tli.write.save_as_table("mycloud.messinround.asparquet",
mode="overwrite",
table_properties={"format": "parquet", "type": "hive"})
I verified it was created in the Starburst UI (I ran some queries there, too).

I then ran the following in Jupyter just to make sure it is accessible from the API.
session.table("mycloud.messinround.asparquet").collect()
All of that did create a parquet file for me in S3 as you can see.

It was so small it actually just created one “part file”, but if the DF was much larger it would have likely been spread across multiple files as that’s just how it works. And, a good thing too, since that would allow the file to be written in parallel.
Programmatically, you could probably do a few things now to get this into a local parquet file. One approach would be to:
- Convert the PyStarburst DF into a Pandas DF using
to_pandas()
- Dataframe — PyStarburst
- Converting the Pandas DF to a file with something like pandas.DataFrame.to_parquet — pandas 2.2.3 documentation (or maybe fastparquet · PyPI)
Of course, this is going to be a bad solution if the dataset is too big. Again, if that’s the case, just create a new table and let Starburst/Trino write the parquet files for you and then pull them from your bucket.
Hope that helps!