I believe that with the Delta Lake connector, we are taking advantage of the metadata in the DeltaLog files. I’d imagine that this works better than the Hive connector as there is more metadata info available than what’s in HMS.
I’ll ping around a bit to see I can find someone who might be able to answer your questions more accurately and with deeper depth.
in Hive catalogs backed by a Hive Metastore, does dynamic filtering push down fully into partitions?
Yes, it does. We use it to reduce unnecessary io.trino.metastore.HiveMetastore#getPartitionsByNames calls, avoid file listing in pruned partition directories and avoid generating unnecessary splits from coordinator, and use the dynamic filter on the workers to prune unnecessary row groups in parquet (stripes in orc).
With Delta Lake tables (on ADLS/S3), does dynamic filtering still rely on transaction logs, and are there any limitations compared to Hive?
In delta lake and iceberg, dynamic filters are used to prune files based on statistics in the table metadata as well as pushed down into parquet reader for row -group pruning. This works the same way as any static predicate in the query.
For ADLS object storage sources, how effective is dynamic filtering in pruning files vs. falling back to coordinator-level filtering?
Dynamic filtering is agnostic to specific object storage being used, just like normal predicate pushdown. Effectiveness of pruning files and row-groups depends on the selectivity of the min/max statistics per file and per row-group. This depends on how well the data is clustered, typically sorting by the columns used in filters helps.
best practices for enabling dynamic filtering
The most important thing is to make sure that table statistics are present for the relevant tables as the CBO decision to put the smaller table on the build side of join is a critical factor in making dynamic filtering effective. It also helps the CBO to make decisions about whether its worth delaying certain scans to allow dynamic filters to be collected first.
Also, regardless of the connector, the engine additionally uses dynamic filters to remove unnecessary rows from the output of the connector table scan on the workers.
Thanks for the answer. so, based on answer if i want to diagnose if dynamic filtering is effectively applied in query plans? I can use explain and explain analyze to see detailed query plain and there it should mentioned about it .correct? do we have any other way to diagnose ? @raunaqmorarka
Yep, you’ll see the dynamic filter being called out in the explain plan output. You can also see it in the visual representation of the DAG/plan and other places as you can see around 51:30, 55:45, 1:07:50, and 1:08:55 in this TCB video – https://www.youtube.com/watch?v=-2SjSM73l8g