Understanding planning

This query produces a scan on the dataset.

SELECT count(*)
FROM A
WHERE dir0 = (SELECT MAX(dir0) FROM A)

The result is 3445

Thread Setup Time Process Time Wait Time Max Batches Max Records Peak Memory
00-00-14 0.074s 0.165s 0.990s 15 48,101 48KB

I’m surprised at the 48,101 in the plan. Did it read the entire dataset, or only the max dir0 partition? How can I tell?

@swarren any chance you can share the query plan?

The plan itself has the table and column definitions. I probably need to clean that before I can share the plan.

We ran into this too. Dremio is smart about pruning partitions if you filter on dir0, but if you run functions like max against directories, Dremio will actually create records with just the directory name once per row in the partition. Most of our performance challenges were around appropriate partition pruning (filters on dir0) rather than the thing you’re doing, so I don’t recall if that’s being prioritized in any way… They do know about it though.

1 Like