Blocked on upstream taking too long for some queries

Hi

I am using the same query but there are moments when the blocked on upstream is taking too long. I don’t think there are any other jobs running at that time. I was wondering if you can help me out

c49f2f85-1adb-473d-a0e9-0dc18e6082f5.zip (36.4 KB)

@lfcosio

Blocked on upstream means that Dremio is waiting for a phase downstream to complete. The Phases are bottom up so for phase 1, phase 0 is downstream whereas phase 2 is upstream. So when phase 0 and phase 1 say “blocked on upstream”, it simply means that they are ready for execution but some upstream phase has not completed the work or still in progress. In this case if you see the Operators table and look at the PARQUET_ROW_GROUP_SCAN (02-xx-02), about 0.2 seconds are spent on SETUP, to open and close files and 0.6 seconds on wait time (could be reading Parquet footer). The reason is if you open the operator metrics for the PARQUET_ROW_GROUP_SCAN there are NUM_CACHE_MISSES which can cause the read to go back to the reflection files which might be on a distributed store like S3. To improve this, you would need more CACHE_HITS and zero CACHE_MISSES. See if this query was run before and if not then run it one more time so the cache is built for the data. If already run many times, could be the disk holding the cache is not big enough

Thanks
Bali

1 Like

Thanks. I got things reversed where I thought it goes from 0-2. Thanks for the explanation. I do have a follow up though

It seems that for this one, it is blocked on others. What should I look out for?

e64f7c5f-672c-4b2c-a35c-49d8e6a477db.zip (35.3 KB)

@lfcosio

If you in the query, time was spent in " Starting: 422ms. During this phase the fragments are propagated to to the executors. Click in the planning tab and search for “Fragment Start RPCs (421 ms)”. It could be the pods were busy at this time as other queries were running. Do you always see 0.5s spent in fragment assignment?

Thanks
Bali