Blocked on upstream taking too long for some queries

lfcosio · February 18, 2021, 10:23am

Hi

I am using the same query but there are moments when the blocked on upstream is taking too long. I don’t think there are any other jobs running at that time. I was wondering if you can help me out

lfcosio · February 18, 2021, 10:24am

c49f2f85-1adb-473d-a0e9-0dc18e6082f5.zip (36.4 KB)

balaji.ramaswamy · February 19, 2021, 7:21am

@lfcosio

Blocked on upstream means that Dremio is waiting for a phase downstream to complete. The Phases are bottom up so for phase 1, phase 0 is downstream whereas phase 2 is upstream. So when phase 0 and phase 1 say “blocked on upstream”, it simply means that they are ready for execution but some upstream phase has not completed the work or still in progress. In this case if you see the Operators table and look at the PARQUET_ROW_GROUP_SCAN (02-xx-02), about 0.2 seconds are spent on SETUP, to open and close files and 0.6 seconds on wait time (could be reading Parquet footer). The reason is if you open the operator metrics for the PARQUET_ROW_GROUP_SCAN there are NUM_CACHE_MISSES which can cause the read to go back to the reflection files which might be on a distributed store like S3. To improve this, you would need more CACHE_HITS and zero CACHE_MISSES. See if this query was run before and if not then run it one more time so the cache is built for the data. If already run many times, could be the disk holding the cache is not big enough

Thanks
Bali

lfcosio · February 19, 2021, 8:57am

Thanks. I got things reversed where I thought it goes from 0-2. Thanks for the explanation. I do have a follow up though

It seems that for this one, it is blocked on others. What should I look out for?

e64f7c5f-672c-4b2c-a35c-49d8e6a477db.zip (35.3 KB)

balaji.ramaswamy · February 20, 2021, 6:25am

@lfcosio

If you in the query, time was spent in " Starting: 422ms. During this phase the fragments are propagated to to the executors. Click in the planning tab and search for “Fragment Start RPCs (421 ms)”. It could be the pods were busy at this time as other queries were running. Do you always see 0.5s spent in fragment assignment?

Thanks
Bali

Topic		Replies	Views
Reflection refresh issue: Blocked on upstream	3	1466	December 18, 2020
HIVE Queries taking more Time on Blocked on Upstream	4	1318	July 20, 2020
Parquet Row Scan long wait time	1	1053	December 12, 2020
Identical SQL Querys - Profile Comparison	5	842	February 22, 2023
Accelerated queries are blocked on downstream	4	1999	October 24, 2019

Blocked on upstream taking too long for some queries

Related topics