When we perform the Reflection refresh, it is noticed that some reflections are updated extremely fast and other ones take pretty much longer to finish the procedure (for some it is taken seconds to be done and for others circa 30 minutes). When we analyzed the logs, it was realized that the main issue is almost always the “Blocked on upstream” issue. Can you give us some advice on how to deal with this issue properly? Maybe some tips on optimizing the refresh configuration to prevent this issue to happen. It is important to mention that the PDS are being loaded from a Glue Catalog.
Blocked on downstream just means one of the Dremio phases is waiting on another phase to complete. This can be CPU or IO bound. The job profile should tells us where the bottleneck is. Are you able to share the query profile with us?
of course. Thanks a lot for the attention,
You have couple of issues
- Wait time on Glue is about 9 minutes, for 420,000 records you have splits=), and since all these are remote reads, see operator metrics on HIVE_SUB_SCAN (open job profile) there is significant IO wait. You have C3 turned on , see operator metrics on HIVE_SUB_SCAN, column “NUM_CACHE_HITS” and “NUM_CACHE_MISSES” but nothing is reading of the C3 cache. Has this been configured righ? See C3 documentation below
- CPU contention - The second issue is there is significant sleep time (CPU) on phase 1, expand phase 1 and see column “waiting”, are there other queries running. Do you have plans on adding on more executor. If this happens again, send us the new profile and server.log from the Dremio executor