Identical SQL Querys - Profile Comparison

Hello all, I have two queries (that are exactly the same), they were run one after each other. The first query took 13s while the next query took 6s.

I have turned caching off for the storage account but I am struggling to understand where the difference is coming from? Can anyone assist in determining where the speed up/delay comes from?

6s.zip (29.9 KB)
13s.zip (30.6 KB)

A bit of background, the queries are submitted via the Python Arrow Flight Client, and the table we are reading off is a Delta Table.

I think the difference is due to your client consuming records at different speeds. Basically, Dremio is providing the data faster than your client can consume them and so ends up blocking until the client asks for more records.

I can tell this by looking at Phase 01-xx-xx and the high Blocked on Downstream timings. It’s significantly higher for your 13s profile vs 6s profile.

@Benny_Chow thanks, for my own understanding is the upstream or downstream blocked timings purely just and indication of one phase waiting for either the previous or next phase?

In this case Phase 2 is the client I assume or the process to send to the client? So considering I am using Arrow Flight as the client in my application do you have any insight what would cause this delay? Considering its the same query just firing immediately afterwards but not being ‘blocked’ the second time.

Would you consider this behaviour just any anomaly in the networking between Dremio and the Client?

Phase 2 → Phase 0 is how the data flows. But in reality data is requested starting at Phase 0 and if Phase 0 doesn’t have data, then we ask from Phase 1 and so on.

The client gets its data from Phase 0’s screen operator. In your profiles, you are pumping 12.96M records to the client.

From the profiles, I can’t see any other reason why there was a delay except for the client spending more time between batch requests.

@nikhil.makan In addition to @Benny_Chow great explanation, if you try running this query via the Dremio UI, you should not see any “Blocked Downstream” on phase 0

Thanks @Benny_Chow and @balaji.ramaswamy this has been very insightful.