Exceptionally poor response time ( > 700 x slower) compared to underlying database

  • I have a query that performs three inner joins
  • 2 of the joins are to the same table
  • on snowflake, no caching, medium warehouse execution time is around 5 secs, just under 1million rows
  • dremio, all tables accelerated, run time is 1 HOUR and STILL GOING.

More worrying is that it spends all its time in a NESTED_LOOP_JOIN AND it’s only uses ONE core of the entire cluster the whole time

The query ultimately timed out - but that may be aqua data studio client - not too sure - either way, here’s the profile
c044da8c-4351-48b0-97e7-cbc6a00d22f8.zip (35.1 KB)

@mfarmer

Since you are operating only on 341,248 records, it is single threaded. I also see 15m blocked on downstream which is waiting on JDBC to consume records

Hmm - so was using aqua data studio as client - not sure why it would be causing a long wait for 15 mins. I have two questions:

  • how can you see the 15 min wait in the profile
  • I monitored the server with htop for a long time - there’s still over 45 mins of delay, and that one core was pegged at 100% for a long time

@mfarmer

how can you see the 15 min wait in the profile? Expand phase 00-xx-xx . Phase 0 is the last phase and downstream to Phase 0 is external like ODBC or JDBC etc. If you see under Phase 00-xx-xx and under “Blocked Downstream” you will see the time I am talking about

NL joins are expensive and since you are single threaded that core is probably getting used 100%

So - I am re-running the query from within dremio - 8 mins in and it’s still going… Questions:

  1. It’s still very slow compared to snowflake (other queries seem to work same or slightly quicker)
  2. why the decision to only use a single thread ? It pains me to see the cluster at 0% except one core for so long!
  3. What caused the nested loop and what could be changed to avoid it ?

Thanks!

#2 We use one thread per 100,000 records but limited by number of splits. In your case you only have one split so hence the single thread. Can you partition by a different key to increase number of splits?

#3 Why does it use a NL join, in the below query, are “lctryiso” and “mctryiso” columns in kldw.reference.kl_iata_all

One thing you can try is to first join only kldw.reference.kl_iata_all and kldw.reference.iata_purchased_db and see how the query performs and look at the plan. Then add “kldw.reference.country”

thanks - will try - just an FYI - query took 1hr 46 mins to complete - snowflake was about 5 seconds

@mfarmer

Would you be able to send us the profile of the completed query?

1d8785b7-9ea8-4787-b45b-509befc874a9.zip (719.4 KB)