Hive 3 First Query Always Slow

We’ve noticed a pattern where the first query each day against our Hive 3 data source is SUPER slow. A quick preview takes 10 minutes if it’s the first query of the day against Hive. Subsequent queries against Hive, for completely different tables/criteria, are back to lightening speed, and the previously slow preview is back to being fast regardless of criteria we add.

Is there a metadata refresh happening or a config change we might need to make?

@patricker

It could be very much possible we are refreshing metadata during the first run. Can you please send us the job profile of the first slow run?

Share a Dremio Query Profile

I can’t share a query profile. But I’d be happy to share almost anything in the UI/plan (where I can easily remove sensitive details before posting).

@patricker

See if the planning time during the first run is high, if it is then click the planning tab and scroll down to the below section to see if you find any significant time

: PERMISSION_CACHE_HIT (0 ms)
: PERMISSION_CACHE_HIT (0 ms)
: CACHED_METADATA (0 ms)

PERMISSION_CACHE_HIT (0 ms)

Slightly different than the name you suggested, but looks important:
PARTIAL_METADATA (1,201,139 ms) - which is 20.01 minutes. The job took 20.02 minutes in total to run the whole job.

It looks like it spent almost the entire query time on “PARTIAL_METADATA”.

@patricker

It looks like metadata is expired and that is why we fetch and subsequent runs are fast

What source configuration changes should I try for Metadata Refresh? Right now it’s set to “Only Queried Datasets”, Fetch 1 hour, Expire 3 hours.

I see that the option to fetch all is shown as depricated?

1 Like

@patricker

fetch all can cause a long background refresh so best would be to do only queried datasets. Are you seeing long planning times very frequently