This is for Dremio OSS 19.3
We are storing CSV files in a folder in Azure Storage, and have promoted that folder to a Physical Dataset (PDS). We have then created a Virtual Dataset (VDS) from that PDS that has a raw reflection enabled.
Whenever we add a new CSV file, we need to run a Metadata Refresh (
ALTER TABLE table REFRESH METADATA) and then refresh the reflection to be able to query the new data.
However, the Metadata Refresh can take up to a minute for us - and during the time the metadata refresh is running, queries that target the VDS are not running. In the Dremio UI they don’t appear at all, and once the refresh is complete they appear and run very quickly.
The end result is that during a metadata refresh, queries that normally take less than a second can appear to take up to a minute. Why does the query wait for the refresh? Couldn’t it run against the existing reflection?
@phillip Your VDS should run using the old reflection, when you say queries are not running during the metadata refresh time, are they stuck in particular phase, what does the UI job icon look like? It may be possible your metadata is expired and the datasets are all doing an inline refresh and filling up the command pool or it maybe possible your coordinator is having only one core and the command pool slot (which is # of cores - 1) is full. Can you please send me the profile of the VDS job that hangs during the ALTER PDS job but then completes as soon as the ALTER PDS is done?
@balaji.ramaswamy I recorded a short video that demonstrates what I’m seeing.
Metadata refresh affecting query jobs - Dremio - 26 January 2022 (loom.com)
Here is the profile of the VDS job that is submitted after the refresh.
e47d334a-62c2-42b7-9407-5875628beedc.zip (14.7 KB)
edit: And in case it is relevant, here is the profile of the refresh metadata job:
74ff2c68-850a-4adb-b42a-ecd06fa751c7.zip (4.7 KB)