Dremio fails to recognize new folders

Hi,

In line with the question here (Dremio with reflection on updated (append) dataset) I wanted to ask the following:

I’m referencing the following structure:

dir0

  • date1
  • date2
    -…

And within each date folder I have the same parquet file.

I want to be able to query for the most recent date with below query which works when setting it up. However when adding new folders it doesn’t take into account the new folders when I rerun the query. Somehow dremio fails to recognize them. I’m not running any reflection.

SELECT rundate,variable,“value”,source,dir1 as dealkey,dir0 as filetimestamp FROM mergeperdealkey WHERE dir0 IN (
SELECT TO_CHAR(timeconverted,‘YYYYMMDDHH24MISS’) FROM (
SELECT to_timestamp(timestring,‘YYYYMMDDHH24MISS’) as timeconverted FROM (SELECT DISTINCT(dir0) as “timestring” FROM mergeperdealkey) ORDER BY timeconverted desc LIMIT 1
)
)

Regards,
Christian

@cklar Have you refreshed the dataset?

https://docs.dremio.com/sql-reference/sql-commands/datasets/

You can also wait for the background refresh to complete, check metadata_refresh.log on the coordinator

Thanks - I’ll try it.

Right now I switched the refresh cycle to 1 min which seems to show me folder updates every 90 seconds. Do you know if setting the refresh cycle so low has any performance impacts?

@cklar Yes, depending how often the datasets change, the coordinator can be under severe heap pressure causing full GC cycles