I am using Dremio Community edition 23.1.0 on my EKS cluster and I am having issues with the PDS not auto refreshing. I have a folder with parquet files in my AWS S3 source which I formated and created a dataset. The problem is that this parquet folder in the source is updated daily, but dremio does not update it automatically, I have missing data, I only have the data from the last time I formated the parquet folder. And it is not a matter of waiting because it has happened to me that data has not been updated in a month and i haven’t noticed it until someone complaint. In my previous dremio installation (which I still have running and I checked it) on version 11.0.0 this issue doesn’t happen. The same parquet folder formated to be a dataset gets updated automatically without me having to re format it or without having to auto refresh metadata or anything. What setting can I modify and how, to ensure that auto refresh of datasets is done automatically? In my source settings i have this configuration:
Hi! Thank your for the fast reply. The file metadata_refresh.log is empty. And the command ALTER PDS <PDS_NAME> REFRESH METADATA yes helps, is what I have been doing all the type to refresh it manually. But the same dataset in my old dremio v11.0 does not need me to manually refresh the metadata, why is that? Which setting can be affecting this?
@jbaranda On 23.x by default unlimited splits is turned on and requires your metadata to be moved to S3, In your dremio.conf do you have a dist:/// setting?when this runs via background there should be an internal refresh dataset job created for every PARQUET dataset, can you please find that profile? In addition the ALTER PDS command should also have generated an internal refresh dataset job. Can you please send those 2 profiles for the same dataset?
I was able to fix it updating dremio from version 23.1 to version 24.0. Dremio V23.1 had a lot of issues and apparently this was one. Thank you for the help!