I’m running dremio through the latest docker container.
I’m mapping a datalake through the “NAS” option to a mounted drive (the drive is NTFS formatted from a windows file server).
All works when first creating data sources with the parquet files, however randomly after 1-2hours the datasources disappear.
I tried changing the reflection policy and the metadata policies. Nothing helped.
The mounted drive sometimes looses connection (not something I can fix, but it comes back after a split second) - somehow I have the feeling that’s the issue, but I also tried the switching off the “Remove dataset definitions if underlying data is unavailable” option and that didn’t help either.
Also not sure if it matters but below the docker command with which I’m running the container.
When you say “datasources” disappear, do you mean the “datasets”, change from a purple icon back to a folder? If your source goes offline, this can happen. If your file formats are not CSV and only PARQUET/JSON try the below 2 settings and see if it helps
Uncheck the first flag, “Remove dataset definitions if underlying data is unavailable.” so when the source goes offline, the formatting on the PDS will not be lost
Check the flag “Automatically format files into physical datasets when users issue queries.” so even if formatting is lost on a PDS and becomes a folder, just querying the dataset will auto promote. Only caution is that do not check this flag if you have CSV’s as the CSV will get promoted with default options, for example if the CSV file has “|” (pipe) as the delimiter then auto promotion will cause an issue as it promotes with “,” (comma) as the field delimiter. For PARQUET and JSON there are no formatting options so it is safe if you only have these 2 formats