NAS datalake - datasources disappear

Hi -

I’m running dremio through the latest docker container.

I’m mapping a datalake through the “NAS” option to a mounted drive (the drive is NTFS formatted from a windows file server).

All works when first creating data sources with the parquet files, however randomly after 1-2hours the datasources disappear.

I tried changing the reflection policy and the metadata policies. Nothing helped.

The mounted drive sometimes looses connection (not something I can fix, but it comes back after a split second) - somehow I have the feeling that’s the issue, but I also tried the switching off the “Remove dataset definitions if underlying data is unavailable” option and that didn’t help either.

Also not sure if it matters but below the docker command with which I’m running the container.

docker run --rm --privileged=true -p 9047:9047 -p 31011:31010 -p 45678:45678 -v /home/user/Desktop/dremio:/opt/dremio/data -v /mnt/fileshare:/mnt/fileshare-v --name dremioContainer dremio/dremio-oss:latest

Any help would be appreciated.

Regards

@cklar

When you say “datasources” disappear, do you mean the “datasets”, change from a purple icon back to a folder? If your source goes offline, this can happen. If your file formats are not CSV and only PARQUET/JSON try the below 2 settings and see if it helps

Uncheck the first flag, “Remove dataset definitions if underlying data is unavailable.” so when the source goes offline, the formatting on the PDS will not be lost

Check the flag “Automatically format files into physical datasets when users issue queries.” so even if formatting is lost on a PDS and becomes a folder, just querying the dataset will auto promote. Only caution is that do not check this flag if you have CSV’s as the CSV will get promoted with default options, for example if the CSV file has “|” (pipe) as the delimiter then auto promotion will cause an issue as it promotes with “,” (comma) as the field delimiter. For PARQUET and JSON there are no formatting options so it is safe if you only have these 2 formats

With disappear I mean that the virtual data sources that were using parquet files from the NAS disappear.
image

I actually tried unchecking the “Remove dataset…” option prior to writing the post but that didn’t help.

One question regarding your second suggestion: Does that mean that Dremio will create local copies of all the parquet files?

@cklar Dremio will not create copies unless you turn on reflections

Okay thank you!

Christian