I’m wondering if this is configuration is advisable? I wonder if having multiple nodes reading and writing the same NAS volume might create a bottleneck. I was under the impression that this sort of thing should be done with distributed object storage like S3 or HDFS. Doesn’t that defeat the point of distributed data, especially for reflections and acceleration?
@hmarchman-jones For dist storage, recommend using a data lake source like S3/Azure storage/GC/HDFS
You can use NAS but performance may be an issue
Thank you for the response.
What about local storage on the nodes?
@hmarchman-jones With version 21.x and above storing reflections on local is no longer supported
So would your professional recommendation be to use cloud object storage or a local Hadoop cluster ?
@hmarchman-jones Where is the data?
both the data lake and the Dremio data stores are on the same NAS volume.
Also, can you point me toward documentation re: storage configurations and what is supported?
@hmarchman-jones NAS is supported but needs to be performant, the file scheme for NAS is slightly different, please see documentation
@balaji.ramaswamy Thank you!
So it is correct for all nodes to be configured with dist pointing at the same absolute NAS path?
@hmarchman-jones Yes, it has to be the same on all executor nodes (the client side mapping)
What about for paths.local? Specifically, we are having issues with RocksDB operations resulting in filesystem corruption.
@hmarchman-jones Is your RocksDB on local or NAS? Can you send us your dremio.conf (minus passwords) pelase?