We’ve been running dremio for a while now and since the weekend (no changes or logins during this period) we’re having a strange issue. Almost every query results with the following error:
FileNotFoundException: /data/nas/results/20c51ec7-a8e4-77e6-31fd-e09cf37c0c00/1_1_0.dremarrow1 (Remote I/O error)
The filename and query ID changes but the structure remains the same. Repeating the query several times sometimes yields results but most of the time, it’s marked as cancelled by the logs. Master node logs are not informative at all and executor nodes on debug only show the job as cancelled.
Memory and CPU usage during these queries looks ordinary and so does the disk space and shared NAS storage.
We had a chance to restore a backup for the virtual machines to Friday and everything seems to be working fine now, but we would like to know what the root of the problem is and if there’s less drastic measures we could take to prevent/solve this.
Thanks in advance.