Got the error below from Dremio executor and this executor no longer responds to any queries until restating the executor. Is there any “retry” mechanism for Executor fetching the data from a remote location? Will it help if disable the “async read” mode? Thank you very much.
Dremio cluster is deployed in Azure by arm-templates from Dremio cloud-tools.
ERROR c.d.plugins.azure.AzureStoragePlugin - Error reading HTTP response asynchronously.
java.util.concurrent.TimeoutException: null
at io.reactivex.internal.operators.single.SingleTimeout$TimeoutMainObserver.run(SingleTimeout.java:115) ~[rxjava-2.2.0.jar:na]
at io.reactivex.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:38) ~[rxjava-2.2.0.jar:na]
at io.reactivex.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:26) ~[rxjava-2.2.0.jar:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_222]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_222]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_222]
We disabled the async mode and things got much better.
However, we also use ADLS gen2 as distributed storage and I think it’s using async mode by default. I’m wondering if we can change that from async to sync while you are working on improving this functionality?
If this error is happening while querying a dataset on Azure storage, kindly turn off (uncheck) “Enable asynchronous access when possible” under the source-advanced options
If this happening while writing reflections to Azure storage then add below line dremio.conf on all executors and restart them and then retry