Timeout: reading HTTP response asynchronously from Azure Storage

Got the error below from Dremio executor and this executor no longer responds to any queries until restating the executor. Is there any “retry” mechanism for Executor fetching the data from a remote location? Will it help if disable the “async read” mode? Thank you very much.

Dremio cluster is deployed in Azure by arm-templates from Dremio cloud-tools.

ERROR c.d.plugins.azure.AzureStoragePlugin - Error reading HTTP response asynchronously.
java.util.concurrent.TimeoutException: null
at io.reactivex.internal.operators.single.SingleTimeout$TimeoutMainObserver.run(SingleTimeout.java:115) ~[rxjava-2.2.0.jar:na]
at io.reactivex.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:38) ~[rxjava-2.2.0.jar:na]
at io.reactivex.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:26) ~[rxjava-2.2.0.jar:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_222]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_222]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_222]

Hi @dli16

You are right, we are working on some fixes/improvement when aync is on. Can you please uncheck asynch. off and retry?

Thanks
@balaji.ramaswamy

Thank you for your response. How can I disable the async mode if Azure Storage(ADLS Gen2) is setup as distributed data storage?

@dli16

Can you please send us the profile of the job that failed with a timeout exception?

We disabled the async mode and things got much better.

However, we also use ADLS gen2 as distributed storage and I think it’s using async mode by default. I’m wondering if we can change that from async to sync while you are working on improving this functionality?

Hello @balaji.ramaswamy,

After upgraded to 4.0.5 of Dremio. We still see “timeout error” from com.dremio.plugins.azure.AzureStorageFileSystem$AzureAsyncReader.

It seemed that the code flow to https://github.com/dremio/dremio-oss/commit/94319dd5d92f488df3290a1eeb7054e9d3155f05#diff-26960976704dd5e031481495d38c69a3R378 without any retries.

@dli16

If this error is happening while querying a dataset on Azure storage, kindly turn off (uncheck) “Enable asynchronous access when possible” under the source-advanced options

If this happening while writing reflections to Azure storage then add below line dremio.conf on all executors and restart them and then retry

debug.dist.async.enabled: false

Note: We are working on fixing this

Thanks
@balaji.ramaswamy

Thank you so much for your response and the team actively working on this issue.