Timeout reading parquet file from s3

I have some queries that are able to connect to s3 to check the accelerations. however, I have one particular query that seems to timeout, is there a config I can up the timeout:

  (java.io.InterruptedIOException) Reopen at position 8013365 on s3a://staging-dremio/storage/accelerator/6baae6fd-2600-476a-9009-7f68343ffb77/d1b5100e-8890-4b9b-ae59-aef231abfa4f/0_0_0.parquet: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
    org.apache.hadoop.fs.s3a.S3AUtils.translateException():125
    org.apache.hadoop.fs.s3a.S3AInputStream.reopen():155
    org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek():281
    org.apache.hadoop.fs.s3a.S3AInputStream.read():364
    java.io.DataInputStream.read():149
    com.dremio.exec.store.dfs.FSDataInputStreamWrapper$WrappedInputStream.read():247
    com.dremio.exec.store.dfs.FSDataInputStreamWithStatsWrapper$WrappedInputStream.read():127
    java.io.DataInputStream.read():100
    org.apache.parquet.hadoop.util.CompatibilityUtil.getBuf():109
    com.dremio.parquet.pages.BaseReaderIterator.readFully():157
    com.dremio.parquet.pages.SmartPRI.getPage():100
    com.dremio.parquet.pages.MemoizingPageIterator.getPage():41
    com.dremio.parquet.pages.PageIterator.nextPage():118
    com.dremio.parquet.pages.PageIterator.hasNextPage():63
    com.dremio.parquet.reader.column.generics.BitSimpleReader.evalNextBatch():85
    com.dremio.parquet.reader.SimpleRowGroupReader.eval():39
    com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.next():296
    com.dremio.exec.store.parquet.UnifiedParquetReader.next():220
    com.dremio.sabot.op.scan.ScanOperator.outputData():208
    com.dremio.sabot.driver.SmartOp$SmartProducer.outputData():518
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():82
    com.dremio.sabot.driver.Pipeline.pumpOnce():72
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():291
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():287
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1807
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():244
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$800():84
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():580
    com.dremio.sabot.task.AsyncTaskWrapper.run():107
    com.dremio.sabot.task.slicing.SlicingThread.run():71
  Caused By (com.amazonaws.SdkClientException) Unable to execute HTTP request: Timeout waiting for connection from pool
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException():1069
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper():1035
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute():742
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer():716
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute():699
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500():667
    com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute():649
    com.amazonaws.http.AmazonHttpClient.execute():513
    com.amazonaws.services.s3.AmazonS3Client.invoke():4221
    com.amazonaws.services.s3.AmazonS3Client.invoke():4168
    com.amazonaws.services.s3.AmazonS3Client.getObject():1378
    org.apache.hadoop.fs.s3a.S3AInputStream.reopen():148
    org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek():281
    org.apache.hadoop.fs.s3a.S3AInputStream.read():364
    java.io.DataInputStream.read():149
    com.dremio.exec.store.dfs.FSDataInputStreamWrapper$WrappedInputStream.read():247
    com.dremio.exec.store.dfs.FSDataInputStreamWithStatsWrapper$WrappedInputStream.read():127
    java.io.DataInputStream.read():100
    org.apache.parquet.hadoop.util.CompatibilityUtil.getBuf():109
    com.dremio.parquet.pages.BaseReaderIterator.readFully():157
    com.dremio.parquet.pages.SmartPRI.getPage():100
    com.dremio.parquet.pages.MemoizingPageIterator.getPage():41
    com.dremio.parquet.pages.PageIterator.nextPage():118
    com.dremio.parquet.pages.PageIterator.hasNextPage():63
    com.dremio.parquet.reader.column.generics.BitSimpleReader.evalNextBatch():85
    com.dremio.parquet.reader.SimpleRowGroupReader.eval():39
    com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.next():296
    com.dremio.exec.store.parquet.UnifiedParquetReader.next():220
    com.dremio.sabot.op.scan.ScanOperator.outputData():208
    com.dremio.sabot.driver.SmartOp$SmartProducer.outputData():518
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():82
    com.dremio.sabot.driver.Pipeline.pumpOnce():72
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():291
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():287
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1807
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():244
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$800():84
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():580
    com.dremio.sabot.task.AsyncTaskWrapper.run():107
    com.dremio.sabot.task.slicing.SlicingThread.run():71
  Caused By (org.apache.http.conn.ConnectionPoolTimeoutException) Timeout waiting for connection from pool
    org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection():286
    org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get():263
    sun.reflect.GeneratedMethodAccessor8.invoke():-1
    sun.reflect.DelegatingMethodAccessorImpl.invoke():43
    java.lang.reflect.Method.invoke():498
    com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke():70
    com.amazonaws.http.conn.$Proxy46.get():-1
    org.apache.http.impl.execchain.MainClientExec.execute():190
    org.apache.http.impl.execchain.ProtocolExec.execute():184
    org.apache.http.impl.client.InternalHttpClient.doExecute():184
    org.apache.http.impl.client.CloseableHttpClient.execute():82
    org.apache.http.impl.client.CloseableHttpClient.execute():55
    com.amazonaws.http.apache.client.impl.SdkHttpClient.execute():72
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest():1190
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper():1030
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute():742
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer():716
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute():699
    com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500():667
    com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute():649
    com.amazonaws.http.AmazonHttpClient.execute():513
    com.amazonaws.services.s3.AmazonS3Client.invoke():4221
    com.amazonaws.services.s3.AmazonS3Client.invoke():4168
    com.amazonaws.services.s3.AmazonS3Client.getObject():1378
    org.apache.hadoop.fs.s3a.S3AInputStream.reopen():148
    org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek():281
    org.apache.hadoop.fs.s3a.S3AInputStream.read():364
    java.io.DataInputStream.read():149
    com.dremio.exec.store.dfs.FSDataInputStreamWrapper$WrappedInputStream.read():247
    com.dremio.exec.store.dfs.FSDataInputStreamWithStatsWrapper$WrappedInputStream.read():127
    java.io.DataInputStream.read():100
    org.apache.parquet.hadoop.util.CompatibilityUtil.getBuf():109
    com.dremio.parquet.pages.BaseReaderIterator.readFully():157
    com.dremio.parquet.pages.SmartPRI.getPage():100
    com.dremio.parquet.pages.MemoizingPageIterator.getPage():41
    com.dremio.parquet.pages.PageIterator.nextPage():118
    com.dremio.parquet.pages.PageIterator.hasNextPage():63
    com.dremio.parquet.reader.column.generics.BitSimpleReader.evalNextBatch():85
    com.dremio.parquet.reader.SimpleRowGroupReader.eval():39
    com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.next():296
    com.dremio.exec.store.parquet.UnifiedParquetReader.next():220
    com.dremio.sabot.op.scan.ScanOperator.outputData():208
    com.dremio.sabot.driver.SmartOp$SmartProducer.outputData():518
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():82
    com.dremio.sabot.driver.Pipeline.pumpOnce():72
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():291
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():287
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1807
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():244
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$800():84
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():580
    com.dremio.sabot.task.AsyncTaskWrapper.run():107
    com.dremio.sabot.task.slicing.SlicingThread.run():71

bumping this to see if anyone knows what’s up

Hey there,

Have you checked out our docs: https://docs.dremio.com/data-sources/s3.html

They describe the setting you need to use to increase the timeout: fs.s3a.connection.maximum

Christy

I have seen this thanks, however this isn’t for a data source this is a timeout reading an acceleration file stored on s3.

Could potentially be a similar issue to this thread - Reflection in AWS S3 is slow? store in EBS?

We are currently tracking an internal improvement for s3 storage. However, it may be worth investigating on your side if they are in the same (or even just similar) region for improved network topology.

This post maybe is related to that one: Parquet metadata error - is Parquet v2.0 file format supported?