DATA_READ ERROR: AsyncTimeoutException

I am getting this error on the column primary_key.

The column is an MD5() column

00:19:58    Runtime Error in model fct_daily_hk_stock_shareholding_history (models/l1-hk-trading/common/fact/stock_shareholding/fct_daily_hk_stock_shareholding_history.sql)
  ERROR: DATA_READ ERROR: Failed to decode column primary_key::varchar
 
  Total records decoded and sent upstream 1023744
  Normal value encoded pages read 410
  DICTIONARY encoded pages read 0
  Current page encoding PLAIN
  Total records decoded in current page and sent upstream after passing filter 3968
  File path /finance/l0/hkex_ccass_participant_stock_shareholding_history/195ae62a-35e0-e624-3161-336394cb7d00/2_10_0.parquet
  Rowgroup index 0
  SqlOperatorImpl TABLE_FUNCTION
  Location 2:6:11
  Fragment 2:0
 
  [Error Id: 6d21ce02-9c8b-4f8b-b3d1-2a91b627ad54 on dremio-executor-0.dremio-cluster-pod.dremio-v2-prod.svc.cluster.local:0]
 
    (java.io.IOException) com.dremio.io.AsyncByteReaderWithTimeout$AsyncTimeoutException
      com.dremio.parquet.pages.async.SlidingWindowReader.readFully():231
      com.dremio.parquet.pages.BaseReaderIterator.readFully():249
      com.dremio.parquet.pages.IncrementalPageReaderIterator.getPage():180
      com.dremio.parquet.pages.MemoizingPageIterator.getPage():45
      com.dremio.parquet.pages.PageIterator.nextPage():184
      com.dremio.parquet.pages.PageIterator.hasNextPage():110
      com.dremio.parquet.reader.column.generics.VarCharSimpleReader.evalNextBatch():142
      com.dremio.parquet.reader.SimpleRowGroupReader.eval():36
      com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.next():1098
      com.dremio.exec.store.parquet.UnifiedParquetReader.readEnsuringReadersReturnSameNumberOfRecords():447
      com.dremio.exec.store.parquet.UnifiedParquetReader.next():409
      com.dremio.exec.store.parquet.TransactionalTableParquetReader.next():200
      com.dremio.exec.store.parquet.ParquetCoercionReader.next():138
      com.dremio.exec.store.parquet.ScanTableFunction.processRow():217
      com.dremio.sabot.op.tablefunction.TableFunctionOperator.outputData():114
      com.dremio.sabot.driver.SmartOp$SmartSingleInput.outputData():243
      com.dremio.sabot.driver.StraightPipe.pump():55
      com.dremio.sabot.driver.Pipeline.doPump():134
      com.dremio.sabot.driver.Pipeline.pumpOnce():124
      com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():647
      com.dremio.sabot.exec.fragment.FragmentExecutor.run():556
      com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():1213
      com.dremio.sabot.task.AsyncTaskWrapper.run():130
      com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():278
      com.dremio.sabot.task.slicing.SlicingThread.run():185
    Caused By (com.dremio.io.AsyncByteReaderWithTimeout.AsyncTimeoutException) null
      com.dremio.io.AsyncByteReaderWithTimeout.lambda$within$1():84
      java.util.concurrent.FutureTask.run():264
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run():304
      java.util.concurrent.ThreadPoolExecutor.runWorker():1128
      java.util.concurrent.ThreadPoolExecutor$Worker.run():628
      java.lang.Thread.run():829

Attached is the profile.

26b59923-8114-4875-a1ce-3bd149be7f0c.zip (88.9 KB)

======================

Not sure if there’s a configuration which I can use to extend the Timeout Issue here.

dremio-oss/common/legacy/src/main/java/com/dremio/io/AsyncByteReaderWithTimeout.java at 12d7a954966ee5782921a4fbe32cc4568cbf631b · dremio/dremio-oss · GitHub

I kinda fixed the issue by unchecking Enable asynchronous access when possible

At least the issue does not show up anymore after the fix :man_shrugging:

Found that the root cause is that I did not allocate enough memory to the VM Running MinIO, the object store which stores the Iceberg tables read by Dremio.

After increasing the VM’s memory from 4GB to 8GB, the error is gone.

@Ken The failed to decode is just the top level exception, kindly check Async again as that will cause performance issues on large scans as it will incur wait times. The reason you got the error is that MinIO did not respond within 5 seconds for a scan request.On an executor 8 GB of memory is also very low. querues may run OOM with 8 GB of RAM

Thanks @balaji.ramaswamy!

My dremio executor has always had at least 32GB to work with.


What I meant was the VM hosting MinIO, the object store which dremio accesses for data.

After I increase MinIO’s memory to 8GB, the error is gone.

Based on your explanation, I suppose the wait time for a response from Minio is reduced (as more Memory is allocated to MinIO).

I have also re-checked the async mode.

It is all good now.

Thanks!