Errors reading parquet files

We are seeing errors (see below) when queries/reflections try to read particular partitions on S3.
If I filter out those partitions from the queries they run successfully.

Only difference I am noticing is that failing partitions have larger files than working ones (~200MB compared to ~500MB).
Does Dremio have some sort of limit on file sizes? Reading these files using other tools works just fine (Thrift, Trino, etc.)

If you have any idea on what is going on here, advise is greatly appreciated.


SqlOperatorImpl PARQUET_ROW_GROUP_SCAN
Location 2:12:4
Fragment 2:0

[Error Id: f3ec9ab1-5101-4ced-9862-306514879705 on 64f5aa287589:0]

  (java.lang.IndexOutOfBoundsException) Index: 2, Size: 1
    java.util.ArrayList.rangeCheck():659
    java.util.ArrayList.get():435
    com.dremio.exec.store.parquet.ParquetSplitReaderCreator.lambda$createInputStreamProvider$3():211
    com.dremio.exec.store.dfs.SplitReaderCreator.handleEx():165
    com.dremio.exec.store.parquet.ParquetSplitReaderCreator.createInputStreamProvider():195
    com.dremio.exec.store.parquet.ParquetSplitReaderCreator.lambda$new$0():96
    com.dremio.exec.store.parquet.ParquetSplitReaderCreator.createRecordReader():234
    com.dremio.exec.store.dfs.PrefetchingIterator.next():65
    com.dremio.exec.store.dfs.PrefetchingIterator.next():37
    com.dremio.sabot.op.scan.ScanOperator.outputData():352
    com.dremio.sabot.driver.SmartOp$SmartProducer.outputData():521
    com.dremio.sabot.driver.StraightPipe.pump():56
    com.dremio.sabot.driver.Pipeline.doPump():108
    com.dremio.sabot.driver.Pipeline.pumpOnce():98
    com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():345
    com.dremio.sabot.exec.fragment.FragmentExecutor.run():294
    com.dremio.sabot.exec.fragment.FragmentExecutor.access$1600():94
    com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():747
    com.dremio.sabot.task.AsyncTaskWrapper.run():112
    com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop():243
    com.dremio.sabot.task.slicing.SlicingThread.run():171


SqlOperatorImpl PARQUET_ROW_GROUP_SCAN
Location 2:12:4
Fragment 2:0

...(:0)
com.dremio.exec.store.parquet.ParquetSplitReaderCreator(ParquetSplitReaderCreator.java:211)
com.dremio.exec.store.dfs.SplitReaderCreator(SplitReaderCreator.java:165)
com.dremio.exec.store.parquet.ParquetSplitReaderCreator(ParquetSplitReaderCreator.java:195)
com.dremio.exec.store.parquet.ParquetSplitReaderCreator(ParquetSplitReaderCreator.java:96)
com.dremio.exec.store.parquet.ParquetSplitReaderCreator(ParquetSplitReaderCreator.java:234)
com.dremio.exec.store.dfs.PrefetchingIterator(PrefetchingIterator.java:65)
com.dremio.exec.store.dfs.PrefetchingIterator(PrefetchingIterator.java:37)
com.dremio.sabot.op.scan.ScanOperator(ScanOperator.java:352)
com.dremio.sabot.driver.SmartOp$SmartProducer(SmartOp.java:521)
com.dremio.sabot.driver.StraightPipe(StraightPipe.java:56)
com.dremio.sabot.driver.Pipeline(Pipeline.java:108)
com.dremio.sabot.driver.Pipeline(Pipeline.java:98)
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper(FragmentExecutor.java:345)
com.dremio.sabot.exec.fragment.FragmentExecutor(FragmentExecutor.java:294)
com.dremio.sabot.exec.fragment.FragmentExecutor(FragmentExecutor.java:94)
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl(FragmentExecutor.java:747)
com.dremio.sabot.task.AsyncTaskWrapper(AsyncTaskWrapper.java:112)
com.dremio.sabot.task.slicing.SlicingThread(SlicingThread.java:243)
com.dremio.sabot.task.slicing.SlicingThread(SlicingThread.java:171)
      ```

I am not seeing these errors anymore, no changes were made, it just started working out of the blue…
I am still interested in know what could have caused this issue.

@GoldenGoose

Is it possible the files that Dremio was reading were regenerated?

@balaji.ramaswamy No, our files are only written to once, consider it as if they were in WORM storage.

Tested this on version 20.0.0-202201050826310141-8cc7162b and I am not able to replicate on that version, I am however able to replicate this on version 20.1.0-202202061055110045-36733c65 on a completely new cluster.
In progress of downgrading our prod cluster now.

@GoldenGoose Do you have both the profiles? successful one from 20.0 and failed one from 20.1