Dremio can not read Parquet produced by Arrow

Hi there,

It appears that Dremio 4.1.8 cannot read Parquet file produced by Arrow 1.15.0 that contains array columns. See the error below.

2020-03-27 15:19:23,071 [FABRIC-rpc-event-queue] INFO  c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 2181e684-e42b-9e78-7a9c-e7350dccee00:0:0
2020-03-27 15:19:23,104 [e0 - 2181e684-e42b-9e78-7a9c-e7350dccee00:frag:0:0] INFO  c.d.p.r.c.g.ColumnDecodingTracer - User Error Occurred [ErrorId: 2b018206-03be-49ab-8c54-67b4b1c478be]
com.dremio.common.exceptions.UserException: Failed to decode column bigint::int64
        at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:776) ~[dremio-common-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.parquet.reader.column.generics.ColumnDecodingTracer.addStatusAndPrepareException(ColumnDecodingTracer.java:95) [dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.parquet.reader.column.generics.BigIntSimpleReader.evalNextBatch(BigIntSimpleReader.java:129) [dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.parquet.reader.column.generics.UnionColumnReaderWrapper.evalNextBatch(UnionColumnReaderWrapper.java:33) [dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.parquet.reader.SimpleRowGroupReader.eval(SimpleRowGroupReader.java:39) [dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.next(ParquetVectorizedReader.java:297) [dremio-ce-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.exec.store.parquet.UnifiedParquetReader.next(UnifiedParquetReader.java:237) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.op.scan.ScanOperator.outputData(ScanOperator.java:236) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:521) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:109) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:99) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:336) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:285) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1200(FragmentExecutor.java:92) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:674) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:104) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:226) [dremio-ce-sabot-scheduler-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:156) [dremio-ce-sabot-scheduler-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
Caused by: java.lang.RuntimeException: Error decoding column bigint::int64 at index 0.
        at com.dremio.parquet.reader.column.generics.BigIntSimpleDictionaryPageReader.read(BigIntSimpleDictionaryPageReader.java:160) ~[dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.parquet.reader.column.generics.BigIntSimpleReader.evalNextBatch(BigIntSimpleReader.java:108) [dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        ... 16 common frames omitted
Caused by: java.lang.IndexOutOfBoundsException: index: 16, length: 8 (expected: range(0, 16))
        at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335) ~[arrow-memory-0.15.0-20200308131102-f6b34ee198-dremio.jar:4.1.45.Final]
        at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322) ~[arrow-memory-0.15.0-20200308131102-f6b34ee198-dremio.jar:4.1.45.Final]
        at io.netty.buffer.ArrowBuf.getLong(ArrowBuf.java:348) ~[arrow-memory-0.15.0-20200308131102-f6b34ee198-dremio.jar:4.1.45.Final]
        at com.dremio.parquet.reader.column.generics.ValueWriters.readWriteValue(ValueWriters.java:124) ~[dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.parquet.reader.column.generics.BigIntSimpleDictionaryPageReader.writeFromRLEFromValuesDecoder(BigIntSimpleDictionaryPageReader.java:328) ~[dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.parquet.reader.column.generics.BigIntSimpleDictionaryPageReader.writeFromRLE(BigIntSimpleDictionaryPageReader.java:297) ~[dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        at com.dremio.parquet.reader.column.generics.BigIntSimpleDictionaryPageReader.read(BigIntSimpleDictionaryPageReader.java:146) ~[dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
        ... 17 common frames omitted
2020-03-27 15:19:23,134 [out-of-band-observer] INFO  query.logger - {"queryId":"2181e684-e42b-9e78-7a9c-e7350dccee00","schema":"[NAS]","queryText":"SELECT * FROM \"parquet_cpp_example2.parquet\"","start":1585322362958,"finish":1585322363127,"outcome":"FAILED","username":"akravchenko"}
2020-03-27 15:19:23,154 [FABRIC-rpc-event-queue] WARN  c.d.exec.work.foreman.AttemptManager - Dropping request to move to COMPLETED state as query is already at FAILED state (which is terminal).

Here is a minimal Parquet file (created by using https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer2.cc) that reproduces the issue: https://www.dropbox.com/s/swd3s98v0onavlj/parquet_cpp_example2.parquet?dl=0 .

Also I suspect that Arrow 1.15.0 memory leak reported in ARROW-6874 is related to this issue and affects Dremio 4.1.8.

Apache Spark 2.4.4 reads this Parquet file without issue.

Anton