Hi there,
It appears that Dremio 4.1.8 cannot read Parquet file produced by Arrow 1.15.0 that contains array columns. See the error below.
2020-03-27 15:19:23,071 [FABRIC-rpc-event-queue] INFO c.d.sabot.exec.FragmentExecutors - Received remote fragment start instruction for 2181e684-e42b-9e78-7a9c-e7350dccee00:0:0
2020-03-27 15:19:23,104 [e0 - 2181e684-e42b-9e78-7a9c-e7350dccee00:frag:0:0] INFO c.d.p.r.c.g.ColumnDecodingTracer - User Error Occurred [ErrorId: 2b018206-03be-49ab-8c54-67b4b1c478be]
com.dremio.common.exceptions.UserException: Failed to decode column bigint::int64
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:776) ~[dremio-common-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.parquet.reader.column.generics.ColumnDecodingTracer.addStatusAndPrepareException(ColumnDecodingTracer.java:95) [dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.parquet.reader.column.generics.BigIntSimpleReader.evalNextBatch(BigIntSimpleReader.java:129) [dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.parquet.reader.column.generics.UnionColumnReaderWrapper.evalNextBatch(UnionColumnReaderWrapper.java:33) [dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.parquet.reader.SimpleRowGroupReader.eval(SimpleRowGroupReader.java:39) [dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.extra.exec.store.dfs.parquet.ParquetVectorizedReader.next(ParquetVectorizedReader.java:297) [dremio-ce-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.exec.store.parquet.UnifiedParquetReader.next(UnifiedParquetReader.java:237) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.op.scan.ScanOperator.outputData(ScanOperator.java:236) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:521) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:109) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:99) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:336) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:285) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$1200(FragmentExecutor.java:92) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:674) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:104) [dremio-sabot-kernel-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.task.slicing.SlicingThread.mainExecutionLoop(SlicingThread.java:226) [dremio-ce-sabot-scheduler-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:156) [dremio-ce-sabot-scheduler-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
Caused by: java.lang.RuntimeException: Error decoding column bigint::int64 at index 0.
at com.dremio.parquet.reader.column.generics.BigIntSimpleDictionaryPageReader.read(BigIntSimpleDictionaryPageReader.java:160) ~[dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.parquet.reader.column.generics.BigIntSimpleReader.evalNextBatch(BigIntSimpleReader.java:108) [dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
... 16 common frames omitted
Caused by: java.lang.IndexOutOfBoundsException: index: 16, length: 8 (expected: range(0, 16))
at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335) ~[arrow-memory-0.15.0-20200308131102-f6b34ee198-dremio.jar:4.1.45.Final]
at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322) ~[arrow-memory-0.15.0-20200308131102-f6b34ee198-dremio.jar:4.1.45.Final]
at io.netty.buffer.ArrowBuf.getLong(ArrowBuf.java:348) ~[arrow-memory-0.15.0-20200308131102-f6b34ee198-dremio.jar:4.1.45.Final]
at com.dremio.parquet.reader.column.generics.ValueWriters.readWriteValue(ValueWriters.java:124) ~[dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.parquet.reader.column.generics.BigIntSimpleDictionaryPageReader.writeFromRLEFromValuesDecoder(BigIntSimpleDictionaryPageReader.java:328) ~[dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.parquet.reader.column.generics.BigIntSimpleDictionaryPageReader.writeFromRLE(BigIntSimpleDictionaryPageReader.java:297) ~[dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
at com.dremio.parquet.reader.column.generics.BigIntSimpleDictionaryPageReader.read(BigIntSimpleDictionaryPageReader.java:146) ~[dremio-ce-parquet-plugin-4.1.8-202003120636020140-9c2a6b13.jar:4.1.8-202003120636020140-9c2a6b13]
... 17 common frames omitted
2020-03-27 15:19:23,134 [out-of-band-observer] INFO query.logger - {"queryId":"2181e684-e42b-9e78-7a9c-e7350dccee00","schema":"[NAS]","queryText":"SELECT * FROM \"parquet_cpp_example2.parquet\"","start":1585322362958,"finish":1585322363127,"outcome":"FAILED","username":"akravchenko"}
2020-03-27 15:19:23,154 [FABRIC-rpc-event-queue] WARN c.d.exec.work.foreman.AttemptManager - Dropping request to move to COMPLETED state as query is already at FAILED state (which is terminal).
Here is a minimal Parquet file (created by using https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer2.cc) that reproduces the issue: https://www.dropbox.com/s/swd3s98v0onavlj/parquet_cpp_example2.parquet?dl=0 .
Also I suspect that Arrow 1.15.0 memory leak reported in ARROW-6874 is related to this issue and affects Dremio 4.1.8.
Apache Spark 2.4.4 reads this Parquet file without issue.
Anton