IndexOutOfBoundsException while reading few parquet tables

The dremio setup we have is with below details:-
Dremio build :- 2.0.5-201806021755260067-767cfb5-mapr Community edition

We are able to fetch data from several hive external tables with underlying parquet data in hdfs. However while querying few hive external tables we see IndexOutOfBoundsException being thrown. Stacktrace details as below. On the same tables on which we get this exception we are able to do count(*) though. Also the same queries on these hive tables are running successfully from hive shell. Please let us know if any further information is needed to rectify this issue.

Questions:-

  1. Any memory buffer configurations which can overcome this issue.
  2. Any difference in parquet jars version used by dremio versus what is used by the parquet files in the underlying hive external table which can cause this issue

2018-09-05 15:21:14,014 [e0 - 24700b96-5149-d5b1-b210-df1fcef3aa00:frag:0:0] INFO c.d.e.s.parquet2.ParquetRowiseReader - User Error Occurred [ErrorId: f763bb78-bfde-4ed7-b69a-ecacdeabe072]
com.dremio.common.exceptions.UserException: Failed to read data from parquet file
at com.dremio.common.exceptions.UserException$Builder.build(UserException.java:746) ~[dremio-common-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.exec.store.parquet2.ParquetRowiseReader.next(ParquetRowiseReader.java:380) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.exec.store.parquet.UnifiedParquetReader.next(UnifiedParquetReader.java:220) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.exec.store.hive.exec.FileSplitParquetRecordReader.next(FileSplitParquetRecordReader.java:178) [dremio-hive-plugin-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.next(AdditionalColumnsRecordReader.java:83) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.op.scan.ScanOperator.outputData(ScanOperator.java:208) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.driver.SmartOp$SmartProducer.outputData(SmartOp.java:518) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.driver.StraightPipe.pump(StraightPipe.java:56) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.driver.Pipeline.doPump(Pipeline.java:82) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.driver.Pipeline.pumpOnce(Pipeline.java:72) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:291) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run(FragmentExecutor.java:287) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_181]
at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_181]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) [hadoop-common-2.7.0-mapr-1703.jar:na]
at com.dremio.sabot.exec.fragment.FragmentExecutor.run(FragmentExecutor.java:244) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.exec.fragment.FragmentExecutor.access$800(FragmentExecutor.java:84) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run(FragmentExecutor.java:580) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.task.AsyncTaskWrapper.run(AsyncTaskWrapper.java:107) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
at com.dremio.sabot.task.slicing.SlicingThread.run(SlicingThread.java:71) [dremio-extra-sabot-scheduler-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
Caused by: java.lang.IndexOutOfBoundsException: null
at io.netty.buffer.EmptyByteBuf.checkIndex(EmptyByteBuf.java:1054) ~[netty-buffer-4.1.17.Final.jar:4.1.17.Final]
at io.netty.buffer.EmptyByteBuf.setBytes(EmptyByteBuf.java:487) ~[netty-buffer-4.1.17.Final.jar:4.1.17.Final]
at io.netty.buffer.DuplicatedByteBuf.setBytes(DuplicatedByteBuf.java:337) ~[netty-buffer-4.1.17.Final.jar:4.1.17.Final]
at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:478) ~[netty-buffer-4.1.17.Final.jar:4.1.17.Final]
at io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34) ~[arrow-memory-0.8.0-201804280314010062-9a17ead-dremio203.jar:4.1.17.Final]
at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:937) ~[arrow-memory-0.8.0-201804280314010062-9a17ead-dremio203.jar:4.1.17.Final]
at org.apache.arrow.vector.BaseNullableVariableWidthVector.setBytes(BaseNullableVariableWidthVector.java:1203) ~[arrow-vector-0.8.0-201804280314010062-9a17ead-dremio203.jar:0.8.0-201804280314010062-9a17ead-dremio203]
at org.apache.arrow.vector.BaseNullableVariableWidthVector.fillHoles(BaseNullableVariableWidthVector.java:1190) ~[arrow-vector-0.8.0-201804280314010062-9a17ead-dremio203.jar:0.8.0-201804280314010062-9a17ead-dremio203]
at org.apache.arrow.vector.BaseNullableVariableWidthVector.setValueCount(BaseNullableVariableWidthVector.java:879) ~[arrow-vector-0.8.0-201804280314010062-9a17ead-dremio203.jar:0.8.0-201804280314010062-9a17ead-dremio203]
at org.apache.arrow.vector.complex.MapVector.setValueCount(MapVector.java:312) ~[arrow-vector-0.8.0-201804280314010062-9a17ead-dremio203.jar:0.8.0-201804280314010062-9a17ead-dremio203]
at org.apache.arrow.vector.complex.impl.SingleMapWriter.setValueCount(SingleMapWriter.java:339) ~[arrow-vector-0.8.0-201804280314010062-9a17ead-dremio203.jar:0.8.0-201804280314010062-9a17ead-dremio203]
at org.apache.arrow.vector.complex.impl.VectorContainerWriter.setValueCount(VectorContainerWriter.java:90) ~[dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:0.8.0-201804280314010062-9a17ead-dremio203]
at com.dremio.exec.store.parquet2.ParquetRowiseReader.next(ParquetRowiseReader.java:360) [dremio-sabot-kernel-2.0.5-201806021755260067-767cfb5-mapr.jar:2.0.5-201806021755260067-767cfb5-mapr]
… 18 common frames omitted

  1. Can you provide the query profile when you try to read the table?
  2. Can you share the output of DESCRIBE FORMATTED <table> from Hive CLI?
  3. Can you describe the underlying pqt files? Compression? Page size? etc etc

Thanks for response Anthony,

Hive table has 119 columns + 1 partition column.

it has 2 partitions. below information got it from parquet-tools jar
partition 1 : 119 columns + 1 partition
partition 2 : 116 columns + 1 partition

with above schema changes will Dremio parquet reader supports. do we have any advance setting available to fix this,

Rowgroup index 0
Delta vector present, size (if present) no
Max no. of rows trying to read 1023
No. of rows read so far in current iteration 1023
No. of rows read so far in current rowgroup 1023
Max no. rows in current rowgroup 109780

Log from query profile:

[Error Id: 671461da-ee6e-4f55-9050-e060c4ef059e on 10.244.10.204:-1]

(java.lang.IndexOutOfBoundsException) null
io.netty.buffer.EmptyByteBuf.checkIndex():1054
io.netty.buffer.EmptyByteBuf.setBytes():487
io.netty.buffer.DuplicatedByteBuf.setBytes():337
io.netty.buffer.WrappedByteBuf.setBytes():478
io.netty.buffer.UnsafeDirectLittleEndian.setBytes():34
io.netty.buffer.ArrowBuf.setBytes():937
org.apache.arrow.vector.BaseNullableVariableWidthVector.setBytes():1203
org.apache.arrow.vector.BaseNullableVariableWidthVector.fillHoles():1190
org.apache.arrow.vector.BaseNullableVariableWidthVector.setValueCount():879
org.apache.arrow.vector.complex.MapVector.setValueCount():312
org.apache.arrow.vector.complex.impl.SingleMapWriter.setValueCount():339
org.apache.arrow.vector.complex.impl.VectorContainerWriter.setValueCount():90
com.dremio.exec.store.parquet2.ParquetRowiseReader.next():360
com.dremio.exec.store.parquet.UnifiedParquetReader.next():220
com.dremio.exec.store.hive.exec.FileSplitParquetRecordReader.next():178
com.dremio.exec.store.dfs.implicit.AdditionalColumnsRecordReader.next():83
com.dremio.sabot.op.scan.ScanOperator.outputData():208
com.dremio.sabot.driver.SmartOp$SmartProducer.outputData():518
com.dremio.sabot.driver.StraightPipe.pump():56
com.dremio.sabot.driver.Pipeline.doPump():82
com.dremio.sabot.driver.Pipeline.pumpOnce():72
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():291
com.dremio.sabot.exec.fragment.FragmentExecutor$DoAsPumper.run():287
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
com.dremio.sabot.exec.fragment.FragmentExecutor.run():244
com.dremio.sabot.exec.fragment.FragmentExecutor.access$800():84
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():580
com.dremio.sabot.task.AsyncTaskWrapper.run():107
com.dremio.sabot.task.slicing.SlicingThread.run():u7121:

Can you reattach the query profile? It didn’t go through.
Also, sorry but I’m a bit confused when you said “with above schema changes will Dremio parquet reader supports” - can you clarify what works and what doesn’t work? Dremio does support (multiple) partitions.

hive parquet table has different schema in hive partitions, as mentioned , partition1 has 119 columns and partition2 has 116. if i removed one partition dremio able to read hive parquet table.

Hope this helps you understand,

Thanks for explaining this on our call last week. We are currently investigating internally.