The executor logs have a high number of WARN messages containing: Exception from createPageListFromColumnAndOffsetIndexjava.lang.ArrayIndexOutOfBoundsException
I traced the error to the closed source plugin in dremio-ce-parquet-plugin-13.0.0-202101272034330307-20fb9275.jar. (The code hasn’t changed in dremio-ce-parquet-plugin-17.0.0-202107060524010627-31b5222b.jar)
This error occurs when the UnifiedParquetReader attempts to gather statistics on the parquet file by calling RowGroups.getRowGroupMetadataFor(). The exception gets thrown from this line: RowGroups.java:163 - statistics.setMinMaxFromBytes(minValue, maxValue); when there are no min/max statistics for columns where it is expected. This occurs when the first page contains all nulls. The line ++columnArrayIndex; is inside the if block, so the columnArrayIndex doesn’t advance when page-0 is null
This was the output from: parquet-tools column-index someParquetFile.parquet
for the page that throws the error.
column index for column someColumn:
Boudary order: ASCENDING
null count min max
page-0 20000
page-1 19589 1.0 200.0
page-2 8916 1.0 200.0