Parquet Logical Type Support

mitchell.davis · January 21, 2019, 11:45pm

@balaji.ramaswamy I think I’ve found how to recreate the issue.

I built commit d4c98e76e29d84e13508f6802df4acc5e38b1008 locally and was able to get DEBUG output and found out that the partitioning columns of the file was being reordered when we cleansed the data. Also, since store.parquet.partition_column_limit defaults to 25 the many many partition columns are trimmed and un-luckily, the data I most recently sent you, trimmed out the columns that were the issue.

So, the first thing to do is set store.parquet.partition_column_limit = 300. This should cover the file I’m attaching here that reproduces the error.

Again, the issue comes from the com.dremio.exec.store.dfs.MetadataUtils.toPartitionValue method being unable to process a partition column that is a SMALLINT.

I think this is a bug, but I’ll leave that up to the experts. If it is, I can make the change necessary by adding in the code to the com.dremio.exec.store.dfs.MetadataUtils.toPartitionValue method, but I’m unsure of what that will break. So, some guidance would be appreciated.

problem.snappy.parquet.zip (250.2 KB)

mitchell.davis · January 28, 2019, 5:54pm

@balaji.ramaswamy I know you’re busy, but any thoughts on this?

balaji.ramaswamy · January 29, 2019, 5:27am

Hi @mitchell.davis

Seems like we do not support SMALLINT types. I tested via Hive but Hive internally converts as parquet-tools schema shows the column as INT. I saw that the column in your parquet show as INT16 and that reproduces the error. I assume these are generated by Spark. I will open an internal request on this

Thanks
@balaji.ramaswamy

mitchell.davis · February 18, 2019, 12:45pm

Thank you @balaji.ramaswamy.

Any update on a timeline for this fix?

ben · February 19, 2019, 8:02pm

Hi @mitchell.davis

We’re assessing the different options on how to handle this particular type. I’ll update you ASAP when we have a firmer timeline for a possible code fix.

Thanks,
@ben

mitchell.davis · April 16, 2019, 4:26pm

Any update on this @ben?

Topic		Replies	Views
Parquet metadata error - is Parquet v2.0 file format supported?	3	4095	January 28, 2019
Regression in parquet reader in version 3.3.1	27	3199	October 1, 2019
Dremio can not read Parquet produced by Arrow	0	1467	March 27, 2020
Dremio OS hive metadata error (Windows)	5	1391	November 9, 2018
Errors reading parquet files	4	1370	April 1, 2022

Parquet Logical Type Support

Related topics