@balaji.ramaswamy I think I’ve found how to recreate the issue.
I built commit d4c98e76e29d84e13508f6802df4acc5e38b1008 locally and was able to get DEBUG output and found out that the partitioning columns of the file was being reordered when we cleansed the data. Also, since store.parquet.partition_column_limit
defaults to 25 the many many partition columns are trimmed and un-luckily, the data I most recently sent you, trimmed out the columns that were the issue.
So, the first thing to do is set store.parquet.partition_column_limit = 300
. This should cover the file I’m attaching here that reproduces the error.
Again, the issue comes from the com.dremio.exec.store.dfs.MetadataUtils.toPartitionValue
method being unable to process a partition column that is a SMALLINT
.
I think this is a bug, but I’ll leave that up to the experts. If it is, I can make the change necessary by adding in the code to the com.dremio.exec.store.dfs.MetadataUtils.toPartitionValue
method, but I’m unsure of what that will break. So, some guidance would be appreciated.
problem.snappy.parquet.zip (250.2 KB)