Parquet Schema Error @ OSS-only build

There is a strange behavior / error which could be reproduced on Dremio-OSS (v15.5.0 build with Ddremio.oss-only = true).

test scenario:

  • Dremio OSS (v15.5.0 build with Ddremio.oss-only = true) running as docker container (coordinator + executer on one node)
  • Parquet files @ AWS S3 bucket
  • SQL queries using where clause

Error: New schema found. Please reattempt the query. Multiple attempts may be necessary to fully learn the schema.

Hints:

  • Multiple attempts does not help
  • ALTER PDS xxx REFRESH METADATA does not help
  • If I am using the community edition (available @ DockerHub) or build without Ddremio.oss-only flag everything is work (and it becomes faster, because vectorized reader are used).

It seems to me that the OSS version does not really work with basic parquet files. Are there any specific requirements / limits on parquet file content which have to be fulfilled I want to use Dremio core (Apache 2.0 licensed).

Thanks for your input in advance.
94d5f90b-d4a9-48c2-b53a-c6c2ab6890f8.zip (194.3 KB)

@schul_mi It seems like parquet files under this certain folder have heterogenous schema, is that true?

@balaji.ramaswamy I am sure that those parquet files under this folder are homogenous.
On one hand the community edition of Dremio can manage those file and on the other hand the same version build with OSS flag (Ddremio.oss-only = true) cannot. Is this a bug or a feature that differ those editions?

The strange behaviour from my point of view is:

  • if I perform the SQL command (ALTER PDS … REFRESH METADATA) → no error occured
  • if I setup the table (format folder into PDS) → no error occured
  • if I use standard SQL command (SELECT * FROM table) → no error occured
  • but if I add where clauses to the SQL command → this error - hetoregenous schema - occured

Are there any kind of limitations in the handling of parquet files?

@schul_mi Are the Community edition and OSS build on the same Dremio version?

@balaji.ramaswamy both versions are the same. (latest on github GitHub - dremio/dremio-oss: Dremio - the missing link in modern data → 15.5.0). Only Ddremio.oss-only=true flag is added to MAVEN build in case of OSS version.