Dremio is not able to infer the complete schema from gz compressed json files

Hey guys,

Today we have experienced a strange issue with Dremio. We were trying to create VDS on a s3 location which is gz compresses and with json schema type.

The json record actually have 12 fields/columns but Dremio is inferring just 8 columns from the dataset.

So my assumption is that, since Dremio infers schema based on the sample dataset and it might happen that, the sample json won’t have all those 4 fields when the values are null (By default the gson won’t serialize null values to json) and hence we are only getting 8 fields in our table.

How can we force Dremio to infer schema from all of these fields.

Or can we create the virtual dataset externally by giving DDL (create VDS command) with schema?



What happens if you do a run of the dataset?