I have a s3 folder which contains 3 years worth of data in JSON format. I tried to create a Physical Dataset based on this folder but I can see only a part of the total fields the gathered JSON have. Indeed, we added, during the time, new fields in our data, leading to differences in the number of keys in the JSON files. We only append new keys in our JSONs, we did not remove any data.
For instance we introduced a new field in 2022 called
visitor_id, but this field is not being picked up when Dremio infers the schema.
How can we proceed to have Dremio pick all the possibles columns (meaning all the keys in all the JSON files) ? I already tried to refresh the metadata of the physical dataset but I did not work.
Is it because there is a split limitations in S3 for data other than Parquet, Iceberg and Delta Lake ?