Add store.json or store.parquet option for omit_nulls

david.lee · March 21, 2018, 3:33pm

When reading JSON can we have an option to omit null values?

I think a lot of the schema change issues is the result of having null values in JSON files.

Example: Two JSON files with addresses

a.jsonl
{“address”: “1 Lombard Street”, “city”: “San Francisco”, “phone”: “415-111-1111”}
{“address”: “2 Market Street”, “city”: “San Francisco”, “phone”: “415-222-2222”}

b.jsonl
{“address”: “3 Kearny Street”, “city”: “San Francisco”, “phone”: null}
{“address”: “4 Bush Street”, “city”: “San Francisco”, “phone”: null

I believe if a.jsonl is converted to a.parquet that phone will end up as a string column in the parquet file.
I believe if b.jsonl is converted to b.parquet that phone will end up as a int column in the parquet file.

Then if you try to read both files at the same time in the same directory it creates a schema change / inconsistency issue.

Having the option to exclude null values when reading JSON files should at least get rid of column data type inconsistencies.

dealercrm · March 21, 2018, 4:04pm

Yes please!

The alternative is storing all null values as empty strings which then poses a problem if other values are integers. Also reporting a schema change inconsistency error.

I have been chasing and fighting this ghost with the convert_from(data, ‘JSON’) AS data function.

Also note that this function expects the schema to be exactly the same for every record. Therefore if you have two records like this, the query and any acceleration will fail.

{“address”: “3 Kearny Street”, “city”: “San Francisco”, “phone”: null}
{“address”: “4 Bush Street”, “city”: “San Francisco”}

Topic		Replies	Views
Error: Null values are not supported in lists by default. Please set `store.json.all_text_mode` to true to read lists containing nulls. Be advised that this will treat JSON null values as a string containing the word 'null'	3	1108	October 29, 2021
Json data : null and missing fields	5	6342	July 2, 2018
Null values are not supported in lists by default. Please set `store.json.all_text_mode` to true to read lists containing nulls. Be advised that this will treat JSON null values as a string containing the word 'null'	19	4578	June 20, 2018
Json data: Dremio ignores attributes where values are null	0	21	February 27, 2025
Add_files and schema.name-mapping.default	4	217	September 7, 2024

Add store.json or store.parquet option for omit_nulls

Related topics