Null values are not supported in lists by default. Please set `store.json.all_text_mode` to true to read lists containing nulls. Be advised that this will treat JSON null values as a string containing the word 'null'

Hai_Pham · June 15, 2018, 9:10am

I downloaded this file (about 3GB):
https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.json?accessType=DOWNLOAD

Then I tried to open it by Dremio and error message was shown
Null values are not supported in lists by default. Please set store.json.all_text_mode to true to read lists containing nulls. Be advised that this will treat JSON null values as a string containing the word ‘null’.

Can you please advise how I can resolve this issue

anthony · June 15, 2018, 3:17pm

If you scroll to the bottom of http://host:9047/admin/advanced under “Dremio Support”, there will be an area to copy/paste that setting > click Show > then toggle it

Hai_Pham · June 17, 2018, 2:35am

Thank you so much, I was able to turn this option on
However, dremio is still unable to load this file, it shows error message: “Error parsing JSON - Unable to expand the buffer”
Have you ever tried to open 3Gb file, is 16GB of memory computer not enough to open?

anthony · June 17, 2018, 12:32pm

How do you have Dremio deployment? Windows app? Linux server?

Hai_Pham · June 17, 2018, 1:25pm

I used linux, specifically ubuntu 16

anthony · June 17, 2018, 2:18pm

This may need further investigation. 16gb single node is indeed a bit small, and keep in mind some of that goes to heap, not direct memory. Please note we have plenty of users working with larger files though. If I have some free time maybe I’ll try to load the dataset as well…

Hai_Pham · June 18, 2018, 3:09am

So is there any option to change heap size for Dremio?

mishai · June 18, 2018, 3:14am

Yes, you can change the dremio-env file. Here is the documentation:
http://docs.dremio.com/deployment/dremio-config.html#environment-setup

Hai_Pham · June 18, 2018, 3:27am

Thank you, but this option doesn’t resolve problem
I changed
DREMIO_MAX_HEAP_MEMORY_SIZE_MB=12000
DREMIO_MAX_DIRECT_MEMORY_SIZE_MB=12000

I checked the RAM and it looks to me Dremio didn’t use RAM to process, MemFree is always 12645252 kB

doron · June 18, 2018, 4:29am

Did you restart the node after making the change? That is required.

Hai_Pham · June 18, 2018, 5:30am

Yes, i did of course

christy · June 18, 2018, 7:24am

My first comment here is that given your 16Gb of ram, those settings are potentially too high. Try setting them to 8GB each and see what happens.

If that still doesn’t happen, it would be interesting to see what happens if you could say bisect the file into 2 1.5 GB instances. I’d like to get a better idea as to when this becomes an issue.

Hai_Pham · June 19, 2018, 1:54am

Would you mind spending a little time to load this file to make sure it works on your machine(single machine)
https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.json?accessType=DOWNLOAD
It doesn’t make sense that analyze tool cannot process 3GB file since we’re working with big data
Thank you so much

christy · June 19, 2018, 8:45am

Hey Hai,

I can see the issue. It’s due to a very deep and wide schema in the JSON file; whilst trying to schema learn, it’s coming up against an internal limit of an buffer, which doesn’t seem to be settable via the UI.

Let me talk to support. I’ll feed back once I have some more news

Christy

christy · June 19, 2018, 8:53am

Hey Hai,

Looking at the JSON file again, I noticed that the file is actually a single Object.

Dremio is fundamentally a “row” based technology. Essentially, it wants an Array of Objects, so it can treat each entry as a row. Here, Dremio is trying to fit the entire file into a single row and “schema” learn across the entire file.

I would suggest that in this instance, some pre-processing of the file is required to extract the data you want and turn it into a new file that is an array of objects. For example, I notice, there is a large meta section at the start of the file. You could strip this out and inside take the actual “row” data (found later in the file as the “data” property) and use that to create a new file.

Hai_Pham · June 19, 2018, 9:06am

Thank you so much Christy, it’s very informative
By the way, can you please advise what public large dataset I should use in order to show Dremio’s capability to work with big data file to our data scientist team
Thank you so much

christy · June 19, 2018, 12:49pm

I’m glad I could help

I often use the Yelp data: https://www.yelp.com/dataset

It’s also a few GB in size, and multiple JSON files, so you can show joins too.

Hope this helps

kelly · June 19, 2018, 7:19pm

Here is a tutorial that uses Yelp, including instructions on how to make the data available:

kelly · June 19, 2018, 7:28pm

Also, here’s a discussion on the multi-record JSON file format Christy mentioned: Feature requests + Bug reporting

Hai_Pham · June 20, 2018, 4:16am

I changed the dataset and saw Dremio on AWS t2.medium instance can load 4GB file perfectly
Thank you so much

Topic		Replies	Views
Can Dremio process Big Data actually?	10	1482	June 19, 2018
Attempting to write a too large value for field with index	29	6524	April 1, 2022
Working with big dataset	7	2561	November 16, 2018
Unable load big json data to Iceberg with Dremio Apache Iceberg	0	34	March 18, 2025
Limit List field exceeded the maximum number of elements 128	22	5371	September 30, 2022

Null values are not supported in lists by default. Please set `store.json.all_text_mode` to true to read lists containing nulls. Be advised that this will treat JSON null values as a string containing the word 'null'

Related topics