Then I tried to open it by Dremio and error message was shown
Null values are not supported in lists by default. Please set store.json.all_text_mode to true to read lists containing nulls. Be advised that this will treat JSON null values as a string containing the word ‘null’.
Can you please advise how I can resolve this issue
If you scroll to the bottom of http://host:9047/admin/advanced under “Dremio Support”, there will be an area to copy/paste that setting > click Show > then toggle it
Thank you so much, I was able to turn this option on
However, dremio is still unable to load this file, it shows error message: “Error parsing JSON - Unable to expand the buffer”
Have you ever tried to open 3Gb file, is 16GB of memory computer not enough to open?
This may need further investigation. 16gb single node is indeed a bit small, and keep in mind some of that goes to heap, not direct memory. Please note we have plenty of users working with larger files though. If I have some free time maybe I’ll try to load the dataset as well…
My first comment here is that given your 16Gb of ram, those settings are potentially too high. Try setting them to 8GB each and see what happens.
If that still doesn’t happen, it would be interesting to see what happens if you could say bisect the file into 2 1.5 GB instances. I’d like to get a better idea as to when this becomes an issue.
I can see the issue. It’s due to a very deep and wide schema in the JSON file; whilst trying to schema learn, it’s coming up against an internal limit of an buffer, which doesn’t seem to be settable via the UI.
Let me talk to support. I’ll feed back once I have some more news
Looking at the JSON file again, I noticed that the file is actually a single Object.
Dremio is fundamentally a “row” based technology. Essentially, it wants an Array of Objects, so it can treat each entry as a row. Here, Dremio is trying to fit the entire file into a single row and “schema” learn across the entire file.
I would suggest that in this instance, some pre-processing of the file is required to extract the data you want and turn it into a new file that is an array of objects. For example, I notice, there is a large meta section at the start of the file. You could strip this out and inside take the actual “row” data (found later in the file as the “data” property) and use that to create a new file.
Thank you so much Christy, it’s very informative
By the way, can you please advise what public large dataset I should use in order to show Dremio’s capability to work with big data file to our data scientist team
Thank you so much