Root Object Cannot be Scalar Parsing Error

Hi all,

For some reason I am getting this error whenever I try to convert a CSV to a JSON with the sample data given by dremio:

“Failure while parsing JSON. Found token of [VALUE_STRING]. Dremio currently only supports parsing json strings that contain either lists or maps. The root object cannot be a scalar.”

This happens every time I try to convert almost anything from one format to another. Can someone help me with this? Thanks.

Hi @shawnb,

The formatting window is used to parse the data according to the files format. So if you have a CSV file and you try to parse/format it as a JSON file, it will give this error (though it would be nice to have a more descriptive error).

So when you promote (format) a CSV file, just choose "Text (delimited) and comma for the delimiter:

Or am I misunderstanding your problem?

Hi @ben thank you for responding. I see what you mean now but if I wanted to transform a CSV file into a JSON format how would I do that? I thought the way I was doing it was the way to transform files? Thanks and if I need to clarify anything please let me know, I am new here and am kind of bad at wording questions sometimes. Thanks!

Hi @shawnb,

If you want to transform the file into JSON format for export to another tool you can do this in Dremio, provided that the file is not too large (a few thousand rows should be ok). Add the CSV file and format it as a CSV file. Then go to this data set in Dremio, run a SELECT * FROM you.table and use the export tool to download the set in a different format (JSON in your case).

I am getting a similar error @ben but the steps that lead to error are different from what @shwanb has listed in this issue.

  1. I have a folder “SampleFolder” with ~2000+ JSON files.
  2. I query on “SampleFolder” (after converting it to a Physical Data Set) and I see the following error - “Failure while parsing JSON. Found token of [VALUE_STRING]. Dremio currently only supports parsing json strings that contain either lists or maps. The root object cannot be a scalar”
  3. But when I randomly query on the underlying JSON files (after converting it to a Physical Data Set), I don’t see any issue - the query works and I am able to see the underlying data.

can you assist with this issue please


When you initially query the dataset, Dremio will sample some of the files for the schema of the overall dataset. It would seem that some of the JSON files are showing this issue, or there is a discrepancy in the type of some field between 2 files (so, maybe “column_a” is a string, while in the other it “column_a” is a list type).

Are you sure that all of the JSON files have objects with the same properties?

Thanks @ben - that was our original suspect and we have isolated those files - the query executes fine now. But if Dremio is reading the schema/layout anyways, why does consistent layout matter?

With the JSON files on a filesystem-like source (S3, Azure Storage, HDFS), there is no schema defined ahead of time for those files for Dremio to determine the data types. So Dremio has to actually scan and parse a few files to determine a schema. If there is a big discrepancy in the implicit schema between 2 JSON object, then this can cause problems in determining the overall schema of the dataset (all the files).

I think what might be happening in your case, is that a majority of objects have a property like:
"beatles" : [ "john", "paul", "george", "ringo" ]
But then some minority of objects have that property like:
"beatles": 4

What did the files that you isolated look like when compared to the majority of files?