Hi, Just evaluating Dremio first time today. I am looking to create a dataset out of a drive of XML files where there may be thousands of records per file, like a daily xml log. Is there a simple way that Dremio can parses these XML files? I seem to be missing something. Like I said, first day!
Hey @Ben_Spencer, Dremio does not support parsing XML files. We’d recommend converting them to some other format before analyzing with Dremio.
Hey Can, thanks for responding. Would this be on the cards anytime? It seems like this would be just one of a number of parsers/interpretors that could makes sense and Dremio doesn’t cover.
You can’t really analyze XML in a structured manner without a XSD file…
Is Address a string or a list of strings???
<Address>1600 Pennsylvania Ave</Address> vs <Address>1600 Pennsylvania Ave</Address> <Address>10 Downing Street</Address>
I wrote something to convert XML to JSON
Even after all this JSON can still be problematic…
"Address":"1600 Pennsylvania Ave" "Zip":"20500" vs "Address":"10 Downing Street" "Zip": null
What datatype is Zip? string, int, boolean??
The only nested data structure that really works is parquet which includes a schema definition…
Someday I’ll write a XML to Parquet converter with the help of XSD files.