Hi Dremio Team,
I’m facing an issue with Schema Discovery (Preview works, Run broken) when using nested JSON structures in MongoDB. The case is pretty easy to reproduce. I add three JSON documents to a Mongo collection:
[
{
"o" : {
"text": "a",
"num" : 42
}
},
{
"o":
{
"text": "b",
"num": 43
}
},
{
"o":
{
"text": "c",
"num": 44
}
}
]
I then execute the following query in PREVIEW mode:
SELECT CONCAT("coll"."o"."text", 'myString')
FROM mysource.mydb.coll
This runs fine – it returns all three “text” attribute values plus “myString”.
When I switch to “RUN” or execute via JDBC, I get:
Error: SCHEMA_CHANGE ERROR: New schema found and recorded. Please reattempt the query. Multiple attempts may be necessary to fully learn the schema.
Original Schema schema(o::struct<text::varchar>)
New Schema schema(o::struct<text::varchar, num::int32>)
SqlOperatorImpl MONGO_SUB_SCAN
Location 0:0:2
SqlOperatorImpl MONGO_SUB_SCAN
Location 0:0:2
Fragment 0:0
[Error Id: a37fca37-5532-462c-8f5e-d65d6e00a67c on MYMACHINE:31010]
(org.apache.arrow.vector.util.SchemaChangeRuntimeException) Schema change error
com.dremio.common.exceptions.UserException.schemaChangeError():88
com.dremio.sabot.op.scan.ScanOperator.checkAndLearnSchema():261
com.dremio.sabot.op.scan.ScanOperator.setupReader():178
com.dremio.sabot.op.scan.ScanOperator.setup():163
com.dremio.sabot.driver.SmartOp$SmartProducer.setup():560
com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():79
com.dremio.sabot.driver.Pipe$SetupVisitor.visitProducer():63
com.dremio.sabot.driver.SmartOp$SmartProducer.accept():530
com.dremio.sabot.driver.StraightPipe.setup():102
com.dremio.sabot.driver.StraightPipe.setup():102
com.dremio.sabot.driver.Pipeline.setup():58
com.dremio.sabot.exec.fragment.FragmentExecutor.setupExecution():347
com.dremio.sabot.exec.fragment.FragmentExecutor.run():237
com.dremio.sabot.exec.fragment.FragmentExecutor.access$800():88
com.dremio.sabot.exec.fragment.FragmentExecutor$AsyncTaskImpl.run():594
com.dremio.sabot.task.AsyncTaskWrapper.run():103
com.dremio.sabot.task.slicing.SlicingThread.run():110
SQLState: null
ErrorCode: 0
When I try exactly the same with a JSON source (i.e., same data, but no MongoDB), I get no error.
Query Profile:
query_profile_concat_bug.zip (41.8 KB)
Sample JSON file to import into MongoDB:
concat.zip (262 Bytes)
Thanks, Tim