New schema found. Please reattempt the query. Multiple attempts may be necessary to fully learn the schema


My team and I would sometimes encounter this error message when posting queries using the REST API:

We guessed that the cause of this error may be due to the fact that the physical datasets that we are trying to query have nested json fields (Thus, multiple query attempts might be needed for schema learning).

We spotted that, at times, queries that require schema learning would be (automatically) re-attempted by Dremio (as seen in the screenshot below), while at other times, no (automatic) re-attempts will be performed and the error will immediately be thrown out (as seen in the previous screenshot), which we find this behaviour quite peculiar.

We are currently using Dremio version 4.7.2 (Community Edition). We have read in the release notes for Dremio version 4.5.0 that this issue was resolved by only pushing down projections that are simple column references.

Hence, we would like to check if this issue we faced is a bug and if there is any way to allow limited auto re-attempts of the query for schema learning before Dremio would throw back the error message.

Here are the job profiles of the two screenshots to aid in the debugging of this issue:

Job profile of the failed query: (15.4 KB)

Job profile of the successful query after automatic schema learning re-attempt(s): (31.8 KB)

Thank you.

Same problem.

I’m trying to build a raw reflection over a mongodb collection with millions of documents. One field contains a variable json structure.

@fdellutri @edksk If you have changing schema then Dremio tries 10 times and stops. However, it does not forget what it has learnt so rerunning the query will start from where it left off

Is there a reason the schemas are so heterogeneous?

In my case, the mongodb collection has a field that stores a nested, variable structure. In such case, is there any solution I could follow (e.g. a schema transformation?)

@fdellutri If the struct is constantly changing schema, currently we cannot do much, end of Q3 we are coming up with enhancements so you can define your own schema and that should address this