New schema found. Please reattempt the query. Multiple attempts may be necessary to fully learn the schema

Hi,

My team and I would sometimes encounter this error message when posting queries using the REST API:

We guessed that the cause of this error may be due to the fact that the physical datasets that we are trying to query have nested json fields (Thus, multiple query attempts might be needed for schema learning).

We spotted that, at times, queries that require schema learning would be (automatically) re-attempted by Dremio (as seen in the screenshot below), while at other times, no (automatic) re-attempts will be performed and the error will immediately be thrown out (as seen in the previous screenshot), which we find this behaviour quite peculiar.

We are currently using Dremio version 4.7.2 (Community Edition). We have read in the release notes for Dremio version 4.5.0 that this issue was resolved by only pushing down projections that are simple column references.

Hence, we would like to check if this issue we faced is a bug and if there is any way to allow limited auto re-attempts of the query for schema learning before Dremio would throw back the error message.

Here are the job profiles of the two screenshots to aid in the debugging of this issue:

Job profile of the failed query:
389394d2-c16b-4219-b6da-afb3ac59ce26.zip (15.4 KB)

Job profile of the successful query after automatic schema learning re-attempt(s):
9aec65a0-c715-423f-8e04-7e214253537b.zip (31.8 KB)

Thank you.

Same problem.

I’m trying to build a raw reflection over a mongodb collection with millions of documents. One field contains a variable json structure.

@fdellutri @edksk If you have changing schema then Dremio tries 10 times and stops. However, it does not forget what it has learnt so rerunning the query will start from where it left off

Is there a reason the schemas are so heterogeneous?

In my case, the mongodb collection has a field that stores a nested, variable structure. In such case, is there any solution I could follow (e.g. a schema transformation?)

@fdellutri If the struct is constantly changing schema, currently we cannot do much, end of Q3 we are coming up with enhancements so you can define your own schema and that should address this

Is there any update on this, we keep running into that issue and was wondering if there was a way to define a schema

@OmarSultan85 Internal schema is a feature we are working on and will be later this year

Hi, I have a Nessie table and this error keeps popping up every time I try to access the json field. You can very easily reproduce by creating a table in a nessie endpoint like this:

CREATE TABLE Nessie.my_catalog.my_schema.my_table AT BRANCH main
AS (
    SELECT 
        1 AS id, 
        'Example Name' AS name, 
        '[{"key": "value1"}, {"key": "value2"}]' AS json_data    
)

and then query like this:

SELECT CONVERT_FROM(json_data, 'JSON') FROM Nessie.my_catalog.my_schema.my_table

It ends up with this, doesn’t matter whether run it twice or more often:

New field in the schema found. Please reattempt the query. Multiple attempts may be necessary to fully learn the schema.

Is this a known behaviour? Is there any possibility to handle json fields using the nessie endpoint?

Thank you in advance!

@styx0r Not sure if it is related to Nessie, does this work if you try the repro as a regular Non-Nessie table like Parquet or JSON on file on S3 or Hive?

@balaji.ramaswamy thx for your answer. Yes it works. I tried a regular minio-s3 endpoint with parquet and iceberg. Both of them behave like expected. That’s why I concluded it’s related to Nessie.