Schema Learning Sampling records Size

Hi ,
I am new to Dremio, and i have connected Dremio with Mongo DB
One of my Mongo DB collection has more than 10K Documents,
out of these document the latest inserted document have some extra attributes,
But dremio is not able to show those extra attributes,
i think dremio is learning schema from first few 100 documents only,
is there any way to increase that samoling size???

Hi @Vaibhav_Agrawal,

Let me get back to you on this with an answer


Hi @Vaibhav_Agrawal,

We do a small initial sample the very first time we query a table. For mongo, this limit is at 100 records.

The problem that we are seeing is that if there are columns in the table which were not found during sampling. There is a work around for this, which is to do a full scan of the table: Try the below

select * from mongo.table where RANDOM() = RANDOM()

The reason for the RANDOM()= RANDOM() filter is that this prevents us from returning a lot of results back, but it’s also a filter that we will not push down into the Mongo scan. What should happen now is that when we discover the new columns, we will update the schema in our kvstore.

Hope this helps,


Thanks a lot @balaji.ramaswamy for reply…

I figured it out some how… i always have to run the query again to update the schema…
If i don’t run it, it will show results from last run job…