Schema Learning Sampling records Size

Vaibhav_Agrawal · November 3, 2017, 4:18pm

Hi ,
I am new to Dremio, and i have connected Dremio with Mongo DB
One of my Mongo DB collection has more than 10K Documents,
out of these document the latest inserted document have some extra attributes,
But dremio is not able to show those extra attributes,
i think dremio is learning schema from first few 100 documents only,
is there any way to increase that samoling size???
regard
Vaibhav

balaji.ramaswamy · November 6, 2017, 9:05pm

Hi @Vaibhav_Agrawal,

Let me get back to you on this with an answer

Thanks,
@balaji.ramaswamy

balaji.ramaswamy · November 6, 2017, 11:57pm

Hi @Vaibhav_Agrawal,

We do a small initial sample the very first time we query a table. For mongo, this limit is at 100 records.

The problem that we are seeing is that if there are columns in the table which were not found during sampling. There is a work around for this, which is to do a full scan of the table: Try the below

select * from mongo.table where RANDOM() = RANDOM()

The reason for the RANDOM()= RANDOM() filter is that this prevents us from returning a lot of results back, but it’s also a filter that we will not push down into the Mongo scan. What should happen now is that when we discover the new columns, we will update the schema in our kvstore.

Hope this helps,

Thanks,
@balaji.ramaswamy

Vaibhav_Agrawal · November 14, 2017, 4:00am

Thanks a lot @balaji.ramaswamy for reply…

I figured it out some how… i always have to run the query again to update the schema…
If i don’t run it, it will show results from last run job…
Thanks
@Vaibhav_Agrawal

Topic		Replies	Views
Could not access a Table in Dremio (Collection in Mongo DB) due to Dremio Limit	1	1074	September 2, 2020
Can Dremio process Big Data actually?	10	1482	June 19, 2018
New schema found. Please reattempt the query. Multiple attempts may be necessary to fully learn the schema	13	2443	February 24, 2025
Dremio is not able to infer the complete schema from gz compressed json files	1	1304	September 8, 2020
Couple of queries on Dremio Dremio University	3	1399	August 23, 2020

Schema Learning Sampling records Size

Related topics