Hi guys!!! Is any one using dremio + Mongo to help me?
My environment:
MongoDB replica set with 3 nodes, more than 120 million documents - version 4.2.3.
Dremic cluster with 2 nodes (1 coordinator and 1 executor) - version 4.9.1
And I can’t finish at least one PDS reflection. The job always fails and the message I receive is: "MongoCursorNotFoundException: Query failed with error code -5 and error message …
Looking for a solution, i realize that mongo has a limit time for sustain a cursor, who is 10 min, but I’m not capable to go any furder. Need help.
Timeout limit, which is 10 minutes by default. From the docs
By default, the server will automatically close the cursor after 10 minutes of inactivity, or if client has exhausted the cursor.
Batch size, which is 101 documents or 16 MB for the first batch, and 16 MB, regardless of the number of documents, for subsequent batches (as of MongoDB 3.4), From the docs
find() and aggregate() operations have an initial batch size of 101 documents by default. Subsequent getMore operations issued against the resulting cursor have no default batch size, so they are limited only by the 16 megabyte message size.
Can you try setting “noCursorTimeout” to true via the Mongo source advanced property?
About the cursor, I’ve found the same thing and the noCursorTimeout parameter has no way out. But I have found another parameter yesterday: “db.adminCommand( { setParameter: 1, cursorTimeoutMillis: 3600000} )”. So this way I had increase to 1 hour.
But the job on dremio is running about 18 hours and failed.
The error was: “ExecutionSetupException: One or more nodes lost connectivity during query.”, The query: “This query was attempted 6 times due to schema learning 5”
The schema learning attempts just means that as Dremio was processing the documents in the collection kept encountering different schemas between documents
The other error on “One or more nodes lost connectivity during query” is because the executor went unresponsive probably due to a full GC. Send us the output of “ps -ef | grep dremio” and also add “-XX:+PrintClassHistogramBeforeFullGC -XX:+PrintClassHistogramAfterFullGC” to your dremio-env under “DREMIO_JAVA_SERVER_EXTRA_OPTS”, restart the executor, reproduce the issue and send us the server.gc and server.gc.1 files, along with the profile and the server.log from the executor
If this is not VM and a K8’s deployment then I need to send different instructions
@victorbertoldo I do not see the Histogram flags, that I asked to enable. Also the heap size on the executor is 4 GB now, can you please increase to 8 GB and retry?
The profile I have is from “2020-12-17 15:52:00”, does this issue still happen? Please send me the profile when the problem happens now, send the GC logs, server.log after the query fails with "ExecutionSetupException: One or more nodes lost connectivity during query. Identified nodes were "
Sorry to revive this thread, but I have a similar issue creating a reflection on a large MongoDB
The collection has 90+ million documents, but Dremio always stops at exactly 65013696.
Looking at the processing rate in the MONGO_SUB_SCAN, this corresponds to between 30 and 40 minutes.
After between 2 and 3 hours (most of the time 2.5 hours), the reflection refresh fails with the MongoCursorNotFoundException: Command failed with error 43 (CursorNotFoundException)
I have set noCursorTimeout=true in the advanced properties, but it makes no difference.
The reflection refresh attempts has been running daily for 14 days, and always fails at exactly the same number of documents.
Any ideas how to solve this? As usual, unfortunately I cannot share profiles
The first 4 stages all process the 65 mio documents
@dotjdk Looks liek a server side exception ., are you the DBA for the Mongo instance?
Adjust the server parameter cursorTimeoutMillis to a higher value. Not recommended, but a potential quick fix if you’re trying to do a one-time run on your local machine