Refresh reflection on Mongodb PDS

Hi guys!!! Is any one using dremio + Mongo to help me?

My environment:

MongoDB replica set with 3 nodes, more than 120 million documents - version 4.2.3.
Dremic cluster with 2 nodes (1 coordinator and 1 executor) - version 4.9.1

And I can’t finish at least one PDS reflection. The job always fails and the message I receive is: "MongoCursorNotFoundException: Query failed with error code -5 and error message …

Looking for a solution, i realize that mongo has a limit time for sustain a cursor, who is 10 min, but I’m not capable to go any furder. Need help.

Here is the profile of that.

dc371e72-137c-4714-93e8-25bac9eab779.zip (48,6,KB)

@victorbertoldo

From Stackoverflow

This could be because of 2 reasons

Timeout limit, which is 10 minutes by default. From the docs

By default, the server will automatically close the cursor after 10 minutes of inactivity, or if client has exhausted the cursor.

Batch size, which is 101 documents or 16 MB for the first batch, and 16 MB, regardless of the number of documents, for subsequent batches (as of MongoDB 3.4), From the docs

find() and aggregate() operations have an initial batch size of 101 documents by default. Subsequent getMore operations issued against the resulting cursor have no default batch size, so they are limited only by the 16 megabyte message size.

Can you try setting “noCursorTimeout” to true via the Mongo source advanced property?

Tks for helping me out.

About the cursor, I’ve found the same thing and the noCursorTimeout parameter has no way out. But I have found another parameter yesterday: “db.adminCommand( { setParameter: 1, cursorTimeoutMillis: 3600000} )”. So this way I had increase to 1 hour.
But the job on dremio is running about 18 hours and failed.
The error was: “ExecutionSetupException: One or more nodes lost connectivity during query.”, The query: “This query was attempted 6 times due to schema learning 5”

Here’s the profile:

c385a100-bb76-4f61-8329-c56b06947cd2.zip (227,9,KB)

I don’t now if my mongo exceds my dremio’s cluster capacity and don’t how to analyse that.

Any help would be great.

@victorbertoldo

The schema learning attempts just means that as Dremio was processing the documents in the collection kept encountering different schemas between documents

The other error on “One or more nodes lost connectivity during query” is because the executor went unresponsive probably due to a full GC. Send us the output of “ps -ef | grep dremio” and also add “-XX:+PrintClassHistogramBeforeFullGC -XX:+PrintClassHistogramAfterFullGC” to your dremio-env under “DREMIO_JAVA_SERVER_EXTRA_OPTS”, restart the executor, reproduce the issue and send us the server.gc and server.gc.1 files, along with the profile and the server.log from the executor

If this is not VM and a K8’s deployment then I need to send different instructions

Thanks
Bali

dremio.zip - Google Drive about the enormous
delay, but I had to focus on another mongodb problems.
Here’s the files that you asked.

Thx for
me out.

@victorbertoldo I do not see the Histogram flags, that I asked to enable. Also the heap size on the executor is 4 GB now, can you please increase to 8 GB and retry?

Sorry about that, I’ve updated the file on link above.

@victorbertoldo

The profile I have is from “2020-12-17 15:52:00”, does this issue still happen? Please send me the profile when the problem happens now, send the GC logs, server.log after the query fails with "ExecutionSetupException: One or more nodes lost connectivity during query. Identified nodes were "

Logs are needed from the