Refresh reflection on Mongodb PDS

Hi guys!!! Is any one using dremio + Mongo to help me?

My environment:

MongoDB replica set with 3 nodes, more than 120 million documents - version 4.2.3.
Dremic cluster with 2 nodes (1 coordinator and 1 executor) - version 4.9.1

And I can’t finish at least one PDS reflection. The job always fails and the message I receive is: "MongoCursorNotFoundException: Query failed with error code -5 and error message …

Looking for a solution, i realize that mongo has a limit time for sustain a cursor, who is 10 min, but I’m not capable to go any furder. Need help.

Here is the profile of that.

dc371e72-137c-4714-93e8-25bac9eab779.zip (48,6,KB)

@victorbertoldo

From Stackoverflow

This could be because of 2 reasons

Timeout limit, which is 10 minutes by default. From the docs

By default, the server will automatically close the cursor after 10 minutes of inactivity, or if client has exhausted the cursor.

Batch size, which is 101 documents or 16 MB for the first batch, and 16 MB, regardless of the number of documents, for subsequent batches (as of MongoDB 3.4), From the docs

find() and aggregate() operations have an initial batch size of 101 documents by default. Subsequent getMore operations issued against the resulting cursor have no default batch size, so they are limited only by the 16 megabyte message size.

Can you try setting “noCursorTimeout” to true via the Mongo source advanced property?

Tks for helping me out.

About the cursor, I’ve found the same thing and the noCursorTimeout parameter has no way out. But I have found another parameter yesterday: “db.adminCommand( { setParameter: 1, cursorTimeoutMillis: 3600000} )”. So this way I had increase to 1 hour.
But the job on dremio is running about 18 hours and failed.
The error was: “ExecutionSetupException: One or more nodes lost connectivity during query.”, The query: “This query was attempted 6 times due to schema learning 5”

Here’s the profile:

c385a100-bb76-4f61-8329-c56b06947cd2.zip (227,9,KB)

I don’t now if my mongo exceds my dremio’s cluster capacity and don’t how to analyse that.

Any help would be great.

@victorbertoldo

The schema learning attempts just means that as Dremio was processing the documents in the collection kept encountering different schemas between documents

The other error on “One or more nodes lost connectivity during query” is because the executor went unresponsive probably due to a full GC. Send us the output of “ps -ef | grep dremio” and also add “-XX:+PrintClassHistogramBeforeFullGC -XX:+PrintClassHistogramAfterFullGC” to your dremio-env under “DREMIO_JAVA_SERVER_EXTRA_OPTS”, restart the executor, reproduce the issue and send us the server.gc and server.gc.1 files, along with the profile and the server.log from the executor

If this is not VM and a K8’s deployment then I need to send different instructions

Thanks
Bali

dremio.zip - Google Drive about the enormous
delay, but I had to focus on another mongodb problems.
Here’s the files that you asked.

Thx for
me out.

@victorbertoldo I do not see the Histogram flags, that I asked to enable. Also the heap size on the executor is 4 GB now, can you please increase to 8 GB and retry?

Sorry about that, I’ve updated the file on link above.

@victorbertoldo

The profile I have is from “2020-12-17 15:52:00”, does this issue still happen? Please send me the profile when the problem happens now, send the GC logs, server.log after the query fails with "ExecutionSetupException: One or more nodes lost connectivity during query. Identified nodes were "

Logs are needed from the

Sorry to revive this thread, but I have a similar issue creating a reflection on a large MongoDB

The collection has 90+ million documents, but Dremio always stops at exactly 65013696.

Looking at the processing rate in the MONGO_SUB_SCAN, this corresponds to between 30 and 40 minutes.

After between 2 and 3 hours (most of the time 2.5 hours), the reflection refresh fails with the MongoCursorNotFoundException: Command failed with error 43 (CursorNotFoundException)

I have set noCursorTimeout=true in the advanced properties, but it makes no difference.

The reflection refresh attempts has been running daily for 14 days, and always fails at exactly the same number of documents.

Any ideas how to solve this? As usual, unfortunately I cannot share profiles :frowning:

The first 4 stages all process the 65 mio documents

MONGO_SUB_SCANPROJECTPROJECTEXTERNAL_SORT

but the data never reaches the remaining stages (next one being PARQUET_WRITER)

This is all I can share. The 00-xx-08 - PARQUET_WRITER has Max Records = 0. The four stages below that has Max Records = 65,013,696

@dotjdk Looks liek a server side exception ., are you the DBA for the Mongo instance?

  1. Adjust the server parameter cursorTimeoutMillis to a higher value. Not recommended, but a potential quick fix if you’re trying to do a one-time run on your local machine