Live cursors getting dropped (pymongo.errors.CursorNotFound)

Augustus Ostow

unread,

Jan 16, 2020, 4:36:41 PM1/16/20

to mongodb-user

It seems like ever since upgrading our server to 3.6.16 cursors are getting dropped even when we have no_cursor_timeout=True.

After a couple hours into a job I get:

pymongo.errors.CursorNotFound: cursor id 821814684691 not found

I'm using pymongo 3.9.0. I didn't see anything out of the ordinary in the mongo server logs. Any ideas?

Thanks

Shane Harvey

unread,

Jan 16, 2020, 5:22:58 PM1/16/20

to mongodb-user

Starting in MongoDB 3.6, all operations are associated with a ClientSession and cursor's lifetime is tied to the lifetime of the session. Sessions are always discarded after 30 minutes of inactivity (see https://docs.mongodb.com/manual/reference/parameters/#param.localLogicalSessionTimeoutMinutes). This change means that a cursor will timeout after 30 minutes of inactivity even one created with "no_cursor_timeout=True".

We are working on documenting this limitation: DOCS-11255

In the meantime, to workaround this change in behavior the application can:
- ensure that a cursor will issue a getMore command at least once every <30 minutes, or
- create the cursor with an explicit ClientSession and call refreshSessions at least once every <30 minutes as described in DOCS-11255.

Augustus Ostow

unread,

Jan 17, 2020, 12:42:00 PM1/17/20

to mongodb-user

After logging this error a little more carefully it doesn't seem like this is a timeout from 30min of inactivity. The pymongo CursorNotFound exception is thrown a few seconds after a document was successfully fetched from the cursor.

Another clue is that this exception pops around the same place in the job -- usually on a particular document, but if I exclude that document then on the one before. I tried running a shorter job on those several documents that worked fine.

Any other thoughts about what could cause this exception on an active cursor?

Thanks

Shane Harvey

unread,

Jan 17, 2020, 5:00:49 PM1/17/20

to mongodb-user

If the cursor is killed a few seconds after a getMore (not 30 minutes) then it is likely a different issue. There are many issues that can cause a CursorNotFound error, eg cursor timeouts, session timeout, sending a getMore to the wrong address (often a problem seen when putting load balancers in front of the MongoDB cluster). Does your app use a replica set or a sharded cluster?

Another clue is that this exception pops around the same place in the job -- usually on a particular document, but if I exclude that document then on the one before. I tried running a shorter job on those several documents that worked fine.

Interesting, does the server log any information about why the cursor is killed? You may need to increase the logging verbosity. If the the server's log file is inconclusive, I recommend adding a pymongo "CommandLogger" to ease debugging: https://pymongo.readthedocs.io/en/stable/api/pymongo/monitoring.html.

If you can create a small repro for this issue, I suggest you open a PYTHON bug ticket according to: https://github.com/mongodb/mongo-python-driver/#bugs--feature-requests.

Augustus Ostow

unread,

Jan 20, 2020, 1:56:16 PM1/20/20

to mongodb-user

It's a replica set.

The basic pymongo CommandLogger recipe didn't give me much more info besides the request id that fails. Is there anything else I should print out in the command logger? I'm still waiting on atlas support about a permissions issue preventing me from turning up the server logging verbosity.