pymongo and forking services

jona...@findmeon.com

unread,

Nov 4, 2016, 6:45:58 PM11/4/16

to mongodb-user

According to the docs, pymongo is not fork-safe (http://api.mongodb.com/python/current/faq.html#is-pymongo-fork-safe) and clients should either be created after a fork or with `connect=False` in the constructor for a lazy established connection.

I'm doing a pre-deploy audit, and noticed that while we do utilize `connect=False` it is still possible (though unlikely) that we query mongo before a fork. I was hoping that there is an appropriate API call I can make after a fork to reinitialize the connection pool object (and not have to replace it). many systems with connection pools and crypto libraries support post-fork re-initialization like this. a quick reading of the docs suggests `close()` might work, but I haven't had time to read through the code.

has anyone else looked into this before?

A. Jesse Jiryu Davis

unread,

Nov 6, 2016, 6:10:10 PM11/6/16

to mongodb-user

Hi Jonathan. For the reasons described in the docs, we don't support reinitialization of PyMongo. I suggest that if you create a MongoClient before forking and suspect it might be used for an operation before the fork, that you simply discard it and create a new MongoClient after the fork:

# Recreate the client in case we used it before fork.

client = MongoClient(... configuration ....)

jona...@findmeon.com

unread,

Nov 7, 2016, 12:02:47 PM11/7/16

to mongodb-user

Thanks Jesse. I was hoping all that would have been taken care of by `close()` -- we can hope, can't we ;)

I only had the connection info at startup, so I wanted to avoid the call for configuring a new client -- but it's easy to stash whatever info I can't pull out of the existing client.

A. Jesse Jiryu Davis

unread,

Nov 7, 2016, 12:23:22 PM11/7/16

to mongodb-user

Can you tell me, for my education, some more about how you code and deploy your application? This "no fork" policy has bothered some of our users and we want to understand why. Why does your application fork? Why does it sometimes query MongoDB before forking, and sometimes not? What libraries are in your stack?

Feel free to write me directly je...@mongodb.com if any details aren't public.

Appreciated,

Jesse

jona...@findmeon.com

unread,

Nov 7, 2016, 2:22:51 PM11/7/16

to mongodb-user

On Monday, November 7, 2016 at 12:23:22 PM UTC-5, A. Jesse Jiryu Davis wrote:

Can you tell me, for my education, some more about how you code and deploy your application? This "no fork" policy has bothered some of our users and we want to understand why. Why does your application fork? Why does it sometimes query MongoDB before forking, and sometimes not? What libraries are in your stack?

I'd be happy to share publicly.

Deployment Info:

This application is basically a Service Oriented Architecture of multiple independent services deployed on 3 platforms with interconnected services: `Pyramid` Web Servers, `Celery` Task Mangers, and `Twisted` task runners. Both the `Pyramid` and `Celery` services fork (`Pyramid` is deployed via a `uWSGI` container on `nginx`, but the native `waitress` server forks); `Celery` forks as well. `SqlAlchemy` is used to talk to `Postgres`, and is fork-safe through calling `engine.dispose()` at a post-fork hooks (celery and waitress both have a hook). `Redis` is involved as well, and the python client auto-detects a fork (it compares PIDs since version 2.4.12).

There can be a couple of mongo clients/servers involved with a given application. The connection info and credentials are split between `.ini` files used by the application and python files managed by `blackbox` (https://github.com/StackExchange/blackbox).

Pre-Fork connection:

Connections happen pre-fork for 2 general reasons:

1. Ensure the data service is running. I want to ensure a read/write as part of the initial startup against all the backing data stores. For development/testing, failing fast will streamline a resolution. For production, it's better to fail at this stage before registering that server behind a load balancer.

2. Pre-load data before a fork, both to optimize copy-on-write behavior and avoid a bottleneck of 12+ instances on each node querying the backend for 10MB of info simultaneously.

There were probably other concerns, these just popped to the top of my head.

Because of how the various frameworks handle configuration data, the info needed to create a new client may or may not be easy to access after a fork. In Pyramid's case, the information would be in the application registry, which is best accessed via an API object. There is a 'threadlocal' way to access this information (`get_current_registry`) but all the threadlocal functions are recommended against under production.

ANYWAYS, It took me longer to write a fix than to write this email - I just wrapped the mongo client in a container with the credentials.

If you have any questions on anything, be sure to cc me directly and I'll try to answer anything I can.

A. Jesse Jiryu Davis

unread,

Nov 7, 2016, 3:29:37 PM11/7/16

to mongodb-user

Lovely, thanks very much for those useful details.

Reply all

Reply to author

Forward