[Django] #20562: Docs: How to use django ORM with multiprocessing

72 views
Skip to first unread message

Django

unread,
Jun 5, 2013, 3:01:16 AM6/5/13
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
-------------------------------+--------------------
Reporter: guettli | Owner: nobody
Type: Uncategorized | Status: new
Component: Documentation | Version: 1.5
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------
There are several tickets closed as "invalid" which were submitted because
the user had problems to use the django ORM with the multiprocessing
library.

Please add some documentation how to do use it.

Main part: restart the database connection after fork()....

--
Ticket URL: <https://code.djangoproject.com/ticket/20562>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jun 6, 2013, 3:51:13 AM6/6/13
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
-------------------------------+------------------------------------
Reporter: guettli | Owner: nobody
Type: New feature | Status: new
Component: Documentation | Version: 1.5
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 1 | UI/UX: 0
-------------------------------+------------------------------------
Changes (by bmispelon):

* cc: bmispelon@… (added)
* needs_better_patch: => 0
* needs_tests: => 0
* easy: 0 => 1
* needs_docs: => 0
* type: Uncategorized => New feature
* stage: Unreviewed => Accepted


Comment:

Seeing as we're already providing an example `views.py` and `urls.py` for
most views, I think it makes sense to include an example template when it
makes sense.

The only example template I could find was in
https://docs.djangoproject.com/en/1.5/topics/class-based-views/generic-
display/#generic-views-of-objects.

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:1>

Django

unread,
Jun 6, 2013, 3:53:39 AM6/6/13
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
-------------------------------+--------------------------------------

Reporter: guettli | Owner: nobody
Type: Uncategorized | Status: new
Component: Documentation | Version: 1.5
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Changes (by bmispelon):

* cc: bmispelon@… (removed)
* type: New feature => Uncategorized
* easy: 1 => 0
* stage: Accepted => Unreviewed


--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:2>

Django

unread,
Jun 6, 2013, 4:06:44 AM6/6/13
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
-------------------------------+------------------------------------

Reporter: guettli | Owner: nobody
Type: Uncategorized | Status: new
Component: Documentation | Version: 1.5
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------
Changes (by akaariai):

* stage: Unreviewed => Accepted


Comment:

I believe the only thing we can document is that "don't use fork()".
Closing connection after fork() might be too late (who says that it is
safe to close a connection from the child?). You need to do it before
fork(). This might work. Or might not work. How about in-memory sqlite
database, will that work? And so on...

The problem isn't that we aren't willing to make fork() work, or document
how you can use fork() with Django. The problem is that in general the
libraries used by Django aren't fork() safe. We can't work around that.

I am marking this as accepted. We should at least mention that you should
not use fork(). In addition we should maybe recommend alternatives to
fork(). I don't believe we should mention that "you can use fork() if you
do the following things". It will be nearly impossible to actually
guarantee that will be true.

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:3>

Django

unread,
Jun 6, 2013, 10:31:02 AM6/6/13
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
-------------------------------+------------------------------------

Reporter: guettli | Owner: nobody
Type: Uncategorized | Status: new
Component: Documentation | Version: 1.5
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------+------------------------------------

Comment (by guettli):

My rule of thumb: "fork() before connection.cursor is created. If it is
None, it is safe to fork()". The same goes for other connections (for
example memcached).

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:4>

Django

unread,
Jun 8, 2013, 7:55:10 PM6/8/13
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
-------------------------------+--------------------------------------

Reporter: guettli | Owner: nobody
Type: Uncategorized | Status: new
Component: contrib.admin | Version: master
Severity: Normal | Resolution:
Keywords: 1 | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 1 | UI/UX: 1
-------------------------------+--------------------------------------
Changes (by ogpcludi <sample@…>):

* needs_better_patch: 0 => 1
* component: Documentation => contrib.admin
* needs_tests: 0 => 1
* version: 1.5 => master


* easy: 0 => 1

* keywords: => 1
* needs_docs: 0 => 1
* has_patch: 0 => 1
* ui_ux: 0 => 1


* stage: Accepted => Unreviewed


Comment:

1

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:5>

Django

unread,
Jun 8, 2013, 7:55:11 PM6/8/13
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
-------------------------------+--------------------------------------
Reporter: guettli | Owner: anonymous
Type: Uncategorized | Status: assigned

Component: contrib.admin | Version: master
Severity: Normal | Resolution:
Keywords: 1 | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 1 | UI/UX: 1
-------------------------------+--------------------------------------
Changes (by ogpcludi <sample@…>):

* status: new => assigned


* needs_better_patch: 0 => 1
* component: Documentation => contrib.admin
* needs_tests: 0 => 1

* keywords: => 1


* version: 1.5 => master
* easy: 0 => 1

* owner: nobody => anonymous


* needs_docs: 0 => 1
* has_patch: 0 => 1
* ui_ux: 0 => 1
* stage: Accepted => Unreviewed


Comment:

1

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:6>

Django

unread,
Oct 5, 2013, 2:12:35 PM10/5/13
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
--------------------------------------+------------------------------------
Reporter: guettli | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Documentation | Version: 1.5

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by timo):

* type: Uncategorized => Cleanup/optimization


--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:5>

Django

unread,
Oct 17, 2014, 9:05:08 AM10/17/14
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
--------------------------------------+------------------------------------
Reporter: guettli | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Documentation | Version: 1.5

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by ajendrex):

Hello guettli,

so, did you have anything in mind when you said "recommend alternatives to
fork"? I'm facing a problem of a for loop accesing a django model queryset
(so, making db queries) that takes too long. each iteration makes
independent calculations, so, is it possible to distribute them somehow?
Anyone?

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:6>

Django

unread,
Jun 25, 2015, 4:54:45 PM6/25/15
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
--------------------------------------+------------------------------------
Reporter: guettli | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Documentation | Version: 1.5

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by nikolas):

I'm also curious if anyone is using the django orm from multiple
processes.

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:7>

Django

unread,
Jun 26, 2015, 2:28:45 AM6/26/15
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
--------------------------------------+------------------------------------
Reporter: guettli | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Documentation | Version: 1.5

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by aaugustin):

Most users of the Django ORM use it from multiple processes, since most
production WSGI servers use multiple processes :-)

This question isn't specific to Django. The general problem that you can't
carry sockets across fork.

If your Django process has network connections to remote data stores and
you want to fork, you need to close these connections before forking.
(Usually, they're reopened automatically on the next access.)

In practice, applications servers fork before Django does anything, so
this issue only arises when you fork in a management command, typically
because you're trying to use `multiprocessing`.

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:8>

Django

unread,
Jun 26, 2015, 4:37:34 AM6/26/15
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
--------------------------------------+------------------------------------
Reporter: guettli | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Documentation | Version: 1.5

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by MoritzS):

Replying to [comment:8 aaugustin]:


> This question isn't specific to Django. The general problem that you
can't carry sockets across fork.

fork() copies the whole file descriptor table to the child process, so
sockets are definitely carried across to the child process.

It's just that you have to explicitly and carefully handle the sockets to
open and close them in the correct processes.
And that's where Django and probably most of the db driver libraries fall
short.

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:9>

Django

unread,
Jun 26, 2015, 5:54:10 AM6/26/15
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
--------------------------------------+------------------------------------
Reporter: guettli | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Documentation | Version: 1.5

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by aaugustin):

OK, I'm out of my depth there. All I know if that, if multiple children
attempt to use a connection established in the parent, you get timeouts,
probably because packets are sent back to the parent.

Is there something like a "pre-fork" signal that Django could react to?
I'm not aware of such a thing.

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:10>

Django

unread,
Jun 26, 2015, 8:04:59 AM6/26/15
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
--------------------------------------+------------------------------------
Reporter: guettli | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Documentation | Version: 1.5

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by guettli):

I created this ticket two years ago. I think a guideline is enough here.

I resolved my issues by checking that no ORM code gets executed before the
multiprocessing module spawns the workers.

In other words:

1. start the workers via multiprocessing.
2. connect to DB.

If you have N workers, you need N connections to the database.

I think no change to the django code base is necessary. Just docs.

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:11>

Django

unread,
Jan 26, 2018, 3:56:05 AM1/26/18
to django-...@googlegroups.com
#20562: Docs: How to use django ORM with multiprocessing
--------------------------------------+------------------------------------
Reporter: Thomas Güttler | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Documentation | Version: 1.5

Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------

Comment (by Antony V. Badaykin):

Replying to [comment:8 Aymeric Augustin]:


> Most users of the Django ORM use it from multiple processes, since most
production WSGI servers use multiple processes :-)
>

> This question isn't specific to Django. The general problem that you
can't carry sockets across fork.
>

> If your Django process has network connections to remote data stores and
you want to fork, you need to close these connections before forking.
(Usually, they're reopened automatically on the next access.)
>

> In practice, applications servers fork before Django does anything, **so
this issue only arises when you fork in a management command**, typically


because you're trying to use `multiprocessing`.

What about to add some top-level wrapper, like `MultiprocessingCommand`
for example, that's carry out about connections and etc?

--
Ticket URL: <https://code.djangoproject.com/ticket/20562#comment:12>

Reply all
Reply to author
Forward
0 new messages