Possible idea for removing global state in Django.

508 views
Skip to first unread message

Jonathan Slenders

unread,
Oct 16, 2013, 4:24:48 PM10/16/13
to django-d...@googlegroups.com
The global state problem is something that's been bothering me for a long while, but after seeing a presentation of Alex Gaynor [1] from last year, I started thinking about this again.

The main problem is that you'd need to have a DjangoProject object which contains the root configuration and you'd have to somehow pass that object around everywhere.

Maybe I'm not the first to think about this, but what's wrong with thread-local-storage? In Django, we have a huge advantage above nodejs in that there's only one active request per thread and that request should belong to only one DjangoProject instance. (We don't have unexpected contextswitches in the same thread, like in Twisted or Tulip. -- We can make that assumption.)

Actually we are already using threadlocals for the current active language [2], so I don't see any reason for not using the same approach to keep track of the currently active DjangoApplication object.

It's not a really small project, but not impossible huge either. The most important parts of the code that need to be changed to use this thread local are:

* django.conf.settings:
We don't want to break that API, so the settings object should become a proxy that knows the current active project and returns the correct settings.

* the reverse() functions (and a few others) from URL patterns.
URL patterns depend on the application.

* Django models:
That's the harder one. MyModel.objects.get() needs to know the settings, in order to know what the current database is.


I would propose a python context manager to move from one project to another, if you ever have to. Say that you want to query a model from another project, you can do this:

other_django_project = DjangoProject.from_settings_file(my_project_settings)
with other_django_project.activate():
    MyModel.objects.get()

manage.py should look like this:

if __name__ == '__main__':
    django_project = DjangoProject.from_settings_file(my_project_settings)
    with django_project.activate():
        execute_from_command_line(sys.argv)


And for the flask-lovers which don't like automatic code generation and singleton patterns, they can just use the DangoProject constructor:

if __name__ == '__main__':
    django_project = DjangoProject(
            url_patterns=root_url_patterns,
            installed_apps=[ ... ],
            ....
    )
    with django_project.activate():
        execute_from_command_line(sys.argv)


For the why of all this, I refer to the presentation of Alex, but the main advantages are that it becomes much more easy to integrate Django projects in other Python projects and that unit testing becomes easier: you don't have a global state.

What do you think? I don't see real backward-compatibility issues that we can't solve. Do I forget something?

Cheers,
Jonathan



Russell Keith-Magee

unread,
Oct 16, 2013, 7:26:20 PM10/16/13
to Django Developers
On Thu, Oct 17, 2013 at 4:24 AM, Jonathan Slenders <jonathan...@gmail.com> wrote:
The global state problem is something that's been bothering me for a long while, but after seeing a presentation of Alex Gaynor [1] from last year, I started thinking about this again.

The main problem is that you'd need to have a DjangoProject object which contains the root configuration and you'd have to somehow pass that object around everywhere.

Maybe I'm not the first to think about this, but what's wrong with thread-local-storage?

What's wrong with thread local storage? Well, try this though experiment.

Everywhere that you see the word "thread local", replace it with "global variable". Now re-read your argument.

It doesn't matter how you gussy it up -- a thread local is a global variable, with all the software engineering consequences that follow from that, including increased coupling, decreased cohesion, complications for testing, and so on.
 
In Django, we have a huge advantage above nodejs in that there's only one active request per thread and that request should belong to only one DjangoProject instance. (We don't have unexpected contextswitches in the same thread, like in Twisted or Tulip. -- We can make that assumption.)
 
Yet. :-)

I don't know that we can rely on this being true for all time.
 
(This is an admittedly weak argument -- I certainly wouldn't base any objection to thread locals on this alone)

Actually we are already using threadlocals for the current active language [2], so I don't see any reason for not using the same approach to keep track of the currently active DjangoApplication object.

You're correct -- however, I'd call this a wart, not a pattern to be followed. If we were in a position to remove these thread locals, I would.

Yours,
Russ Magee %-)

Jonathan Slenders

unread,
Oct 17, 2013, 1:12:50 AM10/17/13
to django-d...@googlegroups.com




Le jeudi 17 octobre 2013 01:26:20 UTC+2, Russell Keith-Magee a écrit
What's wrong with thread local storage? Well, try this though experiment.

Everywhere that you see the word "thread local", replace it with "global variable". Now re-read your argument.

It doesn't matter how you gussy it up -- a thread local is a global variable, with all the software engineering consequences that follow from that, including increased coupling, decreased cohesion, complications for testing, and so on.
 

 It is a safe convention, I think. Actually every variable on the call stack is a thread local, because every thread has its own stack. In python it is possible to read variables from another thread (frame inspection), but you just don't do that.

A thread local is a persistent global -- I know --, but if you let it behave like a call stack, using context managers in Python, then it's not that different from a real local variable. the difference is that a 'real' local is on the interpreter's stack, while a thread local is in a pure-python stack. Both appear local.
The important part is that these stacks should look identical before and after execution of every function call, so that you don't have side effects. Python's "with"-statement is amazing at handling this.

Yes, it is more implicit than passing objects around, but it is safe. I think that it's also the only option we have.

Personally, for the active language, I would make that a stack as well, and deprecate language.activate like it is now.

with language.activate('en'):
    do_something();

About relying on that we won't have unexpected context switches in the same thread. That would be a safe assumption. There is one exception, called Gevent. what Gevent does, is patching IO routines and swapping the current call stack for another using a C-extension. That's a dangerous practice, which is unsafe by design (Also why Guido van Rossem only wants to have an explicit 'yield' for coroutinus.) I'm not sure, but if we want to support gevent with thread locals, we meight need to hook into gevent and swap our pure-python stacks as well.

Hopefully that explains why I think that thread-locals are not that bad as they look.



 
 

Aymeric Augustin

unread,
Oct 17, 2013, 2:34:48 AM10/17/13
to django-d...@googlegroups.com

On Oct 17, 2013 7:13 AM, "Jonathan Slenders" <jonathan...@gmail.com> wrote:
> There is one exception, called Gevent. what Gevent does, is patching IO routines and swapping the current call stack for another using a C-extension. That's a dangerous practice, which is unsafe by design (Also why Guido van Rossem only wants to have an explicit 'yield' for coroutinus.) I'm not sure, but if we want to support gevent with thread locals, we meight need to hook into gevent and swap our pure-python stacks as well.

Thread locals are "greenlet locals" in gevent so this isn't an issue.

But like Russell I doubt this proposal actually solves the problem.

For instance, thread locals are strictly equivalent to regular variables in tests because they are single threaded (with a handful of exceptions). But allowing testing in isolation is a major goal of "removing global state".

--
Aymeric.

Shai Berger

unread,
Oct 17, 2013, 6:31:07 PM10/17/13
to django-d...@googlegroups.com
On Thursday 17 October 2013 08:34:48 Aymeric Augustin wrote:
>
> For instance, thread locals are strictly equivalent to regular variables in
> tests because they are single threaded (with a handful of exceptions). But
> allowing testing in isolation is a major goal of "removing global state".

If I understand correctly, what Jonathan is suggesting is not your garden-
variety thread-local variables, but what Common Lisp calls "dynamically scoped
variables" -- you could call them "call-stack-local" or something like that.
In Python terms, every change to the values of such variables would be done in
the __enter__() of a context manager, and undone (that is, old value restored)
in the __exit__(). I think such variables are a great idea, and would indeed
help a lot with all the problems associated with global state. In terms of
software engineering, they are a lot like exceptions: A channel of
communications between functions on different levels of the call stack, that
does not require explicit acknowledgement on every level, and yet does not
completely break locality.

However, I don't think such variables can be used in a reliable and elegant
manner without language-level support, and sadly, Python does not support
them. Attempts I've made to get the functionality using context managers ended
with awkward APIs for either setting values, getting values, or both; as a
trivial example, Jonathan's suggestion of the context manager being returned
by django_project.activate() is of such low granularity, that it is almost
equivalent to regular globals.

If a reasonable API for this can be defined, I'd be all for it. I suspect
that's a non-trivial "if".

Shai.

Anssi Kääriäinen

unread,
Oct 18, 2013, 7:27:21 AM10/18/13
to django-d...@googlegroups.com


One possible improvement is to start collecting all global and thread local state under one object (call it "env" or something like that). That way it would be a lot easier to actually see what global state you are using. Currently that isn't at all clear. We have a couple of different thread-local storages, and then we have a lot of cached values based on settings, import time actions etc.

The difficulty of doing this will likely be somewhere between really hard and impossible. And that environment object might end up as a God Object for Django (http://en.wikipedia.org/wiki/God_object). So maybe not so great idea in practice...

 - Anssi

Jonathan Slenders

unread,
Oct 21, 2013, 4:03:39 AM10/21/13
to django-d...@googlegroups.com
I'm in favour of everything that makes it easier to see where we still have a global state.

About threads, I realised there's the case where you'd use a threadpool in a Django views for optimising stuff. The thread locals wouldn't work there, so that means we would have to offer an API for copying thread locals from one thread to another... Maybe very dirty, but I wonder whether Python has a concept of "parent thread" and whether thread locals could look in there?


Reply all
Reply to author
Forward
0 new messages