That's my ticket, and I had mostly forgotten about it.
As you've found out, there is no way to hook into the bootstrap
process. For signals, we just stuff them somewhere (typically
__init__.py or models.py) where we expect them to be imported. I've
always felt that this is a bit of a kludge, but there's been no better
way to handle it.
Right now the primary solution is to patch Django itself, and there
are at least a few place where that could be done. Your approaches
(modifying the settings or models modules) rely on incidental behavior
-- these modules happen to be loaded in almost every case, so whatever
we throw in there will also get run.
The path I took was a more traditional approach to bootstrapping --
attaching the functionality to the code entry points: the various
request handlers and the management command module.
I'm still not happy with the implementation in the patch on that
ticket, but I think it's in the right direction.
Because of Django's initialization procedure, addressing the
bootstrapping process is actually a bit complex:
* Some things need to be run before any models are loaded.
Specifically... anything that wants to listen to the class_prepared
signal. This is quite hard because an accidental import of a model
creates all sorts of mess.
* Some things just need to be run before the Django environment is
used (to run a management command or handle a request)... registering
signal listeners, for example.
* Logging may need to come even earlier. If you truly want to log
everything, you'll want to run that code first.
The reason I opened #5685 was primarily because there is no way to
solve this problem without either:
1. Patching trunk
or
2. Writing a bunch of custom wrappers around the Django code to do the
setup... and these wrappers are tedious, easy to screw up and don't
play well with each other.
That means: I'm strongly in favor of any hook to allow code to be run
before the Django environment is setup, and I'm not tied to any
particular path of solving the problem.
Best,
- Ben
On Oct 12, 2009, at 11:04 AM, Simon Willison <si...@simonwillison.net>
wrote:
>
> On Oct 12, 9:03 am, Benjamin Slavin <benjamin.sla...@gmail.com> wrote:
>> That means: I'm strongly in favor of any hook to allow code to be run
>> before the Django environment is setup, and I'm not tied to any
>> particular path of solving the problem.
>
> a very useful step one would be to go
> through and document exactly how Django initialises itself at the
> moment - what loads in what order, when are settings evaluated, what
> bits of Django actually look at INSTALLED_APPS etc.
It's a bit dated now, but this might be a useful place to start.
http://code.djangoproject.com/wiki/DevModelCreation
>
Also, I bet Marty knows this area well from his book work.
Actually, I didn't research much on the initialization process as a
whole, if there indeed is such a beast. I started with what happens
when Python actually encounters a model definition, which occurs after
settings and INSTALLED_APPS have been taken into account, which is
pretty much where the DevModelCreation article starts as well. Like
most people, I've generally deferred to James on the startup issue
here.
http://www.b-list.org/weblog/2007/nov/05/server-startup/
-Gul
There really is no such thing as "INSTALLED_APPS loading". I think
you mean "model loading"... if so, it's not quite so simple. Maybe
there's a way to make this approach work, but it's at least not as
easy as "let's just add these two settings".
> Obviously the callables would be restricted in what they could do, but
> it need be no worse than the restrictions on e.g. what you can import
> in settings.py.
The big offender here is that __import__ (and thus "import ..." and
"from ... import ...") has consequences. Models are initialized the
first time the Python interpreter sees them. I'll illustrate why this
is a problem through an example:
Lets say you have myapp/bootstrap.py and bootstrap.py does an import
(from myapp.views import some_signal_i_care_about) and myapp.views
imports myapp.models. Right there... with two seemingly innocuous
imports, we've broken the contract that the environment is pristine.
Now, if we register a listener for class_prepared, it will never* get
called for anything in myapp.models, or anything imported there. (*
From memory... this is true unless there's a deferred processing of a
model relationship, in which case it's even harder to predict the
outcome)
So, given the current state of affairs it goes beyond just "what they
could do" to "what could be imported". Even a stray import in an
__init__.py could create a series of problems.
> Obviously this approach could be tweaked - e.g. instead of a tuple of
> callables, you could have a tuple of (callable, args, kwargs) - but
> that's detail. Is the basic idea workable?
If we're building a bootstrapping system, it's likely possible to
introduce some additional logic into the metaclass approach that
models use, and work around this case.... but it's non-trivial.
I'd love to take a stab at this again, but am not sure when I'm going
to have time... so I'd be quite happy for someone else to beat me to
it.
Best,
- Ben
Ok... I see what you're saying. I've always viewed that block as
simple expanding the wildcard entries in INSTALLED_APPS and didn't get
that you were referring to it. My apologies for the misunderstanding
there.
>> The big offender here is that __import__ (and thus "import ..." and
>> "from ... import ...") has consequences. Models are initialized the
>> first time the Python interpreter sees them. I'll illustrate why this
>> is a problem through an example:
>
> Yes, but that's also true if you just try to import models or some
> other django modules in settings.py. I thought I covered this by
> saying "Obviously the callables would be restricted in what they could
> do, but it need be no worse than the restrictions on e.g. what you
> can import in settings.py".
I agree it's the same problem as in settings.py. That said, because
it's just one module and most people (from my experience) don't do
imports (aside from standard library stuff) in settings.py, it's a
much less complex interaction.
My concern -- borne from my own work to solve this problem -- is that
it gets very tricky to simply apply a set of coding practices to solve
the problem. I know that saying "you can't import anything that might
import a model" was difficult for my team. We got it to work, but it
did take a good bit of effort to get things working.
> Aren't we talking about the same thing - having to work with how model
> loading is done? Certainly for the logging configuration and other
> configuration which isn't model-dependent, where is the problem?
> I agree this limitation would need to be well documented and perhaps
> have better error reporting.
>
> That is why I specifically appended "but it need be no worse than the
> restrictions on e.g. what you can import in settings.py" to "what they
> could do". Perhaps I should have specifically referred to "imports
> made from code imported and called from settings.py", but I thought
> that would be evident.
We are mostly aligned. The problem, again, comes from the risk of
spurious imports. Maybe we were doing something wrong when we tried to
solve this previously... but I can tell you that the import problem
was a real pain for us.. even when we were trying to be careful.
So, I do understand the direction you're headed. My concern (from
experience) is that it's not as simple as the restrictions on
settings.py. In settings.py you have a single file to worry about and,
as I said before, we don't tend to see many imports there because it's
configuration not logic.
> Before I knew better, I once did
>
> from django.db.backends.utils import CursorDebugWrapper
>
> in settings.py, which seems innocuous and appears not to be model-
> dependent in any way, and yet it gave an error:
>
> "Error: Can't find the file 'settings.py' in the directory containing
> './manage.py'. It appears you've customized things.
> You'll have to run django-admin.py, passing it your settings module.
> (If the file settings.py does indeed exist, it's causing an
> ImportError somehow.)"
This is actually a different problem. It's a circular import:
django/db/__init__.py does an import of settings... which imports your
settings.py... which imports CursorDebugWrapper (which passes through
django/db/__init__.py)... and so on... so you're going to get an
ImportError.
The problem with doing the model importing is that in many cases it
won't generate an exception... it will just fail silently. For
example, signal listeners will be registered after the signal has
already fired.
However, I'm actually really glad you used this example. It's helped
me realized that the circular import problem actually means that we
can't explicitly or accidentally import our settings either... which
means even Simon's code would stop working.
settings.py
============
CALLBACKS = ['django.utils.log.setup']
django/utils/log.py
============
def setup(): # I've extrapolated this from Simon's current patch to
Settings.__init__()
from django.conf import settings # THIS WILL FAIL
log.configure_from_dict(settings.LOGGING)
Even passing the settings as an argument (to the callback) won't work
for many cases. For example, you can't register a listener for
class_prepared because django.db imports settings (as you saw).
I think that we need to keep settings as a discrete piece of
functionality -- establishing configuration.
It looks like I may actually be able to put a bit of time toward a new
implementation for this, so I'll keep the list posted.
- Ben
Is it really the case that we want to log everything? I believe that
logging after initialization is enough. And for my example of a logging
handler that uses ORM it's the only way it can work. Initialization by
definition shouldn't do anything interesting for an application
programmer to look for, it should either succeed or fail with an
exception saying that it "can't run your program, sorry".
As it stands now loading and processing of all the settings is the point
that marks success of initialization. So I'm with Simon in putting
logging somewhere where all the other settings get processed.