Okay, here is the reason this is likely happening. And a temporary workaround.
I am going to have to think about a bigger solution to the general problem and/or whether I have the Django folks make a change to the registry to cleanup after errors properly.
Anyway, what is happening is that when your server starts up and Apache/mod_wsgi therefore starts, when the first request comes into your Django application, that forces the loading of wsgi.py and so get_wsgi_application() is called.
Unlike before Django 1.7, this now has a side effect of loading up all your applications. This appears to occur within Registry.populate().
If the loading of one of the application experiences an error, it will raise an exception with the registry in a half built state. It will release the lock on the registry acquired by that thread, but the ready flag on the registry will be left as False.
When mod_wsgi sees that the WSGI script raised an exception when loading, it will throw away the module and return a 500 response error. What it doesn't do is throw away all the other modules that may have been loaded such as Django and its registry.
On the next request, it will load wsgi.py again and when it calls into Registry.populate(), if finds the ready flag still false, but app_configs, which it has the reentrant check on is half populated and so it raises an error for that now instead.
A few things to now sort out.
What was the transient error that loading of your application encounters on system boot. Is your application possibly trying to preload data from a database and the database server is on the same system and the database hasn't completely started up and so is failing? This would explain why it only happens on system boot.
If there was a transient error which caused an exception to be raised, the details should have appeared in the Apache error log. Make sure you check the main Apache error log and not just the VirtualHost error log.
Next, what can be done.
Ideally Django would in the case of seeing an exception while loading applications not leave the registry in an inconsistent state and cleanup so that application loading can be attempted again. I imagine though that they may be of the opinion that it is too hard to try and load applications more than once as they may have cached start which would cause reloading to be difficult.
The only solution therefore would be too kill the process so that mod_wsgi will recreate it automatically and so throw all the state away.
Thus, do the following:
import sys
import os
import signal
try:
application = get_wsgi_application()
except Exception:
# Error loading applications.
if 'mod_wsgi' in sys.modules:
os.kill(os.getpid(), signal.SIGINT)
raise
Worth pointing out is that this could happen with any WSGI server. With a WSGI server like gunicorn though, the initial error would cause gunicorn as a whole to exit and fail to start up. If gunicorn was under supervisord then it would automatically restart it and likely the next time if whatever resource wasn't ready the first time was then ready it would start okay.
I may be able to introduce a option configuration directive to say restart whole process on failed load of WSGI script. This has to be optional though and would also only apply to daemon mode. It can't be on by default because in a system with multiple WSGI applications in same process, you don't necessarily want to restart whole process if one fails.
Another issue is that if you do restart, if it fails to load again, then you end up in an endless loop of it continually restarting and due to how processes are managed by Apache, pretty well impossible to have a system which throttles back and stops restarting.
So try that workaround, but watch it carefully, because if the error isn't recoverable, it will loop, continually killing the process and restarting it.
Better that you work out what isn't ready such that the applications are loading so soon after system boot.
Does that all make sense?
Graham