contrib.sites and multitenancy

746 views
Skip to first unread message

Carl Meyer

unread,
Nov 20, 2010, 11:41:31 PM11/20/10
to Django developers
Hi all,

I've recently been exploring simple multitenancy options in Django
using contrib.sites, and have some thoughts on how core could make it
easier.

First, let me make a quick distinction between static and dynamic
multitenancy. In the static case, you have a limited set of sites
running on the codebase, that set changes infrequently, and it's not a
problem to have separate settings files for each (which probably
requires some level of additional webserver configuration for each as
well). Django already supports this form just fine via contrib.sites
and the SITE_ID setting.

In dynamic multitenancy, the set of tenant sites changes often and
should be manageable via the database with no need for additional
configuration files or webserver reloads. In order to do this
currently using contrib.sites, you have to stuff the request object
into threadlocals and then monkeypatch contrib.sites to get the
hostname from the threadlocals request and return a Site object
accordingly. (This is the approach taken by Satchmo's django-threaded-
multihost [1] and its fork django-multihost [2]). Having to do this
makes me sad.

But there's good news! Thanks to the work of Gabriel Hurley and others
in r13980 [3], contrib.sites now sports a get_current_site() function
that takes the request as a parameter, and this function is used
throughout Django! Having access to the request gives us most of what
we need; all that's lacking is a way for a project to put in place an
alternative policy such as "return a Site object based on matching
request.get_host() to Site.domain, rather than using
settings.SITE_ID". (This is distinct from the current fallback to
RequestSite if contrib.sites is not installed. RequestSite uses
request.get_host(), but doesn't care what its value is, and doesn't
get a corresponding Site object from the database, so it doesn't let
you name the sites or limit the set of valid domains.)

A few options for how such a hook could be implemented:

1) A setting, something like
SITES_CURRENT_SITE="path.to.my.alternative.get_current_site_function".
This is quite flexible, but Yet Another Setting.

1a) Same as above, but reuse SITE_ID; i.e. it can either be an
integer, or a string path to a get_current_site function. It ends up a
bit misnamed.

2) get_current_site fires a signal first thing, and listeners can
return a Site object to short-circuit the default behavior. This is
also fully flexible, but seems like perhaps more than is necessary.

3) get_current_site checks the request for a "site" attribute, and
returns that if it exists. This allows the project to define its own
site-selection logic in a middleware, and just attach the chosen Site
object to the request. The issue here, I guess, could be backwards-
compatibility with people who may already be annotating their requests
with a "site" attribute. I'm not sure what guarantees Django offers in
terms of not encroaching on namespaces which aren't specifically
documented as open for user annotation, but I know annotating requests
is not uncommon.

Any thoughts on which option should be preferred here? Is there an
even better option I've overlooked?

Thanks,

Carl

[1] http://bitbucket.org/bkroeze/django-threaded-multihost
[2] https://github.com/jaddison/django-multihost
[3] http://code.djangoproject.com/changeset/13980

Russell Keith-Magee

unread,
Nov 21, 2010, 4:14:03 AM11/21/10
to django-d...@googlegroups.com


On 21/11/2010, at 12:41 PM, Carl Meyer <carl.j...@gmail.com> wrote:

Hi all,

I've recently been exploring simple multitenancy options in Django
using contrib.sites, and have some thoughts on how core could make it
easier.
...

A few options for how such a hook could be implemented:
...

1a) Same as above, but reuse SITE_ID; i.e. it can either be an
integer, or a string path to a get_current_site function. It ends up a
bit misnamed.

This approach - or a variant of it - seems like the best idea to me. Rather than providing a custom hook for get_current_site(), I'd argue that we should make SITE_ID a callable that is used internally by Site.objects.get_current().

My reason for this is to avoid confusion over Site.objects.get_current(). At present, it only returns the right answer if the Sites model is installed. However, it is a method on a model, so this makes a certain amount of sense. If the extension point is get_current_site(), we could end up in a situation where the method exists, but may give you the wrong answer if you have a custom get_current_site() method. 

So - I'd propose the following: 

 * add an optional request argument to Site.objects.get_current()
 * allow SITE_ID to be a callable that accepts the request as an argument
 * raise an error if SITE_ID is a callable, but Site.objects.get_current() is called without a request object being provided.
 * get_current_site() remains the preferred public interface for getting the current site.

Essentially, this just means changing line 18 of sites/models.py [1] to check for a callable, and invoke that callable if required.


The question this leaves is whether a SITE_ID callable returns an ID, or a full site object. I agree that returning an object from a method called _ID is less than ideal, but that's a relatively small wart given the corner that we're in, IMHO.

The alternative, of course, is to completely deprecate Site.objects.get_current(), and use a new setting to configure get_current_site(). To be honest, this really wouldn't bother me either. This is the sort of area where a setting is the right solution, and deprecating Site.objects.get_current() avoids the confusion over the "right" way to get the current site.

Yours,
Russ Magee %-)

bur...@gmail.com

unread,
Nov 21, 2010, 6:27:45 AM11/21/10
to django-d...@googlegroups.com
Hi Carl, Russell,

I think any settings.py option will help us a lot,

but doesn't the overall solution mean that one would still need to
have the Site model installed even if we use our custom callable?
I'd also like if someone could explain correct interfaces and if we're
going to change them.
Isn't Site/RequestSite the only possible interface right now?
The more I thinking on it, the more I see we have the same problem
like with the Users model:
the only way to write reusable 3rd party app is to do item_site =
ForeignKey(Site) which bounds us to the Site object.

Also, will the request be passed to SITE_ID if it become a callable?
If not, I'd like if we take get_current_site(request) instead of
Site.get_current(), and deprecate the last one.

> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-develop...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-developers?hl=en.
>

--
Best regards, Yuri V. Baburov, ICQ# 99934676, Skype: yuri.baburov,
MSN: bu...@live.com

legutierr

unread,
Jan 15, 2011, 2:19:07 PM1/15/11
to Django developers
After having spoken briefly to Carl and Russell about in irc earlier
this week, I created a new ticket that addresses the issues raised in
this thread:

http://code.djangoproject.com/ticket/15089

Would anyone mind accepting it and giving feedback?

Gabriel Hurley

unread,
Jan 16, 2011, 6:06:22 PM1/16/11
to django-d...@googlegroups.com
I would be more in favor of seeing a new setting to configure `get_current_site()` than seeing the existing `SITE_ID` setting shoehorned into a new purpose (and being permanently misnamed for legacy reasons). For the sake of allowing complete customization of the site lookup process I suppose a new settings makes sense here.

As I noted on the ticket, I'm not entirely sure that deprecating Site.objects.get_current_site() is necessary, and there are arguments to be made for both sides. Avoiding user confusion is a good thing, but there are cases where authors may want to make use of the Site model exclusively, in which case the shortcut and its associated caching is still useful.

In response to Yuri:

 1. You would not need the Sites Framework installed. The get_current_site function works by checking to see if contrib.sites is installed, and returning a RequestSite if not. RequestSite is just a generic object which implements a common interface with the Site model. Using a custom callable could similarly bypass any need for the Site model if so desired, or it could simply be a different way to retrieve a Site object.

 2. Allowing a custom callable solves the problem of being tied to a particular model, because your custom callable can return an object of any type you like so long as it implements the appropriate hooks (as RequestSite does). This practice is generally referred to as "duck typing".

The "correct interface"--as I understand it--is that all authors should be moving towards get_current_site() as the single point of entry for gaining access to a Site-like object, and that you should not expect the returned object to be a Site instance, but simply that it will behave like one and have an equivalent base set of attributes and methods.

All the best,

    - Gabriel

legutierr

unread,
Jan 16, 2011, 11:28:34 PM1/16/11
to Django developers
I have been researching changes required to implement this, and I
thought I would share one of the sticky points. Some of this is
relevant to Gabriel's recent post.

I have been reading through revisions 14141 [1] and 14142 [2], which
reversed some of the changes in revision 13980 [3], and I have
discovered a couple of challenges. There are a number of places in
contrib app code where the current site must be known, but where there
is no request available to be passed to get_current_site()
(contrib.sitemaps is the most important example, see below); in each
of these cases, Site.objects.get_current() must be used.

As Carl Meyer's above post so correctly described, Django as of
version 1.2.4 is pretty good at "static multitenancy", but Django is
very, very bad at "dynamic multitenancy" (i.e. true multitenancy).
The problem is that dynamic multitenancy is determinable by the
current request, whereas Django's entire multitenancy implementation
is currently tied to a static global SITE_ID value, and is therefore
locked into a static multitenancy mode.

It is for this reason that Site.objects.get_current() needs to be
deprecated. Site.objects.get_current() is 100% dependent on the
global SITE_ID value, and as a result will never permit dynamic
multitenancy to be implemented by any application that uses it. If
the objective is to implement true multitenancy in Django using the
sites framework, then the only alternative to eliminating
Site.objects.get_current() in favor of get_current_site() would be to
have Django store each request in thread locals, the way that Flask
does [4]. In other words, in order to implement true multitenancy,
either the request will have to be placed into thread locals so that
it may be accessed by Site.objects.get_current(), or the request will
have to be passed around to every corner of the code that needs to
know about contrib.sites, so that it can be passed to
get_current_site().

Now, maybe I am jumping to conclusions when I imply that using thread
locals a la Flask [4] is not a viable option. Perhaps it would make
sense to add a middleware that could manage thread locals data safely
and provide an interface to access the stored request. It might be
the easiest thing to do. But I for one do not think that it will be
necessary to do so.

PLEASE NOTE: the above referenced ticket does not NECESSARILY propose
modifying every library that might have a dependency on
Site.objects.get_current() [see below]. What this ticket is saying is
that any applications that implement multitenancy in such a way that
dynamic multitenancy is not an option should be modified during the
1.4 iteration so as to support dynamic/true multitenancy. It is also
proposing a tangible implementation and interface that can be used to
do so. The issues I am raising in this post can be delayed until
future versions if need be, as long as a migration path is specified
now.

At least, those are my two cents.

[1] http://www.google.com/url?sa=D&q=http://code.djangoproject.com/changeset/14141
[2] http://www.google.com/url?sa=D&q=http://code.djangoproject.com/changeset/14142
[3] http://www.google.com/url?sa=D&q=http://code.djangoproject.com/changeset/13980
[4] http://flask.pocoo.org/docs/design/#thread-locals

The list of locations where Site.objects.get_current is still in use
seems to be as follows:

$ grep -nr get_current\( django/ | grep -v svn
django/contrib/auth/tests/views.py:195: site =
Site.objects.get_current()
django/contrib/comments/feeds.py:12: self._site =
Site.objects.get_current()
django/contrib/comments/feeds.py:17: self._site =
Site.objects.get_current()
django/contrib/comments/feeds.py:22: self._site =
Site.objects.get_current()
django/contrib/comments/moderation.py:242: subject = '[%s] New
comment posted on "%s"' % (Site.objects.get_current().name,
django/contrib/sitemaps/__init__.py:33: current_site =
Site.objects.get_current()
django/contrib/sitemaps/__init__.py:68: site =
Site.objects.get_current()
django/contrib/sitemaps/__init__.py:89: current_site =
Site.objects.get_current()
django/contrib/sitemaps/tests/basic.py:166:
Site.objects.get_current()
django/contrib/sites/models.py:10: def get_current(self):
django/contrib/sites/models.py:92: current_site =
Site.objects.get_current()
django/contrib/sites/tests.py:19: # Make sure that
get_current() does not return a deleted Site object.
django/contrib/sites/tests.py:20: s =
Site.objects.get_current()
django/contrib/sites/tests.py:28: site =
Site.objects.get_current()
django/contrib/sites/tests.py:33: site =
Site.objects.get_current()
django/views/decorators/cache.py:19: sites.get_current().domain,
for example, as that is unique across a Django

legutierr

unread,
Jan 17, 2011, 2:42:17 PM1/17/11
to Django developers
I have added an initial, incomplete patch to the ticket for anyone who
would like to comment on it:

http://code.djangoproject.com/ticket/15089

legutierr

unread,
Jan 18, 2011, 10:02:20 PM1/18/11
to Django developers
To complement the above list of locations where
Site.objects.get_current is still in use, I have generated the
following list of locations in the code where settings.SITE_ID
continues to be in use:

$ grep -nr SITE_ID * | grep -v svn | grep -v pyc
conf/project_template/settings.py:39:SITE_ID = 1
contrib/admindocs/views.py:134: site_obj =
Site.objects.get(pk=settings_mod.SITE_ID)
contrib/admindocs/views.py:141: 'site_id':
settings_mod.SITE_ID,
contrib/admindocs/views.py:286: site_obj =
Site.objects.get(pk=settings_mod.SITE_ID)
contrib/admindocs/views.py:295: 'site_id':
settings_mod.SITE_ID,
contrib/comments/feeds.py:27: site__pk = settings.SITE_ID,
contrib/comments/forms.py:155: site_id =
settings.SITE_ID,
contrib/comments/templatetags/comments.py:82: site__pk
= settings.SITE_ID,
contrib/comments/views/moderation.py:21: comment =
get_object_or_404(comments.get_model(), pk=comment_id,
site__pk=settings.SITE_ID)
contrib/comments/views/moderation.py:47: comment =
get_object_or_404(comments.get_model(), pk=comment_id,
site__pk=settings.SITE_ID)
contrib/comments/views/moderation.py:74: comment =
get_object_or_404(comments.get_model(), pk=comment_id,
site__pk=settings.SITE_ID)
contrib/flatpages/templatetags/flatpages.py:22: flatpages =
FlatPage.objects.filter(sites__id=settings.SITE_ID)
contrib/flatpages/views.py:35: f = get_object_or_404(FlatPage,
url__exact=url, sites__id__exact=settings.SITE_ID)
contrib/gis/tests/__init__.py:106: # Saving original values of
INSTALLED_APPS, ROOT_URLCONF, and SITE_ID.
contrib/gis/tests/__init__.py:109: self.old_site_id =
getattr(settings, 'SITE_ID', None)
contrib/gis/tests/__init__.py:123: # SITE_ID needs to be set
contrib/gis/tests/__init__.py:124: settings.SITE_ID = 1
contrib/gis/tests/__init__.py:135: settings.SITE_ID =
self.old_site_id
contrib/redirects/middleware.py:11: r =
Redirect.objects.get(site__id__exact=settings.SITE_ID, old_path=path)
contrib/redirects/middleware.py:17: r =
Redirect.objects.get(site__id__exact=settings.SITE_ID,
contrib/sitemaps/tests/basic.py:124:
public.sites.add(settings.SITE_ID)
contrib/sitemaps/tests/basic.py:131:
private.sites.add(settings.SITE_ID)
Binary file contrib/sites/.models.py.swp matches
contrib/sites/models.py:26: Returns the current ``Site`` based
on the SITE_ID in the
contrib/sites/models.py:38: sid = settings.SITE_ID
contrib/sites/models.py:42: "the SITE_ID setting.
Create a site in your database and set "
contrib/sites/models.py:43: "the SITE_ID setting to fix
this error."
contrib/sites/models.py:154: elif hasattr(settings, 'SITE_ID'):
contrib/sites/models.py:155: return
Site.objects.get_site_by_id(settings.SITE_ID)
contrib/sites/tests.py:11: Site(id=settings.SITE_ID,
domain="staticsite.com", name="staticsite.com").save()
contrib/sites/tests.py:31: s2 =
Site.objects.get(id=settings.SITE_ID)
contrib/sites/tests.py:46: self.assertEqual(site.id,
settings.SITE_ID)
contrib/sites/tests.py:60: self.assertNotEqual(site.id,
settings.SITE_ID)

The reason that this list is pertinent is that it shows where the
Django code is locked into a "static multitenancy" model. Unless this
code is converted to use get_request_site, none of the above contrib
apps will work with a "dynamic multitenancy" model.

In a number of the above cases, a conversion will be trivial.
However, in any case where the SITE_ID is referenced inside an object
or function that is not a view, and therefore does not have reference
to the incoming request, there may be a problem in eliminating the
dependency.

Carl Meyer

unread,
Jan 19, 2011, 3:01:57 PM1/19/11
to Django developers
Thanks legutierr for all your work on this ticket and patch, and
everyone for the comments. I just took some time to review the patch
on #15089 and had a long conversation on IRC with legutierr, and
here's what I'm thinking:

It does appear that there is some code (CurrentSiteManager, for
instance), that wants a Site object and simply cannot pass in the
request (barring threadlocals, which is not an option for core/
contrib). I don't think the addition of multitenancy justifies
breaking currently-working code. So I think we need to add support for
multitenancy (which I'm using here as shorthand for a hook that takes
a request and returns a Site object or something compatible) in such a
way that it is explicitly enabled by setting, and calling code that
fails to provide the request only fails (and in a clear way) when
multitenancy has been explicitly enabled.

There's also the issue of what third-party code does when it depends
on contrib.sites and needs an actual Site model instance, not a
RequestSite or some other replacement. Contrib.flatpages is an example
of this -- the flatpage model has an FK to Site, so it doesn't make
sense for flatpages to call a method which might return a Site object
or might return something else. It needs a real Site object. In my
mind, this use case is why deprecating Site.objects.get_current() is
not a good option. The current patch on #15089 deprecates
Site.objects.get_current() and then adds a "require_site_object" flag
to get_current_site(), but this seems backwards to me. There's already
a well-established indicator that you want an instance (or queryset)
of a certain model, and that is to call a manager method on that
model. I'd rather keep Site.objects.get_current() as the supported API
for when the calling code depends on the Site model (because, e.g.,
it's querying the database for another model with an FK to Site), and
get_current_site() as the API for when the calling code doesn't care
whether contrib.sites is used or not and just wants something with a
domain name. (I think this distinction is already somewhat implied by
how the two methods are currently used in the Django codebase, but
could be better documented. And they could probably named more clearly
as well, but that ship may have sailed).

In my mind, the introduction of a multitenancy hook (in the 1.4
timeframe) could then look something like this:

* Introduce a SITES_BACKEND setting, which defaults to None but can be
set to a dotted path to a callable that takes the request and returns
a Site object (or something else entirely, which should quack like a
Site/RequestSite if the dev wants to use third-party code, but isn't
required to).

* Add an optional "request" argument to Site.objects.get_current(). If
SITES_BACKEND is set and Site.objects.get_current() is not passed the
request, it's an error. But existing code, with no SITE_BACKEND,
continues to work fine calling Site.objects.get_current() with no
request.

* Both Site.objects.get_current() and get_current_site() call
SITE_BACKEND, if its set, and return whatever it returns.
Site.objects.get_current() should check this return value and error if
its not an instance of the Site model. get_current_site() doesn't care
what it returns.

* If SITE_BACKEND is not set, both Site.objects.get_current() and
get_current_site() behave exactly as they do now.

The result is that it's fully backwards-compatible, with no need to
deprecate anything that works now. If you choose to set a
SITE_BACKEND, then you must use third-party code that's been updated
to either call get_current_site() or pass a request to
Site.objects.get_current().

Thoughts?

Carl

Gabriel Hurley

unread,
Jan 19, 2011, 5:55:13 PM1/19/11
to django-d...@googlegroups.com
On Wednesday, January 19, 2011 12:01:57 PM UTC-8, Carl Meyer wrote:
Contrib.flatpages is an example
of this -- the flatpage model has an FK to Site, so it doesn't make
sense for flatpages to call a method which might return a Site object
or might return something else. It needs a real Site object. In my
mind, this use case is why deprecating Site.objects.get_current() is
not a good option.The current patch on #15089 deprecates 

I'm very much in agreement here. This is what I ran into when I first tried to replace all the calls to Site.objects.get_current_site() with get_current_site()... there simply *are* cases where a Site object is required, and using the object manager is the right way to do it, as you pointed out.
 
In my mind, the introduction of a multitenancy hook (in the 1.4
timeframe) could then look something like this:

Also agreed on the 1.4 timeframe. While there is a valid concern for making sure that get_current_site() (introduced in 1.3) doesn't end up getting changed in 1.4, this whole task seems like it's snowballing a small feature whose main purpose was to be more DRY into a large feature with a totally different goal, and doing so very late in the game for the 1.3 release cycle.
 
* Add an optional "request" argument to Site.objects.get_current_site(). If
SITES_BACKEND is set and Site.objects.get_current() is not passed the
request, it's an error. But existing code, with no SITE_BACKEND,
continues to work fine calling Site.objects.get_current() with no
request.

I'm a little uneasy about this... I think it would be an error for Site.objects.get_current() to ever return an object which is not a Site instance, and if we're dealing with arbitrary callables from a SITES_BACKEND setting being responsible for returning a Site-like object it seems like this could break in non-obvious ways.

What that means to me is that while you might want Site.objects.get_current_site() to have multiple ways of retrieving the current Site object based on data in the request, I don't think those should be conflated with the arbitrary SITES_BACKEND callables. I'd rather see Site.objects.get_current_site() try more ways of getting the current site if a request is passed in, and/or have a separate method on the Site manager that can take a request and do request-based lookups. Expand the options on the built-in manager but keep them finite.

I'd say all the SITES_BACKEND functionality should remain tied to get_current_site() and out of the Site manager.
 
* Both Site.objects.get_current() and get_current_site() call
SITE_BACKEND, if its set, and return whatever it returns.
Site.objects.get_current() should check this return value and error if
its not an instance of the Site model. get_current_site() doesn't care
what it returns.

See my previous comment. Raising an exception if the SITES_BACKEND callable returns an object of the wrong type is one way to handle it, I'm just not sure it's my favorite solution. Seems brittle given that a developer using a custom callable might not think about how other apps (especially in contrib) are going to interact with the return value of their callable.
 

That's what I've got for now. Thanks to legutierr for the work on the patch so far.

Carl Meyer

unread,
Jan 19, 2011, 8:14:23 PM1/19/11
to Django developers
Hi Gabriel,

On Jan 19, 5:55 pm, Gabriel Hurley <gab...@gmail.com> wrote:
> > In my mind, the introduction of a multitenancy hook (in the 1.4
> > timeframe) could then look something like this:
>
> Also agreed on the 1.4 timeframe. While there is a valid concern for making
> sure that get_current_site() (introduced in 1.3) doesn't end up getting
> changed in 1.4, this whole task seems like it's snowballing a small feature
> whose main purpose was to be more DRY into a large feature with a totally
> different goal, and doing so very late in the game for the 1.3 release
> cycle.

Yes, I agree that it's worth thinking about what we'll want to do in
1.4, and consider making necessary changes to the new
get_current_site() API before 1.3. At this point I don't foresee
needing backwards-incompatible API changes to get_current_site() in
order to add multitenancy, so I don't think I see anything that needs
to be done pre-1.3.

> > * Add an optional "request" argument to Site.objects.get_current_site(). If
>
> > SITES_BACKEND is set and Site.objects.get_current() is not passed the
> > request, it's an error. But existing code, with no SITE_BACKEND,
> > continues to work fine calling Site.objects.get_current() with no
> > request.
>
> I'm a little uneasy about this... I think it would be an error for
> Site.objects.get_current() to ever return an object which is not a Site
> instance, and if we're dealing with arbitrary callables from a SITES_BACKEND
> setting being responsible for returning a Site-like object it seems like
> this could break in non-obvious ways.
>
> What that means to me is that while you might want
> Site.objects.get_current_site() to have multiple ways of retrieving the
> current Site object based on data in the request, I don't think those should
> be conflated with the arbitrary SITES_BACKEND callables. I'd rather see
> Site.objects.get_current_site() try more ways of getting the current site if
> a request is passed in, and/or have a separate method on the Site manager
> that can take a request and do request-based lookups. Expand the options on
> the built-in manager but keep them finite.
>
> I'd say all the SITES_BACKEND functionality should remain tied to
> get_current_site() and out of the Site manager.

Hmm. I agree that the interaction of SITES_BACKEND and
Site.objects.get_current() is a relatively weak spot in this proposal,
but I like this approach even less.

I very much want to confine all the new behaviors to SITES_BACKEND
(some common sample backends can be provided), and leave current
behavior untouched if you don't define SITES_BACKEND, rather than add
new builtin behaviors unrelated to SITES_BACKEND (as the current patch
does). Otherwise things get more confusing, it's harder to know how to
signal what behavior you want, and maintaining backwards compatibility
is trickier.

I also think it's important to maintain compatibility between
Site.objects.get_current() and get_current_site(); just have the
former be a stricter version of the latter. I think it's much better
for people who define an unusual SITES_BACKEND to get explicit
immediate errors if they have code calling Site.objects.get_current(),
telling them in plain terms that they can't use code that depends on
the Site model if their SITES_BACKEND returns something else, rather
than have a situation where people define SITES_BACKEND and then it
only takes effect some of the time (when get_current_site is called).
The latter is less explicit, doesn't fail fast, and most importantly
doesn't let you define a custom SITES_BACKEND that _does_ return a
Site object, and have it take effect when Site.objects.get_current()
is called, which you'd almost certainly want in that case.

> > * Both Site.objects.get_current() and get_current_site() call
> > SITE_BACKEND, if its set, and return whatever it returns.
> > Site.objects.get_current() should check this return value and error if
> > its not an instance of the Site model. get_current_site() doesn't care
> > what it returns.
>
> See my previous comment. Raising an exception if the SITES_BACKEND callable
> returns an object of the wrong type is one way to handle it, I'm just not
> sure it's my favorite solution. Seems brittle given that a developer using a
> custom callable might not think about how other apps (especially in contrib)
> are going to interact with the return value of their callable.

I think it's less brittle and less confusing than the alternatives.
It's a pretty clear distinction: the only code that should ever use
Site.objects.get_current() is code that explicitly depends on the Site
model being used. If you're defining a SITES_BACKEND that returns
something other than a Site model, you aren't using the Site model,
and therefore you can't use code that explicitly depends on the Site
model.

Carl

legutierr

unread,
Jan 19, 2011, 8:19:40 PM1/19/11
to Django developers
Carl and I had a long discussion in IRC about all the issues he raises
above. I am still digesting that conversation, but there are some
things that already spring to mind.

1. I can see the merits of defining a SITE_BACKEND api that would
allow users to choose from different site-retrieval implementations
that sites framework would contain. I wonder, would it make sense to
make SITE_BACKEND the central aspect of the framework? SITE_BACKEND
could be the required setting (instead of SITE_ID), and a
"get_orm_site_by_id" backend, a "get_orm_site_by_request" backend, and
a "get_request_site" backend could all be defined.

That way, the documentation could say: "Always define a SITE_BACKEND,
here are the different implementations to choose from." It would
provide a good opportunity to discuss the different architectures in
parallel, and would require users to explicitly choose from among
mutually-explicit specific strategies, with only one setting. As it
stands now with both of our proposals, documenting how all this works
requires describing a very intricate and confusing conditional tree.
Making the SITE_BACKEND control everything would simplify
documentation significantly.

2. One difference between our proposals is that I propose that the
framework's API define a *single*, unitary facade
function--"get_current_site()", which takes a an optional
"require_site_object" as a parameter--while Carl proposes that the
framework define two facade functions--"get_current_site()", and the
"Site.objects.get_current()" manager method.

I see these two approaches as logically equivalent. Both proposals
recognize the need to differentiate between the case where any generic
site object may be returned (RequestSite,
django.contrib.sites.models.Site, or some user-defined object), and
the case where only django.contrib.sites.models.Site may be returned.
Carl's proposal is more consistent with the Django practice of putting
orm-object-retrieval logic inside managers, and avoids having to
deprecate Site.objects.get_current(). My proposal more closely
follows the facade design pattern, and I believe simplifies the
documentation and thus simplifies things for users. I would favor my
proposal, obviously, but I can understand the merits of Carl's. Also,
Carl's proposal has the benefit of being what is already checked in.

3. In spite of the fact that having two facade functions does abrogate
the need to define a "require_site_object" parameter in
get_current_site, I think an argument can be made to include the
"require_site_object" parameter in the API for SITE_BACKEND. Carl
proposes verifying the object returned by SITE_BACKEND before
returning from the manager method, to make sure it is in fact a Site
object. If a user-defined SITE_BACKEND is coded to retrieve a custom
object from the db, then an unnecessary database call can be prevented
if "require_site_object" is passed in and used to return None at the
top of the function.

4. I think it is important to keep in mind what the original
motivation was behind #15089 and the post that started this thread: to
remedy architectural inadequacies that forced the Satchmo devs, in
their implementation of django-threaded-multihost, to monkey-patch
SiteManager. Although using thread-locals to store the current
request is something that Django frowns upon, it is the only way to
achieve certain multitenancy behaviors, and seems to be required by a
decent number of users (between them, django-threaded-multihost and
django-multihost have 37 followers and 4 forks, and I am sure there
are a number of other implementations out there).

If SITE_BACKEND is called only when a request is supplied as an
optional argument to Site.objects.get_current(), then the thread
locals strategy employed by django-threaded-multihost et al. will
still require monkey-patching SiteManager. If SITE_BACKEND is being
called inside Site.objects.get_current(), it *should* be called
*regardless* of whether a request is passed into it. If a particular
SITE_BACKEND requires that a request be passed in for it to do
properly retrieve a site, then that backend should raise an exception;
it should not be that such a SITE_BACKEND is simply ignored. This is
distinct from Carl's proposal, but I think not a significant deviation
from it.

Regards,

Eduardo

Jari Pennanen

unread,
Jan 25, 2011, 11:08:10 AM1/25/11
to Django developers
Hi!

I've started another thread since I was oblivious about this one.

One thing I noticed in my early experimentation is that caching Site
objects is a bit of waste. I don't know what is the overhead for
caching Site objects compared to just IDs but I suppose when we get to
hunderds of sites and most requests need *only the site id* it could
be significant.

Point is that most of the time view needs only site_id for filtering.
(sites__id__exact=site_id)

So currently I'm caching only site ids to dict by host name like this
(on first time it populates the cache from Site.objects.all()):

_CACHE_SITEIDS = {
'example.com' : 2,
...
}

And my MultiSitedMiddleware does pretty much this (following code does
not include function getters/setters to simplify):

request.site_id = _CACHE_SITEIDS.get(request.META.get('HTTP_HOST',
''), settings.SITE_ID)

So I propose to implement get_current_site_id(request=None) and
get_current_site(request=None) if middleware is out of question. If
request is not given the return settings.SITE_ID or current site
object. (get_current_site should *not* cache, it gets the Site object
on demand, it is very to need the site *object*)

I've reviewed my approach and so far only thing I'm missing is that I
also need to cache for settings.MEDIA_ROOT and MEDIA_URL and MEDIA[1]
by site (Site objects does not support this, so I'm going to do
extended Site model that creates a support for that). So I probably
will implement some additional cache getter get_media(request=None)
that returns tuple (media, media_url, media_root)

([1]Notice that I have one extra there "MEDIA" which is not part of
standard django, but it is just dict that initializes single Media
object (MEDIA={'css' : ...} in settings.py) that can be used sitewide.
I've found this as handy to have so I can update sitewide jquery.js
file/url from settings.py alone. Since it is per site setting I must
save it to database or somewhere and make it changable as per site
like MEDIA_ROOT and MEDIA_URL.)

Jari Pennanen

unread,
Jan 25, 2011, 11:20:08 AM1/25/11
to Django developers
This is the stuff I've come up on first try few days ago:

Here is my current cache implementation: https://gist.github.com/795135
And here is my middleware: https://gist.github.com/795138
Reply all
Reply to author
Forward
0 new messages