Multi Tenancy and HTML Filtering Levels

281 views
Skip to first unread message

Stephen McDonald

unread,
May 5, 2012, 9:37:59 PM5/5/12
to mezzani...@googlegroups.com
Hi all,

Don't want to detract from the current thread on template refactoring, as it's the biggest area in need at the moment, but I've pushed up some new changes that I'd love some feedback on.

A quick one to first get out of the way, I've added a setting that controls the filtering level on RichTextField instances. Filtering was hastily added to fix the possible XSS issue described recently. This has the downside of filtering out potentially dangerous yet useful/required HTML, particular for embedding videos. What I've done is provided a setting with 3 levels of filtering, high (filter all dangerous tags), low (allows some extra tags, like those for video, plus we'll add more to this over time), and None (disables filtering entirely - a valid option with only one or few trusted admin users). 

To support this I've also added a "choices" arg to the register_setting function, to provide a fixed set of choices for an editable setting. Format is a sequences of name/value pairs, just like choices for Django form/model fields.


Now for the biggie: multi-tenancy. We now have the ability to host multiple sites (django.contrib.sites) on a single running instance. This is the change I'm most keen on feedback for, particularly around actually running it, testing it, and trying to break it, as its approach is unorthodox at worst, by way of introducing threadlocals and a "current request". Here's the gist of how it works:

All site related models previously made use of django.contrib.sites's CurrentSiteManager which is where the tie to the SITE_ID setting occurred. These models now use Mezzanine's CurrentSiteManager, which uses a series of checks in looking for the site ID, the main one being matching the host of the current request to the domain of the site record. To support this, a new middleware is used that assigns and removes the request object to a threadlocal variable, which gives us global access to the current request. If this approach turns out fine, which I believe it will, it will lead the way for a clean implementation of multi-lingual content support, as well as some great refactoring around the multi-device support.

The series of checks in order looks like this:

- Check for a "site_id" in the session. This is first, and is used for bypassing the domain check, so that say in the admin for example, we can include a feature for flipping between sites, without leaving the domain the admin is running on.
- Check for the host in the current request against a site record's domain, as described above.
- Check for an environment variable MEZZANINE_SITE_ID. This allows you to specify a site to use when a current request wouldn't be available, for instance with management commands. I've also modified manage.py to look for (and remove) a --site=ID setting which will take care of this.
- Final fallback is the SITE_ID setting.

While fairly time consuming, this turned out to be quite an easy change thanks to the initial sites work that Eduardo Gutierrez did way back over a year ago, and all the tests he wrote still pass and are still very applicable to this new approach, so big thanks to him. Here's the implementation, would be very happy to have any flaws pointed out:





--
Stephen McDonald
http://jupo.org

Josh Cartmell

unread,
May 6, 2012, 11:21:34 PM5/6/12
to mezzani...@googlegroups.com
Hey Steve I wanted to let you know that I do plan on taking a look at this, I just haven't had a chance yet.  Thanks for all of the hard work on it and I think it is a great addition to Mezzanine.

Would I be correct to assume that I could test this in a local development environment using localhost and 127.0.0.1 to access the same site?  That being the case, which I think it is, I think it should be pretty easy for people to try this out and give feedback.

Thanks!

Stephen McDonald

unread,
May 6, 2012, 11:54:08 PM5/6/12
to mezzani...@googlegroups.com

Yep that's exactly what I did when building it.

POKOLI I PUNTO

unread,
May 7, 2012, 5:33:57 AM5/7/12
to mezzani...@googlegroups.com
First of all, Thanks for your Work Stephen. My response in-line ;)

Al 07/05/12 05:54, En/na Stephen McDonald ha escrit:
> All site related models previously made use of django.contrib.sites's
> CurrentSiteManager which is where the tie to the SITE_ID setting
> occurred. These models now use Mezzanine's CurrentSiteManager, which
> uses a series of checks in looking for the site ID, the main one being
> matching the host of the current request to the domain of the site
> record. To support this, a new middleware is used that assigns and
> removes the request object to a threadlocal variable, which gives us
> global access to the current request. If this approach turns out fine,
> which I believe it will, it will lead the way for a clean
> implementation of multi-lingual content support, as well as some great
> refactoring around the multi-device support.
When you say that it will lead the way for a clean implementation of
multi-lingual content support, you assume that every language has it's
own site and the user has to create the content for every site?

For example If i have to languages (English, Spanish):

I will create two sites: es.example.com en.example.com and every site
has the content in the corresponding language. But: How we can specify
in which content is the default site example.com?

It would be very grateful I you can explain how the multi-tenacy will
help to the multi-language content support.


Stephen McDonald

unread,
May 7, 2012, 5:54:51 AM5/7/12
to mezzani...@googlegroups.com
The approach you described is actually possible right now with multi-tenancy support. Even prior to this, one of the suggested approaches for multi-lingual content was to create separate sites for each langage, the different now is that you can set that up with a single instance given the multi-tenancy work.

But that's not what I was referring to. The new feature addresses a general problem for loading site related data - essentially a global variable (not quite, but for argument's sake) that can be applied to all queries that need to reference a site, without having to pass around a request object or site ID to every single piece of code that queries site related data.

What I was referring to was that general approach, of being able to globally filter data by a particular state, in the current case a site, and have all existing queries Just Work, without modifying all the pathways down through every function call in order to pass the site around. The same concept will be applicable to language. Once we actually have a method for querying content by language, we'll be able to again have One Spot in the code (most likely a model manager) that grabs the current language for the current request, and applies it to all model that have language specific data.

Hope that clears it up a bit.

POKOLI I PUNTO

unread,
May 7, 2012, 6:15:08 AM5/7/12
to mezzani...@googlegroups.com
Thanks for your comments. It clears all my doubts ;)

Al 07/05/12 11:54, En/na Stephen McDonald ha escrit:

Josh Cartmell

unread,
May 8, 2012, 1:30:41 PM5/8/12
to mezzani...@googlegroups.com
Hey Steve so my first comment is that I think there should be a way (and maybe there is and I'm missing it but I imagine the manytomany admin widget working well) to have a page be assigned to multiple sites.  Two sites may share a common contact page for example, or certain blog posts may be shared across sites and others not.  I also think it would be useful to be able to change the site a page is assigned to because an admin user may accidentally create a page while having the wrong site selected.

Also I noticed that the site selector in the admin (which so far works great) floats in a an awkward position covering up the titles of pages.

I will continue to update this thread if I have more feedback.  Thanks for putting this together.

Josh Cartmell

unread,
May 8, 2012, 2:41:41 PM5/8/12
to mezzani...@googlegroups.com
So site selector floating in a weird place appears to have been the result of my browser having cached mezzanine css because it has gone away without any code changing.

Here is my first attempt at sites being a ManyToManyField:


Things to note:
-I removed the save method because when you save a new page it doesn't have an id at the point that save was called so that was causing errors.
-I added in default to sites but it isn't working the way I would want and when you add a page all available sites are selected by default.  
-If you are using the admin as a site and remove it from the sites list and click save and continue editing you end up on a page that doesn't exist.  I imagine this would happen to the current implementation as well if there was a way to change the site of a page.

I think allowing content to be shared across sites is important to our implementation and I think that sharing content between multiple sites is at least part of the intention of the django Sites framework.

Any thoughts on my implementation?

Josh Cartmell

unread,
May 8, 2012, 2:47:15 PM5/8/12
to mezzani...@googlegroups.com
I forgot to mention but I didn't update the fixtures so if you test out my changes don't use --noinput when you create a db and tell it not install initial pages.

Alec Taylor

unread,
May 8, 2012, 2:54:26 PM5/8/12
to mezzani...@googlegroups.com
Sounds great.

The multi-tenant support in particular sounds like something which you
should remove the dependencies of and submit directly for Django. It's
a feature definitely needed but current absent.

Josh Cartmell

unread,
May 8, 2012, 2:57:53 PM5/8/12
to mezzani...@googlegroups.com
One more idea I just had was that in addition to making sites a manytomany it would be nice to have a boolean value that could make a page always show for all sites regardless of which sites were selected.  That way if there were pages that you always wanted to show you wouldn't have to keep associating it with more sites as you added them.

Stephen McDonald

unread,
May 8, 2012, 4:34:53 PM5/8/12
to mezzani...@googlegroups.com
This is the logical next step with this, making the site field many to many. I thought about it a bit and stopped short there as there are a lot of walls hit particular with pages.

- What happens when a child page gets assigned to a second site that doesn't have the parent?
- How can the page's _order field (for ordering) be used which it's relative to the other pages on that branch? We'd need to break ordering out into its own table that stored page IDs, site IDs and _order values - not sure if it's the right approach or not, but either way it's a big step from how it is now.

So I left that for now and just took it as far as the current state of it - single site field per site-related record in any model.

Josh Cartmell

unread,
May 8, 2012, 4:53:03 PM5/8/12
to mezzani...@googlegroups.com
Good points, I hadn't thought of those issues but I'm glad you did.  I think that making it a many to many will be a great feature but I see that it is not so simple as I had initially thought.

Stephen McDonald

unread,
May 8, 2012, 5:04:27 PM5/8/12
to mezzani...@googlegroups.com
Yeah it's definitely the next logical step. Just needs a lot of thought. So I thought for this feature we'd just take it this far for now, and solve the next step separately. 

One idea I did have was that we could customize the admin for sites, and have a drop-down list of existing sites you could choose from that lets you "copy" site related data for that site when you add a new site. But not too fussed on that one for now.

Gary Reynolds

unread,
May 8, 2012, 6:13:55 PM5/8/12
to mezzani...@googlegroups.com
So if this is the direction, I think it would be an idea to reconsider the differentiation between the site structure, and the what is displayed at any given point in the structure.

It's the content which people want to be reusable. Why not have (Content) items which can be associated to a Node. I bracket Content, because it could well be a Gallery or Product that someone wants "mounted" at a particular location in the site.

This is how my own Django based CMS works, and although I've not needed to re-use content across multiple sites, the data is separated for these reasons; moreso to allow dynamic application routing, which great advantage (although a custom {% url %} tag is required in certain cases).

One of my biggest reservations with Mezzanine is how tightly coupled site structure is to content - it removes a great deal of flexibility; too much for my purposes.

Gary

Stephen McDonald

unread,
May 8, 2012, 7:07:22 PM5/8/12
to mezzani...@googlegroups.com
This sounds like a good general direction for solving the content sharing problem. 

I'd like to mark taking the multi-tenant site setup to this next step as something for 1.2 or later, and go with the current approach for 1.1 - what do people think about that?

Josh Cartmell

unread,
May 10, 2012, 4:01:53 PM5/10/12
to mezzani...@googlegroups.com
That sounds like a sensible timeline to me.  I think as people get to use the new multi tenant features issues or feature requests that we haven't thought of may arise which will help shape the continuing direction.

Would separating content from structure essentially mean that we still have a menu that is constructed of Pages, but then the secondary content type associated with that Page (i.e. RichTextPage, Gallery, Form, etc...) could be changed at will or re-used?

That sounds very powerful and flexible which I like, I just think great care needs to be taken to not make it clunky.  For example right now even if you have a 100 pages it is still pretty easy to find a specific page in the admin because the page tree keeps things neatly organized.  If you were looking for a piece of RichText content among a mass of a hundred, or more, I could imagine it being a little more difficult to find since those pieces of RichText content would not have any inherent structure/ordering (if I am understanding things correctly).  Ease of use is one of the biggest "features" I like about Mezzanine and I would not want to detract from that.  In short, I do like the idea as long as it implemented well.  If Mezzanine's history is an indicator it will be implemented well =)
Reply all
Reply to author
Forward
0 new messages