Adding a URL tagging feature

214 views
Skip to first unread message

Atul Bhouraskar

unread,
Sep 17, 2015, 10:02:11 PM9/17/15
to django-d...@googlegroups.com
I've received a few comments on Ticket #25409 and so I'm opening up the discussion here.

The pull request is https://github.com/django/django/pull/5309

Apologies for the long post, just wanted to be as clear I could!

The objectives of the discussion are to determine:
1. Is this something that could be merged in before the other URL re-factoring work (https://groups.google.com/d/topic/django-developers/9AqFxIIW2Mk/discussion)  I personally think that we can as the code changes are minimal.
2. Does this approach conflict with/complement/replace the 'decorators' approach proposed in the URL rework project. There is a comparison of the pros and cons of each approach below. Looking at the code on that branch it appears simple for me to be able to merge it in there.

== Synopsis ==

Often (usually in middleware) processing has to be applied to certain URLs only eg CORS.

The usual way to specify this would be to create an additional set of regex patterns identifying these urls - eg.

CORS_URLS_REGEX = r'^/api/2/.*$'

JSONP_URLS = r'^/api/1/.*$'

PRIVATE_URLS = r'/(private|api)/.*$'

Each middleware then typically matches the incoming request URL to the regex and determines whether it is to be selected for processing by it.

This approach has several limitations including:
* It violates DRY as the regexes in the settings have to be synced with the actual URL patterns
* Matching multiple patterns either requires the user to create complex regexes or the app/middleware writer has to essentially reinvent URL patterns - poorly.

== The Proposal ==

Add an optional tags keyword argument to django.conf.urls.url allowing a URL to be optionally tagged with one or more tags which can then be retrieved via HttpRequest.resolver_match.tags in the middleware / view (or any code with access to urlpatterns - not necessarily in the context of a request). Probably easiest to explain via examples:


urlpatterns = [
    url(r'^$', views.home, name='home'),
    url(r'^private/$', include(private_patterns), tags=['private']),
    url(r'^api/1/', include(api_v1_patterns), tags=[
        'api', 'private', 'jsonp',
    ]),
    url(r'^api/2/', include(api_v1_patterns), tags=[
        'api', 'cors', 'private',
    ]),
]

api_v1_patterns = [
    url(r'^list/books/$', views.list_books, name='list-books'),
    url(r'^list/articles/$', views.list_articles, name='list-articles', tags=['public]),
    ...
]

api_v2_patterns = [
    url(r'^list/books/$', views.list_books, name='v2-list-books'),
    url(r'^list/articles/$', views.list_articles, name='v2-list-articles',),
    ...
]

In the above patterns all URLs under /private/ are tagged 'private', all URLs under /api/1/ are tagged 'api', 'jsonp' and 'private'.


Some examples to show how you can access and use tags

Example Middleware:

class PrivatePagesMiddleware(object):
    def process_view(self, request, view_func, view_args, view_kwargs):
        """
        For any url tagged with 'private', check if the user is authenticated. The presence of a
        'public' tag overrides the 'private' tag and no check should be performed.
        Authentication depends on whether the URL is marked as 'cors' or not. 'cors' urls
        use HTTP header token authentication
        """
        tags = request.resolver_match.tags
        if 'private' in tags and not 'public' in tags:
            if 'cors' in tags:
                # CORS requests are authenticated via tokens in the headers
                # check auth tokens
                ...
                if not authenticated:
                      return HttpResponseForbidden()
            elif not request.user.is_authenticated():  # normal django auth
                return redirect('login')

class CorsMiddleware(object):
    def process_view(self, request, view_func, view_args, view_kwargs):
        if 'cors' in request.resolver_match.tags:
            # continue CORS processing

    def process_response(self, request, response):
         if 'cors' in request.resolver_match.tags:
            # continue CORS processing


Example Management command:

commands/exportapi.py

"""
Javascript API code generator
Iterate through urlpatterns, for each url tagged with 'api' export a Javascript function
that allows js code to call the api function. Depending on whether the pattern is tagged
'jsonp' or 'cors' write the corresponding type of function
"""

def get_api_urls(urlpatterns, api_type):
    for pattern in urlpatterns:
         # check if pattern has the 'api' tag and the api_type tag
         ....
         if is_api_type:
             yield pattern


class Command(BaseCommand):
    def handle():
         for api_pattern in get_api_urls(urlpattrns, 'jsonp'):
              # write JSONP javascript function to stdout

         for api_pattern in get_api_urls(urlpattrns, 'cors'):
              # write CORS javascript function to stdout

manage.py exportapi > api.js

---------------------------------------------------------------------

The actual code change required to enable the tags feature is about 10 lines. All that the urls code does is to make the tags (after combining included patterns) available to the match object (which is already available to the request object).

As per the discussion in the ticket, the URLs rework project (https://groups.google.com/d/topic/django-developers/9AqFxIIW2Mk/discussion) also adds a feature that is at first glance similar to what I have proposed.

However I believe that the two approaches solve different sets of problems (though there is overlap).

The corresponding proposal there is to add a decorators tag to django.conf.urls.url allowing

url(r'^private/'), include(private_patterns), decorators=[login_required]),

This will apply the decorator login_required to all the urls under /private/

If what you wanted to do was to apply the decorator to all views then this is undoubtedly very convenient and does the job perfectly.

However decorators are not the most convenient mechanism for:

1. Whitelisting as opposed to Blacklisting where a group of URLs is by default private except for the ones marked public. Writing a login_required decorator is straightforward, however writing a login_exempt decorator will always involve using the decorator to 'tag' the view and then check the tag in the middleware (eg. the csrf_exempt decorator). Using a decorator to 'mark' a view is heavyweight and needs to be done carefully (using functools etc) to ensure that it works correctly in the presence of other decorators.

2. Selecting a URL on the basis of a combination of decorators is not straightforward. Applying multiple decorators effective ANDs them however ORing or other logic is convoluted if actually possible. With string tags this is trivial.

3. Decorators are most useful in the context of a request as they are applied when the URL is actually resolved. On the other hand checking if a URL is tagged does not necessarily involve resolving the url allowing them to be more easily used in management commands etc

In addition to the above:
* Tagging is more 'semantic' - tagging a URL as 'private' does not enforce the use of the login_required decorator - there could be a completely different mechanism used which could change over time.
* Tagging a URL has no side effects other than they being copied over to the match object. The urls mechanism does not have to care about if/how the tags are actually used.
* More lightweight when all you want to do is 'mark' the URL.

The linked pull request is fully functional and includes tests but not documentation - which I can add at short notice.

All comments welcome!


Atul

Marc Tamlyn

unread,
Sep 18, 2015, 7:57:40 AM9/18/15
to django-d...@googlegroups.com
Some quick thoughts:

- I'm unconvinced that selecting urls by a given type is a common use case - can you expand on that?
- Implementing the detection and usage of tags as middleware seems not as nice as in decorators to me, especially as the middleware tree can be... unpredictable in its behaviour. I try to avoid writing middleware when possible. Decorators I know just apply at the last point to the view. I'd prefer a nicer way to interact with the tags than middleware, but maybe all I want is nicer middleware.
- With includes, are tags appended to as you go down? Could there be a way to remove "parent" tags? (for example the a whole area of the site could be marked private, but have a "shareable" preview page inside it with an obscure url which can be shared)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAHdnYzu2zHVMcrjsSRpvRrdQBMntqy%2Bh0puWB2-uB8GOU6Tf7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Shai Berger

unread,
Sep 28, 2015, 5:07:01 PM9/28/15
to django-d...@googlegroups.com
Assume decorators, then, more or less:

def tagged(*tags):
def decorate(view):
@functools.wraps(view)
def decorated(*args, **kw):
view(*args, **kw)
decorated.tags = tags
return decorated
return decorate

(taking care to define a new function in the decorator so that you can use the
same view with differnet tags in different URLs)

That doesn't seem too heavyweight to me. Am I missing something?

Assuming not, the decorator defined above can be used already with current
production versions of Django for single URLs, and after the URL rework also
on includes.

Your examples become

# Already today
api_v1_patterns = [
url(r'^list/books/$', views.list_books, name='list-books'),
url(r'^list/articles/$', tagged('public')(views.list_articles),
name='list-articles'),
...
]

# After URL rework
urlpatterns = [
url(r'^$', views.home, name='home'),
url(r'^private/$', include(private_patterns),
decorators=[tagged('private')]),
url(r'^api/1/', include(api_v1_patterns), decorators=[
tagged('api', 'private', 'jsonp'),
]),
url(r'^api/2/', include(api_v1_patterns), decorators=[
tagged('api', 'cors', 'private'),
]),
]

taking Marc's doubts (which I agree with) into account, and seeing as 1.9 is
already feature-frozen, I think that the proper way forward for this feature
is to live out of core.

If I am missing something, and it is hard to implement it out of core, please
explain.

HTH,
Shai.

Atul Bhouraskar

unread,
Sep 29, 2015, 9:23:57 PM9/29/15
to django-d...@googlegroups.com
Thanks Marc& Shai for taking the time to look at this. Apologies for the late response as I was busy.

Yes, I agree that decorators can be used to tag urls, however I think that the way that the URL rework code applies decorators can be improved.

The reason I've called the decorators approach 'heavyweight' is because the decorators are applied by the ResolverMatch object and so all of the decorator code (not just the decorated function) will run for every decorator for every request. Normally decorator code (if applied directly to the view) would only run at module import time and only the wrapped function code is executed during a request and there is no reason to look at performance. But placing all the functools wrapping helper code into the 'hot' request path should be avoided if possible.

I've gone over the URL rework branch in some detail and I think we can/should move the application of decorators from inside the ResolverMatch to the ResolverEndpoint so that the decorators are applied only once when the resolver is instantiated using get_resolver(). I'm happy to submit a patch for that however I'm not entirely sure which tree is the reference one for the URL rework. Is it this one? https://github.com/jaddison/django/tree/gsoc2015_url_dispatcher

Atul

Reply all
Reply to author
Forward
0 new messages