URL dispatching framework: feedback requested

384 views
Skip to first unread message

Marten Kenbeek

unread,
Dec 28, 2015, 11:23:19 AM12/28/15
to Django developers (Contributions to Django itself)
Hi everyone,

This past week I've made some great progress in rewriting the URL dispatcher framework and iterating on my implementation. A big part of my effort to refactor my original code was to increase the performance of reverse() to a level similar to the legacy dispatcher, and to decouple the various parts of the code. I think I have now achieved both goals, so I'd like to get some feedback on the result. 

The current code can be found at https://github.com/django/django/pull/5578.

I will be cleaning up the code and shuffling around some of it. There's still a lot to be done, but the high-level design and public API are pretty much finished and ready for review. 

The API consists of 4 parts, most of which are extendible or replaceable: a Dispatcher, a set of Resolvers, Constraints and the URL configuration. 

The main entry point for users is the Dispatcher class. The dispatcher is responsible for resolving namespaces and reversing URLs, and handles some of the utility functions available to users (some more may be moved here, such as is_valid_path() or translate_url()). It is a thin wrapper around the root Resolver to allow a single entry point for both reversing and resolving URLs. It currently provides the following public API:
  • Dispatcher.resolve(path, request=None) -> Resolve path to a ResolverMatch, or raise Resolver404
  • Dispatcher.resolve_namespace(viewname, current_app=None) -> Resolve the namespaces in viewname, taking current_app into account. Returns resolved lookup in a list.
  • Dispatcher.reverse(lookup, *args, **kwargs) -> Reverse lookup, consuming *args and **kwargs. Returns a full URL path or raises NoReverseMatch. 
  • Dispatcher.resolve_error_handler(view_type) -> Get error handler for status code view_type from current URLConf. Fall back to default error handlers. 
  • Dispatcher.ready -> (bool) Whether the dispatcher is fully initialized. Used to warn about reverse() where reverse_lazy() must be used. 
I'm currently looking into possible thread-safety issues with Dispatcher.load_namespace(). There are some parts of Django that depend on private API's of Dispatcher and other parts of the dispatching framework. To maximize extensibility, I'll look if these can use public API's where appropriate, or gracefully fail if a swapped-in implementation doesn't provide the same private API. One example is admindocs, which uses the Dispatcher._is_callback() function for introspection. 

If a developer wishes to completely replace the dispatcher framework, this would be the place to do it. This will most likely be possible by setting request.dispatcher to a compatible Dispatcher class. 

The BaseResolver class currently has two implementations: Resolver and ResolverEndpoint. The resolver's form a tree structure, where each resolver endpoint is a leaf node that represents a view; it's job is to resolve a path and request to a ResolverMatch. Users will mostly use this through Dispatcher.resolve(), rather than using it directly. Its public API consists of two functions:
  • BaseResolver.resolve(path, request=None) -> Return a ResolverMatch or raise Resolver404.
  • BaseResolver.describe() -> Return a human-readable description of the pattern used to match the path. This is used in error messages. 
There is a slightly more extensive API that allows a resolver to "play nice" with Django's resolver implementations. This allows a developer to replace a single layer of resolvers to implement custom logic/lookups. For example, you can implement a resolver that uses the first hierarchical part of the path as a dict lookup, rather than iterating each pattern. To make this possible, a resolver should accept the following arguments in its constructor:
  • BaseResolver.__init__(pattern, decorators=None) -> pattern is a URLPattern instance (explained below). decorators is a list of decorators that should be applied to each view that's a "descendant" of this resolver. This list is passed down so the fully decorated view can be cached. 
I'm still looking how exactly we'd allow a developer to hook in a custom resolver, any ideas are welcome. 

Constraints are the building blocks of the current dispatcher framework. A Constraint can (partially) match a path and request, and extract arguments from them. It can also reconstruct a partial URL from a set of arguments. Current implementations are a RegexPattern, LocalizedRegexPattern, LocalePrefix and ScriptPrefix. This is the main extension point for developers. I envision that over time, Django will include more constraints into core for common use-cases. One can for example implement a DomainConstraint or MethodConstraint to match a domain name or request method in the URL, or implement a set of constraints based on the parse library for better performance than the built-in regex-based constraints. A Constraint currently has the following public API:
  • Constraint.match(path, request=None) -> Match against path and request, extracting arguments in the process. Returns new_path, args, kwargs or raises a Resolver404.
  • Constraint.construct(url_object, *args, **kwargs) -> Reconstruct a partial URL from *args and **kwargs, and add partial URL to the url_object. Returns url_object, args, kwargs or raises NoReverseMatch. Any arguments used by the constraint should be removed from the returned arguments -- if any arguments are left when all constraints are consumed, a NoReverseMatch is raised.
  • Constraint.describe() -> Return a human-readable description of the constraint used to match the path. This is used in error messages. 
The biggest API problem here is in Constraint.describe(). The current implementation is incredibly naive when it comes to the order of constraints. If any constraint implements a sort of lookahead/lookbehind assertion, the current API doesn't provide a method to properly communicate that in an error message. 

The last part of the puzzle is the URL configuration. There is little functionality here: it is mostly an effort to standardize and normalize the data structure in the URL configuration files, while keeping the configuration decoupled from the Resolver and Dispatcher classes. The API for Django users will remain unchanged (except for the new decorators option). The public API is mostly intended for developers who wish to implement their own dispatcher or resolver, while maintaining compatibility with Django's method of configuring URLs. It consists of 4 classes: URLPattern, Endpoint, Include and URLConf. To illustrate:

[
    url
(r'^$', views.home, name='home'),
    url
(r'^accounts/', include('accounts.urls', namespace='accounts')),
    url
(r'^admin/', admin.site.urls),
]


The above snippet is how you would configure a basic site with a home page, accounts section and admin. This URLConf would produce the following data structure:

[
   
URLPattern([RegexPattern(r'^$')], Endpoint(views.home, 'home')),
   
URLPattern([RegexPattern(r'^accounts/')], Include(URLConf('accounts.urls'), namespace='accounts'))
   
URLPattern([RegexPattern(r'^admin/')], Include(URLConf(admin.site.get_urls(), app_name='admin', decorators=[admin.site.admin_view]), namespace='admin'))),
]

As you can see, the latter is a lot more verbose, while I think the compact URL configuration is one of the great features of Django's URLConf. That's why the API for configuring URLs will remain as it was, and only the resulting data structure will change. The only reason one would need to use these classes directly is to bypass the additional checks in url() and include() -- as happens e.g. in admin.site.urls.

----------------------------------------------------------------------------------------------------------------

Here's a small overview of what still needs to be done:
  • General clean-up.
  • Investigate thread-safety in Dispatcher.load_namespace().
  • Investigate, and where possible, replace use of private APIs by code outside django.urls.
  • Provide hooks to replace the dispatcher and resolver. 
  • Expand on the describe() API to allow for more complex patterns.
  • Expand and rewrite the tests in urls_tests.
  • Document the new framework API. 
There's still plenty to do, but I feel it is finally nearing completion. Any help, feedback and testing is welcome to give it that final push to a merge. I will certainly need some help to extensively document the new API. 

Marten

Marten Kenbeek

unread,
Jan 3, 2016, 3:32:41 PM1/3/16
to Django developers (Contributions to Django itself)
A quick question: Aymeric suggested to merge django.conf.urls into django.urls as well. Does anyone think we should not make this move?

If no one disagrees, I will proceed with a new PR to make the move.

Tim Graham

unread,
Jan 4, 2016, 7:24:32 AM1/4/16
to Django developers (Contributions to Django itself)
It looks to me like the rationale for the existing organization is that everything in that module is designed to be used in a URLconf. Do you propose to make everything importable from "django.urls" or from "django.urls.conf" or some other organization?

If we remove urls from django.conf, then the next logical step could be to rename "django.conf" to "django.settings" as it's not left with much else. I'm not advocating for that, but it's something I noticed.

Aymeric Augustin

unread,
Jan 4, 2016, 8:01:55 AM1/4/16
to django-d...@googlegroups.com
On 4 janv. 2016, at 13:24, Tim Graham <timog...@gmail.com> wrote:

> It looks to me like the rationale for the existing organization is that everything in that module is designed to be used in a URLconf.

I believe it is. However, as I said on the pull request, most users won’t reverse engineer this subtlety. I think it would just be simpler for everyone if all URL-related functions were importable from django.urls.

--
Aymeric.


Florian Apolloner

unread,
Jan 7, 2016, 8:07:03 AM1/7/16
to Django developers (Contributions to Django itself)


On Monday, December 28, 2015 at 5:23:19 PM UTC+1, Marten Kenbeek wrote:
One can for example implement a DomainConstraint or MethodConstraint to match a domain name or request method in the URL, or implement a set of constraints based on the parse library for better performance than the built-in regex-based constraints.

A method constraint seems simple enough, how would the domain constraint work with regards to reversing -- do you have an example for that?

Cheers,
Florian

Marten Kenbeek

unread,
Jan 8, 2016, 11:46:39 AM1/8/16
to Django developers (Contributions to Django itself)
The first argument to Constraint.construct() is a URL helper object which allows you to set the scheme, host, path, querystring and fragment separately. So reversing a domain constraint is as simple as this:

class DomainConstraint(Constraint):
   
...
   
def construct(self, url_object, *args, **kwargs):
        url_object
.host = self.domain
       
return url_object, args, kwargs

James Addison

unread,
Jan 14, 2016, 2:06:52 PM1/14/16
to Django developers (Contributions to Django itself)
Marten,

As you likely remember, I've been running your code for a few months now, overall it's been pretty good. I mentioned some time ago that the list of URLs displayed with `DEBUG=True` when triggering a 404 is sometimes empty, or partially empty. This still happens for me, and I believe it has something to do with using scheme/domain constraints - here are mine:

class ConstraintBase(Constraint):
    """
    This exists purely as a way to ensure MRO mixin inheritance will work
    """
    def match(self, path, request=None):
       
return path, (), {}


   
def construct(self, url_object, *args, **kwargs):

       
return url_object, args, kwargs


class SchemeConstraintMixin(object):
    scheme
= ''

    def match(self, path, request=None):
       
if request and request.scheme == self.scheme:
           
return super().match(path, request)
       
raise Resolver404


   
def construct(self, url_object, *args, **kwargs):

        url_object
.scheme = self.scheme
       
return super().construct(url_object, *args, **kwargs)


class HostConstraintMixin(object):
    host
= ''

    def match(self, path, request=None):
       
if request and request.get_host() == self.host:
           
return super().match(path, request)
       
raise Resolver404


   
def construct(self, url_object, *args, **kwargs):

        url_object
.host = self.host
       
return super().construct(url_object, *args, **kwargs)


class SchemeHostConstraint(SchemeConstraintMixin, HostConstraintMixin, ConstraintBase):
   
def __init__(self, scheme, host):
       
self.scheme = scheme
       
self.host = host


www_urlpatterns
= [
    url
(r'^$', images_views.photos, {'template': 'images/homepage.html'}, name='homepage'),
    url
(r'^users/', include('users.urls.profiles')),
    url
(r'^blog/', include('writing.urls.content'), {'section_slug': 'blog'}),
    url
(r'^galleries/', include('writing.urls.content'), {'section_slug': 'galleries'}),
    url
(r'^writers/', include('writing.urls.authors')),
    url
(r'^', include('vendors.urls.www')),
    url
(r'^', include('images.urls')),
] + static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

my_urlpatterns
= [
    url
(r'^', include('users.urls.my')),
    url
(r'^businesses/', include('vendors.urls.my')),
    url
(r'^user/', include('users.urls.auth')),
    url
(r'^user/', include('users.urls.social')),
]

manage_urlpatterns
= [
    url
(r'^admin/', include(admin.site.urls)),
]

payments_urlpatterns
= [
    url
(r'^braintree/', include('payments.urls.payments_braintree')),
]

urlpatterns
= [
   
# non-logged in pages; public facing and indexable by search engines
    url(SchemeHostConstraint(*settings.SITE_DOMAIN_WWW), include(www_urlpatterns)),
   
# logged in users only, public facing and not indexable by search engines (perhaps some exceptions...)
    url(SchemeHostConstraint(*settings.SITE_DOMAIN_MY), include(my_urlpatterns)),
   
# staff only; logged in users only; Django admin + custom management pages
    url(SchemeHostConstraint(*settings.SITE_DOMAIN_MANAGE), include(manage_urlpatterns)),
   
# webhooks for payment processing
    url(SchemeHostConstraint(*settings.SITE_DOMAIN_PAYMENTS), include(payments_urlpatterns))
]

Relevant settings:

SITE_DOMAIN_MY = ('http', 'mylocal.project.com:9000')
SITE_DOMAIN_WWW
= ('http', 'wwwlocal.project.com:9000')
SITE_DOMAIN_MANAGE
= ('http', 'managelocal.project.com:9000')
SITE_DOMAIN_PAYMENTS
= ('http', 'paymentslocal.project.com:9000')

I'm running this within a vagrant/virtualbox setup, hence the `:9000` port. That said, this still happens on our preview server.

Just today, however, I was wondering why `APPEND_SLASH=True` was no longer having an effect - no redirection happens. I *think* it is due to the `CommonMiddleware` code here https://github.com/knbk/django/blob/dispatcher_api/django/middleware/common.py#L75. As both `request.get_full_path()` and `request.path_info` do not include the scheme or domain, using `is_valid_path()` will fail. Is this correct?

Cheers,
James


Tim Graham

unread,
Mar 7, 2016, 11:00:49 AM3/7/16
to Django developers (Contributions to Django itself)
Hi Marten,

How are things going? Do you plan to make a push to merge this for 1.10? The alpha is scheduled for May 16. Are you still waiting for feedback? I think writing documentation would certainly facilitate that. Also, if there's any chance to break the existing commit into some smaller logic chunks/features to make review and merging easier, that would certainly be welcome from my perspective.
...

Asif Saifuddin

unread,
May 11, 2016, 12:52:33 PM5/11/16
to Django developers (Contributions to Django itself)
Can we expect this to be merged on 1.10 alpha? after that the minor imporvements could be take place.

Thanks

Tim Graham

unread,
May 11, 2016, 1:23:03 PM5/11/16
to Django developers (Contributions to Django itself)
I'm not targeting this for 1.10. The patch hasn't been under review and is likely too much to review in a couple days. Also documentation remains outstanding.
Reply all
Reply to author
Forward
0 new messages