url patterns question

40 views
Skip to first unread message

jjander...@gmail.com

unread,
Jun 19, 2017, 4:11:18 PM6/19/17
to Django users

I'm trying to write a simple Django app and I have encountered a few problems handling URL patterns. Any help will be appricated.

Problem 1
____________

My first problem is that I would like to parse a url that is known up until the last part of the URL. So for example, if I use the following pattern:

urlpatterns = [
    url(r'^blank/$', views.Blank, name='blank'),
]

I  can submit the following URL on my browser and the 'Blank' view will be displayed. No other URL will work with this pattern.


If I use the following pattern:

urlpatterns = [
    url(r'^blank.*/$', views.Blank1, name='blank1'),
]

If I enter:
      localhost:8000/uA/blank/
or:
      localhost:8000:/uA/blankdlkfajdf/

either pattern will match and the 'Blank1' view gets invoked. I can enter any URL that starts with /uA/blank, thanks to the '.*' in the pattern.


But if I use the pattern:

urlpatterns = [
    url(r'^blank/.*$', views.BlankMore, name='blankMore'),
]

I can enter:
   
     localhost:8000/uA/blank/

and the pattern will match, but if I enter:

      localhost:8000/uA/blank/abc

I get an error message: "Page not found (404)"

I could be doing something wrong here, however I am more inclined to believe this is a bug. My error? Bug?


Problem 2 - Similar to problem 1 except I would like to capture a value in the URL

If I use the pattern:


urlpatterns = [
    url(r'^pass/val(?P<val>.*)/$', views.PassVal, name='passval'),
]

and I enter the url:

     http://localhost:8000/uA/pass/val12/

the browser displays the value for 'val' as 12.

But if I use a url pattern like this:

urlpatterns = [
    url(r'^pass/(?P<val>.*)/$', views.PassVal, name='passval'),
]

and I enter the url:

     http://localhost:8000/uA/pass/val12/

the browser displays the value 'val12', as expected.

but if I enter the url:

     http://localhost:8000/uA/pass/?abc=12/

I get an error message: "Page not found (404)"

What I would like to happen in this case is that the value "?abc=12" be passed to the view. I know the '?' is a special character here but not to the browser or Django.  My expectation is that the browser would pass it as part of the request and that Django would do likewise. Is this a bug? My error?


Jim A.






    


James Schneider

unread,
Jun 19, 2017, 7:43:34 PM6/19/17
to django...@googlegroups.com

But if I use the pattern:

urlpatterns = [
    url(r'^blank/.*$', views.BlankMore, name='blankMore'),
]

I can enter:
   
     localhost:8000/uA/blank/

and the pattern will match, but if I enter:

      localhost:8000/uA/blank/abc

I get an error message: "Page not found (404)"

I could be doing something wrong here, however I am more inclined to believe this is a bug. My error? Bug?

This appears to be working as designed. Your interpretation of the regex algorithm behavior is incorrect. In most cases, the matching algorithm will take the first and almost always shortest match (there are probably some exceptions nestled deep in the Python re module).

The .* modifier means "match any character (.) zero or more times (*)". Since blank/ matches the .* zero times, it is a match for your expression. 

What you are likely looking for is something like r'^blank/[\w-]+$' which would capture any characters a-z, A-Z, 0-9, and a hyphen (-). This is typically used to capture slugs, so you may not need the hyphen.

Also bear in mind that the Django setting for APPEND_SLASH is True, which can interfere with your regular expression matching when Django automatically redirects and adds a / at the end of the URL by default if nothing matches. If your regexes are broad enough, it shouldn't be an issue, though. 






Problem 2 - Similar to problem 1 except I would like to capture a value in the URL

If I use the pattern:


urlpatterns = [
    url(r'^pass/val(?P<val>.*)/$', views.PassVal, name='passval'),
]

and I enter the url:

     http://localhost:8000/uA/pass/val12/

the browser displays the value for 'val' as 12.

But if I use a url pattern like this:

urlpatterns = [
    url(r'^pass/(?P<val>.*)/$', views.PassVal, name='passval'),
]

and I enter the url:

     http://localhost:8000/uA/pass/val12/

the browser displays the value 'val12', as expected.

but if I enter the url:

     http://localhost:8000/uA/pass/?abc=12/

I get an error message: "Page not found (404)"

What I would like to happen in this case is that the value "?abc=12" be passed to the view. I know the '?' is a special character here but not to the browser or Django.  My expectation is that the browser would pass it as part of the request and that Django would do likewise. Is this a bug? My error?


Here I believe Django is getting in the way, but for the right reasons. You're attempting to capture GET arguments to the URI request. The URL dispatcher in Django is not designed to capture these arguments. Instead, I'm guessing the request parser for Django first strips the GET arguments (the ?abc= portion), and passes the remaining portion of the URL to the URL dispatch process where your regexes can be processed. You should be retrieving your GET arguments via request.GET in a function-based view or self.request.GET in a class-based view. 


https://docs.djangoproject.com/en/1.11/intro/tutorial01/#url-argument-regex - Explains how GET/POST values are swallowed by the request processor before the URL is sent to the dispatcher.
https://docs.djangoproject.com/en/1.11/intro/tutorial04/#write-a-simple-form - Example of using request.POST (which is equivalent to request.GET, just depends on the verb in use)


The broader question really revolves around what you are trying to accomplish? You should use either REST-style URL keywords that are captured by the URL dispatcher, or use the request.GET/POST request to gather GET/POST arguments. In general, I've seen the majority of developers default to using URL matching to control what objects are being looked at, and use GET/POST arguments for minor tweaks such as filtering, meaning that a majority of the time, no ?arguments are used within Django. It's perfectly acceptable to go that route though, and Django fully supports it. 

-James

Melvyn Sopacua

unread,
Jun 20, 2017, 5:21:52 PM6/20/17
to django...@googlegroups.com

On Monday 19 June 2017 16:43:12 James Schneider wrote:

> But if I use the pattern:

>

> *urlpatterns = [ *

> * url(r'^blank/.*$', views.BlankMore, name='blankMore'),*

> *]*

>

> I can enter:

>

> localhost:8000/uA/blank/

>

> and the pattern will match, but if I enter:

>

> localhost:8000/uA/blank/abc

>

> I get an error message: "Page not found (404)"

>

> I could be doing something wrong here, however I am more inclined to

> believe this is a bug. My error? Bug?

>

>

> This appears to be working as designed. Your interpretation of the

> regex algorithm behavior is incorrect. In most cases, the matching

> algorithm will take the first and almost always shortest match (there

> are probably some exceptions nestled deep in the Python re module).

 

What did you base this on? Certainly not the python docs or behavior of any pcre based library.

Regular expressions are capitalist: greedy by default.

The qualifiers *? are there especially to make a match non-greedy.

 

Quick test:

pcregrep -o '^.*:' /etc/passwd

 

versus:

 

pcregrep -o '^.*?:' /etc/passwd

 

> The .* modifier means "match any character (.) zero or more times

> (*)". Since blank/ matches the .* zero times, it is a match for your

> expression.

 

And his problem is that it does *not* match. Not that it does.

 

--

Melvyn Sopacua

James Schneider

unread,
Jun 20, 2017, 9:08:51 PM6/20/17
to django...@googlegroups.com

 

What did you base this on? Certainly not the python docs or behavior of any pcre based library.

Regular expressions are capitalist: greedy by default.

The qualifiers *? are there especially to make a match non-greedy.

 

Quick test:

pcregrep -o '^.*:' /etc/passwd

 

versus:

 

pcregrep -o '^.*?:' /etc/passwd

 

> The .* modifier means "match any character (.) zero or more times

> (*)". Since blank/ matches the .* zero times, it is a match for your

> expression.


Yeah...you're right. This is why I should stop responding on my phone when I'm tired. No idea why I was thinking that non-greedy was default.

In [4]: bool(re.match(r'^blank/.*$', 'blank/abc'))
Out[4]: True

In [5]: bool(re.match(r'^blank/.*$', 'blank/'))
Out[5]: True

Thank you for catching that. Not sure what was going through my head. 

 

 

And his problem is that it does *not* match. Not that it does.

 


And for that I think my statement still stands that Django is stripping the last portion of the URL as a GET argument. I'm betting that requests.GET.get('abc') will return '12/' per the last example from the OP. 

-James

Melvyn Sopacua

unread,
Jun 21, 2017, 5:10:12 AM6/21/17
to django...@googlegroups.com

On Tuesday 20 June 2017 18:08:02 James Schneider wrote:

 

> > And his problem is that it does *not* match. Not that it does.

>

> And for that I think my statement still stands that Django is

> stripping the last portion of the URL as a GET argument. I'm betting

> that

> requests.GET.get('abc') will return '12/' per the last example from

> the OP.

 

Yes, but for the first problem, r'^blank/.*$' should match blank/abc and generates a 404. I think the matching isn't the problem and the 404 is caused either by ordering issue in urlpatterns (first match wins, no others are tried) or with the view itself (get_object_or_404).

 

You are correct that urlpatterns are only matched against request.path. This excludes the query string.

 

--

Melvyn Sopacua

Reply all
Reply to author
Forward
0 new messages