URL encoding

668 views
Skip to first unread message

Arun K.Rajeevan

unread,
Jan 1, 2011, 3:38:06 AM1/1/11
to web...@googlegroups.com
My application has a search form: it self submit to whatever page it is in and redirects to search function.
I took vars from search form (like, term = request.args(0)) and redirect as redirect (c='..', f='..', args=[v])

Now, in result of search form I get underscores(_) if search term include a space.
(eg.,'top parts' displayed as 'top_parts') I want the space to retain

also if term contains a apostrophe I get invalid request page. (eg., I'm )

Former one can be overcome by manually replacing underscore with space.
But what to do about later one?

pbreit

unread,
Jan 1, 2011, 10:59:14 AM1/1/11
to web...@googlegroups.com
There's a short discussion here http://web2py.com/book/default/chapter/04#Dispatching

Which seems to suggest what you are seeing. Perhaps there's an ENV variable that can provide the un-manipulated args?

Jonathan Lundell

unread,
Jan 1, 2011, 11:23:27 AM1/1/11
to web...@googlegroups.com

I'll be addressing this in the new URL rewriting mechanism. In the meantime, have a look at the raw_args option in routes.py:

# specify a list of apps that bypass args-checking and use request.raw_args
#
#routes_apps_raw=['myapp']
#routes_apps_raw=['myapp', 'myotherapp']

This is a hack to work around the problem. If you list your app in this routing variable, request.args will be set to None, and the (mostly) unprocessed args will appear as a string in request.raw_args. You'll need to split them yourself and do whatever processing you need.

Alternatively, experiment with using the query string (vars) instead of args.

Arun K.Rajeevan

unread,
Jan 1, 2011, 12:23:15 PM1/1/11
to web...@googlegroups.com
Ok, I'm gonna experiment with raw_args.
I was wondering what's this #routes_apps_raw=['myapp'] thing before (I was customizing routes) 

When I use this functionality, am I correct in assuming that, I've to do encoding and decoding both?

I can decode request.raw_args.
But where did I encode?
I'm using URL function.
So, should I encode return value of URL function or individual arguments and variables that I'm passing?

Also, will you help me by giving difference between urllib.quote() and urllib.quote_plus() and it's use case.
I haven't got that much experience with urllib class. 

Arun K.Rajeevan

unread,
Jan 1, 2011, 1:05:27 PM1/1/11
to web...@googlegroups.com
I did just this and seems working 

args = request.raw_args
args = args.split('/')

But now problem is with download function.
It works by taking filename from request.args
now, it should take value from request.raw_args

My download function is following:

def download():
    return response.download(request.raw_args,db)

So I changed it to 

import os, time
filename = os.path.join(request.folder,'uploads',request.raw_args.split('/')[0])
return response.stream(open(filename,'rb'))

now it shows images in page. But 
before download button opened a save file box, now it's shown in a page. (picture in text form)

How to make the function open save file box?

Jonathan Lundell

unread,
Jan 1, 2011, 1:07:36 PM1/1/11
to web...@googlegroups.com
On Jan 1, 2011, at 9:23 AM, Arun K.Rajeevan wrote:
Ok, I'm gonna experiment with raw_args.
I was wondering what's this #routes_apps_raw=['myapp'] thing before (I was customizing routes) 

Don't forget to remove the #.


When I use this functionality, am I correct in assuming that, I've to do encoding and decoding both?

I can decode request.raw_args.
But where did I encode?
I'm using URL function.
So, should I encode return value of URL function or individual arguments and variables that I'm passing?

Good question. 

URL encodes args with urllib.quote, and vars (the query string) with urllib.urlencode. From the Python docs:

urllib.quote(string[, safe])
Replace special characters in string using the %xx escape. Letters, digits, and the characters '_.-' are never quoted. By default, this function is intended for quoting the path section of the URL.The optional safe parameter specifies additional characters that should not be quoted — its default value is '/'.

Example: quote('/~connolly/') yields '/%7econnolly/'.


urllib.quote_plus(string[, safe])
Like quote(), but also replaces spaces by plus signs, as required for quoting HTML form values when building up a query string to go into a URL. Plus signs in the original string are escaped unless they are included in safe. It also does not have safe default to '/'.


urllib.urlencode(query[, doseq])
Convert a mapping object or a sequence of two-element tuples to a “percent-encoded” string, suitable to pass to urlopen()above as the optional data argument. This is useful to pass a dictionary of form fields to a POST request. The resulting string is a series of key=value pairs separated by '&' characters, where both key and value are quoted usingquote_plus() above. When a sequence of two-element tuples is used as the query argument, the first element of each tuple is a key and the second is a value. The value element in itself can be a sequence and in that case, if the optional parameter doseq is evaluates to True, individual key=value pairs separated by '&' are generated for each element of the value sequence for the key. The order of parameters in the encoded string will match the order of parameter tuples in the sequence. The urlparse module provides the functions parse_qs() and parse_qsl() which are used to parse query strings into Python data structures.

We do not pass either 'safe' or 'doseq', but we do the equivalent of doseq before we call urlencode().


Also, will you help me by giving difference between urllib.quote() and urllib.quote_plus() and it's use case.
I haven't got that much experience with urllib class. 

The quote() description above is pretty straightforward. Notice that it will encode spaces as %20 and (by default) not encode '/'. It's used for args, in our case.

quote_plus() is used indirectly by urlencode(), so it's relevant for query strings (vars). Spaces become '+' instead of %20 (notice that '+' in the string itself gets encoded).


So: don't encode anything; URL will handle it for you, even in the current system. At least that's the way I read it.

mdipierro

unread,
Jan 1, 2011, 1:07:45 PM1/1/11
to web2py-users
This is unsafe and may open the door to directory traversal attacks.
download decurity relies on url regex validation and you are by-
passing it.

Arun K.Rajeevan

unread,
Jan 1, 2011, 1:16:42 PM1/1/11
to web...@googlegroups.com
Thank you for your notes and to the point quotes. :)

Jonathan Lundell

unread,
Jan 1, 2011, 1:27:27 PM1/1/11
to web...@googlegroups.com
On Jan 1, 2011, at 10:05 AM, Arun K.Rajeevan wrote:
> I did just this and seems working
>
> args = request.raw_args
> args = args.split('/')
>
> But now problem is with download function.
> It works by taking filename from request.args
> now, it should take value from request.raw_args
>
> My download function is following:
>
> def download():
> return response.download(request.raw_args,db)

I suggest this:

file_match = re.compile(r'([\w@ -][=.]?)+$')

def download():
file = request.raw_args.split('/')[-1]
if not file_match.match(file):
raise HTTP(400, thread.routes.error_message % 'invalid request',
web2py_error='invalid args')
request.args = [file]
return response.download(request, db)

Notice that response.download will be looking at args[-1]; that's why we're taking [-1] above.

Massimo's point is important: when you use raw_args, it's your responsibility to validate each arg; otherwise you're opening yourself up to attack.

file_match above is the standard arg-checking pattern.

Reply all
Reply to author
Forward
0 new messages