parseQueryString "bug"

106 views
Skip to first unread message

stratboy

unread,
Feb 7, 2012, 3:43:19 AM2/7/12
to MooTools Users
Hi! I've found that parseQueryString doesn't correctly parse query
string with & entities, like this:

AWSAccessKeyId=1RZJ66V99R267YCDQSG2&Expires=1330162525&Signature=F
%2FbNMruOog2ejsspsaZTBKVkIHM%3D

The output of parseQueryString would be this:

Object { AWSAccessKeyId="1RZJ66V99R267YCDQSG2", amp=[2],
Expires="1330162525" ... }

where you can see amp=[2]

Any ideas?





Sanford Whiteman

unread,
Feb 7, 2012, 4:02:13 AM2/7/12
to stratboy
> Hi! I've found that parseQueryString doesn't correctly parse query
> string with & entities, like this:

Entities have no special meaning in URLs.

> AWSAccessKeyId=1RZJ66V99R267YCDQSG2&Expires=1330162525&Signature=F
> %2FbNMruOog2ejsspsaZTBKVkIHM%3D

> The output of parseQueryString would be this:

> Object { AWSAccessKeyId="1RZJ66V99R267YCDQSG2", amp=[2],
> Expires="1330162525" ... }

> where you can see amp=[2]

`amp` has no value (it is passed twice with just the name). I agree
that there's something weird about the [2] (which looks like [true +
true], haven't looked at the code). Having it be set to null or
undefined makes more sense, so you can find it on the object but with
no value. What are you expecting?

-- Sandy

stratboy

unread,
Feb 7, 2012, 5:55:13 AM2/7/12
to MooTools Users
It would be nice if & (the entity) would be read like a simple '&'
and thus ignored in the results. Also for the fact that '&'
validates, '&' does not validate.

Arian Stolwijk

unread,
Feb 7, 2012, 6:55:51 AM2/7/12
to mootool...@googlegroups.com
That's only in your html. For example when you use a.get('href') it will return the value without the entities: http://jsfiddle.net/A5Naa/

stratboy

unread,
Feb 7, 2012, 8:31:44 AM2/7/12
to MooTools Users
ok but not always one can use get('href').

Anyway, guys, I pointed out a thing, that thing is true, and one can
almost consider it a bug. Stop.
I know I can circumvent this issue, probably in more than one way, but
that's not the point.

Only hope the next mootools will address it.

Thank you for the suggestions anyway,

bye

Sanford Whiteman

unread,
Feb 7, 2012, 10:03:37 AM2/7/12
to mootool...@googlegroups.com
Your suggestion of ignoring the query item 'amp' is a non-starter. That would mean it would become a reserved word. Can't have that.

Since you can't have that, I ask again... what do you think is more correct than returning [2], which I agree is a bug?

-- S.

Tim Wienk

unread,
Feb 7, 2012, 10:10:59 AM2/7/12
to mootool...@googlegroups.com
I don't believe ; has any meaning in URLs, does it?

In that case, it should return

{
'AWSAccessKeyId': '1RZJ66V99R267YCDQSG2',
'amp;Expires': '1330162525',
'amp;Signature': 'F/bNMruOog2ejsspsaZTBKVkIHM='
}

You're using HTML entities in URL-encoded strings, if you don't want
the HTML entities in there, decode the HTML entities first. It is in
no way correct for parseQueryString to automagically decide to do more
than interpret the string as a query string. Feel free to make your
own function called `parseHTMLEntityEncodedQueryString` if you need
that.

--
Tim Wienk, Software Developer, MooTools Developer
E. timw...@gmail.com | W. http://tim.wienk.name

Sanford Whiteman

unread,
Feb 7, 2012, 10:15:45 AM2/7/12
to mootool...@googlegroups.com
; is the RFC-recommended query item separator (it is intended to replace &, but, er, pretty slow going on that).

-- S.

Tim Wienk

unread,
Feb 7, 2012, 10:24:06 AM2/7/12
to mootool...@googlegroups.com
Right, then in that case you get an object with twice the same key
with an empty value, making it:

{
'AWSAccessKeyId': '1RZJ66V99R267YCDQSG2',
'amp': ['', ''],
'Expires': '1330162525',
'Signature': 'F/bNMruOog2ejsspsaZTBKVkIHM='
}

(which is what it is now, judging from the "[2]"?)

Or there should be a way to specify the query item separator instead.
I guess one could argue that splitting on two different query item
separators in one string is, at least, weird.

I guess the fact that it does, though, has as added benefit that none
of the actual intended keys get messed up by having HTML entity
encoded ampersands in the query string. So changing it to using only
one separator may cause backward compatibility issues.

Sounds like we're in for some fun. :-)

Tim Wienk

unread,
Feb 7, 2012, 10:36:24 AM2/7/12
to mootool...@googlegroups.com
Oh, before suggesting to only preserve the last-defined value in the
query string as value, when the same key exists multiple times in a
string, that's not an option. The fact that PHP treats it that way
doesn't mean you'd want to lose the request data in any other language
(or even in PHP).

To handle keys that are specified multiple times in a better way (and
the parsing of query strings in general) we could think about
implementing something like Django's QueryDict class.
https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.QueryDict

Sanford Whiteman

unread,
Feb 7, 2012, 1:39:51 PM2/7/12
to Tim Wienk
> 'amp': ['', ''],
> 'Expires': '1330162525',
> 'Signature': 'F/bNMruOog2ejsspsaZTBKVkIHM='
> }

> (which is what it is now, judging from the "[2]"?)

I presumed that now it's interpreting each name-no-value as if it were
a boolean true, then [true + true] = 2. Obviously that would please
few. :)

-- S.

Aaron Newton

unread,
Feb 7, 2012, 1:47:38 PM2/7/12
to mootool...@googlegroups.com
parseQuery string DOES need a switch for what to do when encountering two keys of the same value. PHP expects such things to be expressed as foo[]=bar&foo[]=baz and our query string utilities do this. But I've had to overwrite this function any time I'm using a backend other than PHP. The other way, which is really the proper way, is that everything but PHP expects duplicate keys: foo=bar&foo=baz which, like PHP, gets turned into an array on the back end. parseQuery should have an option to behave this way (and the query encoding code we have as well).

Regarding the HTML entities, you should run your url through a replacer that swaps out & for & before parsing it.

Sanford Whiteman

unread,
Feb 7, 2012, 2:12:12 PM2/7/12
to Aaron Newton
> other way, which is really the proper way, is that everything but PHP
> expects duplicate keys: foo=bar&foo=baz which, like PHP, gets turned into
> an array on the back end.

I wouldn't say it's standard to have duplicate keys turned into an
array *without* an array hint like []. Maybe some CGIs do it, but
others ignore duplicates and keep the first, ignore duplicates and
keep the last, or create a comma-delimited string. It really isn't
standard in any way I would feel safe generalizing to "the average"
back end, let alone userland code that can change, say, the JSP way to
the PHP way.

Any parsing of what is still "officially" an opaque string should
preserve as much data as possible unless the user provides a hint
saying "just give me what this kind of back end is expected to parse".
This would mean that without the hint, the returned object has to be
quite complex and precise. Probably have to return a deeper object
that always has arrays so the user doesn't have to keep typechecking.

{

string_value : ['hello']
, other_string_value : ['gbye1','gbye2']
, boolean_value : [undefined,undefined]

}

-- S.

Aaron Newton

unread,
Feb 7, 2012, 2:15:20 PM2/7/12
to mootool...@googlegroups.com
Ruby on Rails does this, Django does this, PlayFramework (Java MVC) does this, I believe Spring/Hibernate environments do this. It's pretty standard.

Sanford Whiteman

unread,
Feb 7, 2012, 2:21:30 PM2/7/12
to Aaron Newton
> Ruby on Rails does this, Django does this, PlayFramework (Java MVC) does
> this, I believe Spring/Hibernate environments do this. It's pretty standard.

ASP, ASP.NET, CF do not do this and are definitely in-the-wild.
There's no standard.

-- S.

Aaron Newton

unread,
Feb 7, 2012, 2:42:39 PM2/7/12
to mootool...@googlegroups.com
Vanilla HTML: 

<form method="get" action="http://google.com">
    <input name="foo" type="checkbox" value="1" checked>
    <input name="foo" type="checkbox" value="2" checked>
    
    <select name="bar" multiple>
        <option selected>1</option>
        <option selected>2</option>
    </select>
    <input type="submit">
</form>

submit it to anywhere:


your browser will submit:


This isn't a backend doing the encoding; this is your browser sending a GET request. This is how things *normally* work.

Sanford Whiteman

unread,
Feb 7, 2012, 2:50:43 PM2/7/12
to Aaron Newton
> your browser will submit:

> https://www.google.com/?foo=1&foo=2&bar=1&bar=2

> This isn't a backend doing the encoding; this is your browser sending a GET
> request. This is how things *normally* work.

Did I question this somewhere? I don't think so...

The question is whether a client-side QS parser should have one, two,
three, or perhaps zero `hints` available as to how it builds its
returned object. You said it's standard for CGIs other than PHP to
parse `foo` into an array. That is simply not true. There are three
major approaches (discard, explode, concatenate) and more servers
in-the-wild running CGIs that *don't* explode the above into an array
than CGIs that do.

-- S.


Aaron Newton

unread,
Feb 7, 2012, 3:03:58 PM2/7/12
to mootool...@googlegroups.com
To clarify, what I'm saying is that it's standard for a url to submit multiple values as foo=1&foo=2 rather than the non-standard PHP mechanism of foo[]=1&foo[]=2. In MooTools, we have mechanisms to encode a URI with such values and they currently only output the PHP methodology, which is non-standard (i.e. now how browsers submit multiple values on their own). Conversely, if we implement the option to encode a url in this standard way we need to also provide the same option to decode it. I'm not arguing that we shouldn't support all three methods (explode, concat, ignore), but that currently we ONLY support ignore unless the URI uses PHP's brackets.

Sanford Whiteman

unread,
Feb 7, 2012, 3:14:09 PM2/7/12
to Aaron Newton
> I'm not arguing that we shouldn't support all three methods
> (explode, concat, ignore), but that currently we ONLY support ignore
> unless the URI uses PHP's brackets.

I agree, there shouldn't be any favoritism toward PHP-explode (though
editorially, I do find it the most self-documenting). There has to be
an object representation that preserves as much info as possible,
including the order of any duplicates, so that it can be transformed
into any other format.

-- S.

Reply all
Reply to author
Forward
0 new messages