The specific problem that I am encountering right now is that the Ruby, PHP, and Objective-C OAuth libraries handle percent encoding of incoming parameters differently.
In the Ruby library all parameters are percent encoded with '+' chars being converted to space encodings ( '%20' ), in the PHP lib parameters containing '+'s are left as is.
So, the resulting signature base strings for a query parameter key/ value pair of: c=hi+there would come out in the SBS as follows for each lib:
Ruby: c%3Dhi%2520there PHP: c%3Dhi%25Bthere
The Objective-C lib behaves like the PHP lib except it doesn't seem to be encoding the % of the %2B a second time per section 9.1.3 so the resulting part of the SBS looks like: c%3Dhi%2Bthere
So, I would like to know which approach is correct? Should a '+' in a query string get decoded to a space first ( as is a common practice ) and then percent encoded to %20 per RFC 3986, or should our libs leave the '+' as is and encode it as %2B?
Also, I wonder if it might be slightly more clear if section 5.1 of the spec ( http://oauth.net/core/1.0/#encoding_parameters ) read: "All request parameter names and values..." instead of "All parameter names and values..." just to further clarify that all query part of form encoded name values including the OAuth Protocol Params must be escaped.
I think the right answer is that OAuth does not define how to encode URL parameters nor how to decode them. It only defines what to do with values once the consumer or SP normalized them from whatever format they were sent in.
I agree that section 5.1 is confusing and I will look into moving it maybe into section 9 where it really belongs. But I need to read the spec again to see what this breaks (if any).
Based on this view, + should be turned into space before encoded into the SBS. This will make the ruby library the correct one.
EHL
On 7/14/08 4:45 PM, "Jesse Clark" <je...@jesseclark.com> wrote:
I have been running into issues with encoding spaces in parameters for Signature Base Strings and would like to reraise the issue with the group.
The specific problem that I am encountering right now is that the Ruby, PHP, and Objective-C OAuth libraries handle percent encoding of incoming parameters differently.
In the Ruby library all parameters are percent encoded with '+' chars being converted to space encodings ( '%20' ), in the PHP lib parameters containing '+'s are left as is.
So, the resulting signature base strings for a query parameter key/ value pair of: c=hi+there would come out in the SBS as follows for each lib:
Ruby: c%3Dhi%2520there PHP: c%3Dhi%25Bthere
The Objective-C lib behaves like the PHP lib except it doesn't seem to be encoding the % of the %2B a second time per section 9.1.3 so the resulting part of the SBS looks like: c%3Dhi%2Bthere
So, I would like to know which approach is correct? Should a '+' in a query string get decoded to a space first ( as is a common practice ) and then percent encoded to %20 per RFC 3986, or should our libs leave the '+' as is and encode it as %2B?
Also, I wonder if it might be slightly more clear if section 5.1 of the spec ( http://oauth.net/core/1.0/#encoding_parameters ) read: "All request parameter names and values..." instead of "All parameter names and values..." just to further clarify that all query part of form encoded name values including the OAuth Protocol Params must be escaped.
Percent encoding is used in OAuth response messages, too. This use
case wouldn't be covered by the phrase 'all request parameter names
and values'.
I agree the Ruby implementation is right. There's a relevant test
case in http://oauth.pbwiki.com/TestCases . Maybe it should have
another one, to really drive the point home.
OAuth should follow well established practice for encoding HTTP
messages, I think. On the other hand, OAuth must standardize the
algorithm for constructing a signature base string. The two can be
kept separate. One algorithm encodes or decodes HTTP messages, and a
separate algorithm computes OAuth signatures. The interface between
them passes an object in which each name and each value is a character
string, not encoded. (This happens fairly naturally in some web
frameworks, whose API passes such objects.) A draft extension
codifying this idea is http://oauth.pbwiki.com/FlexibleDecoding
On Jul 14, 4:45 pm, Jesse Clark <je...@jesseclark.com> wrote:
> The specific problem that I am encountering right now is that the
> Ruby, PHP, and Objective-C OAuth libraries handle percent encoding of
> incoming parameters differently.
> In the Ruby library all parameters are percent encoded with '+' chars
> being converted to space encodings ( '%20' ), in the PHP lib
> parameters containing '+'s are left as is.
> So, the resulting signature base strings for a query parameter key/
> value pair of: c=hi+there would come out in the SBS as follows for
> each lib:
> Ruby: c%3Dhi%2520there
> PHP: c%3Dhi%25Bthere
> The Objective-C lib behaves like the PHP lib except it doesn't seem to
> be encoding the % of the %2B a second time per section 9.1.3 so the
> resulting part of the SBS looks like:
> c%3Dhi%2Bthere
> So, I would like to know which approach is correct? Should a '+' in a
> query string get decoded to a space first ( as is a common practice )
> and then percent encoded to %20 per RFC 3986, or should our libs leave
> the '+' as is and encode it as %2B?
> Also, I wonder if it might be slightly more clear if section 5.1 of
> the spec (http://oauth.net/core/1.0/#encoding_parameters) read: "All
> request parameter names and values..." instead of "All parameter names
> and values..." just to further clarify that all query part of form
> encoded name values including the OAuth Protocol Params must be escaped.
I totally agree, OAuth shouldn't define the encoding of URL query
strings or HTTP message bodies. Sadly, the current document isn't
written that way. Section 5.1 defines an encoding to which
established practice doesn't conform, and section 5.2 doesn't
explicitly say that section 5.1 doesn't apply. Section 5.3 references
section 5.1 (which is going in the wrong direction).
OAuth should standardize character encoding, I think. It's important
that all participants use UTF-8, and this practice isn't well
established.
On Jul 14, 4:53 pm, Eran Hammer-Lahav <e...@hueniverse.com> wrote:
> I think the right answer is that OAuth does not define
> how to encode URL parameters nor how to decode them.
> It only defines what to do with values once the consumer
> or SP normalized them from whatever format they were
> sent in.
> I totally agree, OAuth shouldn't define the encoding of URL query
> strings or HTTP message bodies. Sadly, the current document isn't
> written that way. Section 5.1 defines an encoding to which
> established practice doesn't conform, and section 5.2 doesn't
> explicitly say that section 5.1 doesn't apply. Section 5.3 references
> section 5.1 (which is going in the wrong direction).
> OAuth should standardize character encoding, I think. It's important
> that all participants use UTF-8, and this practice isn't well
> established.
> On Jul 14, 4:53 pm, Eran Hammer-Lahav <e...@hueniverse.com> wrote:
> > I think the right answer is that OAuth does not define
> > how to encode URL parameters nor how to decode them.
> > It only defines what to do with values once the consumer
> > or SP normalized them from whatever format they were
> > sent in.
> Committed a patch from Jesse Clark to correct this, please rain > burning hell upon us if this breaks your stuff.
> On Jul 15, 10:26 am, John Kristian <jkrist...@netflix.com> wrote: >> I totally agree, OAuth shouldn't define the encoding of URL query >> strings or HTTP message bodies. Sadly, the current document isn't >> written that way. Section 5.1 defines an encoding to which >> established practice doesn't conform, and section 5.2 doesn't >> explicitly say that section 5.1 doesn't apply. Section 5.3 >> references >> section 5.1 (which is going in the wrong direction).
>> OAuth should standardize character encoding, I think. It's important >> that all participants use UTF-8, and this practice isn't well >> established.
>> On Jul 14, 4:53 pm, Eran Hammer-Lahav <e...@hueniverse.com> wrote:
>>> I think the right answer is that OAuth does not define >>> how to encode URL parameters nor how to decode them. >>> It only defines what to do with values once the consumer >>> or SP normalized them from whatever format they were >>> sent in.
> Feel free to raise luke warm smoldering heck on the mailing list if
> you can think of a reason that we shouldn't be doing it this way.
> On Jul 15, 2008, at 11:10 AM, termie wrote:
> > Committed a patch from Jesse Clark to correct this, please rain
> > burning hell upon us if this breaks your stuff.
> > On Jul 15, 10:26 am, John Kristian <jkrist...@netflix.com> wrote:
> >> I totally agree, OAuth shouldn't define the encoding of URL query
> >> strings or HTTP message bodies. Sadly, the current document isn't
> >> written that way. Section 5.1 defines an encoding to which
> >> established practice doesn't conform, and section 5.2 doesn't
> >> explicitly say that section 5.1 doesn't apply. Section 5.3
> >> references
> >> section 5.1 (which is going in the wrong direction).
> >> OAuth should standardize character encoding, I think. It's important
> >> that all participants use UTF-8, and this practice isn't well
> >> established.
> >> On Jul 14, 4:53 pm, Eran Hammer-Lahav <e...@hueniverse.com> wrote:
> >>> I think the right answer is that OAuth does not define
> >>> how to encode URL parameters nor how to decode them.
> >>> It only defines what to do with values once the consumer
> >>> or SP normalized them from whatever format they were
> >>> sent in.
> Will make it easier when / if any further deviance between > rawurlencode() and rfc 3986 is found.. Oh, and it looks better..
> (in a related note - I might get back in the patching-mood during the > holiday.. I might get the unit tests up to spec again)
> -fangel
> On Jul 15, 8:24 pm, Jesse Clark <je...@jesseclark.com> wrote: >> er... speak for yourself. ;)
>> Feel free to raise luke warm smoldering heck on the mailing list if >> you can think of a reason that we shouldn't be doing it this way.
>> On Jul 15, 2008, at 11:10 AM, termie wrote:
>>> Committed a patch from Jesse Clark to correct this, please rain >>> burning hell upon us if this breaks your stuff.
>>> On Jul 15, 10:26 am, John Kristian <jkrist...@netflix.com> wrote: >>>> I totally agree, OAuth shouldn't define the encoding of URL query >>>> strings or HTTP message bodies. Sadly, the current document isn't >>>> written that way. Section 5.1 defines an encoding to which >>>> established practice doesn't conform, and section 5.2 doesn't >>>> explicitly say that section 5.1 doesn't apply. Section 5.3 >>>> references >>>> section 5.1 (which is going in the wrong direction).
>>>> OAuth should standardize character encoding, I think. It's >>>> important >>>> that all participants use UTF-8, and this practice isn't well >>>> established.
>>>> On Jul 14, 4:53 pm, Eran Hammer-Lahav <e...@hueniverse.com> wrote:
>>>>> I think the right answer is that OAuth does not define >>>>> how to encode URL parameters nor how to decode them. >>>>> It only defines what to do with values once the consumer >>>>> or SP normalized them from whatever format they were >>>>> sent in.
But section 9.1.3 references 5.1 so wouldn't the same percent encoding algorithm need to be used for both normalizing the request parameters and for concatenating and encoding the request elements to form the SBS?
On Jul 15, 2008, at 10:07 AM, John Kristian wrote:
> OAuth should follow well established practice for encoding HTTP > messages, I think. On the other hand, OAuth must standardize the > algorithm for constructing a signature base string. The two can be > kept separate. One algorithm encodes or decodes HTTP messages, and a > separate algorithm computes OAuth signatures. The interface between > them passes an object in which each name and each value is a character > string, not encoded. (This happens fairly naturally in some web > frameworks, whose API passes such objects.) A draft extension > codifying this idea is http://oauth.pbwiki.com/FlexibleDecoding
> On Jul 14, 4:45 pm, Jesse Clark <je...@jesseclark.com> wrote: >> The specific problem that I am encountering right now is that the >> Ruby, PHP, and Objective-C OAuth libraries handle percent encoding >> of >> incoming parameters differently.
>> In the Ruby library all parameters are percent encoded with '+' chars >> being converted to space encodings ( '%20' ), in the PHP lib >> parameters containing '+'s are left as is.
>> So, the resulting signature base strings for a query parameter key/ >> value pair of: c=hi+there would come out in the SBS as follows for >> each lib:
>> Ruby: c%3Dhi%2520there >> PHP: c%3Dhi%25Bthere
>> The Objective-C lib behaves like the PHP lib except it doesn't seem >> to >> be encoding the % of the %2B a second time per section 9.1.3 so the >> resulting part of the SBS looks like: >> c%3Dhi%2Bthere
>> So, I would like to know which approach is correct? Should a '+' in a >> query string get decoded to a space first ( as is a common practice ) >> and then percent encoded to %20 per RFC 3986, or should our libs >> leave >> the '+' as is and encode it as %2B?
>> Also, I wonder if it might be slightly more clear if section 5.1 of >> the spec (http://oauth.net/core/1.0/#encoding_parameters) read: "All >> request parameter names and values..." instead of "All parameter >> names >> and values..." just to further clarify that all query part of form >> encoded name values including the OAuth Protocol Params must be >> escaped.
I'm not familiar with the specific use case in the PHP library so maybe it's something with the string being encoded - but doesn't running rawurlencode() on a space return %20 regardless? I thought it was only urlencode() that encodes a space as +.
----- Original Message ---- From: Jesse Clark <je...@jesseclark.com> To: oauth@googlegroups.com Sent: Tuesday, July 15, 2008 8:09:10 PM Subject: [oauth] Re: percent encoding of parameters
I agree that is probably a cleaner implementation. Feel free to change it and send a patch to termie... :)
On Jul 15, 2008, at 11:41 AM, fangel wrote:
> Maybe that it should be done with array-substitution instead of > multiple-substitution..
> Will make it easier when / if any further deviance between > rawurlencode() and rfc 3986 is found.. Oh, and it looks better..
> (in a related note - I might get back in the patching-mood during the > holiday.. I might get the unit tests up to spec again)
> -fangel
> On Jul 15, 8:24 pm, Jesse Clark <je...@jesseclark.com> wrote: >> er... speak for yourself. ;)
>> Feel free to raise luke warm smoldering heck on the mailing list if >> you can think of a reason that we shouldn't be doing it this way.
>> On Jul 15, 2008, at 11:10 AM, termie wrote:
>>> Committed a patch from Jesse Clark to correct this, please rain >>> burning hell upon us if this breaks your stuff.
>>> On Jul 15, 10:26 am, John Kristian <jkrist...@netflix.com> wrote: >>>> I totally agree, OAuth shouldn't define the encoding of URL query >>>> strings or HTTP message bodies. Sadly, the current document isn't >>>> written that way. Section 5.1 defines an encoding to which >>>> established practice doesn't conform, and section 5.2 doesn't >>>> explicitly say that section 5.1 doesn't apply. Section 5.3 >>>> references >>>> section 5.1 (which is going in the wrong direction).
>>>> OAuth should standardize character encoding, I think. It's >>>> important >>>> that all participants use UTF-8, and this practice isn't well >>>> established.
>>>> On Jul 14, 4:53 pm, Eran Hammer-Lahav <e...@hueniverse.com> wrote:
>>>>> I think the right answer is that OAuth does not define >>>>> how to encode URL parameters nor how to decode them. >>>>> It only defines what to do with values once the consumer >>>>> or SP normalized them from whatever format they were >>>>> sent in.
Except that rawurlencode() never returns a '+', it already does the
correct encoding of the ' ' and the '+'.
The only difference between the PHP rawurlencode() and RFC3986 is the
encoding of the '~'.
Yes, OAuth specifies the same percent encoding for normalizing request
parameters and forming the signature base string. (OAuth must specify
signing algorithms, to achieve interoperability.)
Implementors might be surprised to learn that normalized request
parameters are sometimes different from the request URL query string
or POST body, and characters are sometimes percent encoded twice in
the signature base string.
On Jul 15, 12:14 pm, Jesse Clark <je...@jesseclark.com> wrote:
> But section 9.1.3 references 5.1 so wouldn't the same percent encoding
> algorithm need to be used for both normalizing the request parameters
> and for concatenating and encoding the request elements to form the SBS?
> On Jul 15, 8:41 pm, fangel <morten.fan...@gmail.com> wrote: >> Maybe that it should be done with array-substitution instead of >> multiple-substitution..
> Except that rawurlencode() never returns a '+', it already does the > correct encoding of the ' ' and the '+'. > The only difference between the PHP rawurlencode() and RFC3986 is the > encoding of the '~'.
> So, the resulting signature base strings for a query parameter key/
> value pair of: c=hi+there would come out in the SBS as follows for
> each lib:
> Ruby: c%3Dhi%2520there
> PHP: c%3Dhi%25Bthere
Sorry if this seems like a daft question, but is that a single pass at
encoding pass or a double one? My Erlang code encodes "c=hi+there" as
"c%3Dhi%2Bthere", and then "c%253Dhi%252Bthere" after a second
encoding, neither of which agree with what the Ruby lib gives.
> I agree the Ruby implementation is right. There's a relevant test
> case in http://oauth.pbwiki.com/TestCases . Maybe it should have
> another one, to really drive the point home.
My implementation is based on those test cases. Are they correct/
sufficient?
I think you have the encoding slightly wrong - the first pass against the parameters shouldn't be encoding the "=" character - just the key and value on either side. The second pass then treats the whole constructed query string as a single string to escape.
----- Original Message ----
From: Tim Fletcher <t...@tfletcher.com>
To: OAuth <oauth@googlegroups.com>
Sent: Tuesday, July 22, 2008 10:20:27 AM
Subject: [oauth] Re: percent encoding of parameters
> So, the resulting signature base strings for a query parameter key/
> value pair of: c=hi+there would come out in the SBS as follows for
> each lib:
> Ruby: c%3Dhi%2520there
> PHP: c%3Dhi%25Bthere
Sorry if this seems like a daft question, but is that a single pass at
encoding pass or a double one? My Erlang code encodes "c=hi+there" as
"c%3Dhi%2Bthere", and then "c%253Dhi%252Bthere" after a second
encoding, neither of which agree with what the Ruby lib gives.
> I agree the Ruby implementation is right. There's a relevant test
> case in http://oauth.pbwiki.com/TestCases . Maybe it should have
> another one, to really drive the point home.
My implementation is based on those test cases. Are they correct/
sufficient?
The URL query string c=hi+there represents a parameter named "c" with
value "hi there". (+ represents a space, as specified in HTML.) In
OAuth, the normalized parameter is c=hi%20there, and the corresponding
fragment of the signature base string is c%3Dhi%2520there.
At least conceptually, one must first decode the URL query string or
POST body before computing the signature base string. One could
optimize this by converting directly from HTML encoding to OAuth
encoding; for example converting + to %20.
On Jul 22, 2:20 am, Tim Fletcher <t...@tfletcher.com> wrote:
> > So, the resulting signature base strings for a query parameter key/
> > value pair of: c=hi+there would come out in the SBS as follows for
> > each lib:
> Sorry if this seems like a daft question, but is that a single pass at
> encoding pass or a double one? My Erlang code encodes "c=hi+there" as
> "c%3Dhi%2Bthere", and then "c%253Dhi%252Bthere" after a second
> encoding, neither of which agree with what the Ruby lib gives.
On the test case page, Normalize Request Parameters (section 9.1.1), there is a note that reads: "Note that '+' represents a space, in the parameters column (as in a URL query string)." And you can see in the fourth example in the table that follows the note that 'a=x+y' gets normalized as 'a=x%20y'.
So, for normalization +'s in query parameters should be converted to spaces before the parameter name and values are normalized by percent encoding. Therefore, 'c=hi+there' would get normalized to 'c=hi %20there'. Then in the Concatenate Request Elements (section 9.1.2) step the pieces of the SBS get percent encoded again which would make the final result 'c%3Dhi%2520there'.
The note under the Normalize Request Parameters (section 9.1.1) is the only place that I have seen the decoding of +'s to spaces before normalization addressed. Since it has caused us to end up with so many divergent implementations, should we consider mentioning in the spec that URL query strings should conform to RFC3986 prior to normalization?
>> So, the resulting signature base strings for a query parameter key/ >> value pair of: c=hi+there would come out in the SBS as follows for >> each lib:
>> Ruby: c%3Dhi%2520there >> PHP: c%3Dhi%25Bthere
> Sorry if this seems like a daft question, but is that a single pass at > encoding pass or a double one? My Erlang code encodes "c=hi+there" as > "c%3Dhi%2Bthere", and then "c%253Dhi%252Bthere" after a second > encoding, neither of which agree with what the Ruby lib gives.
>> I agree the Ruby implementation is right. There's a relevant test >> case in http://oauth.pbwiki.com/TestCases . Maybe it should have >> another one, to really drive the point home.
> My implementation is based on those test cases. Are they correct/ > sufficient?
> On the test case page, Normalize Request Parameters (section 9.1.1),
> there is a note that reads: "Note that '+' represents a space, in the
> parameters column (as in a URL query string)." And you can see in the
> fourth example in the table that follows the note that 'a=x+y' gets
> normalized as 'a=x%20y'.
> So, for normalization +'s in query parameters should be converted to
> spaces before the parameter name and values are normalized by percent
> encoding. Therefore, 'c=hi+there' would get normalized to 'c=hi
> %20there'. Then in the Concatenate Request Elements (section 9.1.2)
> step the pieces of the SBS get percent encoded again which would make
> the final result 'c%3Dhi%2520there'.
> The note under the Normalize Request Parameters (section 9.1.1) is the
> only place that I have seen the decoding of +'s to spaces before
> normalization addressed. Since it has caused us to end up with so many
> divergent implementations, should we consider mentioning in the spec
> that URL query strings should conform to RFC3986 prior to normalization?
> -Jesse
> On Jul 22, 2008, at 2:20 AM, Tim Fletcher wrote:
> >> So, the resulting signature base strings for a query parameter key/
> >> value pair of: c=hi+there would come out in the SBS as follows for
> >> each lib:
> > Sorry if this seems like a daft question, but is that a single pass at
> > encoding pass or a double one? My Erlang code encodes "c=hi+there" as
> > "c%3Dhi%2Bthere", and then "c%253Dhi%252Bthere" after a second
> > encoding, neither of which agree with what the Ruby lib gives.
> >> I agree the Ruby implementation is right. There's a relevant test
> >> case inhttp://oauth.pbwiki.com/TestCases. Maybe it should have
> >> another one, to really drive the point home.
> > My implementation is based on those test cases. Are they correct/
> > sufficient?
The TestCases page uses + to represent space, but OAuth Core doesn't
mandate this. The test cases aren't part of the OAuth Core
specification.
I think OAuth Core should embrace the de facto standard for URL query
strings and POST request bodies. That is the MIME type application/x-
www-form-urlencoded as specified by HTML. It's counterproductive for
OAuth to specify some other encoding for HTTP messages. RFC 3986
isn't a good choice, for this reason.
On the other hand, OAuth can and should specify a different encoding
for computing a signature base string. It has to be rigid, to make
digital signatures work.
On Jul 22, 10:57 am, Jesse Clark <je...@jesseclark.com> wrote:
> The note under the Normalize Request Parameters (section 9.1.1) is the
> only place that I have seen the decoding of +'s to spaces before
> normalization addressed. Since it has caused us to end up with so many
> divergent implementations, should we consider mentioning in the spec
> that URL query strings should conform to RFC3986 prior to normalization?
This is the default content type. Forms submitted with this content type must be encoded as follows:
1. Control names and values are escaped. Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A'). 2. The control names/values are listed in the order they appear in the document. The name is separated from the value by `=' and name/ value pairs are separated from each other by `&'.
So, on the side of the processing agent, doesn't this mean that to unescape the form data '+'s should be converted back to spaces? Which corresponds with the encoding steps for creating an SBS which we have been discussing, no?
If I understand correctly then I agree that the escaping mechanism from the HTML Recommendation should be sufficient for unescaping incoming form data and should be referred to in the appropriate section of the OAuth spec.
For implementors this would mean that we would need an unescape method for normalizing form data which percent encodes per RFC 1738 and then converts spaces to +'s per the HTML Rec, as well as an encode method for encoding the pieces of the SBS which percent encodes per RFC 3986 ( unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" ).
Please correct me if I am wrong in my understanding, Thanks, -Jesse
On Jul 23, 2008, at 12:09 PM, John Kristian wrote:
> The TestCases page uses + to represent space, but OAuth Core doesn't > mandate this. The test cases aren't part of the OAuth Core > specification.
> I think OAuth Core should embrace the de facto standard for URL query > strings and POST request bodies. That is the MIME type application/x- > www-form-urlencoded as specified by HTML. It's counterproductive for > OAuth to specify some other encoding for HTTP messages. RFC 3986 > isn't a good choice, for this reason.
> On the other hand, OAuth can and should specify a different encoding > for computing a signature base string. It has to be rigid, to make > digital signatures work.
> On Jul 22, 10:57 am, Jesse Clark <je...@jesseclark.com> wrote: >> The note under the Normalize Request Parameters (section 9.1.1) is >> the >> only place that I have seen the decoding of +'s to spaces before >> normalization addressed. Since it has caused us to end up with so >> many >> divergent implementations, should we consider mentioning in the spec >> that URL query strings should conform to RFC3986 prior to >> normalization?