Chicago meeting minutes

Bob Kuo

unread,

Sep 25, 2013, 5:00:29 PM9/25/13

to netwo...@isocpp.org

The minutes for the SG4 meeting in Chicago have been posted at https://github.com/SG4/draft/wiki/Meeting-minutes-20130924

Please let me know if there are any corrections.

Thanks,

Bob

Thiago Macieira

unread,

Sep 25, 2013, 9:33:01 PM9/25/13

to netwo...@isocpp.org

Not corrections, just follow-up comments:

* Niklas - doesn't do HTTP specific comparisons, e.g.
query string key/value pair ordering is insignificant
separator could be ampersand or a colon

Order is significant. The pair separator and the value separator can be any
character reserved as sub-delim. The most common are, of course, '&' and '='.
I recommend just hardcoding those two for now, with an option to expand the
API in the future.

* Kyle - how would we implement a mechanism to do scheme specific comparison?

I strongly urge dropping the feature. Scheme-specific comparison sounds like a
death-trap to me, a potential mine-field of subtly incompatible
implementations. What's more, they're not very useful without actual network
access.

* Kyle - is std::optional necessary for all parts of a URI
can string_view represent a blank string?
Arthur - some of fields these are mandatory
Niklas - query and fragment could be present but blank

Technically, all fields in a URL except the scheme can be present but blank,
except for the path. In this sense, the authority and host are confounded.

- authority (and host) absent vs authority (and host) blank
scheme: scheme://
scheme:/ scheme:///
- userinfo absent vs userinfo blank
scheme://host scheme://@host
- port absent vs port blank
scheme://host scheme://host:

Strictly speaking, each pair has two different URLs, though most browsers will
collapse one to the other. For QUrl, I have consciously decided not to make
the distinction for the latter two, and we consciously always create file:///
URLs (present-and-blank authority). We preserve the distinction for the query
and fragment and we offer an API for it too. But we also reserve the right to
change behaviour if needed.

For an ISO recommendation that may last a couple of decades, with uses we
cannot predict, we may have to opt for the strict interpretation.

* Niklas - URIs are defined as ASCII
Arthur - isn't then string the same as u8 string
Niklas - not for IRIs - they may have unicode characters
Arthur - SG3 filesystems might be hitting the same problems

I'd like to hear more what was discussed here. This is a major underpinning of
how URLs work and may have great influence on the API of the class. In specific,
we may have to require that uri always carries 8-bit data and that any
decoding to char16_t, char32_t or wchar_t may throw due to loss of data.

I wouldn't say that filesystems might be hitting the same problems. They
certainly have problems related to encoding, but I'd wouldn't say they're the
same problems. In fact, I'd say that URL/IRI *solves* their problem, to an
extent.

IRI is defined as a Unicode string containing percent-encoded sequences. Those
sequences can be decoded using UTF-8, but any percent-encoded sequence that
can't be decoded as UTF-8 must be left unchanged.

That means URLs with IRI can carry any arbitrary 8-bit byte as well as any
Unicode sequence.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

Jeffrey Yasskin

unread,

Sep 26, 2013, 12:02:45 AM9/26/13

to netwo...@isocpp.org

Yep, http://url.spec.whatwg.org/#writing also allows empty-but-present
usernames, passwords, ports, etc.

> * Niklas - URIs are defined as ASCII
> Arthur - isn't then string the same as u8 string
> Niklas - not for IRIs - they may have unicode characters
> Arthur - SG3 filesystems might be hitting the same problems
>
> I'd like to hear more what was discussed here. This is a major underpinning of
> how URLs work and may have great influence on the API of the class. In specific,
> we may have to require that uri always carries 8-bit data and that any
> decoding to char16_t, char32_t or wchar_t may throw due to loss of data.
>
> I wouldn't say that filesystems might be hitting the same problems. They
> certainly have problems related to encoding, but I'd wouldn't say they're the
> same problems. In fact, I'd say that URL/IRI *solves* their problem, to an
> extent.
>
> IRI is defined as a Unicode string containing percent-encoded sequences. Those
> sequences can be decoded using UTF-8, but any percent-encoded sequence that
> can't be decoded as UTF-8 must be left unchanged.
>
> That means URLs with IRI can carry any arbitrary 8-bit byte as well as any
> Unicode sequence.

What does http://url.spec.whatwg.org/ say about utf-8-ness of URLs?

Thiago Macieira

unread,

Sep 26, 2013, 12:36:51 AM9/26/13

to netwo...@isocpp.org

On quarta-feira, 25 de setembro de 2013 23:02:45, Jeffrey Yasskin wrote:
> > IRI is defined as a Unicode string containing percent-encoded sequences.
> > Those sequences can be decoded using UTF-8, but any percent-encoded
> > sequence that can't be decoded as UTF-8 must be left unchanged.
> >
> > That means URLs with IRI can carry any arbitrary 8-bit byte as well as any
> > Unicode sequence.
>
> What does http://url.spec.whatwg.org/ say about utf-8-ness of URLs?

It says:

'A percent-encoded byte is "%", followed by two ASCII hex digits. Sequences of
percent-encoded bytes, after conversion to bytes, should not cause utf-8
decode to run into any errors.'

and

'To utf-8 percent encode a code point, using an encode set, run these steps:
1. If code point is not in encode set, return code point.
2. Let bytes be the result of running utf-8 encode on code point.
3. Percent encode each byte in bytes, and then return them concatenated, in
the same order.'-

It also explains how to decode code points to arbitrary bytes. If you add that
to the first paragraph I pasted, it indicates that you can decode into UTF-8,
provided you don't produce failures.

Now, from what I remember in WhatWG, it would allow different encodings. RFC
3987 (IRI) is more strict: only UTF-8 is allowed.

Glyn Matthews

unread,

Sep 26, 2013, 6:23:19 AM9/26/13

to netwo...@isocpp.org

On 25 September 2013 23:00, Bob Kuo <Bob...@riverbed.com> wrote:

The minutes for the SG4 meeting in Chicago have been posted at https://github.com/SG4/draft/wiki/Meeting-minutes-20130924

I have a few comments on this discussion.

1. I disagree that key/value pair ordering is insignificant for the same reasons that Thiago outlined in his response. Also, would any of the following be equivalent:

http://example.com/?x=1&x=2

http://example.com/?x=2&x=1

http://example.com/?x=1

http://example.com/?x=2

?

2. According to RFC 3986, section 6 normalization is performed prior to comparison. As I have always understood it, normalization is only done in order to compare URIs. The level of normalization performed is a trade-off between efficiency and false negatives. Therefore, it does not make sense to always normalize URIs when constructing the uri object, because the library does not know to what extent the user wants to make this trade-off. Therefore the distinction between string-comparison and syntax-based normalization remains important, even if I'm not going to propose scheme-specific normalization.

3. Niklas (I think assuming that normalization should be performed during construction) suggested adding a flag to say not to perform this step; I would say we can add a constructor argument that accepts the uri_comparison_level to make sure that normalization is only done explicitly.

4. I also disagree with Kyle's suggestion to make it a class invariant that the URI is normalized. Since normalization involves a trade-off, to what level should the URI always be normalized? Therefore I'd like to keep the normalize and compare functions as I believe they're important to allow users to choose what step of the comparison ladder they want to use.

5. The naming suggestions (make_relative, uri_normalization_level, append_query_key_value_pair (!)): are these strong recommendations? I don't mind make_relative and uri_normalization_level, but I'm unsure about the last.

6. I didn't understand Kyle's point about the order of arguments to make_reference and resolve. I take your point about the lack of clarity of the resolve member function though, and I'll update the doc.

Niklas Gustafsson

unread,

Sep 28, 2013, 9:51:13 AM9/28/13

to netwo...@isocpp.org

Here, I’m only responding to a couple of points below:

Without getting into order being significant or not for a particular scheme, what we asked for was an extension point that would allow scheme-specific comparisons.
It seemed from the text, that URIs constructed from an uri_builder would always be normalized, while URIs constructed direction from a string would not. We felt that whether the string in the URI is normalized or not should be invariant between all methods of construction, and the text must be clear on this.

Niklas

Sent from Windows Mail

--
You received this message because you are subscribed to the Google Groups "SG4 - Networking" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networking+...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/networking/.
For more options, visit https://groups.google.com/a/isocpp.org/groups/opt_out.

Reply all

Reply to author

Forward