Not corrections, just follow-up comments:
* Niklas - doesn't do HTTP specific comparisons, e.g.
query string key/value pair ordering is insignificant
separator could be ampersand or a colon
Order is significant. The pair separator and the value separator can be any
character reserved as sub-delim. The most common are, of course, '&' and '='.
I recommend just hardcoding those two for now, with an option to expand the
API in the future.
* Kyle - how would we implement a mechanism to do scheme specific comparison?
I strongly urge dropping the feature. Scheme-specific comparison sounds like a
death-trap to me, a potential mine-field of subtly incompatible
implementations. What's more, they're not very useful without actual network
access.
* Kyle - is std::optional necessary for all parts of a URI
can string_view represent a blank string?
Arthur - some of fields these are mandatory
Niklas - query and fragment could be present but blank
Technically, all fields in a URL except the scheme can be present but blank,
except for the path. In this sense, the authority and host are confounded.
- authority (and host) absent vs authority (and host) blank
scheme: scheme://
scheme:/ scheme:///
- userinfo absent vs userinfo blank
scheme://host scheme://@host
- port absent vs port blank
scheme://host scheme://host:
Strictly speaking, each pair has two different URLs, though most browsers will
collapse one to the other. For QUrl, I have consciously decided not to make
the distinction for the latter two, and we consciously always create file:///
URLs (present-and-blank authority). We preserve the distinction for the query
and fragment and we offer an API for it too. But we also reserve the right to
change behaviour if needed.
For an ISO recommendation that may last a couple of decades, with uses we
cannot predict, we may have to opt for the strict interpretation.
* Niklas - URIs are defined as ASCII
Arthur - isn't then string the same as u8 string
Niklas - not for IRIs - they may have unicode characters
Arthur - SG3 filesystems might be hitting the same problems
I'd like to hear more what was discussed here. This is a major underpinning of
how URLs work and may have great influence on the API of the class. In specific,
we may have to require that uri always carries 8-bit data and that any
decoding to char16_t, char32_t or wchar_t may throw due to loss of data.
I wouldn't say that filesystems might be hitting the same problems. They
certainly have problems related to encoding, but I'd wouldn't say they're the
same problems. In fact, I'd say that URL/IRI *solves* their problem, to an
extent.
IRI is defined as a Unicode string containing percent-encoded sequences. Those
sequences can be decoded using UTF-8, but any percent-encoded sequence that
can't be decoded as UTF-8 must be left unchanged.
That means URLs with IRI can carry any arbitrary 8-bit byte as well as any
Unicode sequence.
--
Thiago Macieira - thiago (AT)
macieira.info - thiago (AT)
kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358