W3C or other reference saying that canonical URIs should be lowercase-only

3 views
Skip to first unread message

Mike Jones

unread,
Feb 17, 2009, 7:26:34 PM2/17/09
to Drummond Reed, John Bradley, icf-wg-...@googlegroups.com

Do any of you have a reference to a doc that states the rule that URIs such as those we use for schema URIs should use lower-case only?

 

                                                                Thanks,

                                                                -- Mike

 

John Bradley

unread,
Feb 17, 2009, 7:55:00 PM2/17/09
to icf-wg-...@googlegroups.com
Mike,

Relitive to Rel types in link headers the topic of browsers comparing short names in a case insensitive manor has come up recently. 

In the http range-14 finding the examples they give show mixed case.

Ironically the link to the current version of the doc seems to be broken:)

I refer to that because the claim URI are conceptually "non-information resources" in tag speak.
That is why we want the URI to return 303.

I don't think there has ever been a strong recommendation against mixed case by the W3C.

I would say however that as URI path are supposed to be case sensitive,  it may well be advisable to stick with a convention like all lower case to reduce confusion

=jbradley

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "ICF-WG-Schemas" group. 
To post to this group, send email to icf-wg-...@googlegroups.com 
To unsubscribe from this group, send email to icf-wg-schema...@googlegroups.com 
For more options, visit this group at http://groups.google.com/group/icf-wg-schemas?hl=en
-~----------~----~----~----~------~----~------~--~---



Mike Jones

unread,
Feb 17, 2009, 8:00:46 PM2/17/09
to icf-wg-...@googlegroups.com

OK, can you then point me to a reference that says that URI comparison for our kinds of purposes is supposed to be case sensitive?

 

                                                                Thanks,

                                                                -- Mike

John Bradley

unread,
Feb 17, 2009, 9:23:49 PM2/17/09
to icf-wg-...@googlegroups.com
This can and has started wars.

I will do my best to find references.  

The http spec 3.2.3 states that when comparing two URI to decide if they match or not, a client SHOULD use a case sensitive octet-by-octet comparison.

In HTML4 sec 6.4 it states that URIs in general are case sensitive.
http://www.w3.org/TR/html4/types.html#h-6.4

The http scheme is clear that http://example.com/foo and http://example.com/FOO are two different URI,  an agent say a web server can choose to serve the same document for them but they are different URI.

In XML namespaces are treated as case sensitive strings.  So http://www.example.org/wine and http://www.Example.org./wine are different namespaces even though they are the same http URI.   Fun stuff:) 

SAML core 1.2.4 implies that assertions are case sensitive.  Though I want to ask a couple of people about that.

openID AX implies that it is case sensitive but I don't know what OPs are actually implementing.

From a making the world work point of view,  I vote for comparing them all case insensitive by the selector and IDP.   Or as I suspect you were thinking normalizing them to all lower case.

The recent messages on the TAG list relate to REL types in link headers. 

Mark's link header spec is a dependency for the XRD spec that is why I was following the recent discussion.

Given that we have not precluded IRI the notion of anything other than a exact string match is a challenge.

If we exclude IRI and escaped characters then a case normalization is possible.

We need selectors to behave predictably when someone in china defines a claim in UTF-8.

I am probably OK with the selector not processing it if that is what we want.

=jbradley


Mike Jones

unread,
Feb 17, 2009, 9:45:59 PM2/17/09
to icf-wg-...@googlegroups.com

Thanks for the references.  I also have a lingering memory somewhere that Drummond knew of another best practice document that also recommended not using any characters but lowercase letters, digits, dash, and I think maybe @ and a few other characters in schema URIs.  Do any of you know the reference that I’m talking about?

 

                                                                Thanks again,

John Bradley

unread,
Feb 17, 2009, 9:49:33 PM2/17/09
to icf-wg-...@googlegroups.com
The XML namespace document recommends against using anything that might get % encoded but it doesn't say anything about case.

Drummond Reed

unread,
Feb 23, 2009, 3:08:29 PM2/23/09
to Mike Jones, John Bradley, icf-wg-...@googlegroups.com

Sorry, Mike, I was behind on email from this list. The “doc” that specifies the lowercase restriction is the ABNF published on the wiki:

 

            https://informationcard.net/wiki/index.php/Claim_URI_Syntax

 

If we need to publish this in a more developer-friendly format, e.g., as a “Claims Best Practices Guide”, we can figure out the best way to do that – the page above was meant to be primarily for our internal use on this WG.

 

RE other references on this topic of optimizing URIs for comparision, one I know of is the XDI.org Global Services Specification document, where section 4.2.1 explains the rational XDI.org used to restrict the allowed syntax for reassignable XRI (i-name) registration. The document is available at http://gss.xdi.org/moin.cgi/FrontPage?action=AttachFile&do=get&target=gss-v1.0.pdf, but to save time, I’m copying section 4.2.1 below.

 

This isn’t exactly apples-to-apples because: a) XDI.org’s scope is XRIs (although XRIs transform to URIs so the syntax restrictions end out being identical), and b) XDI.org had to take into account human transcription issues that may be less relevant to URIs-as-claim-identifiers (although some would argue that preventing human transcription errors is still important).

 

Hope this helps,

 

=Drummond

 

4.2.1 Syntax and Normalization

1330 An I-Name object MUST contain a non-null XRI that conforms to the requirements of

1331 reassignable XRI syntax in XRI Normal Form as specified by XRI Syntax 2.0 [XRISyntax]. This

1332 includes the following requirements.

1333 An I-Name MUST be NFKC normalized.

1334 An I-Name MUST be UTF-8 encoded.

 

1335 For NFCK normalization, I-Brokers SHOULD apply the following formula where the I-Name string

1336 = X:

 

1337 X:R(X) = NFKC(toCasefold(NFD(X)))

 

1338 Although the GRS will perform internal normalization according to this same formula, I-Brokers

1339 SHOULD apply these normalization and encoding rules prior to making any XRI EPP request to

1340 the GRS. Failure to do so could result in registration of a different i-name string in the GRS than

1341 the I-Broker or Registrant expects.

 

1342 Besides these standard XRI normalization requirements, to establish a high degree of

1343 interoperability within the XRI resolution community rooted on the XDI.org GRS, the following

1344 additional syntax and normalization rules apply both to Global I-Names and Delegated I-Names

1345 (Community I-Names delegated by Registrants).

 

1346 1. A Global I-Name MUST NOT contain a percent character (“%”), and therefore may not

1347 contain any XRI reserved or excluded characters that require percent encoding, as this

1348 prevents the use of characters whose native display forms would be confusing or

1349 misleading to human users (including whitespace).

 

1350 2. A Global I-Name MUST NOT contain an underscore (“_”) or tilde (“~”). The reasons for

1351 this are:

1352 a) The W3C recommends against using underscores in URIs because they are easily

1353 confused with spaces when underlined (such as in a hyperlink).

1354 b) The visual representation of tilde is too easily confused with hyphen (“-”), which is

1355 currently allowed in DNS name syntax and is thus more desirable as a logical separation

1356 character than tilde.

 

1357 3. A Global I-Name or a Delegated I-Name MUST NOT begin or end in an XRI reserved

1358 character, a dot (“.”), or a hyphen (“-”).

 

1359 4. A Global I-Name or a Delegated I-Name MUST NOT contain two or more consecutive

1360 dots (“.”), or hyphens (“-”).

 

1361 5. A Global I-Name MUST NOT contain a Cross-Reference. (This policy may be relaxed in

1362 future versions of the GSS.)

 

1363 6. A Delegated I-Name MAY contain a Cross-Reference that conforms to these same

1364 syntax and normalization policies.

 

1365 7. A Global I-Name, or a Delegated I-Name other than a Cross-Reference, MUST NOT

1366 exceed 254 bytes.

 

1367 Note that whitespace of any kind is not allowed an XRI in unescaped form. I-Brokers SHOULD

1368 recommend that Registrants or Delegates use a dot (“.”) as the primary replacement for

1369 whitespace in an I-Name. If a dot is not desired, I-Brokers SHOULD recommend the use of a

1370 hyphen (“-”) as an alternative logical separator.

 

 

 

 


Reply all
Reply to author
Forward
0 new messages