I've just filed a bug against the libxml2 Debian package [1],
though having taken a glance at the code (uri.c as of 2ee91eb6),
the related issue doesn't seem to be specific to Debian.
The problem is that xmlSaveUri () for some reason insists on
having all the colons (but the first one, should it look like a
DOS "drive" part of the filename specification) percent-encoded,
and superfluous "//" added after the scheme. Like (URN example
from [2]):
URI: urn:example:animal:ferret:nose
becomes: urn://example%3Aanimal%3Aferret%3Anose
While I can see that adding "//" for the file: scheme makes
sense (given also that the path part of the URI begins with a
"/"), is there a good reason for the colons to be escaped?
[1]
http://bugs.debian.org/652866
[2]
http://tools.ietf.org/html/rfc3986#section-3
Contributing to the effect is that the xmlParseURI () function
decodes the percent-encoded octets by default, thus making the
colons used as native URI separators indistinguishable from the
ones used as part of the path components.
Curiously enough, Perl's URI module handles these cases
correctly (see below), though it doesn't apparently try to
decode the percent-encoded octets.
I wonder, are the other URI parsing & serialization libraries
affected by any similar flaws?
TIA.
$ perl -e 'use strict;
use warnings;
use URI;
my @x = qw (urn:example:animal:ferret:nose
urn:example:%3a
file:/c:/d%3a:e);
print map {
my $u = URI->new ($_);
("<", $u->canonical (), ">;\tpath: ", $u->path (), "\n");
} (@x);'
<urn:example:animal:ferret:nose>; path: example:animal:ferret:nose
<urn:example:%3A>; path: example:%3a
<file:///c:/d%3A:e>; path: /c:/d%3a:e
$
--
FSF associate member #7257