should ~ (tilde) be escaped as %7E?

98 views
Skip to first unread message

Bill Bereza

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

I don't think I've seen this question in a faq, but I have seen some
people here mention that tildes (~) should be escaped as %7E in URLs.

Is this the kind of thing that should always be done, or is it just a
recommendation? What would be the reasons for or against this? Is this
covered in any RFCs or FAQs?

--
Bill Bereza ber...@pobox.com http://www.pobox.com/~bereza/

Beware of all enterprises that require new clothes.

Warren Steel

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

Bill Bereza wrote:
> I don't think I've seen this question in a faq, but I have seen some
> people here mention that tildes (~) should be escaped as %7E in URLs.


The tilde is an "unsafe" character according to the
URL specs: http://www.w3.org/Addressing/rfc1738.txt

" Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character "%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".

All unsafe characters must always be encoded within a URL. "


> Is this the kind of thing that should always be done, or is it just a
> recommendation? What would be the reasons for or against this? Is this
> covered in any RFCs or FAQs?


Although the "gateways and other transport agents" that
will alter the tilde character are now so few as to be
negligible, and an overwhelming number of those who have
such addresses (including myself) give them out in "raw"
unescaped form, I am forced to reconsider this pecadillo
for other reasons. In my site logs I have noticed an
increase in errors due to the mistyping of the tilde:
/-mudws /_mudws /=mudws etc. I am informed that
some national keyboards, especially in Scandinavia, do
not even contain the tilde character, and must produce
it by more complex means. More to the point, the tilde
continues to prove troublesome to the non-computer savvy--
it is absent from many typewriters in the USA, and from
many newspaper fonts. When my URL is quoted in a
newspaper article, it is frequently printed incorrectly,
or entered incorrectly by readers who attempt to follow
it. This gives me pause, and I must consider should I
give out my URL as http://www.mcsr.olemiss.edu/%7Emudws/ ?

I cannot answer this question. The combination
/%7Emudws also proves troublesome to many--the % is
often misread as a & or other symbol, and the introduction
of mixed cases to the case-sensitive path segment adds
another danger, and /%7EMUDWS is clearly wrong ( /%7emudws
is theoretically correct). The one time I gave the "escaped"
URL to a newspaper, it was garbled as badly as the tilde
version.

--
Warren Steel mu...@olemiss.edu
Department of Music University of Mississippi
http://www.mcsr.olemiss.edu/~mudws/

Abigail

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

Warren Steel (mu...@olemiss.edu) wrote on 1492 September 1993 in
<URL: news:343292...@olemiss.edu>:
++ Bill Bereza wrote:
++ > I don't think I've seen this question in a faq, but I have seen some
++ > people here mention that tildes (~) should be escaped as %7E in URLs.
++
++
++ The tilde is an "unsafe" character according to the
++ URL specs: http://www.w3.org/Addressing/rfc1738.txt
++
++ " Characters can be unsafe for a number of reasons. The space
++ character is unsafe because significant spaces may disappear and
++ insignificant spaces may be introduced when URLs are transcribed or
++ typeset or subjected to the treatment of word-processing programs.
++ The characters "<" and ">" are unsafe because they are used as the
++ delimiters around URLs in free text; the quote mark (""") is used to
++ delimit URLs in some systems. The character "#" is unsafe and should
++ always be encoded because it is used in World Wide Web and in other
++ systems to delimit a URL from a fragment/anchor identifier that might
++ follow it. The character "%" is unsafe because it is used for
++ encodings of other characters. Other characters are unsafe because
++ gateways and other transport agents are known to sometimes modify
++ such characters. These characters are "{", "}", "|", "\", "^", "~",
++ "[", "]", and "`".
++
++ All unsafe characters must always be encoded within a URL. "
++
++
++ > Is this the kind of thing that should always be done, or is it just a
++ > recommendation? What would be the reasons for or against this? Is this
++ > covered in any RFCs or FAQs?
++
++
++ Although the "gateways and other transport agents" that
++ will alter the tilde character are now so few as to be
++ negligible, and an overwhelming number of those who have
++ such addresses (including myself) give them out in "raw"
++ unescaped form, I am forced to reconsider this pecadillo
++ for other reasons. In my site logs I have noticed an
++ increase in errors due to the mistyping of the tilde:
++ /-mudws /_mudws /=mudws etc. I am informed that
++ some national keyboards, especially in Scandinavia, do
++ not even contain the tilde character, and must produce
++ it by more complex means. More to the point, the tilde
++ continues to prove troublesome to the non-computer savvy--
++ it is absent from many typewriters in the USA, and from
++ many newspaper fonts. When my URL is quoted in a
++ newspaper article, it is frequently printed incorrectly,
++ or entered incorrectly by readers who attempt to follow
++ it. This gives me pause, and I must consider should I
++ give out my URL as http://www.mcsr.olemiss.edu/%7Emudws/ ?
++
++ I cannot answer this question. The combination
++ /%7Emudws also proves troublesome to many--the % is
++ often misread as a & or other symbol, and the introduction
++ of mixed cases to the case-sensitive path segment adds
++ another danger, and /%7EMUDWS is clearly wrong ( /%7emudws
++ is theoretically correct). The one time I gave the "escaped"
++ URL to a newspaper, it was garbled as badly as the tilde
++ version.


You might also want to consider software that parses text and extracts
anything that looks like a URL. I have once written such a program,
and it implements RFC 1738 to the spec.

Hence, it would grab "http://www.mcsr.olemiss.edu/" as an URL, when it
sees "http://www.mcsr.olemiss.edu/~mudsws/"; after all, a ~ cannot be
part of an URL, and what's in front of it is a valid URL. Why rely on
the assumption no software is implemented according to the specifications?

Of course, it would be much, much simpler to instruct once server
to map http://www.server.com/name/ to the WWW directory user 'name'
has. No need for tildes or %7E's then.


Abigail
--
Anyone who slaps a "this page is best viewed with Browser X" label
on a Web page appears to be yearning for the bad old days, before the
Web, when you had very little chance of reading a document written on
another computer, another word processor, or another network.
[Tim Berners-Lee in Technology Review, July 1996]

Toby Speight

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

-----BEGIN PGP SIGNED MESSAGE-----

Bill> Bill Bereza <URL:mailto:ber...@pobox.com>

> In <URL:news:34327C4C...@pobox.com>, Bill wrote:

Bill> I don't think I've seen this question in a faq, but I have seen
Bill> some people here mention that tildes (~) should be escaped as
Bill> %7E in URLs.

Right. As specified in RFC 1738.


Bill> Is this the kind of thing that should always be done, or is it
Bill> just a recommendation?

You'll probably get by using the raw form most of the time, (though I
don't recommend it) particularly as the value of an attribute in HTML.

"~" is one of the so-called "national characters" in ASCII, and AFAIK
it isn't present in all variants. This means that it may get mangled
in email, for example (though, thankfully, much of Europe is using
ISO-8859.x these days, so there is less opportunity for mangling then
there was a few years ago). People can find difficulties with it in
print, too - either by mistranscribing the character, or through it
being unavailable on the user's keyboard.

Some other characters cause problems; I was recently mailed a URL which
contained unescaped "(" and ")" instead of %28 and %29; my mailreader
did not correctly identify the URL in the message. I've had similar
problems with unescaped colons (%3A), too. RFC 1738 also recommends
writing URLs in "<URL:...>" brackets to aid machine recognition.


Bill> Bill Bereza ber...@pobox.com http://www.pobox.com/~bereza/

That would be <URL:http://www.pobox.com/%7Ebereza/>, then.

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface

iQB1AwUBNDKxH+dsuUurvcRtAQGcgQMAtjsdWMTM65DtlGuwQyTk3FC9rCHGuinq
OvXsepBYDMKCmse42Sb6r9EA4n1rasHH1+qfmA2diRIOaNxpQdSziKLtr/jVXkg/
GRoj7mdy+OHwxPN2bKCGtpf1tRSCLGRM
=AnIt
-----END PGP SIGNATURE-----

--
"You can ... sell your soul for complete control -
is that really what you need?" (Pink Floyd)

Jukka Korpela

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

Bill Bereza <ber...@pobox.com> writes:

> I don't think I've seen this question in a faq, but I have seen some
> people here mention that tildes (~) should be escaped as %7E in URLs.

Yep. It's actually not a very frequently asked question. But there is a
FIR (Frequently Ignored Requirement) involved.

> Is this the kind of thing that should always be done, or is it just a


> recommendation? What would be the reasons for or against this? Is this

> covered in any RFCs or FAQs?

It is an explicit requirement in the RFC 1738, which is still the official
RFC for URLs, despite some _proposals_ to make a new one (including,
according to a draft, a relaxation of encoding requirements.) Even if
the URL syntax is changed, the encoded notation is certainly going
to remain as a legal _alternative_.

(See http://www.hut.fi/home/jkorpela/HTML3.2/3.5.html for a very short
summary of _some_ requirements of that RFC.)

Reasons? Well, the RFC mentions a few of them. Practically, on the Web
using plain ~ in URLs seldom causes problems. But on the other hand,
tilde often gets distorted by various programs like gateways or
text processing programs - or by people*)! One reason to that is that tilde
is among the dozen or so Ascii characters which are often replaced by national
letters in national variants of Ascii. I have seen e.g. a tilde changed to the
German letter ü (u umlaut) when passing thru two gateways (even without
visiting Germany:-).

*) When did you last see a _correctly_ cited URL in your local
newspaper? It's almost hopeless when journalists write them by hand.
In my experience, they get ~ wrong more than half of the time.

One can also argue that the definitely most usual use of ~ in URLs like
http://www.hut.fi/~jkorpela/
is a strange Unixism. In such contexts, ~user does not even have the
same meaning as in some Unix shells (home directory of user), since
it's really a _subdirectory_ of user's home directory that the URL
refers to. People have really got confused with this. Naturally,
people with no Unix background may have difficulties in realizing what
the funny symbol ~ stands for.

To conclude, I _strongly recommend_
- escaping tildes in URLs
- asking your webmaster to support referring to personal Web pages
with notations which do not require the tilde character in any form
(naturally as an _alternative_ to the tilde form if it is already
in use at the host).

Yucca, http://www.hut.fi/u/jkorpela/


Christopher Davis

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

A> == Abigail <abi...@fnx.com>

A> Of course, it would be much, much simpler to instruct once server
A> to map http://www.server.com/name/ to the WWW directory user 'name'
A> has. No need for tildes or %7E's then.

For a little more hierarchy, one can use /users/<username>, or
/homepages/<username>, or whatever one prefers. Obviously how to do
this is server-specific and best discussed on c.i.w.servers.* though.

--
Christopher Davis <c...@kei.com> <URL: http://www.kei.com/homepages/ckd/ >
Geographic locations in DNS! <URL: http://www.kei.com/homepages/ckd/dns-loc/ >

Nick Kew

unread,
Oct 5, 1997, 3:00:00 AM10/5/97
to

In article <6183pr$5...@mine.informatik.uni-kiel.de>,
ca+po...@informatik.uni-kiel.de (Claus Assmann) writes:
> Jukka Korpela writes:

>
>>Bill Bereza writes:
>
>>> I don't think I've seen this question in a faq, but I have seen some
>>> people here mention that tildes (~) should be escaped as %7E in URLs.
>
>>Yep. It's actually not a very frequently asked question. But there is a
>>FIR (Frequently Ignored Requirement) involved.
>
> There is a problem with encoding the tilde as %7E:
> some archiving systems encode the '%' :-(

So some archiving systems are broken. What's more important: them or the Web?

> Whatever I do, someone has a problem with it:
> when I used '~', some guy from our University complained
> that this character wasn't on his keyboard,
> when I use '%7E', some search/archiving engines cause troubles...

:-)
If you escape it, you're doing the right thing and can take the high ground
in your arguments. Which is in any case no bad thing, as we (collectively)
_should_ be bringing pressure on those responsible for broken software to
fix it.

> Once I wrote an e-mail to the maintainer of such a site:
> he answered I should use '~' :-(

Don't stand for it. Make sure you've had a sunday-lunch bottle of something
before replying :-) And remember, you're in the right!

--
Nick Kew
WebThing virtual office: personal and groupware desktop on the Web
Mail Client, Mail Server, Calendar Server, FileServer, Conferencing
- <URL:http://www.webthing.com/>

Stan Brown

unread,
Oct 6, 1997, 3:00:00 AM10/6/97
to

In article <oiyb4c3...@torvi.hut.fi>, jkor...@torvi.hut.fi (Jukka
Korpela) wrote:
>
>One can also argue that the definitely most usual use of ~ in URLs like
>http://www.hut.fi/~jkorpela/
>is a strange Unixism. In such contexts, ~user does not even have the
>same meaning as in some Unix shells (home directory of user), since
>it's really a _subdirectory_ of user's home directory that the URL
>refers to.

It can be even stranger than that.

On Concentric Network (www.concentric.net), my home directories are on
one disk but the ~username in a URL points to /U1/letter/username where
"letter" is the first letter of the username. Example: /U1/B/Brownsta.
However, the server returns a "not found" if you attempt to access
/U1/B/Brownsta from the WWW.

--

Sorry for the inconvenience, but spam mails outnumbered real mails in
my box and I've been forced to edit my address. Please remove leading
capital letters if you wish to reply -- but don't send bulk email.
Stan Brown, Oak Road Systems, Cleveland, Ohio, USA
http://www.concentric.net/~Brownsta/

Reply all
Reply to author
Forward
0 new messages