email address obfuscation

45 views
Skip to first unread message

dorayme

unread,
Oct 10, 2006, 6:18:39 PM10/10/06
to
Anyone here using methods to make it more difficult for spammers
to garner email addresses from web pages. Mostly interested to
hear from anyone using specific methods (rather than anything
else like further reviews, analyses of the ultimate effectiveness
etc, having things like "removeThis" inside the email address
that is in the "mailto:").

I had a client recently ask me to "do something" about the spam
coming from his website. I want to do better than tell him to get
the best spam filter he can, both on his local and on his server
end via his host. There is a javascript thing I used to use but
these days I am interested in being able to get by without it.
so, any suggestions will be welcome, especially if they are
actually being used by the suggester (not someone else they know
or have heard of... take this as a compliment)

--
dorayme

Joe

unread,
Oct 10, 2006, 7:55:14 PM10/10/06
to
In article <doraymeRidThis-BBFC72.08183911102006@news-
vip.optusnet.com.au>, dorayme...@optusnet.com.au says...

> Anyone here using methods to make it more difficult for spammers
> to garner email addresses from web pages.
> ...
>
I've been using the 'hash entity' method for years. Seems to work, but
as it's used in harness with spam filter on my email proggy and isp, I
don't really know. I replace about half the addy with entities,
especially the . and @ , but how far you go is up to you. It can't hurt
to try.
Anyway, check it at http://graspages.cjb.cc/emailme.php

dorayme

unread,
Oct 10, 2006, 8:27:30 PM10/10/06
to
In article <MPG.1f96d6c2f...@news.aardvark.net.au>,
Joe <joedi...@yahoo.com.au> wrote:

Thanks Joe. I found something to make your technique easier at
http://www.wbwip.com/wbw/emailencoder.html and have already used
it in anger just now and it is working on a client's site. I am
imagining it already side-swiping all attempts by vicious bots to
"harvest" it, I have the image of a rugby player at full bore
with the ball, fending off all attempts to tackle him. You know,
his arm and hand stretched out to push all comers away as Rugby
players do... (I know how analogies tickle you pink...)

Just what I wanted, someone to say something to get me going!

--
dorayme

Nikita the Spider

unread,
Oct 10, 2006, 11:27:53 PM10/10/06
to
In article
<doraymeRidThis-BBF...@news-vip.optusnet.com.au>,
dorayme <dorayme...@optusnet.com.au> wrote:

> Anyone here using methods to make it more difficult for spammers
> to garner email addresses from web pages. Mostly interested to
> hear from anyone using specific methods (rather than anything
> else like further reviews, analyses of the ultimate effectiveness
> etc, having things like "removeThis" inside the email address
> that is in the "mailto:").

I've set up several spamtrap addresses to study this. Eventually I'll
write a short article about my findings, but in the meantime I'll
summarize here. I have three email addresses all on the same page. One
is naked (i.e. just f...@example.com), one is entity encoded (i.e.
&#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
The number of spams each has gotten to date is as follows:

naked - 715
entities - 2
javascript - 1

In short, the entities look pretty effective to me. They're nice because
they don't disturb one's visitors at all and you don't have to mess
around with any Javascript.

But another way of looking at it is to say that Javascript protection is
twice as effective as entity protection. =) (Thanks to Huff's "How to
Lie with Statistics")

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

dorayme

unread,
Oct 11, 2006, 12:54:30 AM10/11/06
to
In article
<NikitaTheSpider-D7...@news-rdr-02-ge0-1.southeas
t.rr.com>,

Nikita the Spider <NikitaT...@gmail.com> wrote:

> In article
> <doraymeRidThis-BBF...@news-vip.optusnet.com.au>,
> dorayme <dorayme...@optusnet.com.au> wrote:
>
> > Anyone here using methods to make it more difficult for spammers
> > to garner email addresses from web pages. Mostly interested to
> > hear from anyone using specific methods (rather than anything
> > else like further reviews, analyses of the ultimate effectiveness
> > etc, having things like "removeThis" inside the email address
> > that is in the "mailto:").
>
> I've set up several spamtrap addresses to study this. Eventually I'll
> write a short article about my findings, but in the meantime I'll
> summarize here. I have three email addresses all on the same page. One
> is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> &#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
> The number of spams each has gotten to date is as follows:
>
> naked - 715
> entities - 2
> javascript - 1
>
> In short, the entities look pretty effective to me. They're nice because
> they don't disturb one's visitors at all and you don't have to mess
> around with any Javascript.
>

Yes, excellent. My feelings too on this one.

> But another way of looking at it is to say that Javascript protection is
> twice as effective as entity protection. =) (Thanks to Huff's "How to
> Lie with Statistics")

People can and do look at things as they like! But the truth is
another matter.

It would be nice to actually know how the 2 and 1 got through...
This brings up this issue: just this morning, there was some post
here at alt.html re a facility to somehow capture material on a
screen (it is gone from my newsreader now). Though the email is
veiled in the source, it is not in the browser as expressed. It
is commonly just printed as normal on the screen. Sure, this bit
can be avoided by simple techniques like making the visible link
something like ...>email us</a>? To avoid any "on screen
harvesting"?

But, this is not always acceptable. I have no idea how the robots
work, how clever they are, whether they in fact look at source or
output or both. Your stats would be more meaningful if you could
say more about the implementation. Interesting experiment though,
Spider. Look forward to your article.

--
dorayme

Beauregard T. Shagnasty

unread,
Oct 11, 2006, 1:21:43 AM10/11/06
to
Nikita the Spider wrote:

> I've set up several spamtrap addresses to study this. Eventually I'll
> write a short article about my findings, but in the meantime I'll
> summarize here. I have three email addresses all on the same page.
> One is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> &#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
> The number of spams each has gotten to date is as follows:
>
> naked - 715
> entities - 2
> javascript - 1

I'll agree that using entities works. I have one address on a web site
that began life in this form. Never got any spam in about six years.

Then one day, I started getting bounces from emails containing viruses.
I found out that someone who added my address to his address book got
infected. My address was used as a forged FROM: by this virus. Shortly
after that, I started to get spam and it's hovering around 200-250 per
day now. :-(

--
-bts
-Motorcycles defy gravity; cars just suck

jojo

unread,
Oct 11, 2006, 1:26:43 AM10/11/06
to
Joe wrote:

> I've been using the 'hash entity' method for years. Seems to work, but
> as it's used in harness with spam filter on my email proggy and isp, I
> don't really know. I replace about half the addy with entities,
> especially the . and @ , but how far you go is up to you. It can't hurt
> to try.
> Anyway, check it at http://graspages.cjb.cc/emailme.php

You can improve that: use HTML-Entities for "mailto:" and hex-entities
(%41 for A) for the email-adress itself.

John Dunlop

unread,
Oct 11, 2006, 4:38:07 AM10/11/06
to
[re e-mail address obfuscation]

jojo:

> You can improve that: use HTML-Entities for "mailto:" and hex-entities
> (%41 for A) for the email-adress itself.

...the one going against if not the word then the spirit of HTML4.01,
the other against the spirit of RFC3986. Character references were
made for when it is inconvenient or impossible to enter a character
directly, for example, when there is no key for it on the keyboard or
the character isn't displayable.

| A given character encoding may not be able to express
| all characters of the document character set. For such
| encodings, or when hardware or software configurations
| do not allow users to input some document characters
| directly, authors may use SGML character references.

(HTML4.01 sec.5.3)

Percent-encoding characters that are allowed as data in a URL part
hinders transcription because characters that could otherwise be
recognisable and rememberable have been, unless you're familiar with
US-ASCII and hexadecimal notation, turned into unrecognisable and
harder-to-remember three-character sequences. That your browser
silently decodes percent-encodings and presents you with a more
human-friendly URL suggests that e-mail address harvesters can do a
similar job.

Principles of URL design take into consideration human factors because
URLs are part of the user-interface. Obfuscating URLs with
percent-encodings makes things harder for humans while barely
increasing the hardship on e-mail address harvesters.

Obfuscation of e-mail addresses is just that: obfuscation. It does
nothing to help the genuine user find and use your e-mail address.
Attempts at obfuscating e-mail addresses - likewise attempts at
obfuscating markup - are trivial to bypass, even by e-mail address
harvesters. I should emphasize that I'm not saying that attempts at
obfuscation will universally fail, only that it takes little effort to
overcome them.

My advice, if you're not keen on actively fighting spam, would be to
either set up junk mail filters both at your server and at your MUA, or
remove the address from the public eye altogether.

--
Jock

Brian Cryer

unread,
Oct 11, 2006, 5:24:02 AM10/11/06
to
"Nikita the Spider" <NikitaT...@gmail.com> wrote in message
news:NikitaTheSpider-D7...@news-rdr-02-ge0-1.southeast.rr.com...

> In article
> <doraymeRidThis-BBF...@news-vip.optusnet.com.au>,
> dorayme <dorayme...@optusnet.com.au> wrote:
>
>> Anyone here using methods to make it more difficult for spammers
>> to garner email addresses from web pages. Mostly interested to
>> hear from anyone using specific methods (rather than anything
>> else like further reviews, analyses of the ultimate effectiveness
>> etc, having things like "removeThis" inside the email address
>> that is in the "mailto:").
>
> I've set up several spamtrap addresses to study this. Eventually I'll
> write a short article about my findings, but in the meantime I'll
> summarize here. I have three email addresses all on the same page. One
> is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> &#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
> The number of spams each has gotten to date is as follows:
>
> naked - 715
> entities - 2
> javascript - 1

Given how easy it is to translate I'm amazed that the encoded version is so
effective. Just goes to show that spammers are stupid as well as sad.
--
Brian Cryer
www.cryer.co.uk/brian

Brian Cryer

unread,
Oct 11, 2006, 5:33:49 AM10/11/06
to
"dorayme" <dorayme...@optusnet.com.au> wrote in message
news:doraymeRidThis-BBF...@news-vip.optusnet.com.au...

I'm sure you already know this, but: Whatever technique you decide to use
(unless you go the route of a better spam filter) be sure to ditch the
existing email address. Once you are on spammer's mailing list its unlikely
that you will ever get off it. So there is no point deploying a
"super-anti-spam" technique with an email address that already gets tons of
spam.
--
Brian Cryer
www.cryer.co.uk/brian


cwdjrxyz

unread,
Oct 11, 2006, 12:05:40 PM10/11/06
to

Several methods that work at least somewhat have been mentioned. Most
of us likely need several email addresses. I have noticed that many
large companies use addresses that can not be answered for contacting
people. All questions have to go to the main address. Some like to use
CGI feedback forms without a mention of a specific address. However
this is not without risk, since a virus can be fed to a server in this
way unless the CGI feedback is not very carefully constructed. There
are people who will put a scripted virus in the feedback box. Limiting
the size of the feedback and not allowing it to contain script helps in
this respect. And of course, do not use a good address on Usenet posts.
I use one at my domain for posting that does not allow any response -
everything is dumped. Then I have addresses used only for friends,
finance, etc. These seldom get spam, so I usually do not have to
configure to allow only mail from those on a list.

Nikita the Spider

unread,
Oct 11, 2006, 1:52:51 PM10/11/06
to
In article <xKudndrPlrGuJbHY...@pipex.net>,
"Brian Cryer" <brian...@127.0.0.1.ntlworld.com> wrote:

I was also surprised by this result, but I can think of two reasons why
harvesting bots might ignore any non-naked addresses, even if they're
easy to translate. First, the harvesters might feel that anyone who is
savvy enough to obfuscate his email address isn't likely to respond to
spam anyway. Second, the harvesters might see no shortage of
un-obfuscated addresses, so why go to the trouble of harvesting the
small number of obfuscated ones? It's this latter theory that I prefer
because laziness is a powerful (and common) motivator.

Nikita the Spider

unread,
Oct 11, 2006, 2:00:53 PM10/11/06
to
In article
<doraymeRidThis-690...@news-vip.optusnet.com.au>,
dorayme <dorayme...@optusnet.com.au> wrote:

> In article
> <NikitaTheSpider-D7...@news-rdr-02-ge0-1.southeas
> t.rr.com>,
> Nikita the Spider <NikitaT...@gmail.com> wrote:
> > I've set up several spamtrap addresses to study this. Eventually I'll
> > write a short article about my findings, but in the meantime I'll
> > summarize here. I have three email addresses all on the same page. One
> > is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> > &#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
> > The number of spams each has gotten to date is as follows:
> >
> > naked - 715
> > entities - 2
> > javascript - 1
> >
> > In short, the entities look pretty effective to me. They're nice because
> > they don't disturb one's visitors at all and you don't have to mess
> > around with any Javascript.
> >

> It would be nice to actually know how the 2 and 1 got through...

One of the two was a standard 419 scam (see http://www.419eater.com/ if
you're not familiar with these) so I could believe that an actual human
clicked on the link. But they one that got through to both the
Javascript- and entity-protected one was a garden variety spam. It
really surprises me that I got only one. I figured that once I was on
the list, the floodgates would open.


> But, this is not always acceptable. I have no idea how the robots
> work, how clever they are, whether they in fact look at source or
> output or both.

I'd be surprised if any do more than look through the source.

> Your stats would be more meaningful if you could
> say more about the implementation. Interesting experiment though,
> Spider. Look forward to your article.

Thanks, will explain methodology, implementation, etc. and post a link
to the article here eventually.

John Dunlop

unread,
Oct 11, 2006, 2:22:22 PM10/11/06
to
cwdjrxyz:

> And of course, do not use a good address on Usenet posts.

Rubbish.

--
Jock

dorayme

unread,
Oct 11, 2006, 5:58:19 PM10/11/06
to
In article <m4OdnWRVb-X...@pipex.net>,
"Brian Cryer" <brian...@127.0.0.1.ntlworld.com> wrote:

> "dorayme" <dorayme...@optusnet.com.au> wrote in message
> news:doraymeRidThis-BBF...@news-vip.optusnet.com.au...
> > Anyone here using methods to make it more difficult for spammers
> > to garner email addresses from web pages. Mostly interested to
> > hear from anyone using specific methods (rather than anything
> > else like further reviews, analyses of the ultimate effectiveness
> > etc, having things like "removeThis" inside the email address
> > that is in the "mailto:").

> I'm sure you already know this, but: Whatever technique you decide to use

> (unless you go the route of a better spam filter) be sure to ditch the
> existing email address. Once you are on spammer's mailing list its unlikely
> that you will ever get off it. So there is no point deploying a
> "super-anti-spam" technique with an email address that already gets tons of
> spam.

I know what you mean. Looking on the bright side though, after a
while, without any response, without fresh harvesting, there
would start to be a reduction perhaps... after the point of
encoding provisions being made.

--
dorayme

dorayme

unread,
Oct 11, 2006, 6:20:22 PM10/11/06
to
In article
<1160555887....@k70g2000cwa.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

> [re e-mail address obfuscation]
>
> jojo:
>
> > You can improve that: use HTML-Entities for "mailto:" and hex-entities
> > (%41 for A) for the email-adress itself.
>
> ...the one going against if not the word then the spirit of HTML4.01,
> the other against the spirit of RFC3986. Character references were
> made for when it is inconvenient or impossible to enter a character
> directly, for example, when there is no key for it on the keyboard or
> the character isn't displayable.
>

Ah but you see, it is like this Jock, recall, for example,
Burning Mississippi. Gene Hackman, second in command of an FBI
hunt is rearing to bring in his team of ex-crim
mission-impossible not-totally-law-abiding but
now-on-the-side-of-the-good-guys to break the back of the
low-down no-good scumbag-leadership of the KKK responsible for a
triple murder. The FBI leader, Agent Alan Ward, makes your sort
of speech, and holds out for high principles and gets bloody
nowhere! Things start to happen soon as the fabulously
charismatic Hackman is allowed to follow his instincts.

> likewise attempts at
> obfuscating markup - are trivial to bypass, even by e-mail address
> harvesters. I should emphasize that I'm not saying that attempts at
> obfuscation will universally fail, only that it takes little effort to
> overcome them.
>

If it is so little effort, what is your theory about why it is so
effective (if it is as recent indications suggest)? Perhaps I can
help you:

Similar speeches are made like yours about the value of security
bars on windows and doors. "Ha", says my neighbour opposite, "I
could get through with a good crowbar in 15 secs!".

Sure he could - if he wants to die by the claws of my specially
and lovingly trained 16 year old cat.

The point is this though: robbers tend to go for the low lying
fruit first and there is plenty enough of that to go around. Do
you understand what I am saying? No need to crash through even
slightly heavier security.

--
dorayme

Jukka K. Korpela

unread,
Oct 11, 2006, 7:13:30 PM10/11/06
to
Scripsit dorayme:

> Anyone here using methods to make it more difficult for spammers
> to garner email addresses from web pages.

Removing all of one's web pages is sometimes suggested as the only sure
method, but even it isn't sure at all, of course. Think about
www.archive.org.

> I had a client recently ask me to "do something" about the spam
> coming from his website.

Tell them to contact a specialist on such matters if they can't handle it.
Spam isn't an HTML problem any more terrorism, lack of good sex, or poverty
is.

> I want to do better than tell him to get
> the best spam filter he can,

Why would you you want to do better than the real thing? I guess you are
thinking of suggesting something _else_, like "email address protection"
snake oil. I hope you now realize how ridiculous the idea is.

Either they do some spam filtering, or they don't. Either way, email address
obsfuscation does not protect them from spam but _will_ damage their
business by damaging communication, style, and impression.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

dorayme

unread,
Oct 11, 2006, 9:06:23 PM10/11/06
to
In article <fUeXg.4244$U9....@reader1.news.jippii.net>,

"Jukka K. Korpela" <jkor...@cs.tut.fi> wrote:

> Scripsit dorayme:
>
> > Anyone here using methods to make it more difficult for spammers
> > to garner email addresses from web pages.

>

> > I had a client recently ask me to "do something" about the spam
> > coming from his website.
>
> Tell them to contact a specialist on such matters if they can't handle it.
> Spam isn't an HTML problem any more terrorism, lack of good sex, or poverty
> is.
>

I have already said to do the spam filtering. It is the other bit
of what you say that I don't want to communicate. I don't
honestly. I know, you are right about an ideal world. If there is
something a little impure that helps, I will use it if all I see
are mainly theoretical objections.



> > I want to do better than tell him to get
> > the best spam filter he can,
>
> Why would you you want to do better than the real thing? I guess you are
> thinking of suggesting something _else_, like "email address protection"
> snake oil. I hope you now realize how ridiculous the idea is.

Well, yes actually. But it really does not seem to me ridiculous,
even though it is not really kosher. What I do find ridiculous is
the idea of being purer than the practicalities dictate. When a
pedestrian stop light is on, Australians will tend to wait till
it goes green, even if there is not a car in sight. French people
are not so ridiculous and express surprise at this behaviour when
visiting here.

>
> Either they do some spam filtering, or they don't. Either way, email address
> obsfuscation does not protect them from spam but _will_ damage their
> business by damaging communication, style, and impression.

Well, I would like to see the evidence for this as it might
relate to various cases in my patch. If you were right, it would
indeed be a reason not to.

I was aware of this response when I posted. And was not looking
forward to it. But I think you are right to have expressed it so
as to dampen any ideas that it is a wholesome thing to do. I have
no illusions: I am a fallen being.

As often though, I do think about what you say and will probably
end up further emphasising the proper way to go, ie. to put in
the best spam filters/blockers they can and to point them to
resources to do this... So, thank you.

--
dorayme

Joe

unread,
Oct 11, 2006, 9:16:38 PM10/11/06
to
In article <doraymeRidThis-8ECD87.10273011102006@news-
vip.optusnet.com.au>, dorayme...@optusnet.com.au says...

> In article <MPG.1f96d6c2f...@news.aardvark.net.au>,
> Joe <joedi...@yahoo.com.au> wrote:
>
> > In article <doraymeRidThis-BBFC72.08183911102006@news-
> > vip.optusnet.com.au>, dorayme...@optusnet.com.au says...
> > > Anyone here using methods to make it more difficult for spammers
> > > to garner email addresses from web pages.
> > > ...
> > >
> > I've been using the 'hash entity' method for years.
> > Anyway, check it at http://graspages.cjb.cc/emailme.php
>
> Thanks Joe. I found something to make your technique easier at
> http://www.wbwip.com/wbw/emailencoder.html and have already used

shiny! and arguably better than my usual 'back of an envelope'
technique, which involves memorising "At 64 dot 46". Then I normally
have to look up 'a'.


> his arm and hand stretched out to push all comers away as Rugby
> players do... (I know how analogies tickle you pink...)

pinking up nicely, ta.

>
> Just what I wanted, someone to say something to get me going!

my pleasure.

John Dunlop

unread,
Oct 12, 2006, 4:28:38 AM10/12/06
to
dorayme:

[re overcoming e-mail address obfuscation]

> If it is so little effort, what is your theory about why it is so
> effective (if it is as recent indications suggest)? Perhaps I can
> help you:

No help needed, dorayme, thank you. Someone in this thread has already
advanced a plausible theory: laziness. Even the slightest extra
effort is too much because unobfuscated e-mail addresses are plentiful,
easy pickings even. No need to stretch.

> The point is this though: robbers tend to go for the low lying
> fruit first and there is plenty enough of that to go around. Do
> you understand what I am saying? No need to crash through even
> slightly heavier security.

Yes, but I am merely pointing out that obfuscating e-mail addresses is
inferior to real security; I am not claiming to know what harvesters
actually do!

Mind that old axiom 'security by obscurity gives a false sense of
security'?

And, as I've explained, the techniques to obfuscate e-mail addresses
proposed in this thread run contrary to the spirit of Internet
specifications. That a construct is included in a specification is
hardly license to exploit it.

Deal with spam at your end; don't pass the buck.

--
Jock

jojo

unread,
Oct 12, 2006, 6:18:46 AM10/12/06
to
John Dunlop wrote:

> And, as I've explained, the techniques to obfuscate e-mail addresses
> proposed in this thread run contrary to the spirit of Internet
> specifications.

That's because spambots are against the spirit of the internet, too. If
the "dark side" does not follow the rules we don't have to follow them
either.

John Dunlop

unread,
Oct 12, 2006, 7:09:55 AM10/12/06
to
jojo:

> That's because spambots are against the spirit of the internet, too.

Under discussion was not the spirit of the Internet but the spirit of
Internet specifications. What harvesters do does not run contrary to
the word or the spirit of the two specifications I mentioned. I would
maintain that what you proposed - replacing US-ASCII characters with
character references in HTML, and percent-encoding octets in URLs that
would otherwise be treated as data - does.

> If the "dark side" does not follow the rules we don't have to follow them
> either.

Come on. Internet specifications are a boon! If you fail to grasp the
advantages they bring - if you fail to imagine a WWW without them - why
wait until the "dark side" supposedly deviates from them before you
ignore them yourself?

Besides, in this war, there are more effective and less harmful
strategies than obfuscation.

--
Jock

Nikita the Spider

unread,
Oct 12, 2006, 12:27:14 PM10/12/06
to
In article <1160641713....@m73g2000cwd.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

> dorayme:
>
> [re overcoming e-mail address obfuscation]
>

> > The point is this though: robbers tend to go for the low lying
> > fruit first and there is plenty enough of that to go around. Do
> > you understand what I am saying? No need to crash through even
> > slightly heavier security.
>
> Yes, but I am merely pointing out that obfuscating e-mail addresses is
> inferior to real security; I am not claiming to know what harvesters
> actually do!

Myself, I'm pretty impressed by the fact that the entity-encoded address
received only two spams while its unprotected counterpart has received
over 700. If this method is inferior, I'd like to know to what! If there
are other methods that are equally easy to implement and don't
inconvenience users, I can't say I've heard of them.

> Mind that old axiom 'security by obscurity gives a false sense of
> security'?

I'd argue that we're not talking about security here so much as
annoyance reduction. I don't mean to nitpick about your words; I
honestly think the difference is important. Security prohibits access to
a resource and there are clear negative consequences when it fails (my
account is cracked, for example). By contrast, my inbox lost its spam
virginity a long time ago. All I can do now with the resources I have
available is to limit further, ahem, penetrations.

> And, as I've explained, the techniques to obfuscate e-mail addresses
> proposed in this thread run contrary to the spirit of Internet
> specifications. That a construct is included in a specification is
> hardly license to exploit it.

I see your point, but the spec isn't strongly worded. As you pointed
out, the relevant section is here:
http://www.w3.org/TR/html401/charset.html#h-5.3

"A given character encoding may not be able to express all characters of
the document character set. For such encodings, or when hardware or
software configurations do not allow users to input some document
characters directly, authors may use SGML character references."

But it also says this:
"Character references are a character encoding-independent mechanism for
entering any character from the document character set."

Using entities to encode email addresses fits perfectly well within this
provision, IMO.

Cheers

Chris F.A. Johnson

unread,
Oct 12, 2006, 1:09:52 PM10/12/06
to
On 2006-10-12, John Dunlop wrote:
> dorayme:
>
> [re overcoming e-mail address obfuscation]
>
> And, as I've explained, the techniques to obfuscate e-mail addresses
> proposed in this thread run contrary to the spirit of Internet
> specifications. That a construct is included in a specification is
> hardly license to exploit it.

As NtS pointed out, that's not true (or, at least, debatable).

> Deal with spam at your end; don't pass the buck.

That's what obfuscate e-mail addresses do. Letting spam be
generated any more than necessary is passing the buck. The
important thing it to prevent it (as much as possible) in the first
place.


--
Chris F.A. Johnson <http://cfaj.freeshell.org>
===================================================================
Author:
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)

John Dunlop

unread,
Oct 12, 2006, 3:57:08 PM10/12/06
to
Chris F.A. Johnson:

> [John Dunlop:]


>
> > Deal with spam at your end; don't pass the buck.
>
> That's what obfuscate e-mail addresses do.

?? No. E-mail address obfuscation tries to deal with the problem at
the user's end. Its aim is to remove all trouble for the e-mail
address owner, no matter the cost to anyone else. If obfuscation dealt
with the problem at your end, it wouldn't be obfuscation since there
would be nothing to obfuscate.

> Letting spam be generated any more than necessary is passing the buck.

It is not your job to prevent spam being generated, unless you are
actively fighting against it, in which case there are better, more
effective approaches than e-mail address obfuscation.

--
Jock

John Dunlop

unread,
Oct 12, 2006, 3:59:08 PM10/12/06
to
Nikita the Spider:

> Myself, I'm pretty impressed by the fact that the entity-encoded address
> received only two spams while its unprotected counterpart has received
> over 700. If this method is inferior, I'd like to know to what!

mentioned now more than once in this thread: normal counter-spam
measures. That means junk mail filters both at the server and at the
MUA.

[re e-mail address obfuscation running contrary to the spirit of
Internet specs]

> I see your point, but the spec isn't strongly worded.

Well, every clause in the spec is vague enough to be open to, however
absurd, interpretation.

I specifically talked not about the spec's wording but about its
spirit. To learn about the spirit of HTML you have to trace its
history: follow the past discussions, study the earlier drafts and
specifications, find out why the constructs were introduced in the
first place.

> As you pointed out, the relevant section is here:
>
> http://www.w3.org/TR/html401/charset.html#h-5.3

I quoted from there but did not mean that as the 'relevant section' to
learn why character references came about. You will find that not in
the HTML4.0 spec but in ISO8879 (my copy's at work and I haven't yet
memorised it all, so much to my consternation I can't give you chapter
and verse.)

> But it also says this:
> "Character references are a character encoding-independent mechanism for
> entering any character from the document character set."
>
> Using entities to encode email addresses fits perfectly well within this
> provision, IMO.

That's not even half the story.

--
Jock

dorayme

unread,
Oct 12, 2006, 5:32:41 PM10/12/06
to
In article
<1160641713....@m73g2000cwd.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

> dorayme:
>
> [re overcoming e-mail address obfuscation]
>
> > If it is so little effort, what is your theory about why it is so
> > effective (if it is as recent indications suggest)? Perhaps I can
> > help you:
>
> No help needed, dorayme, thank you. Someone in this thread has already
> advanced a plausible theory: laziness. Even the slightest extra
> effort is too much because unobfuscated e-mail addresses are plentiful,
> easy pickings even. No need to stretch.

I am in a picky mood, just excuse and ignore it: the lazy theory
is inadequate, not so plausible. You do need help. Go and study
the robber analogy of mine, the robber is not lazy. He can get
what he wants from unsecured houses. He is rationalising his
resources.

>
> > The point is this though: robbers tend to go for the low lying
> > fruit first and there is plenty enough of that to go around. Do
> > you understand what I am saying? No need to crash through even
> > slightly heavier security.
>
> Yes, but I am merely pointing out that obfuscating e-mail addresses is
> inferior to real security; I am not claiming to know what harvesters
> actually do!
>

You were giving a different impression to me at least. I was
getting a message from your words that it was ineffective, that
it would not deter. You did not make things so utterly clear. You
did not say out loud, yes, it will reduce spam but these are the
downsides... You gave the impression of conflating these issues.


> Mind that old axiom 'security by obscurity gives a false sense of
> security'?

<g> I have a car protection system I made myself that is a sort
of inverse of this! It consists of a "key" and "switch" that is
not hidden from view, it is just not obvious to anyone's mind. It
gives me a great sense of security and has worked on a number of
occasions, both on my car and my daughter's and a neighbours'...

> Deal with spam at your end; don't pass the buck.

It is not my spam. Tell that to my client. But, Jock, be careful,
he is 6 foot 8 inches and built like a brick shit-house, has red
hair and is not delicate, if you know what I mean. I think I will
use en encoding just on this occasion...

--
dorayme

Nikita the Spider

unread,
Oct 12, 2006, 6:26:44 PM10/12/06
to
In article <1160683148.6...@c28g2000cwb.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

> Nikita the Spider:
>
> > Myself, I'm pretty impressed by the fact that the entity-encoded address
> > received only two spams while its unprotected counterpart has received
> > over 700. If this method is inferior, I'd like to know to what!
>
> mentioned now more than once in this thread: normal counter-spam
> measures. That means junk mail filters both at the server and at the
> MUA.

Hmmm, I guess we'll have to disagree on the criteria we use to measure
"inferior". Even the best mail filters can generate false positives,
which is something that an entity-encoded address won't do. And it'd
have to be a pretty darn effective filter (or set of filters) to achieve
what the entity encoding has done in this test. Furthermore, entity
encoding is something that any Web page author can do; the same can't be
said for setting up and tuning server-side filters. Last but not least,
entity encoding *prevents spam from being generated*. Mail filtering
doesn't do this. And if I just rely on my ISP's filters to handle my
spam for me, isn't that "passing the buck"?


> [re e-mail address obfuscation running contrary to the spirit of
> Internet specs]
>
> > I see your point, but the spec isn't strongly worded.
>
> Well, every clause in the spec is vague enough to be open to, however
> absurd, interpretation.

If you say so.

> I specifically talked not about the spec's wording but about its
> spirit. To learn about the spirit of HTML you have to trace its
> history: follow the past discussions, study the earlier drafts and
> specifications, find out why the constructs were introduced in the
> first place.
>
> > As you pointed out, the relevant section is here:
> >
> > http://www.w3.org/TR/html401/charset.html#h-5.3
>
> I quoted from there but did not mean that as the 'relevant section' to
> learn why character references came about. You will find that not in
> the HTML4.0 spec but in ISO8879 (my copy's at work and I haven't yet
> memorised it all, so much to my consternation I can't give you chapter
> and verse.)

I haven't read ISO8879. I'll grant that my opinion might change after
doing so. But of all of the abuses to which HTML has been and is
subjected (sending XHTML as text/html comes to mind), I find it hard to
believe that entity encoding email addresses would be in the top one
hundred of many people's lists, if at all.

Dan

unread,
Oct 12, 2006, 9:18:30 PM10/12/06
to

dorayme wrote:
> Anyone here using methods to make it more difficult for spammers
> to garner email addresses from web pages. Mostly interested to
> hear from anyone using specific methods (rather than anything
> else like further reviews, analyses of the ultimate effectiveness
> etc, having things like "removeThis" inside the email address
> that is in the "mailto:").

I personally find it aesthetically distasteful to do any sort of
obfuscation of addresses; it just seems to go against the grain of
Internet standards that have always been designed to keep things as
open as possible, not intentionally obscure. Some of the
character-encoding stuff I can more-or-less tolerate because you have
to view the source code to see that it's whacked out, but other things
like spelling out "address at something dot net", or putting in
signature notes like "remove 'x' from my address", or embedding an
address as a graphic, just rub my nose in the fact that it's being
intentionally made more difficult to use. That's the sort of thing up
with which I won't put.

--
Dan

dorayme

unread,
Oct 12, 2006, 10:17:24 PM10/12/06
to
In article
<1160702305....@k70g2000cwa.googlegroups.com>,
"Dan" <d...@tobias.name> wrote:

That is a fine speech. See my reference to Burning Mississipi. :)


But I agree that the seen email address should be normal
looking. There is a way around this, to not put any at all, just
a link, the words being, "email us" or whatever.

I would be interested to hear from anyone who has an idea of the
chances of email harvesting happening from the expressed text on
the page as distinct from the source. Without some idea of this
knowledge, one is less equipped to inform the good-guy dirty
tricks department. (If Spider's impressive figures are anything
to go on, it looks like these evil bots garner from the source
mainly)

--
dorayme

Nico Schuyt

unread,
Oct 12, 2006, 11:19:39 PM10/12/06
to
Nikita the Spider wrote:
> I've set up several spamtrap addresses to study this. Eventually I'll
> write a short article about my findings, but in the meantime I'll
> summarize here. I have three email addresses all on the same page. One
> is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> &#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
> The number of spams each has gotten to date is as follows:
> naked - 715
> entities - 2
> javascript - 1
> In short, the entities look pretty effective to me. They're nice
> because they don't disturb one's visitors at all and you don't have
> to mess around with any Javascript.
> But another way of looking at it is to say that Javascript protection
> is twice as effective as entity protection. =) (Thanks to Huff's "How
> to Lie with Statistics")

Both are unreliable. Even *I* can make script that extracts email addresses
from JS or entity coded text :-)
Use a mail form.

--
Nico Schuyt
http://www.nicoschuyt.nl/


dorayme

unread,
Oct 13, 2006, 12:00:05 AM10/13/06
to
In article <452f0584$0$50455$dbd4...@news.euronet.nl>,
"Nico Schuyt" <nsc...@hotmail.com> wrote:

Would you, Mr Korpela and Jock - you see, Nico what good company
you are in... :) - please not ignore the fact that it works to
actually stop spam. If you don't think it actually does, say so
loud and clear. The issue of it "being easy" to overcome is quite
irrelevent in a world where almost no bots do this. This is the
world you earthlings and I live for the moment. What world are
you talking about? One in which Spider's stats are not true? In
this world it looks to me to be very reliable for now.

--
dorayme

Nico Schuyt

unread,
Oct 13, 2006, 3:16:18 AM10/13/06
to

Working *now* is no guarantee what so ever for being effective in the near
future.

> The issue of it "being easy" to overcome is quite
> irrelevent in a world where almost no bots do this. This is the
> world you earthlings and I live for the moment. What world are
> you talking about? One in which Spider's stats are not true?

Stats are never true :-)

> In this world it looks to me to be very reliable for now.

The place is right; it's the time that might be a problem.
Tomorrow I'll launch my new evil bot.

John Dunlop

unread,
Oct 13, 2006, 4:41:38 AM10/13/06
to
dorayme:

> [John Dunlop:]

> I am in a picky mood, just excuse and ignore it: the lazy theory
> is inadequate, not so plausible. You do need help. Go and study
> the robber analogy of mine, the robber is not lazy. He can get
> what he wants from unsecured houses. He is rationalising his
> resources.

Oops! Ok, so 'lazy' might not be the /mot juste/, as they say in the
Gorbals, but 'rationalising one's resources' seems to be more or less a
rehashing of the same theory, no? Anyway, it's one I'll have to
remember next time I'm asked to go to the gym.

> > Yes, but I am merely pointing out that obfuscating e-mail addresses is
> > inferior to real security; I am not claiming to know what harvesters
> > actually do!
>
> You were giving a different impression to me at least. I was
> getting a message from your words that it was ineffective, that
> it would not deter. You did not make things so utterly clear. You
> did not say out loud, yes, it will reduce spam but these are the
> downsides...

'I should emphasize that I'm not saying that attempts at obfuscation


will universally fail, only that it takes little effort to overcome

them.'

Does it reduce spam? It would seem to reduce the amount of spam that
that e-mail address owner receives, yes, but whether it makes an impact
on spam in the grand scheme of things, I don't know. Wouldn't a
harvester simply pick other addresses?

> You gave the impression of conflating these issues.

Ok. Let me list some options.

1. Obfuscate the address on the page:
a. munging
b. character references
c. percent-encodings
d. human-only addresses (e.g., 'user (at) host')
e. address written in javascript
2. Implement junk mail filters:
a. server filters
b. MUA filters
3. Remove all trace of the address.

Now my position regarding 1(b,c). Character references are the lesser
of the two evils, because while percent-encodings actually change the
URL for some degrees of equivalency, upsetting the user-interface,
character references don't.

But character references were 'intended to be used when you could not
otherwise enter a character conveniently in the text' (/The SGML
Handbook/ p. 356). I would be surprised if it inconvenienced you to
enter most US-ASCII characters directly.