email address obfuscation

43 views
Skip to first unread message

dorayme

unread,
Oct 10, 2006, 6:18:39 PM10/10/06
to
Anyone here using methods to make it more difficult for spammers
to garner email addresses from web pages. Mostly interested to
hear from anyone using specific methods (rather than anything
else like further reviews, analyses of the ultimate effectiveness
etc, having things like "removeThis" inside the email address
that is in the "mailto:").

I had a client recently ask me to "do something" about the spam
coming from his website. I want to do better than tell him to get
the best spam filter he can, both on his local and on his server
end via his host. There is a javascript thing I used to use but
these days I am interested in being able to get by without it.
so, any suggestions will be welcome, especially if they are
actually being used by the suggester (not someone else they know
or have heard of... take this as a compliment)

--
dorayme

Joe

unread,
Oct 10, 2006, 7:55:14 PM10/10/06
to
In article <doraymeRidThis-BBFC72.08183911102006@news-
vip.optusnet.com.au>, dorayme...@optusnet.com.au says...

> Anyone here using methods to make it more difficult for spammers
> to garner email addresses from web pages.
> ...
>
I've been using the 'hash entity' method for years. Seems to work, but
as it's used in harness with spam filter on my email proggy and isp, I
don't really know. I replace about half the addy with entities,
especially the . and @ , but how far you go is up to you. It can't hurt
to try.
Anyway, check it at http://graspages.cjb.cc/emailme.php

dorayme

unread,
Oct 10, 2006, 8:27:30 PM10/10/06
to
In article <MPG.1f96d6c2f...@news.aardvark.net.au>,
Joe <joedi...@yahoo.com.au> wrote:

Thanks Joe. I found something to make your technique easier at
http://www.wbwip.com/wbw/emailencoder.html and have already used
it in anger just now and it is working on a client's site. I am
imagining it already side-swiping all attempts by vicious bots to
"harvest" it, I have the image of a rugby player at full bore
with the ball, fending off all attempts to tackle him. You know,
his arm and hand stretched out to push all comers away as Rugby
players do... (I know how analogies tickle you pink...)

Just what I wanted, someone to say something to get me going!

--
dorayme

Nikita the Spider

unread,
Oct 10, 2006, 11:27:53 PM10/10/06
to
In article
<doraymeRidThis-BBF...@news-vip.optusnet.com.au>,
dorayme <dorayme...@optusnet.com.au> wrote:

> Anyone here using methods to make it more difficult for spammers
> to garner email addresses from web pages. Mostly interested to
> hear from anyone using specific methods (rather than anything
> else like further reviews, analyses of the ultimate effectiveness
> etc, having things like "removeThis" inside the email address
> that is in the "mailto:").

I've set up several spamtrap addresses to study this. Eventually I'll
write a short article about my findings, but in the meantime I'll
summarize here. I have three email addresses all on the same page. One
is naked (i.e. just f...@example.com), one is entity encoded (i.e.
&#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
The number of spams each has gotten to date is as follows:

naked - 715
entities - 2
javascript - 1

In short, the entities look pretty effective to me. They're nice because
they don't disturb one's visitors at all and you don't have to mess
around with any Javascript.

But another way of looking at it is to say that Javascript protection is
twice as effective as entity protection. =) (Thanks to Huff's "How to
Lie with Statistics")

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

dorayme

unread,
Oct 11, 2006, 12:54:30 AM10/11/06
to
In article
<NikitaTheSpider-D7...@news-rdr-02-ge0-1.southeas
t.rr.com>,

Nikita the Spider <NikitaT...@gmail.com> wrote:

> In article
> <doraymeRidThis-BBF...@news-vip.optusnet.com.au>,
> dorayme <dorayme...@optusnet.com.au> wrote:
>
> > Anyone here using methods to make it more difficult for spammers
> > to garner email addresses from web pages. Mostly interested to
> > hear from anyone using specific methods (rather than anything
> > else like further reviews, analyses of the ultimate effectiveness
> > etc, having things like "removeThis" inside the email address
> > that is in the "mailto:").
>
> I've set up several spamtrap addresses to study this. Eventually I'll
> write a short article about my findings, but in the meantime I'll
> summarize here. I have three email addresses all on the same page. One
> is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> &#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
> The number of spams each has gotten to date is as follows:
>
> naked - 715
> entities - 2
> javascript - 1
>
> In short, the entities look pretty effective to me. They're nice because
> they don't disturb one's visitors at all and you don't have to mess
> around with any Javascript.
>

Yes, excellent. My feelings too on this one.

> But another way of looking at it is to say that Javascript protection is
> twice as effective as entity protection. =) (Thanks to Huff's "How to
> Lie with Statistics")

People can and do look at things as they like! But the truth is
another matter.

It would be nice to actually know how the 2 and 1 got through...
This brings up this issue: just this morning, there was some post
here at alt.html re a facility to somehow capture material on a
screen (it is gone from my newsreader now). Though the email is
veiled in the source, it is not in the browser as expressed. It
is commonly just printed as normal on the screen. Sure, this bit
can be avoided by simple techniques like making the visible link
something like ...>email us</a>? To avoid any "on screen
harvesting"?

But, this is not always acceptable. I have no idea how the robots
work, how clever they are, whether they in fact look at source or
output or both. Your stats would be more meaningful if you could
say more about the implementation. Interesting experiment though,
Spider. Look forward to your article.

--
dorayme

Beauregard T. Shagnasty

unread,
Oct 11, 2006, 1:21:43 AM10/11/06
to
Nikita the Spider wrote:

> I've set up several spamtrap addresses to study this. Eventually I'll
> write a short article about my findings, but in the meantime I'll
> summarize here. I have three email addresses all on the same page.
> One is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> &#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
> The number of spams each has gotten to date is as follows:
>
> naked - 715
> entities - 2
> javascript - 1

I'll agree that using entities works. I have one address on a web site
that began life in this form. Never got any spam in about six years.

Then one day, I started getting bounces from emails containing viruses.
I found out that someone who added my address to his address book got
infected. My address was used as a forged FROM: by this virus. Shortly
after that, I started to get spam and it's hovering around 200-250 per
day now. :-(

--
-bts
-Motorcycles defy gravity; cars just suck

jojo

unread,
Oct 11, 2006, 1:26:43 AM10/11/06
to
Joe wrote:

> I've been using the 'hash entity' method for years. Seems to work, but
> as it's used in harness with spam filter on my email proggy and isp, I
> don't really know. I replace about half the addy with entities,
> especially the . and @ , but how far you go is up to you. It can't hurt
> to try.
> Anyway, check it at http://graspages.cjb.cc/emailme.php

You can improve that: use HTML-Entities for "mailto:" and hex-entities
(%41 for A) for the email-adress itself.

John Dunlop

unread,
Oct 11, 2006, 4:38:07 AM10/11/06
to
[re e-mail address obfuscation]

jojo:

> You can improve that: use HTML-Entities for "mailto:" and hex-entities
> (%41 for A) for the email-adress itself.

...the one going against if not the word then the spirit of HTML4.01,
the other against the spirit of RFC3986. Character references were
made for when it is inconvenient or impossible to enter a character
directly, for example, when there is no key for it on the keyboard or
the character isn't displayable.

| A given character encoding may not be able to express
| all characters of the document character set. For such
| encodings, or when hardware or software configurations
| do not allow users to input some document characters
| directly, authors may use SGML character references.

(HTML4.01 sec.5.3)

Percent-encoding characters that are allowed as data in a URL part
hinders transcription because characters that could otherwise be
recognisable and rememberable have been, unless you're familiar with
US-ASCII and hexadecimal notation, turned into unrecognisable and
harder-to-remember three-character sequences. That your browser
silently decodes percent-encodings and presents you with a more
human-friendly URL suggests that e-mail address harvesters can do a
similar job.

Principles of URL design take into consideration human factors because
URLs are part of the user-interface. Obfuscating URLs with
percent-encodings makes things harder for humans while barely
increasing the hardship on e-mail address harvesters.

Obfuscation of e-mail addresses is just that: obfuscation. It does
nothing to help the genuine user find and use your e-mail address.
Attempts at obfuscating e-mail addresses - likewise attempts at
obfuscating markup - are trivial to bypass, even by e-mail address
harvesters. I should emphasize that I'm not saying that attempts at
obfuscation will universally fail, only that it takes little effort to
overcome them.

My advice, if you're not keen on actively fighting spam, would be to
either set up junk mail filters both at your server and at your MUA, or
remove the address from the public eye altogether.

--
Jock

Brian Cryer

unread,
Oct 11, 2006, 5:24:02 AM10/11/06
to
"Nikita the Spider" <NikitaT...@gmail.com> wrote in message
news:NikitaTheSpider-D7...@news-rdr-02-ge0-1.southeast.rr.com...

> In article
> <doraymeRidThis-BBF...@news-vip.optusnet.com.au>,
> dorayme <dorayme...@optusnet.com.au> wrote:
>
>> Anyone here using methods to make it more difficult for spammers
>> to garner email addresses from web pages. Mostly interested to
>> hear from anyone using specific methods (rather than anything
>> else like further reviews, analyses of the ultimate effectiveness
>> etc, having things like "removeThis" inside the email address
>> that is in the "mailto:").
>
> I've set up several spamtrap addresses to study this. Eventually I'll
> write a short article about my findings, but in the meantime I'll
> summarize here. I have three email addresses all on the same page. One
> is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> &#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
> The number of spams each has gotten to date is as follows:
>
> naked - 715
> entities - 2
> javascript - 1

Given how easy it is to translate I'm amazed that the encoded version is so
effective. Just goes to show that spammers are stupid as well as sad.
--
Brian Cryer
www.cryer.co.uk/brian

Brian Cryer

unread,
Oct 11, 2006, 5:33:49 AM10/11/06
to
"dorayme" <dorayme...@optusnet.com.au> wrote in message
news:doraymeRidThis-BBF...@news-vip.optusnet.com.au...

I'm sure you already know this, but: Whatever technique you decide to use
(unless you go the route of a better spam filter) be sure to ditch the
existing email address. Once you are on spammer's mailing list its unlikely
that you will ever get off it. So there is no point deploying a
"super-anti-spam" technique with an email address that already gets tons of
spam.
--
Brian Cryer
www.cryer.co.uk/brian


cwdjrxyz

unread,
Oct 11, 2006, 12:05:40 PM10/11/06
to

Several methods that work at least somewhat have been mentioned. Most
of us likely need several email addresses. I have noticed that many
large companies use addresses that can not be answered for contacting
people. All questions have to go to the main address. Some like to use
CGI feedback forms without a mention of a specific address. However
this is not without risk, since a virus can be fed to a server in this
way unless the CGI feedback is not very carefully constructed. There
are people who will put a scripted virus in the feedback box. Limiting
the size of the feedback and not allowing it to contain script helps in
this respect. And of course, do not use a good address on Usenet posts.
I use one at my domain for posting that does not allow any response -
everything is dumped. Then I have addresses used only for friends,
finance, etc. These seldom get spam, so I usually do not have to
configure to allow only mail from those on a list.

Nikita the Spider

unread,
Oct 11, 2006, 1:52:51 PM10/11/06
to
In article <xKudndrPlrGuJbHY...@pipex.net>,
"Brian Cryer" <brian...@127.0.0.1.ntlworld.com> wrote:

I was also surprised by this result, but I can think of two reasons why
harvesting bots might ignore any non-naked addresses, even if they're
easy to translate. First, the harvesters might feel that anyone who is
savvy enough to obfuscate his email address isn't likely to respond to
spam anyway. Second, the harvesters might see no shortage of
un-obfuscated addresses, so why go to the trouble of harvesting the
small number of obfuscated ones? It's this latter theory that I prefer
because laziness is a powerful (and common) motivator.

Nikita the Spider

unread,
Oct 11, 2006, 2:00:53 PM10/11/06
to
In article
<doraymeRidThis-690...@news-vip.optusnet.com.au>,
dorayme <dorayme...@optusnet.com.au> wrote:

> In article
> <NikitaTheSpider-D7...@news-rdr-02-ge0-1.southeas
> t.rr.com>,
> Nikita the Spider <NikitaT...@gmail.com> wrote:
> > I've set up several spamtrap addresses to study this. Eventually I'll
> > write a short article about my findings, but in the meantime I'll
> > summarize here. I have three email addresses all on the same page. One
> > is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> > &#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
> > The number of spams each has gotten to date is as follows:
> >
> > naked - 715
> > entities - 2
> > javascript - 1
> >
> > In short, the entities look pretty effective to me. They're nice because
> > they don't disturb one's visitors at all and you don't have to mess
> > around with any Javascript.
> >

> It would be nice to actually know how the 2 and 1 got through...

One of the two was a standard 419 scam (see http://www.419eater.com/ if
you're not familiar with these) so I could believe that an actual human
clicked on the link. But they one that got through to both the
Javascript- and entity-protected one was a garden variety spam. It
really surprises me that I got only one. I figured that once I was on
the list, the floodgates would open.


> But, this is not always acceptable. I have no idea how the robots
> work, how clever they are, whether they in fact look at source or
> output or both.

I'd be surprised if any do more than look through the source.

> Your stats would be more meaningful if you could
> say more about the implementation. Interesting experiment though,
> Spider. Look forward to your article.

Thanks, will explain methodology, implementation, etc. and post a link
to the article here eventually.

John Dunlop

unread,
Oct 11, 2006, 2:22:22 PM10/11/06
to
cwdjrxyz:

> And of course, do not use a good address on Usenet posts.

Rubbish.

--
Jock

dorayme

unread,
Oct 11, 2006, 5:58:19 PM10/11/06
to
In article <m4OdnWRVb-X...@pipex.net>,
"Brian Cryer" <brian...@127.0.0.1.ntlworld.com> wrote:

> "dorayme" <dorayme...@optusnet.com.au> wrote in message
> news:doraymeRidThis-BBF...@news-vip.optusnet.com.au...
> > Anyone here using methods to make it more difficult for spammers
> > to garner email addresses from web pages. Mostly interested to
> > hear from anyone using specific methods (rather than anything
> > else like further reviews, analyses of the ultimate effectiveness
> > etc, having things like "removeThis" inside the email address
> > that is in the "mailto:").

> I'm sure you already know this, but: Whatever technique you decide to use

> (unless you go the route of a better spam filter) be sure to ditch the
> existing email address. Once you are on spammer's mailing list its unlikely
> that you will ever get off it. So there is no point deploying a
> "super-anti-spam" technique with an email address that already gets tons of
> spam.

I know what you mean. Looking on the bright side though, after a
while, without any response, without fresh harvesting, there
would start to be a reduction perhaps... after the point of
encoding provisions being made.

--
dorayme

dorayme

unread,
Oct 11, 2006, 6:20:22 PM10/11/06
to
In article
<1160555887....@k70g2000cwa.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

> [re e-mail address obfuscation]
>
> jojo:
>
> > You can improve that: use HTML-Entities for "mailto:" and hex-entities
> > (%41 for A) for the email-adress itself.
>
> ...the one going against if not the word then the spirit of HTML4.01,
> the other against the spirit of RFC3986. Character references were
> made for when it is inconvenient or impossible to enter a character
> directly, for example, when there is no key for it on the keyboard or
> the character isn't displayable.
>

Ah but you see, it is like this Jock, recall, for example,
Burning Mississippi. Gene Hackman, second in command of an FBI
hunt is rearing to bring in his team of ex-crim
mission-impossible not-totally-law-abiding but
now-on-the-side-of-the-good-guys to break the back of the
low-down no-good scumbag-leadership of the KKK responsible for a
triple murder. The FBI leader, Agent Alan Ward, makes your sort
of speech, and holds out for high principles and gets bloody
nowhere! Things start to happen soon as the fabulously
charismatic Hackman is allowed to follow his instincts.

> likewise attempts at
> obfuscating markup - are trivial to bypass, even by e-mail address
> harvesters. I should emphasize that I'm not saying that attempts at
> obfuscation will universally fail, only that it takes little effort to
> overcome them.
>

If it is so little effort, what is your theory about why it is so
effective (if it is as recent indications suggest)? Perhaps I can
help you:

Similar speeches are made like yours about the value of security
bars on windows and doors. "Ha", says my neighbour opposite, "I
could get through with a good crowbar in 15 secs!".

Sure he could - if he wants to die by the claws of my specially
and lovingly trained 16 year old cat.

The point is this though: robbers tend to go for the low lying
fruit first and there is plenty enough of that to go around. Do
you understand what I am saying? No need to crash through even
slightly heavier security.

--
dorayme

Jukka K. Korpela

unread,
Oct 11, 2006, 7:13:30 PM10/11/06
to
Scripsit dorayme:

> Anyone here using methods to make it more difficult for spammers
> to garner email addresses from web pages.

Removing all of one's web pages is sometimes suggested as the only sure
method, but even it isn't sure at all, of course. Think about
www.archive.org.

> I had a client recently ask me to "do something" about the spam
> coming from his website.

Tell them to contact a specialist on such matters if they can't handle it.
Spam isn't an HTML problem any more terrorism, lack of good sex, or poverty
is.

> I want to do better than tell him to get
> the best spam filter he can,

Why would you you want to do better than the real thing? I guess you are
thinking of suggesting something _else_, like "email address protection"
snake oil. I hope you now realize how ridiculous the idea is.

Either they do some spam filtering, or they don't. Either way, email address
obsfuscation does not protect them from spam but _will_ damage their
business by damaging communication, style, and impression.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

dorayme

unread,
Oct 11, 2006, 9:06:23 PM10/11/06
to
In article <fUeXg.4244$U9....@reader1.news.jippii.net>,

"Jukka K. Korpela" <jkor...@cs.tut.fi> wrote:

> Scripsit dorayme:
>
> > Anyone here using methods to make it more difficult for spammers
> > to garner email addresses from web pages.

>

> > I had a client recently ask me to "do something" about the spam
> > coming from his website.
>
> Tell them to contact a specialist on such matters if they can't handle it.
> Spam isn't an HTML problem any more terrorism, lack of good sex, or poverty
> is.
>

I have already said to do the spam filtering. It is the other bit
of what you say that I don't want to communicate. I don't
honestly. I know, you are right about an ideal world. If there is
something a little impure that helps, I will use it if all I see
are mainly theoretical objections.



> > I want to do better than tell him to get
> > the best spam filter he can,
>
> Why would you you want to do better than the real thing? I guess you are
> thinking of suggesting something _else_, like "email address protection"
> snake oil. I hope you now realize how ridiculous the idea is.

Well, yes actually. But it really does not seem to me ridiculous,
even though it is not really kosher. What I do find ridiculous is
the idea of being purer than the practicalities dictate. When a
pedestrian stop light is on, Australians will tend to wait till
it goes green, even if there is not a car in sight. French people
are not so ridiculous and express surprise at this behaviour when
visiting here.

>
> Either they do some spam filtering, or they don't. Either way, email address
> obsfuscation does not protect them from spam but _will_ damage their
> business by damaging communication, style, and impression.

Well, I would like to see the evidence for this as it might
relate to various cases in my patch. If you were right, it would
indeed be a reason not to.

I was aware of this response when I posted. And was not looking
forward to it. But I think you are right to have expressed it so
as to dampen any ideas that it is a wholesome thing to do. I have
no illusions: I am a fallen being.

As often though, I do think about what you say and will probably
end up further emphasising the proper way to go, ie. to put in
the best spam filters/blockers they can and to point them to
resources to do this... So, thank you.

--
dorayme

Joe

unread,
Oct 11, 2006, 9:16:38 PM10/11/06
to
In article <doraymeRidThis-8ECD87.10273011102006@news-
vip.optusnet.com.au>, dorayme...@optusnet.com.au says...

> In article <MPG.1f96d6c2f...@news.aardvark.net.au>,
> Joe <joedi...@yahoo.com.au> wrote:
>
> > In article <doraymeRidThis-BBFC72.08183911102006@news-
> > vip.optusnet.com.au>, dorayme...@optusnet.com.au says...
> > > Anyone here using methods to make it more difficult for spammers
> > > to garner email addresses from web pages.
> > > ...
> > >
> > I've been using the 'hash entity' method for years.
> > Anyway, check it at http://graspages.cjb.cc/emailme.php
>
> Thanks Joe. I found something to make your technique easier at
> http://www.wbwip.com/wbw/emailencoder.html and have already used

shiny! and arguably better than my usual 'back of an envelope'
technique, which involves memorising "At 64 dot 46". Then I normally
have to look up 'a'.


> his arm and hand stretched out to push all comers away as Rugby
> players do... (I know how analogies tickle you pink...)

pinking up nicely, ta.

>
> Just what I wanted, someone to say something to get me going!

my pleasure.

John Dunlop

unread,
Oct 12, 2006, 4:28:38 AM10/12/06
to
dorayme:

[re overcoming e-mail address obfuscation]

> If it is so little effort, what is your theory about why it is so
> effective (if it is as recent indications suggest)? Perhaps I can
> help you:

No help needed, dorayme, thank you. Someone in this thread has already
advanced a plausible theory: laziness. Even the slightest extra
effort is too much because unobfuscated e-mail addresses are plentiful,
easy pickings even. No need to stretch.

> The point is this though: robbers tend to go for the low lying
> fruit first and there is plenty enough of that to go around. Do
> you understand what I am saying? No need to crash through even
> slightly heavier security.

Yes, but I am merely pointing out that obfuscating e-mail addresses is
inferior to real security; I am not claiming to know what harvesters
actually do!

Mind that old axiom 'security by obscurity gives a false sense of
security'?

And, as I've explained, the techniques to obfuscate e-mail addresses
proposed in this thread run contrary to the spirit of Internet
specifications. That a construct is included in a specification is
hardly license to exploit it.

Deal with spam at your end; don't pass the buck.

--
Jock

jojo

unread,
Oct 12, 2006, 6:18:46 AM10/12/06
to
John Dunlop wrote:

> And, as I've explained, the techniques to obfuscate e-mail addresses
> proposed in this thread run contrary to the spirit of Internet
> specifications.

That's because spambots are against the spirit of the internet, too. If
the "dark side" does not follow the rules we don't have to follow them
either.

John Dunlop

unread,
Oct 12, 2006, 7:09:55 AM10/12/06
to
jojo:

> That's because spambots are against the spirit of the internet, too.

Under discussion was not the spirit of the Internet but the spirit of
Internet specifications. What harvesters do does not run contrary to
the word or the spirit of the two specifications I mentioned. I would
maintain that what you proposed - replacing US-ASCII characters with
character references in HTML, and percent-encoding octets in URLs that
would otherwise be treated as data - does.

> If the "dark side" does not follow the rules we don't have to follow them
> either.

Come on. Internet specifications are a boon! If you fail to grasp the
advantages they bring - if you fail to imagine a WWW without them - why
wait until the "dark side" supposedly deviates from them before you
ignore them yourself?

Besides, in this war, there are more effective and less harmful
strategies than obfuscation.

--
Jock

Nikita the Spider

unread,
Oct 12, 2006, 12:27:14 PM10/12/06
to
In article <1160641713....@m73g2000cwd.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

> dorayme:
>
> [re overcoming e-mail address obfuscation]
>

> > The point is this though: robbers tend to go for the low lying
> > fruit first and there is plenty enough of that to go around. Do
> > you understand what I am saying? No need to crash through even
> > slightly heavier security.
>
> Yes, but I am merely pointing out that obfuscating e-mail addresses is
> inferior to real security; I am not claiming to know what harvesters
> actually do!

Myself, I'm pretty impressed by the fact that the entity-encoded address
received only two spams while its unprotected counterpart has received
over 700. If this method is inferior, I'd like to know to what! If there
are other methods that are equally easy to implement and don't
inconvenience users, I can't say I've heard of them.

> Mind that old axiom 'security by obscurity gives a false sense of
> security'?

I'd argue that we're not talking about security here so much as
annoyance reduction. I don't mean to nitpick about your words; I
honestly think the difference is important. Security prohibits access to
a resource and there are clear negative consequences when it fails (my
account is cracked, for example). By contrast, my inbox lost its spam
virginity a long time ago. All I can do now with the resources I have
available is to limit further, ahem, penetrations.

> And, as I've explained, the techniques to obfuscate e-mail addresses
> proposed in this thread run contrary to the spirit of Internet
> specifications. That a construct is included in a specification is
> hardly license to exploit it.

I see your point, but the spec isn't strongly worded. As you pointed
out, the relevant section is here:
http://www.w3.org/TR/html401/charset.html#h-5.3

"A given character encoding may not be able to express all characters of
the document character set. For such encodings, or when hardware or
software configurations do not allow users to input some document
characters directly, authors may use SGML character references."

But it also says this:
"Character references are a character encoding-independent mechanism for
entering any character from the document character set."

Using entities to encode email addresses fits perfectly well within this
provision, IMO.

Cheers

Chris F.A. Johnson

unread,
Oct 12, 2006, 1:09:52 PM10/12/06
to
On 2006-10-12, John Dunlop wrote:
> dorayme:
>
> [re overcoming e-mail address obfuscation]
>
> And, as I've explained, the techniques to obfuscate e-mail addresses
> proposed in this thread run contrary to the spirit of Internet
> specifications. That a construct is included in a specification is
> hardly license to exploit it.

As NtS pointed out, that's not true (or, at least, debatable).

> Deal with spam at your end; don't pass the buck.

That's what obfuscate e-mail addresses do. Letting spam be
generated any more than necessary is passing the buck. The
important thing it to prevent it (as much as possible) in the first
place.


--
Chris F.A. Johnson <http://cfaj.freeshell.org>
===================================================================
Author:
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)

John Dunlop

unread,
Oct 12, 2006, 3:57:08 PM10/12/06
to
Chris F.A. Johnson:

> [John Dunlop:]


>
> > Deal with spam at your end; don't pass the buck.
>
> That's what obfuscate e-mail addresses do.

?? No. E-mail address obfuscation tries to deal with the problem at
the user's end. Its aim is to remove all trouble for the e-mail
address owner, no matter the cost to anyone else. If obfuscation dealt
with the problem at your end, it wouldn't be obfuscation since there
would be nothing to obfuscate.

> Letting spam be generated any more than necessary is passing the buck.

It is not your job to prevent spam being generated, unless you are
actively fighting against it, in which case there are better, more
effective approaches than e-mail address obfuscation.

--
Jock

John Dunlop

unread,
Oct 12, 2006, 3:59:08 PM10/12/06
to
Nikita the Spider:

> Myself, I'm pretty impressed by the fact that the entity-encoded address
> received only two spams while its unprotected counterpart has received
> over 700. If this method is inferior, I'd like to know to what!

mentioned now more than once in this thread: normal counter-spam
measures. That means junk mail filters both at the server and at the
MUA.

[re e-mail address obfuscation running contrary to the spirit of
Internet specs]

> I see your point, but the spec isn't strongly worded.

Well, every clause in the spec is vague enough to be open to, however
absurd, interpretation.

I specifically talked not about the spec's wording but about its
spirit. To learn about the spirit of HTML you have to trace its
history: follow the past discussions, study the earlier drafts and
specifications, find out why the constructs were introduced in the
first place.

> As you pointed out, the relevant section is here:
>
> http://www.w3.org/TR/html401/charset.html#h-5.3

I quoted from there but did not mean that as the 'relevant section' to
learn why character references came about. You will find that not in
the HTML4.0 spec but in ISO8879 (my copy's at work and I haven't yet
memorised it all, so much to my consternation I can't give you chapter
and verse.)

> But it also says this:
> "Character references are a character encoding-independent mechanism for
> entering any character from the document character set."
>
> Using entities to encode email addresses fits perfectly well within this
> provision, IMO.

That's not even half the story.

--
Jock

dorayme

unread,
Oct 12, 2006, 5:32:41 PM10/12/06
to
In article
<1160641713....@m73g2000cwd.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

> dorayme:
>
> [re overcoming e-mail address obfuscation]
>
> > If it is so little effort, what is your theory about why it is so
> > effective (if it is as recent indications suggest)? Perhaps I can
> > help you:
>
> No help needed, dorayme, thank you. Someone in this thread has already
> advanced a plausible theory: laziness. Even the slightest extra
> effort is too much because unobfuscated e-mail addresses are plentiful,
> easy pickings even. No need to stretch.

I am in a picky mood, just excuse and ignore it: the lazy theory
is inadequate, not so plausible. You do need help. Go and study
the robber analogy of mine, the robber is not lazy. He can get
what he wants from unsecured houses. He is rationalising his
resources.

>
> > The point is this though: robbers tend to go for the low lying
> > fruit first and there is plenty enough of that to go around. Do
> > you understand what I am saying? No need to crash through even
> > slightly heavier security.
>
> Yes, but I am merely pointing out that obfuscating e-mail addresses is
> inferior to real security; I am not claiming to know what harvesters
> actually do!
>

You were giving a different impression to me at least. I was
getting a message from your words that it was ineffective, that
it would not deter. You did not make things so utterly clear. You
did not say out loud, yes, it will reduce spam but these are the
downsides... You gave the impression of conflating these issues.


> Mind that old axiom 'security by obscurity gives a false sense of
> security'?

<g> I have a car protection system I made myself that is a sort
of inverse of this! It consists of a "key" and "switch" that is
not hidden from view, it is just not obvious to anyone's mind. It
gives me a great sense of security and has worked on a number of
occasions, both on my car and my daughter's and a neighbours'...

> Deal with spam at your end; don't pass the buck.

It is not my spam. Tell that to my client. But, Jock, be careful,
he is 6 foot 8 inches and built like a brick shit-house, has red
hair and is not delicate, if you know what I mean. I think I will
use en encoding just on this occasion...

--
dorayme

Nikita the Spider

unread,
Oct 12, 2006, 6:26:44 PM10/12/06
to
In article <1160683148.6...@c28g2000cwb.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

> Nikita the Spider:
>
> > Myself, I'm pretty impressed by the fact that the entity-encoded address
> > received only two spams while its unprotected counterpart has received
> > over 700. If this method is inferior, I'd like to know to what!
>
> mentioned now more than once in this thread: normal counter-spam
> measures. That means junk mail filters both at the server and at the
> MUA.

Hmmm, I guess we'll have to disagree on the criteria we use to measure
"inferior". Even the best mail filters can generate false positives,
which is something that an entity-encoded address won't do. And it'd
have to be a pretty darn effective filter (or set of filters) to achieve
what the entity encoding has done in this test. Furthermore, entity
encoding is something that any Web page author can do; the same can't be
said for setting up and tuning server-side filters. Last but not least,
entity encoding *prevents spam from being generated*. Mail filtering
doesn't do this. And if I just rely on my ISP's filters to handle my
spam for me, isn't that "passing the buck"?


> [re e-mail address obfuscation running contrary to the spirit of
> Internet specs]
>
> > I see your point, but the spec isn't strongly worded.
>
> Well, every clause in the spec is vague enough to be open to, however
> absurd, interpretation.

If you say so.

> I specifically talked not about the spec's wording but about its
> spirit. To learn about the spirit of HTML you have to trace its
> history: follow the past discussions, study the earlier drafts and
> specifications, find out why the constructs were introduced in the
> first place.
>
> > As you pointed out, the relevant section is here:
> >
> > http://www.w3.org/TR/html401/charset.html#h-5.3
>
> I quoted from there but did not mean that as the 'relevant section' to
> learn why character references came about. You will find that not in
> the HTML4.0 spec but in ISO8879 (my copy's at work and I haven't yet
> memorised it all, so much to my consternation I can't give you chapter
> and verse.)

I haven't read ISO8879. I'll grant that my opinion might change after
doing so. But of all of the abuses to which HTML has been and is
subjected (sending XHTML as text/html comes to mind), I find it hard to
believe that entity encoding email addresses would be in the top one
hundred of many people's lists, if at all.

Dan

unread,
Oct 12, 2006, 9:18:30 PM10/12/06
to

dorayme wrote:
> Anyone here using methods to make it more difficult for spammers
> to garner email addresses from web pages. Mostly interested to
> hear from anyone using specific methods (rather than anything
> else like further reviews, analyses of the ultimate effectiveness
> etc, having things like "removeThis" inside the email address
> that is in the "mailto:").

I personally find it aesthetically distasteful to do any sort of
obfuscation of addresses; it just seems to go against the grain of
Internet standards that have always been designed to keep things as
open as possible, not intentionally obscure. Some of the
character-encoding stuff I can more-or-less tolerate because you have
to view the source code to see that it's whacked out, but other things
like spelling out "address at something dot net", or putting in
signature notes like "remove 'x' from my address", or embedding an
address as a graphic, just rub my nose in the fact that it's being
intentionally made more difficult to use. That's the sort of thing up
with which I won't put.

--
Dan

dorayme

unread,
Oct 12, 2006, 10:17:24 PM10/12/06
to
In article
<1160702305....@k70g2000cwa.googlegroups.com>,
"Dan" <d...@tobias.name> wrote:

That is a fine speech. See my reference to Burning Mississipi. :)


But I agree that the seen email address should be normal
looking. There is a way around this, to not put any at all, just
a link, the words being, "email us" or whatever.

I would be interested to hear from anyone who has an idea of the
chances of email harvesting happening from the expressed text on
the page as distinct from the source. Without some idea of this
knowledge, one is less equipped to inform the good-guy dirty
tricks department. (If Spider's impressive figures are anything
to go on, it looks like these evil bots garner from the source
mainly)

--
dorayme

Nico Schuyt

unread,
Oct 12, 2006, 11:19:39 PM10/12/06
to
Nikita the Spider wrote:
> I've set up several spamtrap addresses to study this. Eventually I'll
> write a short article about my findings, but in the meantime I'll
> summarize here. I have three email addresses all on the same page. One
> is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> &#x66;&#x6f;&#x6f; etc.) and one is added to the page by Javascript.
> The number of spams each has gotten to date is as follows:
> naked - 715
> entities - 2
> javascript - 1
> In short, the entities look pretty effective to me. They're nice
> because they don't disturb one's visitors at all and you don't have
> to mess around with any Javascript.
> But another way of looking at it is to say that Javascript protection
> is twice as effective as entity protection. =) (Thanks to Huff's "How
> to Lie with Statistics")

Both are unreliable. Even *I* can make script that extracts email addresses
from JS or entity coded text :-)
Use a mail form.

--
Nico Schuyt
http://www.nicoschuyt.nl/


dorayme

unread,
Oct 13, 2006, 12:00:05 AM10/13/06
to
In article <452f0584$0$50455$dbd4...@news.euronet.nl>,
"Nico Schuyt" <nsc...@hotmail.com> wrote:

Would you, Mr Korpela and Jock - you see, Nico what good company
you are in... :) - please not ignore the fact that it works to
actually stop spam. If you don't think it actually does, say so
loud and clear. The issue of it "being easy" to overcome is quite
irrelevent in a world where almost no bots do this. This is the
world you earthlings and I live for the moment. What world are
you talking about? One in which Spider's stats are not true? In
this world it looks to me to be very reliable for now.

--
dorayme

Nico Schuyt

unread,
Oct 13, 2006, 3:16:18 AM10/13/06
to

Working *now* is no guarantee what so ever for being effective in the near
future.

> The issue of it "being easy" to overcome is quite
> irrelevent in a world where almost no bots do this. This is the
> world you earthlings and I live for the moment. What world are
> you talking about? One in which Spider's stats are not true?

Stats are never true :-)

> In this world it looks to me to be very reliable for now.

The place is right; it's the time that might be a problem.
Tomorrow I'll launch my new evil bot.

John Dunlop

unread,
Oct 13, 2006, 4:41:38 AM10/13/06
to
dorayme:

> [John Dunlop:]

> I am in a picky mood, just excuse and ignore it: the lazy theory
> is inadequate, not so plausible. You do need help. Go and study
> the robber analogy of mine, the robber is not lazy. He can get
> what he wants from unsecured houses. He is rationalising his
> resources.

Oops! Ok, so 'lazy' might not be the /mot juste/, as they say in the
Gorbals, but 'rationalising one's resources' seems to be more or less a
rehashing of the same theory, no? Anyway, it's one I'll have to
remember next time I'm asked to go to the gym.

> > Yes, but I am merely pointing out that obfuscating e-mail addresses is
> > inferior to real security; I am not claiming to know what harvesters
> > actually do!
>
> You were giving a different impression to me at least. I was
> getting a message from your words that it was ineffective, that
> it would not deter. You did not make things so utterly clear. You
> did not say out loud, yes, it will reduce spam but these are the
> downsides...

'I should emphasize that I'm not saying that attempts at obfuscation


will universally fail, only that it takes little effort to overcome

them.'

Does it reduce spam? It would seem to reduce the amount of spam that
that e-mail address owner receives, yes, but whether it makes an impact
on spam in the grand scheme of things, I don't know. Wouldn't a
harvester simply pick other addresses?

> You gave the impression of conflating these issues.

Ok. Let me list some options.

1. Obfuscate the address on the page:
a. munging
b. character references
c. percent-encodings
d. human-only addresses (e.g., 'user (at) host')
e. address written in javascript
2. Implement junk mail filters:
a. server filters
b. MUA filters
3. Remove all trace of the address.

Now my position regarding 1(b,c). Character references are the lesser
of the two evils, because while percent-encodings actually change the
URL for some degrees of equivalency, upsetting the user-interface,
character references don't.

But character references were 'intended to be used when you could not
otherwise enter a character conveniently in the text' (/The SGML
Handbook/ p. 356). I would be surprised if it inconvenienced you to
enter most US-ASCII characters directly.

> > Mind that old axiom 'security by obscurity gives a false sense of
> > security'?
>
> <g> I have a car protection system I made myself that is a sort
> of inverse of this! It consists of a "key" and "switch" that is
> not hidden from view, it is just not obvious to anyone's mind. It
> gives me a great sense of security and has worked on a number of
> occasions, both on my car and my daughter's and a neighbours'...

I could find other analogies such as hiding the backdoor key to your
house under a stone, or hiding the key to your car under a wheel arch,
but I'm not sure what you're getting at here. The sense of security
can be real but false.

> I think I will use en encoding just on this occasion...

If you feel the practical benefits of e-mail address obfuscation
outweigh the practical downsides - e.g., the impression of
unprofessionalism, the mangling of the user-interface by
percent-encoding - and the theoretical downsides, who am I to stand in
your way.

I suppose any persuasiveness I enjoyed must yield to Friday the 13th.

--
Jock

Nikita the Spider

unread,
Oct 13, 2006, 10:22:27 AM10/13/06
to
In article <452f3d17$0$53312$dbd4...@news.euronet.nl>,
"Nico Schuyt" <nsc...@hotmail.com> wrote:

> dorayme wrote:
> > "Nico Schuyt" <nsc...@hotmail.com> wrote:
>
> >> Nikita the Spider wrote:
> >>> I've set up several spamtrap addresses to study this. Eventually
> >>> I'll write a short article about my findings, but in the meantime
> >>> I'll summarize here. I have three email addresses all on the same
> >>> page. One is naked (i.e. just f...@example.com), one is entity
> >>> encoded (i.e. &#x66;&#x6f;&#x6f; etc.) and one is added to the
> >>> page by Javascript. The number of spams each has gotten to date is
> >>> as follows: naked - 715
> >>> entities - 2
> >>> javascript - 1
> >>> In short, the entities look pretty effective to me. They're nice
> >>> because they don't disturb one's visitors at all and you don't have
> >>> to mess around with any Javascript.
> >>> But another way of looking at it is to say that Javascript
> >>> protection is twice as effective as entity protection. =) (Thanks
> >>> to Huff's "How to Lie with Statistics")
>
> >> Both are unreliable. Even *I* can make script that extracts email
> >> addresses from JS or entity coded text :-)
> >> Use a mail form.

A mail form != an email address hyperlink. The former is less convenient
for the user. Yes, email forms limit spam but so does putting one's
email address in an image instead of text, or writing "foo (at) example
dot com". As John Dunlop rightly said, that's "passing the buck" -- you
inconvenience the user. I want to avoid that if possible.

> > Would you, Mr Korpela and Jock - you see, Nico what good company
> > you are in... :) - please not ignore the fact that it works to
> > actually stop spam.
>

> Working *now* is no guarantee what so ever for being effective in the near
> future.

The same could be said for all spam blocking methods (my Bayesian
filters used to work a lot better, for example). So should we should
abandon all attempts to block spam because none of them are guaranteed?
Hmmmm, OK. But you go first. ;)

Nico Schuyt

unread,
Oct 13, 2006, 11:03:43 AM10/13/06
to
Nikita the Spider wrote:

> "Nico Schuyt" wrote:
>> dorayme wrote:
>>> "Nico Schuyt" wrote:
>
>>>> Nikita the Spider wrote:
>>>>> I've set up several spamtrap addresses to study this.
>>>>> [JS versus entity encoding]

>>>> Both are unreliable. Even *I* can make script that extracts email
>>>> addresses from JS or entity coded text :-)
>>>> Use a mail form.

> A mail form != an email address hyperlink. The former is less
> convenient for the user.

Maybe it's more inconvenient for the user if he tries to contact you in an
internet cafe (no mail client)

> Yes, email forms limit spam but so does
> putting one's email address in an image instead of text, or writing
> "foo (at) example dot com".

Not so friendly for the visitor either

But I just applied your entity-encoding-tric in a site where I needed an
e-mail address and didn't had time to install a form :-)
Thanks for the tip!

BTW for encoding of a string ($str) with the e-mail address into html
entities I used:
<php
$str="<e-mail address>";
for ($i=0;$i<strlen($str);$i++)
printf('&#%03d;',ord($str{$i}));
?>

Harlan Messinger

unread,
Oct 13, 2006, 11:03:10 AM10/13/06
to

Not necessarily, and altogether false for users not using an e-mail
client on their local machine, e.g., all users of web-based mail
services, many users of computers at their work place, and all users of
computers at libraries, Internet cafes, etc.

Further, if you are interested in, or think you may ever be interested
in, capturing information from the user besides the message itself (how
did you hear about us? is this a bug report, a help request, or a new
feature suggestion?), then the form is the way to go.

Nikita the Spider

unread,
Oct 13, 2006, 2:33:40 PM10/13/06
to
In article <4p9o5iF...@individual.net>,
Harlan Messinger <hmessinger...@comcast.net> wrote:

> Nikita the Spider wrote:
> > A mail form != an email address hyperlink. The former is less convenient
> > for the user.
>
> Not necessarily, and altogether false for users not using an e-mail
> client on their local machine, e.g., all users of web-based mail
> services, many users of computers at their work place, and all users of
> computers at libraries, Internet cafes, etc.

Fair enough, I hadn't thought of those scenarios. But Web mail users
*do* have an email client on their local machine -- the browser.

> Further, if you are interested in, or think you may ever be interested
> in, capturing information from the user besides the message itself (how
> did you hear about us? is this a bug report, a help request, or a new
> feature suggestion?), then the form is the way to go.

Yes, some of these are good candidates for forms.

dorayme

unread,
Oct 13, 2006, 4:17:12 PM10/13/06
to
In article <452f3d17$0$53312$dbd4...@news.euronet.nl>,
"Nico Schuyt" <nsc...@hotmail.com> wrote:

> > Would you... please not ignore the fact that it works to


> > actually stop spam. If you don't think it actually does, say so
> > loud and clear.
>
> Working *now* is no guarantee what so ever for being effective in the near
> future.
>

This is simply not true. If you had left it at the "no guarantee:
and not added the "whatsoever" you would have had a fighting
chance old chap.



> > The issue of it "being easy" to overcome is quite
> > irrelevent in a world where almost no bots do this. This is the
> > world you earthlings and I live for the moment. What world are
> > you talking about? One in which Spider's stats are not true?
>
> Stats are never true :-)
>

Come now Nico, you can't believe this.

> > In this world it looks to me to be very reliable for now.
>
> The place is right; it's the time that might be a problem.
> Tomorrow I'll launch my new evil bot.

Ah... this is the kind of talk I like, evil talk. If you have
plans... all this is different...

--
dorayme

dorayme

unread,
Oct 13, 2006, 4:30:13 PM10/13/06
to
In article
<1160728898.2...@k70g2000cwa.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

> If you feel the practical benefits of e-mail address obfuscation
> outweigh the practical downsides - e.g., the impression of
> unprofessionalism, the mangling of the user-interface by
> percent-encoding - and the theoretical downsides, who am I to stand in
> your way.

Well, this is what I would like to know more about. When I do
bad, I prefer not to be low class about it. I want to know what
evil I commit. What mangling are you talking about? I have
attempted a few times to raise the question about how bots work,
on the source code or the expressed page text (visible and
audible etc as normal text to humans). It seems it is mainly the
former. If so, who besides alt.html types will it seem so
unprofessional to?

--
dorayme

dorayme

unread,
Oct 13, 2006, 4:36:40 PM10/13/06
to
In article
<NikitaTheSpider-44...@news-rdr-01-ce0-1.southeas
t.rr.com>,

Nikita the Spider <NikitaT...@gmail.com> wrote:

> > Working *now* is no guarantee what so ever for being effective in the near
> > future.
>
> The same could be said for all spam blocking methods (my Bayesian
> filters used to work a lot better, for example). So should we should
> abandon all attempts to block spam because none of them are guaranteed?
> Hmmmm, OK. But you go first. ;)

Actually, Spider, I was just saying to a friend this morning, my
Mac Mail.app filters based on this type of mathematics is failing
me lately... bit alarming actually, i am thinking is it the junk
algorithms not learning any more (they used to be good) or are
the spammers just on to these algorithms bigtime now. Never mind,
clients, websites... I may need to actually buy a better spam set
up for me... I suppose this is OT! But I was interested to hear
your remark about Bayesian filters. Doubtless, there are all
kinds of these...

--
dorayme

Nico Schuyt

unread,
Oct 13, 2006, 5:09:11 PM10/13/06
to
dorayme wrote:
> "Nico Schuyt" <nsc...@hotmail.com> wrote:

>>> Would you... please not ignore the fact that it works to
>>> actually stop spam. If you don't think it actually does, say so
>>> loud and clear.

>> Working *now* is no guarantee what so ever for being effective in
>> the near future.

> This is simply not true. If you had left it at the "no guarantee:
> and not added the "whatsoever" you would have had a fighting
> chance old chap.


Ah well, let's not argue. Like I mentioned in a later posting, I fully admit
the technique of Nikita is a good alternative for the JS-solution.

>>> The issue of it "being easy" to overcome is quite
>>> irrelevent in a world where almost no bots do this. This is the
>>> world you earthlings and I live for the moment. What world are
>>> you talking about? One in which Spider's stats are not true?

>> Stats are never true :-)

> Come now Nico, you can't believe this.

But I really do :-) It's not the statistics, that's pure mathematics; it's
the uncertainty in the methods the data are collected.

>>> In this world it looks to me to be very reliable for now.

>> The place is right; it's the time that might be a problem.
>> Tomorrow I'll launch my new evil bot.

> Ah... this is the kind of talk I like, evil talk. If you have
> plans... all this is different...

Don't worry, no plans :-)

Harlan Messinger

unread,
Oct 13, 2006, 5:10:01 PM10/13/06
to
Nikita the Spider wrote:
> In article <4p9o5iF...@individual.net>,
> Harlan Messinger <hmessinger...@comcast.net> wrote:
>
>> Nikita the Spider wrote:
>>> A mail form != an email address hyperlink. The former is less convenient
>>> for the user.
>> Not necessarily, and altogether false for users not using an e-mail
>> client on their local machine, e.g., all users of web-based mail
>> services, many users of computers at their work place, and all users of
>> computers at libraries, Internet cafes, etc.
>
> Fair enough, I hadn't thought of those scenarios. But Web mail users
> *do* have an email client on their local machine -- the browser.

Well, of course I said it was false that an e-mail link is more
convenient, not false that it can be used! But it can't be used
directly. Clicking a link won't open the browser to a Compose Mail page
on the user's e-mail service. Instead, it may cause an error. Or, if an
e-mail client *is* installed, but configured for someone else, it could
open a new message window, letting the user cluelessly send an e-mail
under someone else's account.

dorayme

unread,
Oct 13, 2006, 5:52:02 PM10/13/06
to
In article <45300049$0$34459$dbd4...@news.euronet.nl>,
"Nico Schuyt" <nsc...@hotmail.com> wrote:

> >>> The issue of it "being easy" to overcome is quite
> >>> irrelevent in a world where almost no bots do this. This is the
> >>> world you earthlings and I live for the moment. What world are
> >>> you talking about? One in which Spider's stats are not true?
>
> >> Stats are never true :-)
>
> > Come now Nico, you can't believe this.
>
> But I really do :-) It's not the statistics, that's pure mathematics; it's
> the uncertainty in the methods the data are collected.

Ah I see what you are saying I think. Yes, I would like to see
more data on these experiments. Spider has mentioned he will one
day do this. Perhaps time for a little experiment or two
ourselves to confirm... :)

--
dorayme

Joe

unread,
Oct 13, 2006, 7:17:18 PM10/13/06
to
In article <1160683028.6...@m73g2000cwd.googlegroups.com>,
usene...@john.dunlop.name says...

>
> ?? No. E-mail address obfuscation tries to deal with the problem at
> the user's end. Its aim is to remove all trouble for the e-mail
> address owner, no matter the cost to anyone else.
>

What exactly is the "cost to anyone else" of my choosing to use hash
entities to hide my email addy from bots while leaving it perfectly
clear to humans? Just curious.

...


>
> It is not your job to prevent spam being generated, unless you are
> actively fighting against it, in which case there are better, more
> effective approaches than e-mail address obfuscation.
>

Whose job do you suppose it is then?

Joe

unread,
Oct 13, 2006, 7:17:26 PM10/13/06
to
In article <doraymeRidThis-D1D293.12172413102006@news-
vip.optusnet.com.au>, dorayme...@optusnet.com.au says...

...


>
> But I agree that the seen email address should be normal
> looking. There is a way around this, to not put any at all, just
> a link, the words being, "email us" or whatever.
>

The problem with that, as I see it, is that people who might want to
email you but are not able to at that time (because they are in an
Internet Cafe, Library, or someplace else they can't send email from)
can't just write the addy on the back of an envelope and take it with
them. Anyway, there's someething that inspires trust about an address
you can actually see - and if the 'bots have trouble, so much the
better.

> I would be interested to hear from anyone who has an idea of the
> chances of email harvesting happening from the expressed text on
> the page as distinct from the source.

but ... nah, you probably feel like a dill already.


John Dunlop

unread,
Oct 13, 2006, 8:19:23 PM10/13/06
to
Joe (GKF):

> What exactly is the "cost to anyone else" of my choosing to use hash
> entities to hide my email addy from bots while leaving it perfectly
> clear to humans? Just curious.

Probably nothing. What has that to do with the price of fish?

> > It is not your job to prevent spam being generated, unless you are
> > actively fighting against it, in which case there are better, more
> > effective approaches than e-mail address obfuscation.
>
> Whose job do you suppose it is then?

What do I care?

I don't get spam. I don't obfuscate my address.

--
Jock

dorayme

unread,
Oct 13, 2006, 8:23:54 PM10/13/06
to
In article <MPG.1f99af9a1...@news.aardvark.net.au>,
Joe (GKF) <joedi...@yahoo.com.au> wrote:

> In article <doraymeRidThis-D1D293.12172413102006@news-
> vip.optusnet.com.au>, dorayme...@optusnet.com.au says...
>
> ...
> >
> > But I agree that the seen email address should be normal
> > looking. There is a way around this, to not put any at all, just
> > a link, the words being, "email us" or whatever.
> >
>
> The problem with that, as I see it, is that people who might want to
> email you but are not able to at that time (because they are in an
> Internet Cafe, Library, or someplace else they can't send email from)
> can't just write the addy on the back of an envelope and take it with
> them. Anyway, there's someething that inspires trust about an address
> you can actually see - and if the 'bots have trouble, so much the
> better.
>

Yes. You are right. And if the bots have trouble, so much the
better.

> > I would be interested to hear from anyone who has an idea of the
> > chances of email harvesting happening from the expressed text on
> > the page as distinct from the source.
>
> but ... nah, you probably feel like a dill already.

Not really (but that's the mark of a dill, you see).

Have this idea that the source is searched for addresses but that
the expressed text could be too ...

The simple fact is that I do not know how these bots work, do
they look in strings starting with "mailto:" or even simpler, any
"well-formed" ascii email string.

--
dorayme

dorayme

unread,
Oct 13, 2006, 8:27:47 PM10/13/06
to
In article
<1160785163....@h48g2000cwc.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

> I don't get spam.
>

OK Jock, time to spill the beans. No one simply just does not get
spam. There is a story behind how you do not get spam. What is
the story? Every single little secret please. Don't be shy now.

--
dorayme

John Dunlop

unread,
Oct 14, 2006, 3:15:20 AM10/14/06
to
dorayme:

> OK Jock, time to spill the beans. No one simply just does not get
> spam. There is a story behind how you do not get spam. What is
> the story? Every single little secret please. Don't be shy now.

If I told you, I would have to kill you.

--
Jock

John Dunlop

unread,
Oct 14, 2006, 4:05:30 AM10/14/06
to
dorayme:

> What mangling are you talking about?

Mangling URLs by percent-encoding octets that could have remained as
raw data.

| For consistency, percent-encoded octets in the ranges
| of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39),
| hyphen (%2D), period (%2E), underscore (%5F), or tilde
| (%7E) should not be created by URI producers and, when
| found in a URI, should be decoded to their corresponding
| unreserved characters by URI normalizers.

(RFC3986 : 2.3)

Two conditions:

1. Unreserved characters should not be percent-encoded.
2. If found, they should be decoded.

E-mail address obfuscation that percent-encodes the octets of
unreserved characters runs afoul of (1). And if 'global transcription'
is a consideration, anyone who obfuscates their address in this way
relies on (2).

> If so, who besides alt.html types will it seem so unprofessional to?

Anyone, I'd imagine, faced with a URL chock full of %xx. If it hasn't
yet been decoded.

--
Jock

jojo

unread,
Oct 14, 2006, 8:16:04 AM10/14/06
to
John Dunlop wrote:

>
> Besides, in this war, there are more effective and less harmful
> strategies than obfuscation.
>

Yes, I know them. They are called"spam-filters"... It wasn't my idea to
obfuscate th emmail-address, I just pointed out a way how to do it if
you want to. And AFAIK there is no way of obfuscation that doesn't run
againest the spirit of the internet specifications.

jojo

Nikita the Spider

unread,
Oct 14, 2006, 1:08:11 PM10/14/06
to
In article
<doraymeRidThis-040...@news-vip.optusnet.com.au>,
dorayme <dorayme...@optusnet.com.au> wrote:

I'm also using Mail.app. Many spams include random bits of prose
(non-spammy words) to offset the weight of the spammy content of the
email. This is a pretty effective technique against a lot of statistical
weighting filters, which is what I think Mail.app and lots of other
programs use.

Nikita the Spider

unread,
Oct 14, 2006, 1:16:38 PM10/14/06
to
In article
<doraymeRidThis-B1C...@news-vip.optusnet.com.au>,

dorayme <dorayme...@optusnet.com.au> wrote:
> The simple fact is that I do not know how these bots work, do
> they look in strings starting with "mailto:" or even simpler, any
> "well-formed" ascii email string.

I'm sure there's a variety of them out there. I've gotten hits on URLs
before that are only expressed in HTML comments, which tells me that
some bots are not properly parsing the HTML but probably just scanning
the source for "<a" or "http://" and using that as their flag for a
link. I would think that some do the same for "mailto:" as you
suggested, or maybe just "@".

Nikita the Spider

unread,
Oct 14, 2006, 1:22:47 PM10/14/06
to
In article <45300049$0$34459$dbd4...@news.euronet.nl>,
"Nico Schuyt" <nsc...@hotmail.com> wrote:

> dorayme wrote:
> > "Nico Schuyt" <nsc...@hotmail.com> wrote:
> >> Stats are never true :-)
>
> > Come now Nico, you can't believe this.
>
> But I really do :-) It's not the statistics, that's pure mathematics; it's
> the uncertainty in the methods the data are collected.

Nico, my methods are perfect! Trust me! =)

Seriously, you're right, I already referred once in this thread to one
of my favorite books, How To Lie with Statistics. There are lots of ways
to mismeasure things and to misrepresent the measurements. I guess I
will write up my methods sooner rather than later since the topic is
fresh on my mind now. That way you can judge whether my findings have
any merit.

As dorayme suggested, I'd love to see others try the same test to see if
the results hold up. It could be that my corner of the Internet is just
populated by stupid bots.

dorayme

unread,
Oct 14, 2006, 4:38:59 PM10/14/06
to
In article
<1160810120.0...@f16g2000cwb.googlegroups.com>,
"John Dunlop" <usene...@john.dunlop.name> wrote:

It may be worth it... <g>

--
dorayme

Joe

unread,
Oct 14, 2006, 8:55:21 PM10/14/06
to
In article <doraymeRidThis-B1C306.10235414102006@news-
vip.optusnet.com.au>, dorayme...@optusnet.com.au says...
>

>
> Have this idea that the source is searched for addresses but that
> the expressed text could be too ...
>
> The simple fact is that I do not know how these bots work, do
> they look in strings starting with "mailto:" or even simpler, any
> "well-formed" ascii email string.
>
>

I don't know the answer to that either, which is why I &# the "mailto:"
stuff and the text that is to appear on the page as well.

dorayme

unread,
Oct 14, 2006, 10:27:56 PM10/14/06
to
In article <MPG.1f9af5c7c...@news.aardvark.net.au>,
Joe (GKF) <joedi...@yahoo.com.au> wrote:

Me too.

OK Joe, I will confess something to you... you know how you said
I might be feeling like a dill... well a couple of things about
this:

(1) That's the sweetest thing you have ever said to me... you
little cucumber yourself...

but

(2) I was imagining different types of spam robots:

a. The sort that live in a little robot-house somewhere. They
have a little sleep and get up and have a little oil and turn on
their html source only browsers. Their job is to get the email
addresses by looking at source only. Some use javascript and
entity type decoders, some don't.

b. The sort that don't have some of their members using little
monitors. Their job is to get the email addresses by looking at
what is on their little screens. They are made a bit different to
the other robots

Now, I reckon this is more than dillish, I boast that it is
outright idiocy. Never overestimate me. In my case, it is a
special martian trait, less is more.

--
dorayme

dorayme

unread,
Oct 14, 2006, 10:30:30 PM10/14/06
to
In article
<doraymeRidThis-DD3...@news-vip.optusnet.com.au>,
dorayme <dorayme...@optusnet.com.au> wrote:

> b. The sort that don't have

b. the sort that do have

--
dorayme

MG

unread,
Oct 15, 2006, 4:49:22 AM10/15/06