I had a client recently ask me to "do something" about the spam
coming from his website. I want to do better than tell him to get
the best spam filter he can, both on his local and on his server
end via his host. There is a javascript thing I used to use but
these days I am interested in being able to get by without it.
so, any suggestions will be welcome, especially if they are
actually being used by the suggester (not someone else they know
or have heard of... take this as a compliment)
--
dorayme
Thanks Joe. I found something to make your technique easier at
http://www.wbwip.com/wbw/emailencoder.html and have already used
it in anger just now and it is working on a client's site. I am
imagining it already side-swiping all attempts by vicious bots to
"harvest" it, I have the image of a rugby player at full bore
with the ball, fending off all attempts to tackle him. You know,
his arm and hand stretched out to push all comers away as Rugby
players do... (I know how analogies tickle you pink...)
Just what I wanted, someone to say something to get me going!
--
dorayme
> Anyone here using methods to make it more difficult for spammers
> to garner email addresses from web pages. Mostly interested to
> hear from anyone using specific methods (rather than anything
> else like further reviews, analyses of the ultimate effectiveness
> etc, having things like "removeThis" inside the email address
> that is in the "mailto:").
I've set up several spamtrap addresses to study this. Eventually I'll
write a short article about my findings, but in the meantime I'll
summarize here. I have three email addresses all on the same page. One
is naked (i.e. just f...@example.com), one is entity encoded (i.e.
foo etc.) and one is added to the page by Javascript.
The number of spams each has gotten to date is as follows:
naked - 715
entities - 2
javascript - 1
In short, the entities look pretty effective to me. They're nice because
they don't disturb one's visitors at all and you don't have to mess
around with any Javascript.
But another way of looking at it is to say that Javascript protection is
twice as effective as entity protection. =) (Thanks to Huff's "How to
Lie with Statistics")
--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
> In article
> <doraymeRidThis-BBF...@news-vip.optusnet.com.au>,
> dorayme <dorayme...@optusnet.com.au> wrote:
>
> > Anyone here using methods to make it more difficult for spammers
> > to garner email addresses from web pages. Mostly interested to
> > hear from anyone using specific methods (rather than anything
> > else like further reviews, analyses of the ultimate effectiveness
> > etc, having things like "removeThis" inside the email address
> > that is in the "mailto:").
>
> I've set up several spamtrap addresses to study this. Eventually I'll
> write a short article about my findings, but in the meantime I'll
> summarize here. I have three email addresses all on the same page. One
> is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> foo etc.) and one is added to the page by Javascript.
> The number of spams each has gotten to date is as follows:
>
> naked - 715
> entities - 2
> javascript - 1
>
> In short, the entities look pretty effective to me. They're nice because
> they don't disturb one's visitors at all and you don't have to mess
> around with any Javascript.
>
Yes, excellent. My feelings too on this one.
> But another way of looking at it is to say that Javascript protection is
> twice as effective as entity protection. =) (Thanks to Huff's "How to
> Lie with Statistics")
People can and do look at things as they like! But the truth is
another matter.
It would be nice to actually know how the 2 and 1 got through...
This brings up this issue: just this morning, there was some post
here at alt.html re a facility to somehow capture material on a
screen (it is gone from my newsreader now). Though the email is
veiled in the source, it is not in the browser as expressed. It
is commonly just printed as normal on the screen. Sure, this bit
can be avoided by simple techniques like making the visible link
something like ...>email us</a>? To avoid any "on screen
harvesting"?
But, this is not always acceptable. I have no idea how the robots
work, how clever they are, whether they in fact look at source or
output or both. Your stats would be more meaningful if you could
say more about the implementation. Interesting experiment though,
Spider. Look forward to your article.
--
dorayme
> I've set up several spamtrap addresses to study this. Eventually I'll
> write a short article about my findings, but in the meantime I'll
> summarize here. I have three email addresses all on the same page.
> One is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> foo etc.) and one is added to the page by Javascript.
> The number of spams each has gotten to date is as follows:
>
> naked - 715
> entities - 2
> javascript - 1
I'll agree that using entities works. I have one address on a web site
that began life in this form. Never got any spam in about six years.
Then one day, I started getting bounces from emails containing viruses.
I found out that someone who added my address to his address book got
infected. My address was used as a forged FROM: by this virus. Shortly
after that, I started to get spam and it's hovering around 200-250 per
day now. :-(
--
-bts
-Motorcycles defy gravity; cars just suck
> I've been using the 'hash entity' method for years. Seems to work, but
> as it's used in harness with spam filter on my email proggy and isp, I
> don't really know. I replace about half the addy with entities,
> especially the . and @ , but how far you go is up to you. It can't hurt
> to try.
> Anyway, check it at http://graspages.cjb.cc/emailme.php
You can improve that: use HTML-Entities for "mailto:" and hex-entities
(%41 for A) for the email-adress itself.
jojo:
> You can improve that: use HTML-Entities for "mailto:" and hex-entities
> (%41 for A) for the email-adress itself.
...the one going against if not the word then the spirit of HTML4.01,
the other against the spirit of RFC3986. Character references were
made for when it is inconvenient or impossible to enter a character
directly, for example, when there is no key for it on the keyboard or
the character isn't displayable.
| A given character encoding may not be able to express
| all characters of the document character set. For such
| encodings, or when hardware or software configurations
| do not allow users to input some document characters
| directly, authors may use SGML character references.
(HTML4.01 sec.5.3)
Percent-encoding characters that are allowed as data in a URL part
hinders transcription because characters that could otherwise be
recognisable and rememberable have been, unless you're familiar with
US-ASCII and hexadecimal notation, turned into unrecognisable and
harder-to-remember three-character sequences. That your browser
silently decodes percent-encodings and presents you with a more
human-friendly URL suggests that e-mail address harvesters can do a
similar job.
Principles of URL design take into consideration human factors because
URLs are part of the user-interface. Obfuscating URLs with
percent-encodings makes things harder for humans while barely
increasing the hardship on e-mail address harvesters.
Obfuscation of e-mail addresses is just that: obfuscation. It does
nothing to help the genuine user find and use your e-mail address.
Attempts at obfuscating e-mail addresses - likewise attempts at
obfuscating markup - are trivial to bypass, even by e-mail address
harvesters. I should emphasize that I'm not saying that attempts at
obfuscation will universally fail, only that it takes little effort to
overcome them.
My advice, if you're not keen on actively fighting spam, would be to
either set up junk mail filters both at your server and at your MUA, or
remove the address from the public eye altogether.
--
Jock
Given how easy it is to translate I'm amazed that the encoded version is so
effective. Just goes to show that spammers are stupid as well as sad.
--
Brian Cryer
www.cryer.co.uk/brian
I'm sure you already know this, but: Whatever technique you decide to use
(unless you go the route of a better spam filter) be sure to ditch the
existing email address. Once you are on spammer's mailing list its unlikely
that you will ever get off it. So there is no point deploying a
"super-anti-spam" technique with an email address that already gets tons of
spam.
--
Brian Cryer
www.cryer.co.uk/brian
Several methods that work at least somewhat have been mentioned. Most
of us likely need several email addresses. I have noticed that many
large companies use addresses that can not be answered for contacting
people. All questions have to go to the main address. Some like to use
CGI feedback forms without a mention of a specific address. However
this is not without risk, since a virus can be fed to a server in this
way unless the CGI feedback is not very carefully constructed. There
are people who will put a scripted virus in the feedback box. Limiting
the size of the feedback and not allowing it to contain script helps in
this respect. And of course, do not use a good address on Usenet posts.
I use one at my domain for posting that does not allow any response -
everything is dumped. Then I have addresses used only for friends,
finance, etc. These seldom get spam, so I usually do not have to
configure to allow only mail from those on a list.
I was also surprised by this result, but I can think of two reasons why
harvesting bots might ignore any non-naked addresses, even if they're
easy to translate. First, the harvesters might feel that anyone who is
savvy enough to obfuscate his email address isn't likely to respond to
spam anyway. Second, the harvesters might see no shortage of
un-obfuscated addresses, so why go to the trouble of harvesting the
small number of obfuscated ones? It's this latter theory that I prefer
because laziness is a powerful (and common) motivator.
> In article
> <NikitaTheSpider-D7...@news-rdr-02-ge0-1.southeas
> t.rr.com>,
> Nikita the Spider <NikitaT...@gmail.com> wrote:
> > I've set up several spamtrap addresses to study this. Eventually I'll
> > write a short article about my findings, but in the meantime I'll
> > summarize here. I have three email addresses all on the same page. One
> > is naked (i.e. just f...@example.com), one is entity encoded (i.e.
> > foo etc.) and one is added to the page by Javascript.
> > The number of spams each has gotten to date is as follows:
> >
> > naked - 715
> > entities - 2
> > javascript - 1
> >
> > In short, the entities look pretty effective to me. They're nice because
> > they don't disturb one's visitors at all and you don't have to mess
> > around with any Javascript.
> >
> It would be nice to actually know how the 2 and 1 got through...
One of the two was a standard 419 scam (see http://www.419eater.com/ if
you're not familiar with these) so I could believe that an actual human
clicked on the link. But they one that got through to both the
Javascript- and entity-protected one was a garden variety spam. It
really surprises me that I got only one. I figured that once I was on
the list, the floodgates would open.
> But, this is not always acceptable. I have no idea how the robots
> work, how clever they are, whether they in fact look at source or
> output or both.
I'd be surprised if any do more than look through the source.
> Your stats would be more meaningful if you could
> say more about the implementation. Interesting experiment though,
> Spider. Look forward to your article.
Thanks, will explain methodology, implementation, etc. and post a link
to the article here eventually.
> And of course, do not use a good address on Usenet posts.
Rubbish.
--
Jock
> "dorayme" <dorayme...@optusnet.com.au> wrote in message
> news:doraymeRidThis-BBF...@news-vip.optusnet.com.au...
> > Anyone here using methods to make it more difficult for spammers
> > to garner email addresses from web pages. Mostly interested to
> > hear from anyone using specific methods (rather than anything
> > else like further reviews, analyses of the ultimate effectiveness
> > etc, having things like "removeThis" inside the email address
> > that is in the "mailto:").
> I'm sure you already know this, but: Whatever technique you decide to use
> (unless you go the route of a better spam filter) be sure to ditch the
> existing email address. Once you are on spammer's mailing list its unlikely
> that you will ever get off it. So there is no point deploying a
> "super-anti-spam" technique with an email address that already gets tons of
> spam.
I know what you mean. Looking on the bright side though, after a
while, without any response, without fresh harvesting, there
would start to be a reduction perhaps... after the point of
encoding provisions being made.
--
dorayme
> [re e-mail address obfuscation]
>
> jojo:
>
> > You can improve that: use HTML-Entities for "mailto:" and hex-entities
> > (%41 for A) for the email-adress itself.
>
> ...the one going against if not the word then the spirit of HTML4.01,
> the other against the spirit of RFC3986. Character references were
> made for when it is inconvenient or impossible to enter a character
> directly, for example, when there is no key for it on the keyboard or
> the character isn't displayable.
>
Ah but you see, it is like this Jock, recall, for example,
Burning Mississippi. Gene Hackman, second in command of an FBI
hunt is rearing to bring in his team of ex-crim
mission-impossible not-totally-law-abiding but
now-on-the-side-of-the-good-guys to break the back of the
low-down no-good scumbag-leadership of the KKK responsible for a
triple murder. The FBI leader, Agent Alan Ward, makes your sort
of speech, and holds out for high principles and gets bloody
nowhere! Things start to happen soon as the fabulously
charismatic Hackman is allowed to follow his instincts.
> likewise attempts at
> obfuscating markup - are trivial to bypass, even by e-mail address
> harvesters. I should emphasize that I'm not saying that attempts at
> obfuscation will universally fail, only that it takes little effort to
> overcome them.
>
If it is so little effort, what is your theory about why it is so
effective (if it is as recent indications suggest)? Perhaps I can
help you:
Similar speeches are made like yours about the value of security
bars on windows and doors. "Ha", says my neighbour opposite, "I
could get through with a good crowbar in 15 secs!".
Sure he could - if he wants to die by the claws of my specially
and lovingly trained 16 year old cat.
The point is this though: robbers tend to go for the low lying
fruit first and there is plenty enough of that to go around. Do
you understand what I am saying? No need to crash through even
slightly heavier security.
--
dorayme
> Anyone here using methods to make it more difficult for spammers
> to garner email addresses from web pages.
Removing all of one's web pages is sometimes suggested as the only sure
method, but even it isn't sure at all, of course. Think about
www.archive.org.
> I had a client recently ask me to "do something" about the spam
> coming from his website.
Tell them to contact a specialist on such matters if they can't handle it.
Spam isn't an HTML problem any more terrorism, lack of good sex, or poverty
is.
> I want to do better than tell him to get
> the best spam filter he can,
Why would you you want to do better than the real thing? I guess you are
thinking of suggesting something _else_, like "email address protection"
snake oil. I hope you now realize how ridiculous the idea is.
Either they do some spam filtering, or they don't. Either way, email address
obsfuscation does not protect them from spam but _will_ damage their
business by damaging communication, style, and impression.
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
> Scripsit dorayme:
>
> > Anyone here using methods to make it more difficult for spammers
> > to garner email addresses from web pages.
>
> > I had a client recently ask me to "do something" about the spam
> > coming from his website.
>
> Tell them to contact a specialist on such matters if they can't handle it.
> Spam isn't an HTML problem any more terrorism, lack of good sex, or poverty
> is.
>
I have already said to do the spam filtering. It is the other bit
of what you say that I don't want to communicate. I don't
honestly. I know, you are right about an ideal world. If there is
something a little impure that helps, I will use it if all I see
are mainly theoretical objections.
> > I want to do better than tell him to get
> > the best spam filter he can,
>
> Why would you you want to do better than the real thing? I guess you are
> thinking of suggesting something _else_, like "email address protection"
> snake oil. I hope you now realize how ridiculous the idea is.
Well, yes actually. But it really does not seem to me ridiculous,
even though it is not really kosher. What I do find ridiculous is
the idea of being purer than the practicalities dictate. When a
pedestrian stop light is on, Australians will tend to wait till
it goes green, even if there is not a car in sight. French people
are not so ridiculous and express surprise at this behaviour when
visiting here.
>
> Either they do some spam filtering, or they don't. Either way, email address
> obsfuscation does not protect them from spam but _will_ damage their
> business by damaging communication, style, and impression.
Well, I would like to see the evidence for this as it might
relate to various cases in my patch. If you were right, it would
indeed be a reason not to.
I was aware of this response when I posted. And was not looking
forward to it. But I think you are right to have expressed it so
as to dampen any ideas that it is a wholesome thing to do. I have
no illusions: I am a fallen being.
As often though, I do think about what you say and will probably
end up further emphasising the proper way to go, ie. to put in
the best spam filters/blockers they can and to point them to
resources to do this... So, thank you.
--
dorayme
shiny! and arguably better than my usual 'back of an envelope'
technique, which involves memorising "At 64 dot 46". Then I normally
have to look up 'a'.
> his arm and hand stretched out to push all comers away as Rugby
> players do... (I know how analogies tickle you pink...)
pinking up nicely, ta.
>
> Just what I wanted, someone to say something to get me going!
my pleasure.
[re overcoming e-mail address obfuscation]
> If it is so little effort, what is your theory about why it is so
> effective (if it is as recent indications suggest)? Perhaps I can
> help you:
No help needed, dorayme, thank you. Someone in this thread has already
advanced a plausible theory: laziness. Even the slightest extra
effort is too much because unobfuscated e-mail addresses are plentiful,
easy pickings even. No need to stretch.
> The point is this though: robbers tend to go for the low lying
> fruit first and there is plenty enough of that to go around. Do
> you understand what I am saying? No need to crash through even
> slightly heavier security.
Yes, but I am merely pointing out that obfuscating e-mail addresses is
inferior to real security; I am not claiming to know what harvesters
actually do!
Mind that old axiom 'security by obscurity gives a false sense of
security'?
And, as I've explained, the techniques to obfuscate e-mail addresses
proposed in this thread run contrary to the spirit of Internet
specifications. That a construct is included in a specification is
hardly license to exploit it.
Deal with spam at your end; don't pass the buck.
--
Jock
> And, as I've explained, the techniques to obfuscate e-mail addresses
> proposed in this thread run contrary to the spirit of Internet
> specifications.
That's because spambots are against the spirit of the internet, too. If
the "dark side" does not follow the rules we don't have to follow them
either.
> That's because spambots are against the spirit of the internet, too.
Under discussion was not the spirit of the Internet but the spirit of
Internet specifications. What harvesters do does not run contrary to
the word or the spirit of the two specifications I mentioned. I would
maintain that what you proposed - replacing US-ASCII characters with
character references in HTML, and percent-encoding octets in URLs that
would otherwise be treated as data - does.
> If the "dark side" does not follow the rules we don't have to follow them
> either.
Come on. Internet specifications are a boon! If you fail to grasp the
advantages they bring - if you fail to imagine a WWW without them - why
wait until the "dark side" supposedly deviates from them before you
ignore them yourself?
Besides, in this war, there are more effective and less harmful
strategies than obfuscation.
--
Jock
> dorayme:
>
> [re overcoming e-mail address obfuscation]
>
> > The point is this though: robbers tend to go for the low lying
> > fruit first and there is plenty enough of that to go around. Do
> > you understand what I am saying? No need to crash through even
> > slightly heavier security.
>
> Yes, but I am merely pointing out that obfuscating e-mail addresses is
> inferior to real security; I am not claiming to know what harvesters
> actually do!
Myself, I'm pretty impressed by the fact that the entity-encoded address
received only two spams while its unprotected counterpart has received
over 700. If this method is inferior, I'd like to know to what! If there
are other methods that are equally easy to implement and don't
inconvenience users, I can't say I've heard of them.
> Mind that old axiom 'security by obscurity gives a false sense of
> security'?
I'd argue that we're not talking about security here so much as
annoyance reduction. I don't mean to nitpick about your words; I
honestly think the difference is important. Security prohibits access to
a resource and there are clear negative consequences when it fails (my
account is cracked, for example). By contrast, my inbox lost its spam
virginity a long time ago. All I can do now with the resources I have
available is to limit further, ahem, penetrations.
> And, as I've explained, the techniques to obfuscate e-mail addresses
> proposed in this thread run contrary to the spirit of Internet
> specifications. That a construct is included in a specification is
> hardly license to exploit it.
I see your point, but the spec isn't strongly worded. As you pointed
out, the relevant section is here:
http://www.w3.org/TR/html401/charset.html#h-5.3
"A given character encoding may not be able to express all characters of
the document character set. For such encodings, or when hardware or
software configurations do not allow users to input some document
characters directly, authors may use SGML character references."
But it also says this:
"Character references are a character encoding-independent mechanism for
entering any character from the document character set."
Using entities to encode email addresses fits perfectly well within this
provision, IMO.
Cheers
As NtS pointed out, that's not true (or, at least, debatable).
> Deal with spam at your end; don't pass the buck.
That's what obfuscate e-mail addresses do. Letting spam be
generated any more than necessary is passing the buck. The
important thing it to prevent it (as much as possible) in the first
place.
--
Chris F.A. Johnson <http://cfaj.freeshell.org>
===================================================================
Author:
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
> [John Dunlop:]
>
> > Deal with spam at your end; don't pass the buck.
>
> That's what obfuscate e-mail addresses do.
?? No. E-mail address obfuscation tries to deal with the problem at
the user's end. Its aim is to remove all trouble for the e-mail
address owner, no matter the cost to anyone else. If obfuscation dealt
with the problem at your end, it wouldn't be obfuscation since there
would be nothing to obfuscate.
> Letting spam be generated any more than necessary is passing the buck.
It is not your job to prevent spam being generated, unless you are
actively fighting against it, in which case there are better, more
effective approaches than e-mail address obfuscation.
--
Jock
> Myself, I'm pretty impressed by the fact that the entity-encoded address
> received only two spams while its unprotected counterpart has received
> over 700. If this method is inferior, I'd like to know to what!
mentioned now more than once in this thread: normal counter-spam
measures. That means junk mail filters both at the server and at the
MUA.
[re e-mail address obfuscation running contrary to the spirit of
Internet specs]
> I see your point, but the spec isn't strongly worded.
Well, every clause in the spec is vague enough to be open to, however
absurd, interpretation.
I specifically talked not about the spec's wording but about its
spirit. To learn about the spirit of HTML you have to trace its
history: follow the past discussions, study the earlier drafts and
specifications, find out why the constructs were introduced in the
first place.
> As you pointed out, the relevant section is here:
>
> http://www.w3.org/TR/html401/charset.html#h-5.3
I quoted from there but did not mean that as the 'relevant section' to
learn why character references came about. You will find that not in
the HTML4.0 spec but in ISO8879 (my copy's at work and I haven't yet
memorised it all, so much to my consternation I can't give you chapter
and verse.)
> But it also says this:
> "Character references are a character encoding-independent mechanism for
> entering any character from the document character set."
>
> Using entities to encode email addresses fits perfectly well within this
> provision, IMO.
That's not even half the story.
--
Jock
> dorayme:
>
> [re overcoming e-mail address obfuscation]
>
> > If it is so little effort, what is your theory about why it is so
> > effective (if it is as recent indications suggest)? Perhaps I can
> > help you:
>
> No help needed, dorayme, thank you. Someone in this thread has already
> advanced a plausible theory: laziness. Even the slightest extra
> effort is too much because unobfuscated e-mail addresses are plentiful,
> easy pickings even. No need to stretch.
I am in a picky mood, just excuse and ignore it: the lazy theory
is inadequate, not so plausible. You do need help. Go and study
the robber analogy of mine, the robber is not lazy. He can get
what he wants from unsecured houses. He is rationalising his
resources.
>
> > The point is this though: robbers tend to go for the low lying
> > fruit first and there is plenty enough of that to go around. Do
> > you understand what I am saying? No need to crash through even
> > slightly heavier security.
>
> Yes, but I am merely pointing out that obfuscating e-mail addresses is
> inferior to real security; I am not claiming to know what harvesters
> actually do!
>
You were giving a different impression to me at least. I was
getting a message from your words that it was ineffective, that
it would not deter. You did not make things so utterly clear. You
did not say out loud, yes, it will reduce spam but these are the
downsides... You gave the impression of conflating these issues.
> Mind that old axiom 'security by obscurity gives a false sense of
> security'?
<g> I have a car protection system I made myself that is a sort
of inverse of this! It consists of a "key" and "switch" that is
not hidden from view, it is just not obvious to anyone's mind. It
gives me a great sense of security and has worked on a number of
occasions, both on my car and my daughter's and a neighbours'...
> Deal with spam at your end; don't pass the buck.
It is not my spam. Tell that to my client. But, Jock, be careful,
he is 6 foot 8 inches and built like a brick shit-house, has red
hair and is not delicate, if you know what I mean. I think I will
use en encoding just on this occasion...
--
dorayme
> Nikita the Spider:
>
> > Myself, I'm pretty impressed by the fact that the entity-encoded address
> > received only two spams while its unprotected counterpart has received
> > over 700. If this method is inferior, I'd like to know to what!
>
> mentioned now more than once in this thread: normal counter-spam
> measures. That means junk mail filters both at the server and at the
> MUA.
Hmmm, I guess we'll have to disagree on the criteria we use to measure
"inferior". Even the best mail filters can generate false positives,
which is something that an entity-encoded address won't do. And it'd
have to be a pretty darn effective filter (or set of filters) to achieve
what the entity encoding has done in this test. Furthermore, entity
encoding is something that any Web page author can do; the same can't be
said for setting up and tuning server-side filters. Last but not least,
entity encoding *prevents spam from being generated*. Mail filtering
doesn't do this. And if I just rely on my ISP's filters to handle my
spam for me, isn't that "passing the buck"?
> [re e-mail address obfuscation running contrary to the spirit of
> Internet specs]
>
> > I see your point, but the spec isn't strongly worded.
>
> Well, every clause in the spec is vague enough to be open to, however
> absurd, interpretation.
If you say so.
> I specifically talked not about the spec's wording but about its
> spirit. To learn about the spirit of HTML you have to trace its
> history: follow the past discussions, study the earlier drafts and
> specifications, find out why the constructs were introduced in the
> first place.
>
> > As you pointed out, the relevant section is here:
> >
> > http://www.w3.org/TR/html401/charset.html#h-5.3
>
> I quoted from there but did not mean that as the 'relevant section' to
> learn why character references came about. You will find that not in
> the HTML4.0 spec but in ISO8879 (my copy's at work and I haven't yet
> memorised it all, so much to my consternation I can't give you chapter
> and verse.)
I haven't read ISO8879. I'll grant that my opinion might change after
doing so. But of all of the abuses to which HTML has been and is
subjected (sending XHTML as text/html comes to mind), I find it hard to
believe that entity encoding email addresses would be in the top one
hundred of many people's lists, if at all.
I personally find it aesthetically distasteful to do any sort of
obfuscation of addresses; it just seems to go against the grain of
Internet standards that have always been designed to keep things as
open as possible, not intentionally obscure. Some of the
character-encoding stuff I can more-or-less tolerate because you have
to view the source code to see that it's whacked out, but other things
like spelling out "address at something dot net", or putting in
signature notes like "remove 'x' from my address", or embedding an
address as a graphic, just rub my nose in the fact that it's being
intentionally made more difficult to use. That's the sort of thing up
with which I won't put.
--
Dan
That is a fine speech. See my reference to Burning Mississipi. :)
But I agree that the seen email address should be normal
looking. There is a way around this, to not put any at all, just
a link, the words being, "email us" or whatever.
I would be interested to hear from anyone who has an idea of the
chances of email harvesting happening from the expressed text on
the page as distinct from the source. Without some idea of this
knowledge, one is less equipped to inform the good-guy dirty
tricks department. (If Spider's impressive figures are anything
to go on, it looks like these evil bots garner from the source
mainly)
--
dorayme
Both are unreliable. Even *I* can make script that extracts email addresses
from JS or entity coded text :-)
Use a mail form.
--
Nico Schuyt
http://www.nicoschuyt.nl/
Would you, Mr Korpela and Jock - you see, Nico what good company
you are in... :) - please not ignore the fact that it works to
actually stop spam. If you don't think it actually does, say so
loud and clear. The issue of it "being easy" to overcome is quite
irrelevent in a world where almost no bots do this. This is the
world you earthlings and I live for the moment. What world are
you talking about? One in which Spider's stats are not true? In
this world it looks to me to be very reliable for now.
--
dorayme
Working *now* is no guarantee what so ever for being effective in the near
future.
> The issue of it "being easy" to overcome is quite
> irrelevent in a world where almost no bots do this. This is the
> world you earthlings and I live for the moment. What world are
> you talking about? One in which Spider's stats are not true?
Stats are never true :-)
> In this world it looks to me to be very reliable for now.
The place is right; it's the time that might be a problem.
Tomorrow I'll launch my new evil bot.
> [John Dunlop:]
> I am in a picky mood, just excuse and ignore it: the lazy theory
> is inadequate, not so plausible. You do need help. Go and study
> the robber analogy of mine, the robber is not lazy. He can get
> what he wants from unsecured houses. He is rationalising his
> resources.
Oops! Ok, so 'lazy' might not be the /mot juste/, as they say in the
Gorbals, but 'rationalising one's resources' seems to be more or less a
rehashing of the same theory, no? Anyway, it's one I'll have to
remember next time I'm asked to go to the gym.
> > Yes, but I am merely pointing out that obfuscating e-mail addresses is
> > inferior to real security; I am not claiming to know what harvesters
> > actually do!
>
> You were giving a different impression to me at least. I was
> getting a message from your words that it was ineffective, that
> it would not deter. You did not make things so utterly clear. You
> did not say out loud, yes, it will reduce spam but these are the
> downsides...
'I should emphasize that I'm not saying that attempts at obfuscation
will universally fail, only that it takes little effort to overcome
them.'
Does it reduce spam? It would seem to reduce the amount of spam that
that e-mail address owner receives, yes, but whether it makes an impact
on spam in the grand scheme of things, I don't know. Wouldn't a
harvester simply pick other addresses?
> You gave the impression of conflating these issues.
Ok. Let me list some options.
1. Obfuscate the address on the page:
a. munging
b. character references
c. percent-encodings
d. human-only addresses (e.g., 'user (at) host')
e. address written in javascript
2. Implement junk mail filters:
a. server filters
b. MUA filters
3. Remove all trace of the address.
Now my position regarding 1(b,c). Character references are the lesser
of the two evils, because while percent-encodings actually change the
URL for some degrees of equivalency, upsetting the user-interface,
character references don't.
But character references were 'intended to be used when you could not
otherwise enter a character conveniently in the text' (/The SGML
Handbook/ p. 356). I would be surprised if it inconvenienced you to
enter most US-ASCII characters directly.
> > Mind that old axiom 'security by obscurity gives a false sense of
> > security'?
>
> <g> I have a car protection system I made myself that is a sort
> of inverse of this! It consists of a "key" and "switch" that is
> not hidden from view, it is just not obvious to anyone's mind. It
> gives me a great sense of security and has worked on a number of
> occasions, both on my car and my daughter's and a neighbours'...
I could find other analogies such as hiding the backdoor key to your
house under a stone, or hiding the key to your car under a wheel arch,
but I'm not sure what you're getting at here. The sense of security
can be real but false.
> I think I will use en encoding just on this occasion...
If you feel the practical benefits of e-mail address obfuscation
outweigh the practical downsides - e.g., the impression of
unprofessionalism, the mangling of the user-interface by
percent-encoding - and the theoretical downsides, who am I to stand in
your way.
I suppose any persuasiveness I enjoyed must yield to Friday the 13th.
--
Jock
> dorayme wrote:
> > "Nico Schuyt" <nsc...@hotmail.com> wrote:
>
> >> Nikita the Spider wrote:
> >>> I've set up several spamtrap addresses to study this. Eventually
> >>> I'll write a short article about my findings, but in the meantime
> >>> I'll summarize here. I have three email addresses all on the same
> >>> page. One is naked (i.e. just f...@example.com), one is entity
> >>> encoded (i.e. foo etc.) and one is added to the
> >>> page by Javascript. The number of spams each has gotten to date is
> >>> as follows: naked - 715
> >>> entities - 2
> >>> javascript - 1
> >>> In short, the entities look pretty effective to me. They're nice
> >>> because they don't disturb one's visitors at all and you don't have
> >>> to mess around with any Javascript.
> >>> But another way of looking at it is to say that Javascript
> >>> protection is twice as effective as entity protection. =) (Thanks
> >>> to Huff's "How to Lie with Statistics")
>
> >> Both are unreliable. Even *I* can make script that extracts email
> >> addresses from JS or entity coded text :-)
> >> Use a mail form.
A mail form != an email address hyperlink. The former is less convenient
for the user. Yes, email forms limit spam but so does putting one's
email address in an image instead of text, or writing "foo (at) example
dot com". As John Dunlop rightly said, that's "passing the buck" -- you
inconvenience the user. I want to avoid that if possible.
> > Would you, Mr Korpela and Jock - you see, Nico what good company
> > you are in... :) - please not ignore the fact that it works to
> > actually stop spam.
>
> Working *now* is no guarantee what so ever for being effective in the near
> future.
The same could be said for all spam blocking methods (my Bayesian
filters used to work a lot better, for example). So should we should
abandon all attempts to block spam because none of them are guaranteed?
Hmmmm, OK. But you go first. ;)
>>>> Both are unreliable. Even *I* can make script that extracts email
>>>> addresses from JS or entity coded text :-)
>>>> Use a mail form.
> A mail form != an email address hyperlink. The former is less
> convenient for the user.
Maybe it's more inconvenient for the user if he tries to contact you in an
internet cafe (no mail client)
> Yes, email forms limit spam but so does
> putting one's email address in an image instead of text, or writing
> "foo (at) example dot com".
Not so friendly for the visitor either
But I just applied your entity-encoding-tric in a site where I needed an
e-mail address and didn't had time to install a form :-)
Thanks for the tip!
BTW for encoding of a string ($str) with the e-mail address into html
entities I used:
<php
$str="<e-mail address>";
for ($i=0;$i<strlen($str);$i++)
printf('&#%03d;',ord($str{$i}));
?>
Not necessarily, and altogether false for users not using an e-mail
client on their local machine, e.g., all users of web-based mail
services, many users of computers at their work place, and all users of
computers at libraries, Internet cafes, etc.
Further, if you are interested in, or think you may ever be interested
in, capturing information from the user besides the message itself (how
did you hear about us? is this a bug report, a help request, or a new
feature suggestion?), then the form is the way to go.
> Nikita the Spider wrote:
> > A mail form != an email address hyperlink. The former is less convenient
> > for the user.
>
> Not necessarily, and altogether false for users not using an e-mail
> client on their local machine, e.g., all users of web-based mail
> services, many users of computers at their work place, and all users of
> computers at libraries, Internet cafes, etc.
Fair enough, I hadn't thought of those scenarios. But Web mail users
*do* have an email client on their local machine -- the browser.
> Further, if you are interested in, or think you may ever be interested
> in, capturing information from the user besides the message itself (how
> did you hear about us? is this a bug report, a help request, or a new
> feature suggestion?), then the form is the way to go.
Yes, some of these are good candidates for forms.
> > Would you... please not ignore the fact that it works to
> > actually stop spam. If you don't think it actually does, say so
> > loud and clear.
>
> Working *now* is no guarantee what so ever for being effective in the near
> future.
>
This is simply not true. If you had left it at the "no guarantee:
and not added the "whatsoever" you would have had a fighting
chance old chap.
> > The issue of it "being easy" to overcome is quite
> > irrelevent in a world where almost no bots do this. This is the
> > world you earthlings and I live for the moment. What world are
> > you talking about? One in which Spider's stats are not true?
>
> Stats are never true :-)
>
Come now Nico, you can't believe this.
> > In this world it looks to me to be very reliable for now.
>
> The place is right; it's the time that might be a problem.
> Tomorrow I'll launch my new evil bot.
Ah... this is the kind of talk I like, evil talk. If you have
plans... all this is different...
--
dorayme
> If you feel the practical benefits of e-mail address obfuscation
> outweigh the practical downsides - e.g., the impression of
> unprofessionalism, the mangling of the user-interface by
> percent-encoding - and the theoretical downsides, who am I to stand in
> your way.
Well, this is what I would like to know more about. When I do
bad, I prefer not to be low class about it. I want to know what
evil I commit. What mangling are you talking about? I have
attempted a few times to raise the question about how bots work,
on the source code or the expressed page text (visible and
audible etc as normal text to humans). It seems it is mainly the
former. If so, who besides alt.html types will it seem so
unprofessional to?
--
dorayme
> > Working *now* is no guarantee what so ever for being effective in the near
> > future.
>
> The same could be said for all spam blocking methods (my Bayesian
> filters used to work a lot better, for example). So should we should
> abandon all attempts to block spam because none of them are guaranteed?
> Hmmmm, OK. But you go first. ;)
Actually, Spider, I was just saying to a friend this morning, my
Mac Mail.app filters based on this type of mathematics is failing
me lately... bit alarming actually, i am thinking is it the junk
algorithms not learning any more (they used to be good) or are
the spammers just on to these algorithms bigtime now. Never mind,
clients, websites... I may need to actually buy a better spam set
up for me... I suppose this is OT! But I was interested to hear
your remark about Bayesian filters. Doubtless, there are all
kinds of these...
--
dorayme
>>> Would you... please not ignore the fact that it works to
>>> actually stop spam. If you don't think it actually does, say so
>>> loud and clear.
>> Working *now* is no guarantee what so ever for being effective in
>> the near future.
> This is simply not true. If you had left it at the "no guarantee:
> and not added the "whatsoever" you would have had a fighting
> chance old chap.
Ah well, let's not argue. Like I mentioned in a later posting, I fully admit
the technique of Nikita is a good alternative for the JS-solution.
>>> The issue of it "being easy" to overcome is quite
>>> irrelevent in a world where almost no bots do this. This is the
>>> world you earthlings and I live for the moment. What world are
>>> you talking about? One in which Spider's stats are not true?
>> Stats are never true :-)
> Come now Nico, you can't believe this.
But I really do :-) It's not the statistics, that's pure mathematics; it's
the uncertainty in the methods the data are collected.
>>> In this world it looks to me to be very reliable for now.
>> The place is right; it's the time that might be a problem.
>> Tomorrow I'll launch my new evil bot.
> Ah... this is the kind of talk I like, evil talk. If you have
> plans... all this is different...
Don't worry, no plans :-)
Well, of course I said it was false that an e-mail link is more
convenient, not false that it can be used! But it can't be used
directly. Clicking a link won't open the browser to a Compose Mail page
on the user's e-mail service. Instead, it may cause an error. Or, if an
e-mail client *is* installed, but configured for someone else, it could
open a new message window, letting the user cluelessly send an e-mail
under someone else's account.
> >>> The issue of it "being easy" to overcome is quite
> >>> irrelevent in a world where almost no bots do this. This is the
> >>> world you earthlings and I live for the moment. What world are
> >>> you talking about? One in which Spider's stats are not true?
>
> >> Stats are never true :-)
>
> > Come now Nico, you can't believe this.
>
> But I really do :-) It's not the statistics, that's pure mathematics; it's
> the uncertainty in the methods the data are collected.
Ah I see what you are saying I think. Yes, I would like to see
more data on these experiments. Spider has mentioned he will one
day do this. Perhaps time for a little experiment or two
ourselves to confirm... :)
--
dorayme
>
> ?? No. E-mail address obfuscation tries to deal with the problem at
> the user's end. Its aim is to remove all trouble for the e-mail
> address owner, no matter the cost to anyone else.
>
What exactly is the "cost to anyone else" of my choosing to use hash
entities to hide my email addy from bots while leaving it perfectly
clear to humans? Just curious.
...
>
> It is not your job to prevent spam being generated, unless you are
> actively fighting against it, in which case there are better, more
> effective approaches than e-mail address obfuscation.
>
Whose job do you suppose it is then?
...
>
> But I agree that the seen email address should be normal
> looking. There is a way around this, to not put any at all, just
> a link, the words being, "email us" or whatever.
>
The problem with that, as I see it, is that people who might want to
email you but are not able to at that time (because they are in an
Internet Cafe, Library, or someplace else they can't send email from)
can't just write the addy on the back of an envelope and take it with
them. Anyway, there's someething that inspires trust about an address
you can actually see - and if the 'bots have trouble, so much the
better.
> I would be interested to hear from anyone who has an idea of the
> chances of email harvesting happening from the expressed text on
> the page as disti