Effective methods to protect email addresses from spammers
http://nadeausoftware.com/articles/2007/05/effective_methods_protect_email_addresses_spammers
Out of all of these methods, one method stands out as being effective,
usable, accessible, functional in all web browsers, JavaScript-free,
plugin-free, and easy to author and maintain:
Split the email address onto two lines
like this:
User: person
Domain:
example.com
However, keep in mind that while an automated harvester may be stopped
by this and some of the other methods, a human harvester will get them
all. No matter how well you protect your address, you’ll probably
still get some spam. Use a spam filter for your email program. A 2005
U.S. Federal Trade Commission study, Email Address Harvesting and the
Effectiveness of Anti-Spam Filters (PDF), found that 95% of spam could
be stopped by a spam filter.
- - - - - - - - -- - - - - - -
User: dean
Domain:
example.com
or?
dean
example.com
or?
dean @
example.com
- - - - - - - - - - - - - -
I use the first format above. Your second format should work fine, but
I'd avoid the third one. The "@" notifies harvesters that an email
address is present, and it isn't hard then to extract the previous and
following words, ignoring white-space and HTML tags.
The general idea is to create an address presentation that doesn't
match the expectations of a harvester's parser, and yet a human can
understand it clearly. Here are a few more ways you might do this:
Example.com from Person
Person at the site
example.com
Person using an account at
example.com
Person receives email at
example.com
Email me at Person at the site
example.com
"Person" emailed at "
example.com"
Account: Person and Site:
example.com
Account: Person
Site:
example.com
- - - - - - - - - - - - - - - - - - - --
I have just released Liame, an email obfuscator for
asp.net and other
technologies which uses some of the techniques described in this post.
Liame
http://liameobfuscator.blogspot.com/
- - - - - - - - - - - - - - - - - - - --
Your scheme to generate a "mailto" link via JavaScript and an
unpredictable encoding scheme should work well.
- - - - - - - - - - -- - - - - - - - - --
The only workable solution I know if is to use a spam filter to
process all content submitted by forms. I use Akismet, and it works
very well.
Akismet,
http://akismet.com/
- - - - -- - - - - - - - -- - - - - - -- -
Additionally, in this world with massive email spam, users always have
an email spam filter. One of the key ways a spam filter detects spam
is to see if the sender is in the user's address book or it's someone
the user has emailed recently. So, if a user contacts you by email,
your email address will be in their recent-addresses list. When you
respond by email, their filter will let your message through. However,
if a user is forced to contact you by a contact form, your email
address won't be in their recent-addresses list. When you respond to
them by email, your message may get flagged as spam and deleted.
Ultimately, spam is the price we pay for trying to maintain open
communications lines with our users.
- - - - - - - -- - - - - - -
Dave_Nadeau on February 4, 2009
So, right now, JavaScript email address insertion is a fairly safe
approach. But be sure your JavaScript doesn't have the address in a
simple text string or the harvester will see it as it scans the
JavaScript code.
= = = = = == = == = = = = = = = = = = = = = = = =
Stop spammer email harvesters by obfuscating email addresses
http://nadeausoftware.com/articles/2007/05/stop_spammer_email_harvesters_obfuscating_email_addresses
Conclusions
Obfuscating an email address by encoding it with ASCII character codes
is a popular idea, but it doesn’t work well. A quarter of the tested
harvesters found the obfuscated addresses. The most successful spam
robot in this test (number 13) was released back in 2003, so this
protection method has been bypassed for quite awhile. One harvester
download site even offered a free email address obfuscator — which, of
course, their harvester could bypass. Obfuscating an email address
with ASCII character codes is not an effective way to stop email
harvesters.
The backwards email addresses were usually ignored by harvesters
because they don’t look valid. But five of the harvesters accepted
them anyway. Once harvested, spammers run their addresses past email
address “verifiers” that probe email servers to confirm that the
addresses are good. If backwards addresses become popular (they aren”t
yet), it is easy for a programmer to enhance a verifier to check each
address both forwards and backwards. Backwards email addresses are
only effective until the method becomes popular.
Backwards email addresses have poor usability and accessibility. While
the protected address looks normal in a web browser, if you copy and
paste it into an email program it comes out backwards. Also, screen
readers used by the visually impaired are confused by the backwards
address and speak it backwards.
Recommendation: don’t depend upon obfuscation to protect your email
address. Spammers have figured out this trick. Use one of the better
protection methods discussed in the other articles in this series.
- - - - - -- - -
To effectively protect an email address we want to deliver a readable
address to a human, and an unreadable one to a spambot. Your first
point is right: Spambots don't read CSS. This enables us to use CSS
tricks to unobfuscate an address in a browser while it remains
obfuscated to the spambot. This works.
Your second point is right: Spambots can read numeric character codes,
so this mechanism doesn't work.
Your third point is right: Spambots don't reverse text in a bdo tag.
This enables us to use it to un-reverse a reversed address in a
browser while it remains reversed to the spambot. This works.
So, your conclusion goes too far: two of the methods you highlight do
work, and one does not. Nevertheless, I wouldn't recommend using the
CSS or bdo tricks since both have usability and accessibility problems
and they may not work in all browsers.
= = = = = == = == = = = = = = = = = = = = = = = =
Stop spammer email harvesters by fragmenting email addresses
http://nadeausoftware.com/articles/2007/05/stop_spammer_email_harvesters_fragmenting_email_addresses
Conclusions
While most of the harvesters did not find the fragmented addresses, a
few harvesters did. Recently released harvesters did better. The more
common these protection methods become, the more likely it is that
newer harvesters will recognize them.
Spelled-out punctuation (e.g. “at”, “(at)”, or “[at]” for “@”) is very
widely used to try to protect addresses in news groups and mailings,
but two of the tested harvesters recognized these addresses. One of
these was released back in 2004, so this protection method has been
bypassed for several years. Despite the FTC’s recommendations a year
later in 2005, protecting an email address by replacing “@” with “at”,
“(at)” or “[at]” is not an effective way to stop email harvesters.
One of the tested harvesters almost found the address with spaces
added between the characters. They may get it right in the next
version of the program, and particularly if this becomes a common
method. Adding spaces between email address characters is probably not
effective. Addresses with embedded spaces also copy and paste badly
into email programs and cannot be read normally by screen readers for
the visually impaired. Addresses with embedded spaces have poor
usability and accessibility.
Most of the tested harvesters found the invalid addresses with
“nospam” added. Once harvested, spammers pass these addresses through
a separate email “verifier” that probes email servers to confirm that
addresses are good. While I did not test email verifiers, it is simple
to write a program to automatically strip off commonly-inserted words,
such as “nospam” or the FTC’s recommended “spamaway”. Inserting
“nospam”, “spamaway”, or any other common phrase into an email address
is probably not effective at stopping spammers.
Email address protection methods that add an HTML tag within an
address presume that harvesters can’t remove them. So far, only one
harvester does. That harvester should have, but did not, recognize the
address with HTML comment tags embedded. This could be an artifact of
the particular test address. However, it is surprising that there
weren’t more harvesters that could strip out HTML tags and comments.
This is a simple thing to program and a feature that I expect more
harvesters will have soon. Every search engine web spider already has
this feature. Embedding an HTML comment or tag into an email address
is not effective.
Embedding an HTML tag with text hidden by CSS will probably defeat
harvesters for awhile, as will distributing characters into HTML table
cells. Even with HTML tag removal, the resulting text is hard to
interpret. Harvesting these addresses will require more sophisticated
HTML and CSS handling than spammers are likely to do. However, both
methods copy and paste badly and cannot be read properly by screen
readers for the visually impaired. Embedding hidden text or splitting
an address into table cells is effective, but the results have poor
usability and accessibility.
While effective, the CSS “font” and ASCII art methods have poor
usability and accessibility. These methods are pretty clever, but
they’re cumbersome and the protected email address they draw can’t be
copy and pasted or read by a screen reader.
Splitting an email address onto multiple lines is a good solution: it
is effective, usable, and accessible. Because it looks like any other
multi-line web page text, addresses shown this way are unlikely to be
recognized by spam robots. The protected addresses can be read
naturally by screen readers and it isn’t difficult for visitors to
copy and paste the address (in two steps) into an email program.
Recommendation: splitting an email address onto multiple lines is easy
and it works well. There are other methods that also work. They are
discussed in the other articles in this series.
= = = = = == = == = = = = = = = = = = = = = = = =
Stop spammer email harvesters by inserting addresses with JavaScript
or CSS
http://nadeausoftware.com/articles/2007/05/stop_spammer_email_harvesters_inserting_addresses_javascript_or_css
Conclusions
Current spambots do not execute JavaScript scripts. Doing so is
possible, but it would slow down their harvesting. With so many
unprotected email addresses easily available with simple harvesting,
there isn’t much need yet for spammers to bother with JavaScript
support. Until there is, protecting an email address by using
JavaScript text insertion is an effective way to stop email harvesters
and reduce spam.
There is one possible exception. The method that used unicode to
obfuscate an email address in a JavaScript variable assumes that
harvesters cannot decode unicode. While none do yet, several do decode
similar character codes used in HTML and URLs. Adding unicode decoding
is an easy and likely upgrade. JavaScript obfuscation using unicode is
probably not effective for long.
During testing I was very surprised when one harvester found the AJAX
protected email address. This didn’t seem possible. On further
investigation I found that I had unintentionally turned on my test
Apache web server’s directory listing feature (on real web sites I
always disable this big security hole). The harvester used the
directory listing to find the site’s list of files and then scanned
each one, including the AJAX PHP script which contained the protected
address. With directory listings disabled, the harvester no longer
found the AJAX address. Tricky JavaScript won’t protect you if the web
server is misconfigured.
The JavaScript methods all require that the visitor’s web browser have
JavaScript enabled. W3schools.com maintains monthly statistics on
browser usage and reports that in 2007 about 94% of visitors had
JavaScript enabled. For the remaining 6%, these methods will show a
blank email address. However, your visitor statistics may be
different. At one time, large corporate or government sites mandated
disabling JavaScript to reduce the number of viruses sneaking past
Internet Explorer’s many bugs. If your site is aimed at these
visitors, you may have fewer visitors with JavaScript enabled. If your
visitors include those that do not enable JavaScript, these methods
have poor usability.
JavaScript methods have good accessibility. Screen readers for the
visually impaired work along side web browsers, reading web page text
even if it has been added by JavaScript scripts.
CSS text insertion is an effective method, when browsers support it.
While most do, Microsoft’s Internet Explorer 7 still does not. Until
Microsoft upgrades their browser, CSS text insertion is not widely
supported. Even then, the text inserted by CSS cannot be selected for
a copy and paste into the visitor’s email program, and it cannot be
read by screen readers for the visually impaired. CSS text insertion
has poor usability and accessibility.
Recommendation: JavaScript text insertion works pretty well to protect
an email address from spambots, but it is awkward to author and
maintain. The other articles in this series discuss more methods to
protect email addresses, including several effective methods that
don’t require JavaScript.
= = = = = == = == = = = = = = = = = = = = = = = =
Stop spammer email harvesters by drawing addresses with images or
Flash
http://nadeausoftware.com/articles/2007/05/stop_spammer_email_harvesters_drawing_addresses_images_or_flash
Conclusions
All of these methods successfully protected email addresses from all
of the tested harvesters. Protecting an email address with an image
and Flash animation is an effective way to stop spammer harvesters and
reduce spam.
However, all of these methods have poor usability and accessibility.
Email addresses drawn with images and Flash cannot be copy and pasted
directly into a visitor’s email program. Flash addresses also require
the Flash plugin, which some people block in an effort to stop banner
ads. And image and Flash addresses cannot be read by the screen
readers used by the visually impaired.
All of these methods have visual artifacts. The protected email
address must be drawn using a font chosen when the image or Flash file
is created. The address will look out of place if that font doesn’t
match the font of the web page. But making a perfect match may not be
possible. Font shapes and sizes differ somewhat between browsers and
between Windows, Mac OS X, and Linux. Also, visitors with aging
eyesight may increase their browser’s font size so that it no longer
matches the font used in your image. These differences will leave your
email address looking wrong.
All of these methods slow down web browsing. They replace a short
piece of text with a much larger image or Flash file that takes longer
to download to the visitor’s browser.
Recommendation: while these methods work to protect an email address
from spambots, the impact on page load time and the reduced usability
and accessibility is undesirable. There are simpler and equally-
effective ways to protect an address. Several good methods are
discussed in the other articles in this series.
= = = = = == = == = = = = = = = = = = = = = = = =
Stop spammer email harvesters by hiding web pages from the harvesters
http://nadeausoftware.com/articles/2007/05/stop_spammer_email_harvesters_hiding_web_pages_harvesters
Conclusions
Web spider directions in “robots.txt”, <meta> tags, and links are not
an effective way to stop email harvesters and reduce spam. Spammers
just ignore these conventions and harvest the pages anyway.
Redirect methods to stop spammers are not effective. Redirects are
such a common feature of the web that most email harvesters and web
spiders handle them. Pages hidden behind a redirect are not safe.
Hiding pages behind links in forms and frames is not effective. These
are standard ways to link to additional web pages and spammer
harvesters and spiders follow them.
JavaScript links are effective at hiding pages from web spiders, but
visitors must have JavaScript enabled in their browser. Today, most
people do. Until harvesters support JavaScript too, email harvesters
will not be able to follow JavaScript links.
Flash links are effective at hiding pages from spammer email
harvesters, but visitors must have the Flash plugin installed. Most
visitors do, but some visitors block Flash animations as a way of
reducing the number of web page ads that blink at them. These visitors
won’t be able to follow the link to your hidden web page. Also, the
screen readers used by the visually impaired cannot read the text in a
Flash link. Flash links have poor usability and accessibility.
Recommendation: Of these methods, only JavaScript links were effective
and had good usability and accessibility. This trick is widely used to
hide web pages from web spiders, including those for search engines.
If you need to protect an entire page of email addresses, this is an
effective way to do so. But if you only need to protect a single email
address, this is pretty cumbersome. The other articles in this series
look at effective ways to protect individual addresses.
A weak spot for all of these methods is that you must protect all
links to a hidden page. This may be hard to do when links are
automatically created for site maps, RSS feeds, and article lists. If
there is an unprotected link to a hidden page posted anywhere on your
site, or at any other web site, an email harvester can get through.
= = = = = == = == = = = = = = = = = = = = = = = =
Stop spammer email harvesters by blocking spammer access to the site
http://nadeausoftware.com/articles/2007/05/stop_spammer_email_harvesters_blocking_spammer_access_site
Conclusions
Blocking access based upon an IP address blacklist always works, but
it may not be accurate. Most home users have dynamic IP addresses that
can change from day to day or month to month. A spammer can force
their own IP address to change by resetting their connection. The IP
address the spammer was using will get reassigned to some other
innocent user. If you block the IP address because a spammer was using
it, now you’re blocking a new innocent user. The spammer has moved on
to a new IP address that you probably aren’t blocking yet. Chasing
spammer IP addresses is a never-ending game where the spammer is
always in front. IP address blocking is not an effective way to stop
the email harvesters used by spammers.
User-agent blocking isn’t effective either. It presumes that email
harvesters honestly report that they are harvesters. Well, if spammers
were honest, they wouldn’t be breaking the law by running email
harvesters in the first place. So, of course they lie. Some email
harvesters cover their tracks by randomly switching among user agent
texts from page to page. Some will randomize the time between
successive page requests as they crawl your site. Most will also limit
the number of page requests they’ll make to the same site in one
session. All of this makes an email harvester look like a legitimate
site visitor, making it hard to block them without also blocking real
visitors.
Site logins may or may not be effective at blocking spammers depending
upon the site’s policies for granting accounts, monitoring their use,
and keeping passwords secure. If a spammer can get an account, they
can easily give the name and password to their spambot and it’ll
automatically log in and harvest your site. For sites with a small
number of accounts, logins can be effective to block unwanted access.
But for larger sites, and particularly for community sites with an
open policy on getting accounts, it’s hard to keep track of all the
logins and react quickly if one has been stolen by a spammer. Logins
are not a practical way to stop spammers.
Logins also require effort on your part to create and manage accounts,
and they require effort for the user to remember their account name
and password. Unless your site has valuable content, users will be
annoyed and the account management hassle is probably not worth it.
Recommendation: blocking web site access based upon IP addresses, user
agent names, or login accounts is not a very effective way to stop
spammers and their email harvesters. The other articles in this series
discuss better methods to protect individual email addresses on a
page, or hide specific web pages, without trying to block access to
the entire site.
= = = = = == = == = = = = = = = = = = = = = = = =
Stop spammer email harvesters by using a contact form
http://nadeausoftware.com/articles/2007/05/stop_spammer_email_harvesters_using_contact_form
Conclusions
Contact forms are widely praised as the solution for protecting email
addresses from email harvesters. And they do stop harvesters, but they
don’t stop spammers. Spammers can use automated software to enter a
contact form message and press the “Send” button. When they do this
for a blog or forum comment form, the spammer’s message shows up in
the blog or forum as blog spam. But with a contact form, the
spammer’s message gets emailed. Worse, it gets emailed by your own web
server. Contact forms don’t stop spam, they just shift the way in
which it is delivered.
There is software that you can use to scan a contact form’s message
and block it from being sent if it looks like spam. See the Akismet
service or Bad Behavior, for instance. Many content management systems
have modules that use services like these to detect and stop spam
messages. I use the Akismet module for Drupal. But no spam filter is
perfect. Some spam will still get through.
Adding a CAPTCHA challenge is widely used to stop spammers from
posting to web forms. There are many variations on the CAPTCHA method
and some may be more effective than others. An image CAPTCHA presumes
that an email harvester cannot use optical character recognition (OCR)
software to read the image’s characters. The more jumbled the
characters, the more likely that this is true. But even jumbled
character images can be read with sufficiently clever software and a
fast computer (see Breaking a Visual CAPTCHA and PWNtcha - CAPTCHA
decoder). A CAPTCHA is probably not effective for protecting a
valuable resource, such as a bank account. But for a contact form, it
is unlikely that spammers will bother doing OCR just to spam you with
one message in a contact form. CAPTCHA methods are a good way to
protect contact forms from spammers.
However, contact forms have poor usability. They force the visitor to
use a web form to send you email instead of using their familiar email
program. Form email doesn’t use their standard email editor and it
doesn’t get added to their sent-mail folder. Some visitors will be
offended that the contact form asks for the visitor’s email address
but won’t show your address. Visitors also may be wary of providing
their address to a web site, worrying that they’ll get spam.
CAPTCHA challenges have poor usability as well. The visitor’s intent
is to send email to you, not get quizzed. Some visitors will be
offended or annoyed. CAPTCHA math challenges also may block kids and
adults that don’t have the needed math skills.
CAPTCHA image challenges have poor accessibility too. For the visually
impaired, a CAPTCHA image is unreadable. If they can’t enter its
jumbled letters into the contact form, they can’t contact you.
Contact forms are complex to set up. Even when a content management
system provides built-in modules to generate and validate the form,
the web server still has to be configured to send email. For companies
and hosting services with IT departments, this is no problem. But for
home web sites, this isn’t easy and it may not be possible. The ISPs
that serve home users over cable modems or DSL may have usage policies
that prevent home email servers, or they may block the network ports
used to send email to stop spammers using home computers as email
zombies. Also, many email servers use spammer blacklists to block
email sent to them from suspicious computers, and many of those
blacklists automatically include all home computers. So, even if you
set up a home email server and can send email past your ISP’s port
blocking, the destination email server may reject the message because
it thinks that you are a spammer.
Recommendation: a contact form is an awkward solution that is a bit
too paranoid. While it does stop email harvesters, it can also block
or offend legitimate visitors. A better balance is needed that
maintains reasonable, if not perfect, security for you while also
making it easy for your visitors to contact you. The other articles in
this series present several effective ways of protecting a published
email address from most email harvesters.
= = = = = == = == = = = = = = = = = = = = = = = =