> Discussion of this article is currently happening on Slashdot, > for the interested.
And if you want to work with an existing package that has been mature for several years now, you might look at the URL below for "Ifile." I helped tune it to become pretty fast.
And I have to disagree somewhat with Graham's article; Naive Bayesian filtering _doesn't_ provide _quite_ as good results as he implies. Having both "sex" and "sexy" in a message does _not_ guarantee at P > 0.99 that messages will get tossed into the "spam" category.
My statistics for those words in my corpus are thus:
The "424:28" indicates that the word "sexy" occurred 28 times in folder #424, which happens to be the "Spam/Phonesex" folder. #426 is Spam/Snakeoil, #449 is X/Advocacy, with an instance of a quote about people being "mesemerized by sexy glitz which distracts them from the work at hand." #456 pointed to a .signature with the word "sexy."
Frankly, the word "sexy" is a very _useful_ one. (And looking at the stats here has caused me to modify a couple email messages in my archives, which will strengthen the result :-).)
Unfortunately, it's not only found in the "Phonesex" folder. Instances are found here and there everywhere. And there are other words that are very common both in "evil spam" and in everyday conversation. Integrating the whole set of statistics together requires adding up statistics for _all_ the words found in a message, not just the words "sex" and "sexy."
My finding is that it is _nowhere_ near sufficient to have two populations, "spam" versus "not spam."
If you muddle together the Nigerian Pyramid schemes with the "Penis enhancement" ads along with the offers of new credit cards as well as the latest sites where you can talk to "hot, horny girls LIVE!", the statistics don't work out nearly so well.
It's hard to tell, on the face of it, why Nigerian scams _should_ be considered textually similar to phone sex ads, and in practice, the result of throwing them all together
I have my spam split into categories so that filtering is _even more discriminatory_:
There are a few things left to improve about Ifile, and I'd like to redo it in some language fundamentally less painful to work with than C The project I periodically consider is to redo the filtering software in Lisp. Unfortunately, I wind up running into _tremendous_ bottlenecks each time I do so. Some combination of my skills and the tools at hand prove not quite adequate. Maybe next time... -- (concatenate 'string "chris" "@cbbrowne.com") http://cbbrowne.com/info/mail.html#ifile Out of my mind. Back in five minutes.
* n...@vanderbilt.edu (sv0f) | On a statistical approach to filtering spam, with Lisp code, here:
Spam has to be dealt with at the transport level. The ability of strangers to send you mail must be curtailed. Several large sites offer a system to reject all mail from unknown correspondents, temporarily or permanently, and wait for the reader of the log to accept incoming mail from addresses that look familiar. Another option is to accept delivery but return transport- like error messages if the user does not want the message. Yet another option is to see if the smtp client is set up to accept mail for the domain that it tries to deliver mail from. Yet another option is to temporarily reject all mail from unknown sources and utilize the fact that spammers have no resources to queue messages for later delivery. And then you can always implement a scheme that returns a temporary rejection, but sends a mail to the originator independently asking for confirmation that he is human and by accepting the conditions that unsolicited commercial e-mail carries a fee that /will/ be collected. Failure to accept the conditions will cause the temporary rejection never to be lifted, thus using up queue space in the offending server, which any sysadmin will notice and take care of even if they do not bother to fix their system configuration to avoid relaying spam. Should the conditions be accepted, the message is allowed through.
If you allow the message to be delivered and waste CPU or brain time, the spammers have won a small victory. That is just wrong. Spammers must die.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
Erik Naggum wrote: > If you allow the message to be delivered and waste CPU > or brain time, the > spammers have won a small victory. That is just wrong. > Spammers must die.
The countermeasure you mention in you message should be taken by the mail service provider. Otherwise I should have to implement a mail client.
In my case the following happened: Immediately after I started posting to newsgroups, I started getting mails in which I was offered help with my debts or I was given advice as to how to make certain parts of my body larger.
I did the following:
(1) I stopped appending a valid email address to my mails (2) I set up several mail accounts. All but one contain my initials in some way and there I sometimes still get spam. But one account is well hidden and only my friends know it. I never got spam there.
I think that first the users should agree upon spam being evil. (There is no such agreement yet.) Then there should be a law against spam. And then police action could be taken.
-- Janos Blazi
-----------== Posted via Newsfeed.Com - Uncensored Usenet News ==---------- http://www.newsfeed.com The #1 Newsgroup Service in the World! -----= Over 100,000 Newsgroups - Unlimited Fast Downloads - 19 Servers =-----
Most of the spam I receive seems to be images, presumably to bypass text-based filters. I suppose you would have to run character recognition first on an image before any text filter, Bayesian or otherwise, could be applied?
> How effective is it in filtering out requsts from African govenments > to launder money?
Apparently very effectic -- Graham discusses that in specific.
But the key is that it is TUNED to the particular user by running a pre-processor through both "good mail" and "spam mail" databases.
From the FAQ (someone in this thread asked about graphics):
<quote from faq> What if spammers sent their messages as images?
Such an email would include a lot of damning content, actually. The headers, to start with, would be as bad as ever. And remember that we scan all the html as well as the text. Within the message body there would probably be a link as well as the image, both containing urls, which would probably score high. "Href" and "img" themselves both have spam probabilities approaching pornographic words.
> And if you want to work with an existing package that has been mature > for several years now, you might look at the URL below for "Ifile." I > helped tune it to become pretty fast.
IFile's documentation and download page is included at the end of Graham's article.
> And I have to disagree somewhat with Graham's article; Naive Bayesian > filtering _doesn't_ provide _quite_ as good results as he implies. > Having both "sex" and "sexy" in a message does _not_ guarantee at P > > 0.99 that messages will get tossed into the "spam" category.
I am not certain of your 'naive' filtering usage with the example of only "included" words. IFile's doc page describes it's algorythm as "naive bayesian filtering" as well.
Graham is using the words included in "good mail" to counter this, as IFile seems to do.
> And I have to disagree somewhat with Graham's article; Naive Bayesian > filtering _doesn't_ provide _quite_ as good results as he implies. > Having both "sex" and "sexy" in a message does _not_ guarantee at P > > 0.99 that messages will get tossed into the "spam" category.
> My statistics for those words in my corpus are thus:
> The "424:28" indicates that the word "sexy" occurred 28 times in > folder #424, which happens to be the "Spam/Phonesex" folder. #426 is > Spam/Snakeoil, #449 is X/Advocacy, with an instance of a quote about > people being "mesemerized by sexy glitz which distracts them from the > work at hand." #456 pointed to a .signature with the word "sexy."
> Frankly, the word "sexy" is a very _useful_ one. (And looking at the > stats here has caused me to modify a couple email messages in my > archives, which will strengthen the result :-).)
> Unfortunately, it's not only found in the "Phonesex" folder. > Instances are found here and there everywhere. And there are other > words that are very common both in "evil spam" and in everyday > conversation. Integrating the whole set of statistics together > requires adding up statistics for _all_ the words found in a message, > not just the words "sex" and "sexy."
> My finding is that it is _nowhere_ near sufficient to have two > populations, "spam" versus "not spam."
> If you muddle together the Nigerian Pyramid schemes with the "Penis > enhancement" ads along with the offers of new credit cards as well as > the latest sites where you can talk to "hot, horny girls LIVE!", the > statistics don't work out nearly so well.
> It's hard to tell, on the face of it, why Nigerian scams _should_ be > considered textually similar to phone sex ads, and in practice, the > result of throwing them all together
> I have my spam split into categories so that filtering is _even more > discriminatory_:
> There are a few things left to improve about Ifile, and I'd like to > redo it in some language fundamentally less painful to work with than > C The project I periodically consider is to redo the filtering > software in Lisp. Unfortunately, I wind up running into _tremendous_ > bottlenecks each time I do so. Some combination of my skills and the > tools at hand prove not quite adequate. Maybe next time... > -- > (concatenate 'string "chris" "@cbbrowne.com") > http://cbbrowne.com/info/mail.html#ifile > Out of my mind. Back in five minutes.
There are two lispy big wigs, namely Paul Graham and Erik Naggum, who thinks their hotshot mouthing on spamming is something of value.
Their outpouring, is not unlike that of damming of drivel flood.
In the treatment of flood, there is the brute force of building of a dam. This is the no-nonsense solution of brutes the likes of technology geeks. Among them are the brute elite the likes of Graham'n'Naggum, who speak on fine engineering far ahead of fellow brutes.
The phenomenon of spam is a human-social phenomenon. Spammers spam because it is effective. Consumers'S mouths says no but their actions says yes, because for the vast majority they are unthinking and happy-go-lucky brutes. These brutes decline spam, but when they happen upon a spam that pleases them, such as a great porno site advertisement or something else that caught their personal interest, they will follow the advertisement. For spammers, spam is effective if there is one response out of one thousand. Similarly, brutes will respond to one out of one thousand hateful spams. The prospect of spamming is thus kept alive by the populous brutes, everywhere, despite tumultuous noises they makes.
As a human community or society, the treatment of spam is up to us, not fucking technology. (fuck Erik Naggum. Fuck you. You got it?) The gist is "what do we want?". (Note: not "what technology geeks want" or "what fucking unix morons want".) If we do not want spam, there are two ways to get rid of it: Thru law, and thru inaction. The former is a well-known proposal in the process. The latter, is possible only if human are not lazy unthinking beer-drinking brutes, which is never going to happen.
-- (gratuitous poem)
i will be building a dam, when there is a pussy flood. for i am a collector of nature, and a lover of its treasure. i will use my savings, to quench those thirsty, and lubricate those angry, for i have suffered and suffered; the pain and anguish. -- Xah Lee
I dedicate this poem to Erik Naggum.
PS i request that anyone who read so far and find it worthwhile to send me an email with the line "Xah, you are beautiful.". Thanks in advance.
> * n...@vanderbilt.edu (sv0f) > | On a statistical approach to filtering spam, with Lisp code, here:
> Spam has to be dealt with at the transport level. The ability of strangers > to send you mail must be curtailed. Several large sites offer a system to > reject all mail from unknown correspondents, temporarily or permanently, and > wait for the reader of the log to accept incoming mail from addresses that > look familiar. Another option is to accept delivery but return transport- > like error messages if the user does not want the message. Yet another > option is to see if the smtp client is set up to accept mail for the domain > that it tries to deliver mail from. Yet another option is to temporarily > reject all mail from unknown sources and utilize the fact that spammers have > no resources to queue messages for later delivery. And then you can always > implement a scheme that returns a temporary rejection, but sends a mail to > the originator independently asking for confirmation that he is human and by > accepting the conditions that unsolicited commercial e-mail carries a fee > that /will/ be collected. Failure to accept the conditions will cause the > temporary rejection never to be lifted, thus using up queue space in the > offending server, which any sysadmin will notice and take care of even if > they do not bother to fix their system configuration to avoid relaying spam. > Should the conditions be accepted, the message is allowed through.
> If you allow the message to be delivered and waste CPU or brain time, the > spammers have won a small victory. That is just wrong. Spammers must die.
> Most of the spam I receive seems to be images, presumably to bypass > text-based filters. I suppose you would have to run character > recognition first on an image before any text filter, Bayesian or > otherwise, could be applied?
No, it wouldn't be necessary.
If you have a population of messages that consist just of images, that's going to bias the vocabulary statistics since there will be lots of words like "multipart" and "alternative" and "jpeg", and very few of the "legitimate" words that people use when they send you real mail.
Remember, if this is being used well, you're not merely classifying between "spam" and "not spam;" you're classifying into a multiplicity of _legitimate_ categories, such as:
-> Mail from family members -> Mail from this friend -> Mail from that friend -> Mail from the other friend -> Email from "technical associates," by person -> Email from mailing lists, arranged _by mailing list_ -> And so forth, for legitimate categories...
combined, preferably, with "spam" that gets classified so that you can get finer discrimination
The spam _isn't_ likely to have similar vocabulary to the email you get from legitimate sources.
If something with totally new characteristics comes along, it may get misfiled, at which point you move it to a more appropriate folder (perhaps even a new folder), and it becomes part of the new corpus, directing future similar spam to the right place. -- (reverse (concatenate 'string "ac.notelrac.teneerf@" "454aa")) http://www.ntlug.org/~cbbrowne/ifilter.html Rules of the Evil Overlord #60. "My five-year-old child advisor will also be asked to decipher any code I am thinking of using. If he breaks the code in under 30 seconds, it will not be used. Note: this also applies to passwords." <http://www.eviloverlord.com/>
In the last exciting episode, afs97...@yahoo.com (AFS97209) wrote::
> How effective is it in filtering out requsts from African govenments > to launder money?
Very much so. Those messages head to Spam/Pyramid and nowhere else.
The contents of the messages involve a set of vocabulary that are quite repetitive between messages, so it's an _ideal_ candidate for Naive Baysian networks. -- (reverse (concatenate 'string "moc.enworbbc@" "sirhc")) http://cbbrowne.com/info/spiritual.html "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult." -- C.A.R. Hoare
>> And if you want to work with an existing package that has been mature >> for several years now, you might look at the URL below for "Ifile." I >> helped tune it to become pretty fast.
> IFile's documentation and download page is > included at the end of Graham's article.
>> And I have to disagree somewhat with Graham's article; Naive Bayesian >> filtering _doesn't_ provide _quite_ as good results as he implies. >> Having both "sex" and "sexy" in a message does _not_ guarantee at P > >> 0.99 that messages will get tossed into the "spam" category.
> I am not certain of your 'naive' filtering usage with the example of > only "included" words. IFile's doc page describes it's algorythm as > "naive bayesian filtering" as well.
> Graham is using the words included in "good mail" to counter this, > as IFile seems to do.
The point is that _all_ the words in the message are considered.
For instance, if I throw my message, which conspicuously contains both the word "sex" and the word "sexy," purportedly surefire indications of spam, at ifile, the fact that it mentions Ifile several times means that it heads to the "Apps/Ifile" folder where resides my archives of the last five years of Ifile discussions.
To consider _only_ the words "sex" and "sexy" is a severe oversimplification. -- (reverse (concatenate 'string "gro.gultn@" "enworbbc")) http://www.ntlug.org/~cbbrowne/lisp.html Objects & Markets "Object-oriented programming is about the modular separation of what from how. Market-oriented, or agoric, programming additionally allows the modular separation of why." -- Mark Miller
> The point is that _all_ the words in the message are considered.
> For instance, if I throw my message, which conspicuously contains both > the word "sex" and the word "sexy," purportedly surefire indications > of spam, at ifile, the fact that it mentions Ifile several times means > that it heads to the "Apps/Ifile" folder where resides my archives of > the last five years of Ifile discussions.
> To consider _only_ the words "sex" and "sexy" is a severe > oversimplification.
Well that makes more sense.
What about Graham's method leads one to believe that IFile would not be considered? Several of the examples he gives (using 'Lisp' for himself instead of 'Ifile' as you would) are isomorphic to this issue -- he is including words from the "good mail" as well.
> A long time ago, in a galaxy far, far away, "Herb Martin" <He...@LearnQuick.Com> wrote: > >> And if you want to work with an existing package that has been mature > >> for several years now, you might look at the URL below for "Ifile." I > >> helped tune it to become pretty fast.
> > IFile's documentation and download page is > > included at the end of Graham's article.
> >> And I have to disagree somewhat with Graham's article; Naive Bayesian > >> filtering _doesn't_ provide _quite_ as good results as he implies. > >> Having both "sex" and "sexy" in a message does _not_ guarantee at P > > >> 0.99 that messages will get tossed into the "spam" category.
> > I am not certain of your 'naive' filtering usage with the example of > > only "included" words. IFile's doc page describes it's algorythm as > > "naive bayesian filtering" as well.
> > Graham is using the words included in "good mail" to counter this, > > as IFile seems to do.
> -- > (reverse (concatenate 'string "gro.gultn@" "enworbbc")) > http://www.ntlug.org/~cbbrowne/lisp.html > Objects & Markets > "Object-oriented programming is about the modular separation of what > from how. Market-oriented, or agoric, programming additionally allows > the modular separation of why." > -- Mark Miller
In article <B9838E4A.2CAF%...@xahlee.org>, xah wrote: > The phenomenon of spam is a human-social phenomenon. Spammers spam because > it is effective.
That's only because you can't see spammers for the anti-social twits that they are, who will keep spamming even when it's not effective. Or they will define their acceptable effectiveness to be something ridiculously low, like one positive response from ten million spams. Or even define negative responses as good responses, so that ``don't send me this crap'' earns one a permanent spot in their list.
Spamming is not effective in any sense of the word that an actual marketer would comprehend.
Now, why *don't* you see spammers for the anti-social twits that they are? I have my own idea about that.
Kaz Kylheku <k...@ashi.footprints.net> writes: > Spamming is not effective in any sense of the word that an actual marketer > would comprehend.
well clearly you have been lucky enough not to spend too much time around (professional) marketers, who take great pains in safe-guarding their power to comprehend everything positively. their job is to foist this inability to discern the feedback loop onto others (primarily professional sales people). this is because when business is good, nobody cares, it's only when business is bad that self-examination is painful. it's no surprise that professional sales people also take it to be their job to point the finger back at the marketers.
whoever thought up the sales / marketing (organizational) partitioning was probably a consultant weeping in anticipation of the spoils to be reaped from the turf wars imminent. split the mind and sell aspirin...
(actually, i have no clue what your background w/ these professions are; these ramblings are from my own limited experience as a naive geek co-founding a chip company where the only lisp involved was emacs lisp... for round two, i'd like to work my way up through the tool chain w/ lisp but somehow i got distracted lo these last four years.)
xah wrote: > There are two lispy big wigs, namely Paul Graham and Erik Naggum, who thinks > their hotshot mouthing on spamming is something of value.
> Their outpouring, is not unlike that of damming of drivel flood.
> In the treatment of flood, there is the brute force of building of a dam. > This is the no-nonsense solution of brutes the likes of technology geeks. > Among them are the brute elite the likes of Graham'n'Naggum, who speak on > fine engineering far ahead of fellow brutes.
...
> These brutes decline spam, but when they happen upon a spam that pleases > them, such as a great porno site advertisement or something else that caught > their personal interest, they will follow the advertisement.
i don't like spam. when something is interesting (it happens, technically, tittically) i try to push the delete-button before i read more, sometimes i'm not able.
> For spammers, > spam is effective if there is one response out of one thousand. Similarly, > brutes will respond to one out of one thousand hateful spams. The prospect > of spamming is thus kept alive by the populous brutes, everywhere, despite > tumultuous noises they makes.
> As a human community or society, the treatment of spam is up to us, not > fucking technology.
that is partly correct.
is up to us, assisted by (fucking or not) technology
> (fuck Erik Naggum. Fuck you. You got it?)
"fuck Erik Naggum".
"Fuck *you*" [relates to 'Erik Naggum', relates to 'the reader', relates to 'Paul Graham', relates to technology-lovers?]
"you got it?"
no, please clarify.
> The gist is > "what do we want?".
"we". who belongs to "we".
> (Note: not "what technology geeks want" or "what fucking > unix morons want".)
are they not included in "we"?
> If we do not want spam, there are two ways to get rid of > it: Thru law, and thru inaction.
> The former is a well-known proposal in the > process. The latter, is possible only if human are not lazy unthinking > beer-drinking brutes, which is never going to happen.
Take a human with a common intelligence, and place him in a group of gorillas, he'll be a brilliant individuum (relative).
if he insists in that group on he's brilliance, the gorillas will give him a brilliant fuck.
The "lazy unthinking beer-drinking brutes"-group belong to the problem-domain, basicly it is the most important and unchangeable part.
When ignoring this, you declare youreself as a complete idiot.
but you're maybe simply jailous [someone wants something what anotherone has]. Cause of your inability of "don't think - drink beer - and fuck - be happy"
> PS i request that anyone who read so far and find it worthwhile to send me > an email with the line "Xah, you are beautiful.". Thanks in advance.
xah <x...@xahlee.org> wrote: > The phenomenon of spam is a human-social phenomenon. Spammers spam because > it is effective. Consumers'S mouths says no but their actions says yes, > because for the vast majority they are unthinking and happy-go-lucky brutes.
In fact, the problem with spam is not that large numbers of people respond, but that it is so cheap to send (for the spammer anyway, since the cost is distributed amongst the recipients and those who share their systems and networks) that nearly *any* response is effective for them.
The problem with spam is that it is theft. If spammers actually had to bear the costs of their spam, they would never send it, because the response rates are ridiculously low. Since they do not, and it is cheap and easy to send out hundreds of millions of messages, a response rate of ten in a million is perfectly acceptable to them.
I think that Graham may be right, that if good spam filtering became normal and automatic in nearly every email client (or server), that response rates might eventually drop so low that it would become worthless to spam.
Michael
-- Michael Sullivan Business Card Express of CT Thermographers to the Trade Cheshire, CT mich...@bcect.com
Christopher Browne <cbbro...@acm.org> wrote: > A long time ago, in a galaxy far, far away, "Herb Martin" > <He...@LearnQuick.Com> wrote: > >> And if you want to work with an existing package that has been mature > >> for several years now, you might look at the URL below for "Ifile." I > >> helped tune it to become pretty fast.
> > IFile's documentation and download page is > > included at the end of Graham's article.
> >> And I have to disagree somewhat with Graham's article; Naive Bayesian > >> filtering _doesn't_ provide _quite_ as good results as he implies. > >> Having both "sex" and "sexy" in a message does _not_ guarantee at P > > >> 0.99 that messages will get tossed into the "spam" category.
> > I am not certain of your 'naive' filtering usage with the example of > > only "included" words. IFile's doc page describes it's algorythm as > > "naive bayesian filtering" as well.
> > Graham is using the words included in "good mail" to counter this, > > as IFile seems to do. > The point is that _all_ the words in the message are considered.
Graham's algorithm *does* consider all the words, sort of. It does a hash lookup on every word, and then considers the fifteen words in that mail that are the strongest signals (whether that be a signal of "good" mail, or "bad") and does the bayes calculation on those. It seems to me that it wouldn't be all that computationally intensive to extend the bayes calculation to more words.
I just did a very quick implementation of just the math and it looks like speed is not the problem, but arbitrary precision. With thousands of words, you easily reach past the edge of the IEEE floating point spec for some of your intermediary values, leading to a (/ x 0) situation. With a good arbitrary precision math library, this is not an issue, but it also appears that using the most significant 100-500 words is likely to produce a certain result so often that it ought to be plenty.
I fed my bayes calculation pseudo random numbers and found that it was generating probabilities over 4 sigma one way or another more than 1/2 the time using 100 numbers. At 200 numbers, something like 80% were 5+ sigma, and a 100 run test did not produce a single probability between 5 and 95%.
So I'm guessing that using the most significant 200 numbers is unlikely to produce results any different from doing the bayes calculation on every last word.
The one scenario where I see trouble is a real message which for some legitimate reason includes a forward of a spam example. If there's enough stuff added to the real message, his over-weighting of "good" indicators will probably tip the scale.
But if it's a fairly short forward message, followed by an actual spam (especially with full headers), it would almost certainly be tagged as "spam", even though, this might be somebody trading information trying to track down a spammer. Or perhaps someone with too much time on their hands read a spam and found it funny or otherwise interesting and decided to pass it on to somebody.
I'm not sure how you can filter spam well without risking a false positive in at least this case, but I suspect that this naive Bayesian algorithm won't do the trick, unless there's a fair bit of "good" content.
> For instance, if I throw my message, which conspicuously contains both > the word "sex" and the word "sexy," purportedly surefire indications > of spam, at ifile, the fact that it mentions Ifile several times means > that it heads to the "Apps/Ifile" folder where resides my archives of > the last five years of Ifile discussions. > To consider _only_ the words "sex" and "sexy" is a severe > oversimplification.
Except that he doesn't actually do this.
Michael
-- Michael Sullivan Business Card Express of CT Thermographers to the Trade Cheshire, CT mich...@bcect.com
mich...@bcect.com (Michael Sullivan) writes: > But if it's a fairly short forward message, followed by an actual spam > (especially with full headers), it would almost certainly be tagged as > "spam", even though, this might be somebody trading information trying > to track down a spammer. Or perhaps someone with too much time on their > hands read a spam and found it funny or otherwise interesting and > decided to pass it on to somebody.
> I'm not sure how you can filter spam well without risking a false > positive in at least this case, but I suspect that this naive Bayesian > algorithm won't do the trick, unless there's a fair bit of "good" > content.
You can have the filter disabled for people you know won't send you worthless messages.
-- -> -/ - Rahul Jain - \- <- -> -\ http://linux.rice.edu/~rahul -=- mailto:rj...@techie.com /- <- -> -X "Structure is nothing if it is all you got. Skeletons spook X- <- -> -/ people if [they] try to walk around on their own. I really \- <- -> -\ wonder why XML does not." -- Erik Naggum, comp.lang.lisp /- <- |--|--------|--------------|----|-------------|------|---------|-----|-| (c)1996-2002, All rights reserved. Disclaimer available upon request.
Kaz Kylheku <k...@ashi.footprints.net> writes: > In article <B9838E4A.2CAF%...@xahlee.org>, xah wrote: > > The phenomenon of spam is a human-social phenomenon. Spammers spam because > > it is effective.
> That's only because you can't see spammers for the anti-social twits that they > are, who will keep spamming even when it's not effective. Or they will define > their acceptable effectiveness to be something ridiculously low, like one > positive response from ten million spams. Or even define negative responses as > good responses, so that ``don't send me this crap'' earns one a permanent spot > in their list.
> Spamming is not effective in any sense of the word that an actual marketer > would comprehend.
One bulk e-mailer says that when she started spamming in 1999, she could send out 100,000 e-mails and get 25 responses. Today, she has to send out a million messages to get the same response (a 0.0025 percent hit rate).
It's interesting reading. I don't think spammers will ever stop (like telemarketers), as long as they're getting *any* responses. Short of lawsuits, that is.
* Robert St. Amant | It's interesting reading. I don't think spammers will ever stop (like | telemarketers), as long as they're getting *any* responses. Short of | lawsuits, that is.
I am actually amazed that out of the million people needed to get 25 responses, there has not yet been a single potential psychopathic axe murderer living in the spammer's city. Imagine just /one/ such case.
-- Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder. Act from faith, and failure makes you blame someone and push harder.
> In fact, the problem with spam is not that large numbers of people > respond, but that it is so cheap to send (for the spammer anyway, since > the cost is distributed amongst the recipients and those who share their > systems and networks) that nearly *any* response is effective for them.
Nearly any *valid* response is effective. One part of the reason that spam works is that it is possible to `identify' the 25 people out of the million that act upon the message. When you spam a million email addresses most of the recipients discard or ignore the message. The set of people that respond to the spam is *much* richer in suckers than the original set of people identified by their addresses.
If *every* spam yielded a (possibly bogus) response, then the value of spamming would be severely decreased. Spamming a set of email addresses would yield no information about which recipients are suckers because they *all* seem to be. Putting a URL in the spam would be useless because it would simply cause a million automatic `hits' on the page.
> The problem with spam is that it is theft. If spammers actually had to > bear the costs of their spam, they would never send it, because the > response rates are ridiculously low. Since they do not, and it is cheap > and easy to send out hundreds of millions of messages, a response rate > of ten in a million is perfectly acceptable to them.
But a response rate of a million in a million would *not* be acceptable.