Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
New Paul Graham Article
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 38 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
sv0f  
View profile  
 More options Aug 16 2002, 2:40 pm
Newsgroups: comp.lang.lisp
From: n...@vanderbilt.edu (sv0f)
Date: Fri, 16 Aug 2002 13:30:01 -0500
Local: Fri, Aug 16 2002 2:30 pm
Subject: New Paul Graham Article
On a statistical approach to filtering spam, with Lisp code,
here:

http://www.paulgraham.com/spam.html

Discussion of this article is currently happening on Slashdot,
for the interested.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christopher Browne  
View profile  
 More options Aug 16 2002, 9:25 pm
Newsgroups: comp.lang.lisp
From: Christopher Browne <cbbro...@acm.org>
Date: 17 Aug 2002 01:25:40 GMT
Local: Fri, Aug 16 2002 9:25 pm
Subject: Re: New Paul Graham Article
In an attempt to throw the authorities off his trail, n...@vanderbilt.edu (sv0f) transmitted:

> On a statistical approach to filtering spam, with Lisp code,
> here:

> http://www.paulgraham.com/spam.html

> Discussion of this article is currently happening on Slashdot,
> for the interested.

And if you want to work with an existing package that has been mature
for several years now, you might look at the URL below for "Ifile."  I
helped tune it to become pretty fast.

And I have to disagree somewhat with Graham's article; Naive Bayesian
filtering _doesn't_ provide _quite_ as good results as he implies.
Having both "sex" and "sexy" in a message does _not_ guarantee at P >
0.99 that messages will get tossed into the "spam" category.

My statistics for those words in my corpus are thus:

sexy 4525 424:28 426:2 449:1 456:1

sex 62535 160:16 169:6 171:5 173:2 184:1 190:1 194:2 211:1 215:4 218:1
    221:15 224:3 226:1 234:2 237:11 238:1 239:2 241:1 244:1 247:1 249:11
    251:1 264:2 273:2 278:2 285:7 289:2 295:1 306:2 321:5 322:2 323:4
    324:9 327:14 332:2 334:2 343:15 346:2 347:1 350:5 352:1 354:2 362:4
    366:6 368:10 369:3 370:1 397:20 411:2 413:3 414:6 415:15 416:16 418:3
    421:17 423:1 424:338 425:11 426:23 432:2 433:2 439:3 442:1 459:2 465:3

The "424:28" indicates that the word "sexy" occurred 28 times in
folder #424, which happens to be the "Spam/Phonesex" folder.  #426 is
Spam/Snakeoil, #449 is X/Advocacy, with an instance of a quote about
people being "mesemerized by sexy glitz which distracts them from the
work at hand."  #456 pointed to a .signature with the word "sexy."

Frankly, the word "sexy" is a very _useful_ one.  (And looking at the
stats here has caused me to modify a couple email messages in my
archives, which will strengthen the result :-).)

Unfortunately, it's not only found in the "Phonesex" folder.
Instances are found here and there everywhere.  And there are other
words that are very common both in "evil spam" and in everyday
conversation.  Integrating the whole set of statistics together
requires adding up statistics for _all_ the words found in a message,
not just the words "sex" and "sexy."

My finding is that it is _nowhere_ near sufficient to have two
populations, "spam" versus "not spam."  

If you muddle together the Nigerian Pyramid schemes with the "Penis
enhancement" ads along with the offers of new credit cards as well as
the latest sites where you can talk to "hot, horny girls LIVE!", the
statistics don't work out nearly so well.

It's hard to tell, on the face of it, why Nigerian scams _should_ be
considered textually similar to phone sex ads, and in practice, the
result of throwing them all together

I have my spam split into categories so that filtering is _even more
discriminatory_:

  Credit
  Foreign
  Gambling
  Investigators
  Newsletters
  Phonesex
  Pyramid
  Snakeoil
  Viruses

There are a few things left to improve about Ifile, and I'd like to
redo it in some language fundamentally less painful to work with than
C The project I periodically consider is to redo the filtering
software in Lisp.  Unfortunately, I wind up running into _tremendous_
bottlenecks each time I do so.  Some combination of my skills and the
tools at hand prove not quite adequate.  Maybe next time...
--
(concatenate 'string "chris" "@cbbrowne.com")
http://cbbrowne.com/info/mail.html#ifile
Out of my mind. Back in five minutes.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Aug 16 2002, 11:51 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 17 Aug 2002 03:51:05 +0000
Local: Fri, Aug 16 2002 11:51 pm
Subject: Re: New Paul Graham Article
* n...@vanderbilt.edu (sv0f)
| On a statistical approach to filtering spam, with Lisp code, here:

  Spam has to be dealt with at the transport level.  The ability of strangers
  to send you mail must be curtailed.  Several large sites offer a system to
  reject all mail from unknown correspondents, temporarily or permanently, and
  wait for the reader of the log to accept incoming mail from addresses that
  look familiar.  Another option is to accept delivery but return transport-
  like error messages if the user does not want the message.  Yet another
  option is to see if the smtp client is set up to accept mail for the domain
  that it tries to deliver mail from.  Yet another option is to temporarily
  reject all mail from unknown sources and utilize the fact that spammers have
  no resources to queue messages for later delivery.  And then you can always
  implement a scheme that returns a temporary rejection, but sends a mail to
  the originator independently asking for confirmation that he is human and by
  accepting the conditions that unsolicited commercial e-mail carries a fee
  that /will/ be collected.  Failure to accept the conditions will cause the
  temporary rejection never to be lifted, thus using up queue space in the
  offending server, which any sysadmin will notice and take care of even if
  they do not bother to fix their system configuration to avoid relaying spam.
  Should the conditions be accepted, the message is allowed through.

  If you allow the message to be delivered and waste CPU or brain time, the
  spammers have won a small victory.  That is just wrong.  Spammers must die.

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JB  
View profile  
 More options Aug 17 2002, 3:54 am
Newsgroups: comp.lang.lisp
From: JB <j...@hotmail.com>
Date: Sat, 17 Aug 2002 10:03:03 +0200
Local: Sat, Aug 17 2002 4:03 am
Subject: Re: New Paul Graham Article

Erik Naggum wrote:
>   If you allow the message to be delivered and waste CPU
>   or brain time, the
>   spammers have won a small victory.  That is just wrong.
>   Spammers must die.

The countermeasure you mention in you message should be
taken by the mail service provider. Otherwise I should have
to implement a mail client.

In my case the following happened: Immediately after I
started posting to newsgroups, I started getting mails in
which I was offered help with my debts or I was given
advice as to how to make certain parts of my body larger.

I did the following:

(1) I stopped appending a valid email address to my mails
(2) I set up several mail accounts. All but one contain my
initials in some way and there I sometimes still get spam.
But one account is well hidden and only my friends know it.
I never got spam there.

I think that first the users should agree upon spam being
evil. (There is no such agreement yet.) Then there should
be a law against spam. And then police action could be
taken.

--
Janos Blazi

-----------== Posted via Newsfeed.Com - Uncensored Usenet News ==----------
   http://www.newsfeed.com       The #1 Newsgroup Service in the World!
-----= Over 100,000 Newsgroups - Unlimited Fast Downloads - 19 Servers =-----


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
c hore  
View profile  
 More options Aug 17 2002, 4:58 am
Newsgroups: comp.lang.lisp
From: carh...@yahoo.com (c hore)
Date: 17 Aug 2002 01:58:38 -0700
Local: Sat, Aug 17 2002 4:58 am
Subject: Re: New Paul Graham Article

> On a statistical approach to filtering spam, with Lisp code,
> here:
> http://www.paulgraham.com/spam.html

Most of the spam I receive seems to be images, presumably
to bypass text-based filters.  I suppose you would have to
run character recognition first on an image before any
text filter, Bayesian or otherwise, could be applied?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
AFS97209  
View profile  
 More options Aug 17 2002, 5:20 am
Newsgroups: comp.lang.lisp
From: afs97...@yahoo.com (AFS97209)
Date: 17 Aug 2002 02:20:57 -0700
Local: Sat, Aug 17 2002 5:20 am
Subject: Re: New Paul Graham Article
How effective is it in filtering out requsts from African govenments
to launder money?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Herb Martin  
View profile  
 More options Aug 17 2002, 7:02 am
Newsgroups: comp.lang.lisp
From: "Herb Martin" <He...@LearnQuick.Com>
Date: Sat, 17 Aug 2002 11:01:45 GMT
Local: Sat, Aug 17 2002 7:01 am
Subject: Re: New Paul Graham Article

> How effective is it in filtering out requsts from African govenments
> to launder money?

Apparently very effectic -- Graham discusses that
in specific.

But the key is that it is TUNED to the particular user
by running a pre-processor through both "good mail"
and "spam mail" databases.

The article is worth a quick read.

--
Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

"AFS97209" <afs97...@yahoo.com> wrote in message

news:6dfa3582.0208170120.330064a8@posting.google.com...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Herb Martin  
View profile  
 More options Aug 17 2002, 7:33 am
Newsgroups: comp.lang.lisp
From: "Herb Martin" <He...@LearnQuick.Com>
Date: Sat, 17 Aug 2002 11:33:19 GMT
Local: Sat, Aug 17 2002 7:33 am
Subject: Re: New Paul Graham Article
The article is worth a quick read.

There is also a FAQ listed at the bottom.

> How effective is it in filtering out requsts from African govenments
> to launder money?

Apparently very effectic -- Graham discusses that
in specific.

But the key is that it is TUNED to the particular user
by running a pre-processor through both "good mail"
and "spam mail" databases.

From the FAQ (someone in this thread asked about
graphics):

<quote from faq>
What if spammers sent their messages as images?

Such an email would include a lot of damning content,
actually. The headers, to start with, would be as bad
as ever. And remember that we scan all the html as
well as the text. Within the message body there would
probably be a link as well as the image, both containing
urls, which would probably score high. "Href" and "img"
themselves both have spam probabilities approaching
pornographic words.

<end quote from faq>
Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

> How effective is it in filtering out requsts from African govenments
> to launder money?

Apparently very effective -- Graham discusses that
in specific.

But the key is that it is TUNED to the particular user
by running a pre-processor through both "good mail"
and "spam mail" databases.

The article is worth a quick read.

--
Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

--
Herb Martin, PP-SEL
(...and aerobatic student)
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

"AFS97209" <afs97...@yahoo.com> wrote in message

news:6dfa3582.0208170120.330064a8@posting.google.com...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Herb Martin  
View profile  
 More options Aug 17 2002, 7:41 am
Newsgroups: comp.lang.lisp
From: "Herb Martin" <He...@LearnQuick.Com>
Date: Sat, 17 Aug 2002 11:40:49 GMT
Local: Sat, Aug 17 2002 7:40 am
Subject: Re: New Paul Graham Article

> And if you want to work with an existing package that has been mature
> for several years now, you might look at the URL below for "Ifile."  I
> helped tune it to become pretty fast.

IFile's documentation and download page is
included at the end of Graham's article.

    http://www.ai.mit.edu/~jrennie/ifile/

> And I have to disagree somewhat with Graham's article; Naive Bayesian
> filtering _doesn't_ provide _quite_ as good results as he implies.
> Having both "sex" and "sexy" in a message does _not_ guarantee at P >
> 0.99 that messages will get tossed into the "spam" category.

I am not certain of your 'naive' filtering usage with
the example of only "included" words.  IFile's doc
page describes it's algorythm as "naive bayesian
filtering" as well.

Graham is using the words included in "good mail"
to counter this, as IFile seems to do.

Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
xah  
View profile  
 More options Aug 17 2002, 8:24 am
Newsgroups: comp.lang.lisp
From: xah <x...@xahlee.org>
Date: Sat, 17 Aug 2002 12:23:23 GMT
Local: Sat, Aug 17 2002 8:23 am
Subject: Re: New Paul Graham Article
There are two lispy big wigs, namely Paul Graham and Erik Naggum, who thinks
their hotshot mouthing on spamming is something of value.

Their outpouring, is not unlike that of damming of drivel flood.

In the treatment of flood, there is the brute force of building of a dam.
This is the no-nonsense solution of brutes the likes of technology geeks.
Among them are the brute elite the likes of Graham'n'Naggum, who speak on
fine engineering far ahead of fellow brutes.

The phenomenon of spam is a human-social phenomenon. Spammers spam because
it is effective. Consumers'S mouths says no but their actions says yes,
because for the vast majority they are unthinking and happy-go-lucky brutes.
These brutes decline spam, but when they happen upon a spam that pleases
them, such as a great porno site advertisement or something else that caught
their personal interest, they will follow the advertisement. For spammers,
spam is effective if there is one response out of one thousand. Similarly,
brutes will respond to one out of one thousand hateful spams. The prospect
of spamming is thus kept alive by the populous brutes, everywhere, despite
tumultuous noises they makes.

As a human community or society, the treatment of spam is up to us, not
fucking technology. (fuck Erik Naggum. Fuck you. You got it?) The gist is
"what do we want?". (Note: not "what technology geeks want" or "what fucking
unix morons want".) If we do not want spam, there are two ways to get rid of
it: Thru law, and thru inaction. The former is a well-known proposal in the
process. The latter, is possible only if human are not lazy unthinking
beer-drinking brutes, which is never going to happen.

--
(gratuitous poem)

  i will be building a dam, when there is a pussy flood.
  for i am a collector of nature, and a lover of its treasure.
  i will use my savings, to quench those thirsty, and lubricate those angry,
  for i have suffered and suffered; the pain and anguish.
               -- Xah Lee

I dedicate this poem to Erik Naggum.

PS i request that anyone who read so far and find it worthwhile to send me
an email with the line "Xah, you are beautiful.". Thanks in advance.

 Xah
 x...@xahlee.org
 http://xahlee.org/PageTwo_dir/more.html

in article 3238545065548...@naggum.no, Erik Naggum at e...@naggum.no wrote
on 8/16/02 8:51 PM:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christopher Browne  
View profile  
 More options Aug 17 2002, 9:32 am
Newsgroups: comp.lang.lisp
From: Christopher Browne <cbbro...@acm.org>
Date: 17 Aug 2002 13:32:12 GMT
Local: Sat, Aug 17 2002 9:32 am
Subject: Re: New Paul Graham Article
The world rejoiced as carh...@yahoo.com (c hore) wrote:

>> On a statistical approach to filtering spam, with Lisp code,
>> here:
>> http://www.paulgraham.com/spam.html

> Most of the spam I receive seems to be images, presumably to bypass
> text-based filters.  I suppose you would have to run character
> recognition first on an image before any text filter, Bayesian or
> otherwise, could be applied?

No, it wouldn't be necessary.

If you have a population of messages that consist just of images,
that's going to bias the vocabulary statistics since there will be
lots of words like "multipart" and "alternative" and "jpeg", and very
few of the "legitimate" words that people use when they send you real
mail.

Remember, if this is being used well, you're not merely classifying
between "spam" and "not spam;" you're classifying into a multiplicity
of _legitimate_ categories, such as:

-> Mail from family members
-> Mail from this friend
-> Mail from that friend
-> Mail from the other friend
-> Email from "technical associates," by person
-> Email from mailing lists, arranged _by mailing list_
-> And so forth, for legitimate categories...

combined, preferably, with "spam" that gets classified so that you can
get finer discrimination

-> Pyramid scams
-> Credit card offers
-> Breast/Penis enhancements, Viagra ads, weight loss, stop smoking
   plans, ...
-> Computer Viruses
and such.

The spam _isn't_ likely to have similar vocabulary to the email you
get from legitimate sources.

If something with totally new characteristics comes along, it may get
misfiled, at which point you move it to a more appropriate folder
(perhaps even a new folder), and it becomes part of the new corpus,
directing future similar spam to the right place.
--
(reverse (concatenate 'string "ac.notelrac.teneerf@" "454aa"))
http://www.ntlug.org/~cbbrowne/ifilter.html
Rules of the  Evil Overlord #60. "My five-year-old  child advisor will
also  be asked to  decipher any  code I  am thinking  of using.  If he
breaks the code  in under 30 seconds, it will not  be used. Note: this
also applies to passwords." <http://www.eviloverlord.com/>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christopher Browne  
View profile  
 More options Aug 17 2002, 9:32 am
Newsgroups: comp.lang.lisp
From: Christopher Browne <cbbro...@acm.org>
Date: 17 Aug 2002 13:32:11 GMT
Local: Sat, Aug 17 2002 9:32 am
Subject: Re: New Paul Graham Article
In the last exciting episode, afs97...@yahoo.com (AFS97209) wrote::

> How effective is it in filtering out requsts from African govenments
> to launder money?

Very much so.  Those messages head to Spam/Pyramid and nowhere else.

The contents of the messages involve a set of vocabulary that are
quite repetitive between messages, so it's an _ideal_ candidate for
Naive Baysian networks.
--
(reverse (concatenate 'string "moc.enworbbc@" "sirhc"))
http://cbbrowne.com/info/spiritual.html
"There are two ways of  constructing a software design:  One way is to
make it so  simple that there are  obviously no deficiencies,  and the
other   way is to make it   so complicated that   there are no obvious
deficiencies.  The first method is far more difficult."
-- C.A.R. Hoare


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christopher Browne  
View profile  
 More options Aug 17 2002, 10:31 am
Newsgroups: comp.lang.lisp
From: Christopher Browne <cbbro...@acm.org>
Date: 17 Aug 2002 14:31:42 GMT
Local: Sat, Aug 17 2002 10:31 am
Subject: Re: New Paul Graham Article
A long time ago, in a galaxy far, far away, "Herb Martin" <He...@LearnQuick.Com> wrote:

The point is that _all_ the words in the message are considered.

For instance, if I throw my message, which conspicuously contains both
the word "sex" and the word "sexy," purportedly surefire indications
of spam, at ifile, the fact that it mentions Ifile several times means
that it heads to the "Apps/Ifile" folder where resides my archives of
the last five years of Ifile discussions.

To consider _only_ the words "sex" and "sexy" is a severe
oversimplification.
--
(reverse (concatenate 'string "gro.gultn@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/lisp.html
Objects & Markets
"Object-oriented programming is about the modular separation of what
from how. Market-oriented, or agoric, programming additionally allows
the modular separation of why."
-- Mark Miller


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Herb Martin  
View profile  
 More options Aug 17 2002, 11:04 am
Newsgroups: comp.lang.lisp
From: "Herb Martin" <He...@LearnQuick.Com>
Date: Sat, 17 Aug 2002 15:03:48 GMT
Local: Sat, Aug 17 2002 11:03 am
Subject: Re: New Paul Graham Article

> The point is that _all_ the words in the message are considered.

> For instance, if I throw my message, which conspicuously contains both
> the word "sex" and the word "sexy," purportedly surefire indications
> of spam, at ifile, the fact that it mentions Ifile several times means
> that it heads to the "Apps/Ifile" folder where resides my archives of
> the last five years of Ifile discussions.

> To consider _only_ the words "sex" and "sexy" is a severe
> oversimplification.

Well that makes more sense.

What about Graham's method leads one to believe that IFile
would not be considered?  Several of the examples he gives
(using 'Lisp' for himself instead of 'Ifile' as you would) are
isomorphic to this issue -- he is including words from the "good
mail" as well.

--
Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

"Christopher Browne" <cbbro...@acm.org> wrote in message

news:ajlmod$1bspp4$1@ID-125932.news.dfncis.de...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joe Marshall  
View profile  
 More options Aug 17 2002, 4:34 pm
Newsgroups: comp.lang.lisp
From: "Joe Marshall" <prunesqual...@attbi.com>
Date: Sat, 17 Aug 2002 20:34:29 GMT
Local: Sat, Aug 17 2002 4:34 pm
Subject: Re: New Paul Graham Article

"sv0f" <n...@vanderbilt.edu> wrote in message news:none-1608021330010001@129.59.212.53...
> On a statistical approach to filtering spam, with Lisp code,
> here:

> http://www.paulgraham.com/spam.html

> Discussion of this article is currently happening on Slashdot,
> for the interested.

Perhaps this technique could be used to filter out the large
amount of crap postings on this newsgroup.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Aug 17 2002, 4:42 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 17 Aug 2002 20:42:29 +0000
Local: Sat, Aug 17 2002 4:42 pm
Subject: Re: New Paul Graham Article
* xah <x...@xahlee.org>
| (fuck Erik Naggum. Fuck you. You got it?)

  Got it.  Now get on with your life.  Thank you.

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kaz Kylheku  
View profile  
 More options Aug 17 2002, 6:20 pm
Newsgroups: comp.lang.lisp
From: Kaz Kylheku <k...@ashi.footprints.net>
Date: Sat, 17 Aug 2002 22:20:54 +0000 (UTC)
Local: Sat, Aug 17 2002 6:20 pm
Subject: Re: New Paul Graham Article

In article <B9838E4A.2CAF%...@xahlee.org>, xah wrote:
> The phenomenon of spam is a human-social phenomenon. Spammers spam because
> it is effective.

That's only because you can't see spammers for the anti-social twits that they
are, who will keep spamming even when it's not effective.  Or they will define
their acceptable effectiveness to be something ridiculously low, like one
positive response from ten million spams. Or even define negative responses as
good responses, so that ``don't send me this crap'' earns one a permanent spot
in their list.

Spamming is not effective in any sense of the word that an actual marketer
would comprehend.

Now, why *don't* you see spammers for the anti-social twits that they are?  I
have my own idea about that.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thien-Thi Nguyen  
View profile  
 More options Aug 17 2002, 6:51 pm
Newsgroups: comp.lang.lisp
From: Thien-Thi Nguyen <t...@glug.org>
Date: 17 Aug 2002 22:51:35 +0000
Local: Sat, Aug 17 2002 6:51 pm
Subject: Re: New Paul Graham Article

Kaz Kylheku <k...@ashi.footprints.net> writes:
> Spamming is not effective in any sense of the word that an actual marketer
> would comprehend.

well clearly you have been lucky enough not to spend too much time around
(professional) marketers, who take great pains in safe-guarding their power to
comprehend everything positively.  their job is to foist this inability to
discern the feedback loop onto others (primarily professional sales people).
this is because when business is good, nobody cares, it's only when business
is bad that self-examination is painful.  it's no surprise that professional
sales people also take it to be their job to point the finger back at the
marketers.

whoever thought up the sales / marketing (organizational) partitioning was
probably a consultant weeping in anticipation of the spoils to be reaped from
the turf wars imminent.  split the mind and sell aspirin...

(actually, i have no clue what your background w/ these professions are; these
ramblings are from my own limited experience as a naive geek co-founding a
chip company where the only lisp involved was emacs lisp...  for round two,
i'd like to work my way up through the tool chain w/ lisp but somehow i got
distracted lo these last four years.)

thi


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Xah, you are beautiful" by ilias
ilias  
View profile  
 More options Aug 18 2002, 10:42 am
Newsgroups: comp.lang.lisp
From: ilias <at_n...@pontos.net>
Date: Sun, 18 Aug 2002 17:47:58 +0300
Local: Sun, Aug 18 2002 10:47 am
Subject: Xah, you are beautiful

xah wrote:
> There are two lispy big wigs, namely Paul Graham and Erik Naggum, who thinks
> their hotshot mouthing on spamming is something of value.

> Their outpouring, is not unlike that of damming of drivel flood.

> In the treatment of flood, there is the brute force of building of a dam.
> This is the no-nonsense solution of brutes the likes of technology geeks.
> Among them are the brute elite the likes of Graham'n'Naggum, who speak on
> fine engineering far ahead of fellow brutes.

...

> These brutes decline spam, but when they happen upon a spam that pleases
> them, such as a great porno site advertisement or something else that caught
> their personal interest, they will follow the advertisement.

i don't like spam.
when something is interesting (it happens, technically, tittically) i
try to push the delete-button before i read more,
sometimes i'm not able.

> For spammers,
> spam is effective if there is one response out of one thousand. Similarly,
> brutes will respond to one out of one thousand hateful spams. The prospect
> of spamming is thus kept alive by the populous brutes, everywhere, despite
> tumultuous noises they makes.

> As a human community or society, the treatment of spam is up to us, not
> fucking technology.

that is partly correct.

is up to us, assisted by (fucking or not) technology

> (fuck Erik Naggum. Fuck you. You got it?)

"fuck Erik Naggum".

"Fuck *you*" [relates to 'Erik Naggum', relates to 'the reader', relates
to 'Paul Graham', relates to technology-lovers?]

"you got it?"

no, please clarify.

> The gist is
> "what do we want?".

"we". who belongs to "we".

> (Note: not "what technology geeks want" or "what fucking
> unix morons want".)

are they not included in "we"?

> If we do not want spam, there are two ways to get rid of
> it: Thru law, and thru inaction.

- law
- inaction
- wisdom
- technology
- creativity
- cooperation
- understanding
- ...

solving the problem when working together.

> The former is a well-known proposal in the
> process. The latter, is possible only if human are not lazy unthinking
> beer-drinking brutes, which is never going to happen.

Take a human with a common intelligence, and place him in a group of
gorillas, he'll be a brilliant individuum (relative).

if he insists in that group on he's brilliance, the gorillas will give
him a brilliant fuck.

The "lazy unthinking beer-drinking brutes"-group belong to the
problem-domain, basicly it is the most important and unchangeable part.

When ignoring this, you declare youreself as a complete idiot.

but you're maybe simply jailous [someone wants something what anotherone
has]. Cause of your inability of "don't think - drink beer - and fuck -
be happy"

> PS i request that anyone who read so far and find it worthwhile to send me
> an email with the line "Xah, you are beautiful.". Thanks in advance.

sorry, no email.

i've place it in the subject.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "New Paul Graham Article" by Michael Sullivan
Michael Sullivan  
View profile  
 More options Aug 19 2002, 3:05 pm
Newsgroups: comp.lang.lisp
From: mich...@bcect.com (Michael Sullivan)
Date: Mon, 19 Aug 2002 14:49:12 -0400
Local: Mon, Aug 19 2002 2:49 pm
Subject: Re: New Paul Graham Article

xah <x...@xahlee.org> wrote:
> The phenomenon of spam is a human-social phenomenon. Spammers spam because
> it is effective. Consumers'S mouths says no but their actions says yes,
> because for the vast majority they are unthinking and happy-go-lucky brutes.

In fact, the problem with spam is not that large numbers of people
respond, but that it is so cheap to send (for the spammer anyway, since
the cost is distributed amongst the recipients and those who share their
systems and networks) that nearly *any* response is effective for them.

The problem with spam is that it is theft.  If spammers actually had to
bear the costs of their spam, they would never send it, because the
response rates are ridiculously low.  Since they do not, and it is cheap
and easy to send out hundreds of millions of messages, a response rate
of ten in a million is perfectly acceptable to them.

I think that Graham may be right, that if good spam filtering became
normal and automatic in nearly every email client (or server), that
response rates might eventually drop so low that it would become
worthless to spam.

Michael

--
Michael Sullivan
Business Card Express of CT             Thermographers to the Trade
Cheshire, CT                                      mich...@bcect.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michael Sullivan  
View profile  
 More options Aug 19 2002, 3:05 pm
Newsgroups: comp.lang.lisp
From: mich...@bcect.com (Michael Sullivan)
Date: Mon, 19 Aug 2002 14:49:10 -0400
Local: Mon, Aug 19 2002 2:49 pm
Subject: Re: New Paul Graham Article

Graham's algorithm *does* consider all the words, sort of.  It does a
hash lookup on every word, and then considers the fifteen words in that
mail that are the strongest signals (whether that be a signal of "good"
mail, or "bad") and does the bayes calculation on those.  It seems to me
that it wouldn't be all that computationally intensive to extend the
bayes calculation to more words.

I just did a very quick implementation of just the math and it looks
like speed is not the problem, but arbitrary precision.  With thousands
of words, you easily reach past the edge of the IEEE floating point spec
for some of your intermediary values, leading to a (/ x 0) situation.
With a good arbitrary precision math library, this is not an issue, but
it also appears that using the most significant 100-500 words is likely
to produce a certain result so often that it ought to be plenty.

I fed my bayes calculation pseudo random numbers and found that it was
generating probabilities over 4 sigma one way or another more than 1/2
the time using 100 numbers.  At 200 numbers, something like 80% were 5+
sigma, and a 100 run test did not produce a single probability between 5
and 95%.

So I'm guessing that using the most significant 200 numbers is unlikely
to produce results any different from doing the bayes calculation on
every last word.

The one scenario where I see trouble is a real message which for some
legitimate reason includes a forward of a spam example.  If there's
enough stuff added to the real message, his over-weighting of "good"
indicators will probably tip the scale.

But if it's a fairly short forward message, followed by an actual spam
(especially with full headers), it would almost certainly be tagged as
"spam", even though, this might be somebody trading information trying
to track down a spammer.  Or perhaps someone with too much time on their
hands read a spam and found it funny or otherwise interesting and
decided to pass it on to somebody.

I'm not sure how you can filter spam well without risking a false
positive in at least this case, but I suspect that this naive Bayesian
algorithm won't do the trick, unless there's a fair bit of "good"
content.  

> For instance, if I throw my message, which conspicuously contains both
> the word "sex" and the word "sexy," purportedly surefire indications
> of spam, at ifile, the fact that it mentions Ifile several times means
> that it heads to the "Apps/Ifile" folder where resides my archives of
> the last five years of Ifile discussions.
> To consider _only_ the words "sex" and "sexy" is a severe
> oversimplification.

Except that he doesn't actually do this.

Michael

--
Michael Sullivan
Business Card Express of CT             Thermographers to the Trade
Cheshire, CT                                      mich...@bcect.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rahul Jain  
View profile  
 More options Aug 19 2002, 3:28 pm
Newsgroups: comp.lang.lisp
From: Rahul Jain <ra...@rice.edu>
Date: 19 Aug 2002 15:28:10 -0400
Local: Mon, Aug 19 2002 3:28 pm
Subject: Re: New Paul Graham Article

mich...@bcect.com (Michael Sullivan) writes:
> But if it's a fairly short forward message, followed by an actual spam
> (especially with full headers), it would almost certainly be tagged as
> "spam", even though, this might be somebody trading information trying
> to track down a spammer.  Or perhaps someone with too much time on their
> hands read a spam and found it funny or otherwise interesting and
> decided to pass it on to somebody.

> I'm not sure how you can filter spam well without risking a false
> positive in at least this case, but I suspect that this naive Bayesian
> algorithm won't do the trick, unless there's a fair bit of "good"
> content.  

You can have the filter disabled for people you know won't send you
worthless messages.

--
-> -/                        - Rahul Jain -                        \- <-
-> -\  http://linux.rice.edu/~rahul -=-  mailto:rj...@techie.com   /- <-
-> -X "Structure is nothing if it is all you got. Skeletons spook  X- <-
-> -/  people if [they] try to walk around on their own. I really  \- <-
-> -\  wonder why XML does not." -- Erik Naggum, comp.lang.lisp    /- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   (c)1996-2002, All rights reserved. Disclaimer available upon request.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert St. Amant  
View profile  
 More options Aug 19 2002, 3:40 pm
Newsgroups: comp.lang.lisp
From: stam...@haeckel.csc.ncsu.edu (Robert St. Amant)
Date: 19 Aug 2002 15:36:51 -0400
Local: Mon, Aug 19 2002 3:36 pm
Subject: Re: New Paul Graham Article

Kaz Kylheku <k...@ashi.footprints.net> writes:
> In article <B9838E4A.2CAF%...@xahlee.org>, xah wrote:
> > The phenomenon of spam is a human-social phenomenon. Spammers spam because
> > it is effective.

> That's only because you can't see spammers for the anti-social twits that they
> are, who will keep spamming even when it's not effective.  Or they will define
> their acceptable effectiveness to be something ridiculously low, like one
> positive response from ten million spams. Or even define negative responses as
> good responses, so that ``don't send me this crap'' earns one a permanent spot
> in their list.

> Spamming is not effective in any sense of the word that an actual marketer
> would comprehend.

From an article in this week's Newsweek, titled "Spamming the World"
(http://www.msnbc.com/news/792491.asp):

     One bulk e-mailer says that when she started spamming in 1999,
     she could send out 100,000 e-mails and get 25 responses. Today,
     she has to send out a million messages to get the same response
     (a 0.0025 percent hit rate).

It's interesting reading.  I don't think spammers will ever stop (like
telemarketers), as long as they're getting *any* responses.  Short of
lawsuits, that is.

--
Rob St. Amant
http://www4.ncsu.edu/~stamant


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Aug 19 2002, 4:59 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 19 Aug 2002 20:59:30 +0000
Local: Mon, Aug 19 2002 4:59 pm
Subject: Re: New Paul Graham Article
* Robert St. Amant
| It's interesting reading.  I don't think spammers will ever stop (like
| telemarketers), as long as they're getting *any* responses.  Short of
| lawsuits, that is.

  I am actually amazed that out of the million people needed to get 25
  responses, there has not yet been a single potential psychopathic axe
  murderer living in the spammer's city.  Imagine just /one/ such case.

--
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joe Marshall  
View profile  
 More options Aug 19 2002, 7:26 pm
Newsgroups: comp.lang.lisp
From: "Joe Marshall" <prunesqual...@attbi.com>
Date: Mon, 19 Aug 2002 23:26:12 GMT
Local: Mon, Aug 19 2002 7:26 pm
Subject: Re: New Paul Graham Article

"Michael Sullivan" <mich...@bcect.com> wrote in message news:1fh63id.15bxu5pizyf4bN%michael@bcect.com...

> In fact, the problem with spam is not that large numbers of people
> respond, but that it is so cheap to send (for the spammer anyway, since
> the cost is distributed amongst the recipients and those who share their
> systems and networks) that nearly *any* response is effective for them.

Nearly any *valid* response is effective.  One part of the reason that
spam works is that it is possible to `identify' the 25 people out of
the million that act upon the message.  When you spam a million email
addresses most of the recipients discard or ignore the message.
The set of people that respond to the spam is *much* richer
in suckers than the original set of people identified by their
addresses.

If *every* spam yielded a (possibly bogus) response, then the
value of spamming would be severely decreased.  Spamming a set
of email addresses would yield no information about which recipients
are suckers because they *all* seem to be.  Putting a URL in the
spam would be useless because it would simply cause a million
automatic `hits' on the page.

> The problem with spam is that it is theft.  If spammers actually had to
> bear the costs of their spam, they would never send it, because the
> response rates are ridiculously low.  Since they do not, and it is cheap
> and easy to send out hundreds of millions of messages, a response rate
> of ten in a million is perfectly acceptable to them.

But a response rate of a million in a million would *not* be
acceptable.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 38   Newer >
« Back to Discussions « Newer topic     Older topic »