Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Identifying spam based on X-Mailer:

1,561 views

Skip to first unread message

Spam Guy

unread,

May 15, 2008, 9:54:42 AM5/15/08

Identifying spam based on X-Mailer:

X-Mailer: Microsoft Outlook Express 6.00.3790.1106
X-Mailer: Microsoft Outlook Express 6.00.2900.2963
X-Mailer: Microsoft Outlook Express 6.00.2900.2969
X-Mailer: Microsoft Outlook Express 6.00.3790.2962

X-Mailer: Microsoft Outlook Express 6.00.2600.1409

Spam Guy

unread,

May 15, 2008, 9:57:44 AM5/15/08

X-Mailer: Microsoft Outlook Express 6.00.2800.2962

Spam Guy

unread,

May 15, 2008, 10:03:31 AM5/15/08

X-Mailer: Microsoft Outlook Express 6.00.3790.4682

(lots of new ones being received today)

Spam Guy

unread,

May 18, 2008, 9:43:37 AM5/18/08

Landmark wrote:

> but if all you get is spam with a genuine looking header but a
> version number of, say, 6.00.3790.4133, then you might be able
> to determine that this is a genuine version number,

And if so, I would never choose to flag it as spam - unless it was a
really old version number.

> but you can never determine that it is a globally invalid version
> number.

Really?

You're saying that you can't say that a given OE version number is
bogus?

Is the total inventory or list of legit Microsoft OE version numbers
not known or knowable?

> Posting headers which you have found in your own unique spam sample
> is, at best, just newsgroup noise.

The OE version numbers I've posted are not unique to the spam _I_ get.

Take some of those numbers and do a web or usenet (google groups)
search. You'll see that (a) there are very few hits for them, and (b)
the hits you get are associated with spam sightings.

6.00.3790.4682
6.00.3790.2962
6.00.3790.1106
6.00.3790.181
6.00.2900.2963
6.00.2900.2969
6.00.2800.2962
6.00.2720.4682
6.00.2600.4682
6.00.2600.1409

For every version number I've posted, I've searched my own inventory
of e-mail (good and spam) and have only seen those numbers come up in
spam.

> If other people think you know what you are talking about and
> add your suggestions to their own spam filtering systems then
> it could be false-positively harmful.

Please be my guest and investigate if the numbers I've posted ever
were associated with legit releases of OE.

And yes, I continue to associate "The Bat" with spam. But on that
topic, I'm not getting any more spam containing "the bat" in the
X-mailer line. This was the very last such e-mail:

Return-Path: <dnik...@gopromortgage.com>
Received: from 158575200 ([124.63.150.57]) by (...)
Wed, 3 Oct 2007 11:23:45 -0400
Received: from gopromortgage.com (159492048 [154634152]) by
gordoncreativeassociates.com (Qmailv1) with
ESMTP id BD932A766F for (...)
Wed, 03 Oct 2007 15:23:29 +0000
Date: Wed, 03 Oct 2007 15:23:29 +0000
From: Crystal Berger <dnik...@gopromortgage.com>
X-Mailer: The Bat! (v2.00.6) Personal

It was the 4'th bat-spam received in October 2007 (62 in September, 44
in August, 42 in July, etc). Then they just stopped.

Coincidentaly, the body contains this:

----------
Fungo bat in your pants
Our Site
----------

(Our Site is hyperlinked to http://argaaa.cn/?(long alpha-numeric
string)

There are (according to google) 13 such spams posted to NANAS, all of
which were posted Sept 29/30 2007.

Searching for the phrase "the bat" in NANAS turns up 212,000 hits,
33,000 posted this year, 1,440 this month.

So "The Bat" still seems to be a useful indicator for spam, but for
some reason I've stopped getting them.

And yes, I know that "The Bat" is a valid mail client, but I dare you
(or anyone else reading this) to search through your mail inventories
and post the stats of how many valid or legit e-mails vs spam have
"the bat" in the X-mailer line.

VanguardLH

unread,

May 18, 2008, 10:03:27 AM5/18/08

"Spam Guy" wrote in <news:48303289...@Guy.com>:

> Landmark wrote:
>
>> but if all you get is spam with a genuine looking header but a
>> version number of, say, 6.00.3790.4133, then you might be able
>> to determine that this is a genuine version number,
>
> And if so, I would never choose to flag it as spam - unless it was a
> really old version number.

And, of course, no one still runs Windows 98 or ME and uses OE 5 over
there, uh huh. I was thinking you were listing non-legit version
numbers but, according to you, they are just old versions (which means
they are legit because they were old which means they existed). Okay,
in your world, anyone using an old version of software must be spewing
spam. The rest of know that not everyone updates their computer or is
even the owner of the computer that they get to use. Yeah, sure, as a
user I'm going to go updating the library's or school's computers when I
have no permission to do so and probably don't have the admin rights to
do so.

> Take some of those numbers and do a web or usenet (google groups)
> search. You'll see that (a) there are very few hits for them, and (b)
> the hits you get are associated with spam sightings.

And where are all those NON-spam sightings? Yeah, there aren't any.

> Please be my guest and investigate if the numbers I've posted ever
> were associated with legit releases of OE.

Oh, now you're claiming they are non-legit versions of OE. You aren't
consistent. Above you said they were old version numbers. If they are
non-legit versions then they cannot be old versions. Old versions would
have actually existed.

So which is it for your list: they are non-legit versions or they are
old versions. They can't be both.

Spam Guy

unread,

May 18, 2008, 5:21:02 PM5/18/08

VanguardLH wrote:

> >> (...) then you might be able to determine that this is a

> >> genuine version number,
> >
> > And if so, I would never choose to flag it as spam - unless it
> > was a really old version number.

> And, of course, no one still runs Windows 98 or ME and uses
> OE 5 over there, uh huh.

The last I checked (about a year ago) win-98 usage was about 2%,
win-me was usually about 1/4 of win-98 usage. In any case, win-98/me
usage is very low. Now combine that with the fraction of 98/me users
that use OE vs some form of web-mail and your down to even smaller
numbers.

I currently have 8 filters in place for OE versions 5.00.x and
5.50.x. Number of spams received that had those versions of OE:

281 spams in 2006
162 spams in 2007
51 spams in 2008

> I was thinking you were listing non-legit version numbers but,
> according to you, they are just old versions

The list of OE numbers in my previous post are the versions of OE that
I've added to my filter this year. I don't know if they are legit
versions or not. I do know that (a) I have no occurrances of those
version numbers in my inventory of good mail, and (b) usenet and web
searches for those number strings turn up relatively few hits, and
said hits are mostly linked to spam in some way.

I would think that a web or usenet search for a valid OE version
string would return more than a dozen hits across a variety of
contexts.

> (which means they are legit because they were old which means
> they existed).

Is it clear now that the above comment is a false extrapolation?

> Okay, in your world, anyone using an old version of software
> must be spewing spam.

I am flagging 8 specific 5.00.x and 5.50.x versions of OE as spam.
Some versions might be been valid, or all might be fake. I don't
know, I haven't checked, and it doesn't matter. For those that are
valid, then yes, I am dumping those to my spam folder because in the
year 2008 I no longer anticipate getting e-mail from someone using a 6
to 8-year-old e-mail client.

I am flagging many more 6.00.x versions as spam. I would not qualify
them as being old versions. I am filtering them not based on their
apparent age, but upon their apparent legitamacy.

> Yeah, sure, as a user I'm going to go updating the library's
> or school's computers when I have no permission to do so

I would not expect, and to my knowledge have never received, valid
e-mail from anyone using OE on a public computer. In most (or all)
cases, anyone using a public computer would be logged into a web-based
mail system if they were to send me mail that I would find worthy of
receiving.

> > Take some of those numbers and do a web or usenet (google groups)
> > search. You'll see that (a) there are very few hits for them,
> > and (b) the hits you get are associated with spam sightings.
>
> And where are all those NON-spam sightings? Yeah, there aren't
> any.

Which means - what?

This is the OE version of a recent, valid e-mail:

6.00.2900.3138

Turns out I have about 80 e-mails with that version - all of them
being "good" (ie - none are spam).

Google-groups reports 2100 hits (yes, many are in NANAS, some
aren't). A general web search turns up more than 4000 hits.

With numbers like that, I wouldn't be able to support the idea of
flagging that version as a precise indicator of spam, because I
couldn't say with confidence that it wasn't a valid version number.

> > Please be my guest and investigate if the numbers I've posted
> > ever were associated with legit releases of OE.
>
> Oh, now you're claiming they are non-legit versions of OE.

I'm saying go ahead and show me that they are valid or real versions.
They were all sourced from spam. We know that spam headers can
contain a wealth of artificially-crafted text.

Did Microsoft ever code these version numbers into OE:

6.00.3790.4682
6.00.3790.2962
6.00.3790.1106
6.00.3790.181
6.00.2900.2963
6.00.2900.2969
6.00.2800.2962
6.00.2720.4682
6.00.2600.4682
6.00.2600.1409

> You aren't consistent. Above you said they were old
> version numbers.

No.

I said that I belived that they were bogus, and that it is useful to
filter based on bogus OE versions.

I also said that I wouldn't flag real version numbers as spam unless
they were old.

I didn't say that the numbers above were being flagged because they
were old. That was your interpretation.

> If they are non-legit versions then they cannot be old
> versions.

I agree with that statement, but it pertains to your false
interpretation of my previous post, so it's not material.

> So which is it for your list: they are non-legit versions or
> they are old versions. They can't be both.

Like I said, be my guest and try to dig up a list of known OE version
numbers and compare them against the above list.

Based on where I got them from, and what a few searches tell me, I'm
speculating that many, most, or all of them are bogus.

And if they're not bogus, then they were never in wide circulation, or
they existed for only a short period of time.

VanguardLH

unread,

May 18, 2008, 8:19:29 PM5/18/08

What is the volume of e-mails (per day, per week, per month, or over
whatever interval) in which you have culled the OE versions that you
think identify a spam source?

How many worldwide collection points were involved in this spam
collection? What was the total volume across them all and at each one?

Or am I thinking too big and this is just for your personal e-mails at
home (i.e., one collection point for a single account and maybe a
hundred e-mails per week)?

Spam Guy

unread,

May 18, 2008, 8:28:14 PM5/18/08

Landmark wrote:

> > Based on where I got them from, and what a few searches tell me,
> > I'm speculating that many, most, or all of them are bogus.
>

> Yes, that's the point, you are speculating.

I've posted my supporting evidence (which you don't directly refute)
and I've stated countless times that all you have to do is do some
digging (perhaps in your own e-mail inventory) to prove that some,
many, most, or all of those versions were indeed legit and NOT bogus.

Only THEN would you have an argument that my speculating was
deficient.

> I really don't care whether they are legit, obsolete or complete
> fabrications. That's not the point.

Yes, it is the point.

If one wants to impliment a local spam-filtering strategy, and if
their mail client has the capability, then filtering based on bogus
header strings is incredibly useful.

> The point is that you are basing this on your own spam exposure.

And why should my spam exposure be any different than anyone elses?

All of the OE versions I've posted have turned up in NANAS, so clearly
I'm not the only one receiving from those same spammers.

> If you were using a bayesian filtering system then it would
> certainly take this into account as a high indicator of
> spamminess.

Do you have a bayesian filter in place?

Have you checked your spam for the OE strings I've posted?

Did your bayesian filters catch (or will it catch) subsequent
occurrances of spam with those strings - or any suspect OE string?

Or is it too hard or does it take too much work to scan your e-mail
inventories for specific content?

> Whilst I have a lot of reservations about bayesian filters,

If you can't tell me how your filter has performed in ID'ing spam
based (partially or totally) on seeing any of the OE version strings
I've mentioned, then your just talking out of your hat.

> If people followed your example then for sure we'd have people
> posting messages saying "Add VIAGRA to your filtering list
> because I've never seen a legit mail which uses the word
> viagra".

Apparently you can't differentiate filtering based on the header
contents (which most people never see) vs filtering the message body.
Are you that simple that you have to divert your argument to such a
degree?

The vast majority of my filters are header-based.

When I test the message body, I look for:

- Cote and Ivoire
- ghana
- DiscountPharmacy or Discount-Pharmacy
- various conditions that include "lottery"

As well as testing the subject line for various characters like ¥, £,
Å, é (indicative of oriental spam)

> Like I said, if you want to implement a rule like that on your
> system, I've no problem with it. I have many such rules myself,
> like right now I'm blocking pretty well all of Viet Nam.

Blocking based on IP (which I do a lot, at the server level) is not
something I'd lump into the same conversation as filtering based on
message characteristics or content. Those are two completely
different topics.

> But I don't go posting in here with a suggestion that
> VietNam is a good filter to add to your rule book.

Again, what is your argument against filtering based on fictional
version strings?

Assuming that we had in front of us a Microsoft authorized or sourced
list of all OE version numbers, and if some, most, or all of the OE
versions I've posted were NOT on that list, would you not say that
filtering based on those numbers would apply to anyone, anywhere?
Would you not say that an e-mail that was received by anyone, anywhere
on the planet that has one of those OE version numbers is without
question a spam?

> If you post rules into a public forum as if it is a FACT,

And if you do nothing of a material nature to post information that
directly counters that information,

> then I do have a problem with it,

Your arm-waving does not constitute a coherent or credible
counter-argument.

You could do a lot more by simply saying that you've received n
e-mails in the past year that you don't consider spam but that DID
contain one of the OE version strings I mentioned.

You could say that you've discovered a list of MS OE version numbers,
the url of which is http:/what-ever, and it clearly shows that n of my
listed numbers shows up on that list.

You could say that your baysian filter has identified one or more OE
version strings as a likely indicator or predictor of spam, but that
your filter's list and my list don't match.

> and the fact that no-one expressed any positive interest in
> your postings supports my complaint that its just a load of
> internet noise.

It's very naive for you to suggest that the lack of postings means
that there aren't a lot of readers. As well, future searches for
these aspects of spam filtering, or the specific OE version numbers
I've posted, will turn up this thread.

> so I'm guessing the reason for posting wasn't for discussion,

So you don't think that's what we're currently doing?

Spam Guy

unread,

May 18, 2008, 9:13:44 PM5/18/08

VanguardLH wrote:

> What is the volume of e-mails (per day, per week, per month,

If a spam makes it through my filters and into my in-box, I look at
the header.

If there is something interesting, or unique, or odd in the various
strings (X-mailer, User-Agent, X-anti-this, X-tested-that, etc) then I
construct a filter and test it against my current e-mail inventory.

I'll be looking at how many "good" e-mails would be caught by the
filter, vs how many spams have I historically received that would have
been caught.

In some cases, I will search the internet / usenet for the strings in
question and form an opinion as to it's validity or usefulness as a
component of a filter.

But if I can't construct a filter that would have caught the new spam
without generating a false-positive hit on any of my good e-mails,
then I won't impliment the filter. There are exceptions, like
filtering based on the very old versions of OE, but only if there are
sufficient spams coming in to justify it.

I don't need to receive thousands of spams from dozens of different
accounts or trap-addresses in order to encounter an example of a
possibly bogus OE version string. I encounter them as I receive them,
and I test them as they skate through my existing set of filters.

> whatever interval) in which you have culled the OE versions
> that you think identify a spam source?

All it takes is one spam - the next one that makes it through your
filters. You don't need more than the next one.

And if you don't get any new spams (today, this week, this month) that
make it through your filters, then isin't that a good thing?

> Or am I thinking too big

You're thinking along the lines of what a baysian filter takes to
function well, which is quantity.

Instead I look at each one individually to see where it's achillies
heel is. Because there will be others like it coming in the future -
all with the same vulnerability.

Spam Guy

unread,

May 19, 2008, 12:38:20 PM5/19/08

MrD wrote:

> > And why should my spam exposure be any different than anyone
> > elses?
>

> Do you mean "why is it so"? I guess it's because different people
> expose their addresses in different ways, and so get into
> different lists.

There are (currently) millions of working e-mail addresses, and there
have been (over the past 20 years) billions of addresses.

Are you saying that there are also millions (or billions) of different
ways that an address becomes incorporated into spam lists?

The fact is that there is a limited number of ways to enroll an
address into a list used by spammers. Almost certainly there are
fewer than 50 unique ways, and most commonly only about a dozen ways.
Direct web exposure, information theft via viral infection, insider
list theft, responding to "don't send me any more e-mail" links,
dictionary attack, bartering between spammers, etc.

So if you say that my spam *exposure* is different than anyone elses,
you're saying that the spammers that send me spam will ONLY send spam
to me, and to no one else, because they are the only spammers that
have my address(es), and they have no-one elses.

You tell me how realistic that it.

> The fact is that your exposure *is* different from other peoples'.

That is bullshit.

My *exposure* is no different. My particular spam-load (the raw
number of spams, the set or grouping of the particular messages
themselves) will almost certainly be unique - but possibly or probably
will be identical to a sub-set of someone else's spam load.

Address acquisition by spammers is essentially a random or
probabilistic event. My addresses are just as prone to fall into a
spammer's list as anyone elses, and they will do so mainly through 10
or 20 different methods or routes.

Yes, domains with many users or that have a high profile will attract
dictionary attacks, and the operators of those domains will have a
different set of anti-spam techniques to counter that type of spam. I
think it's safe to say that not every spammer performs dictionary
attacks, and not every domain experiences them either.

> > So "The Bat" still seems to be a useful indicator for spam,
> > but for some reason I've stopped getting them.
>

> How could that be, if your spam-load is in any way
> representative?

But I once did get "bat-spam" - lots of it. I suggested that people
look for "The Bat" in the X-mailer line as an indicator of spam.
Clearly, as you agree, there is still lots of bat-spam being received,
which means that my advice to look for it is still useful or
diagnostic. The fact that I not longer receive bat-spam is irrelavent
to that advice.

Are you saying that every spammer sends spam to every valid, working
e-mail address? If that was the case, then:

a) we would all get the same spam
b) I would still be getting Bat-spams
c) everyone's spam would be representative of everyone elses

> Unavoidable conclusion: SG's spam-load is unrepresentative.

There is no such thing as a representative spam load, because it would
require that every spammer have a list of all currently working e-mail
addresses, and send spam to all of them during every spam campaign or
run. That doesn't seem to be the situation.

It is more likely that there are N spammers, and each has X different
e-mail addresses to send spam to at any given time, and there is Y
overlap in addresses between spammers. Because N, X and Y are
unknown (unknown to those who are not part of the spam underground)
there is no way to estimate what the spam distribution would or could
or should be. Knowing the distribution would enable someone to locate
their particular spam-load on the distribution curve and make a claim
of one sort or another.

But then you have to factor in such things as if your server is
performing IP blocking, whether your domain has an MX-record vs
relying on A-record fall-back, if your server performs grey listing or
RBL-lookups or challenge-response, etc.

If you are trying to make the case that my posting of specific spam
traits (such as suspect OE version numbers) would not be useful to
someone else, they you need to do better than to simply say that my
spam is not representative (which as I've pointed out is a useless
statement because no spam load is or can be "representative").

You need to explain why a spammer would choose to use those traits (ie
specific OE version numbers) for spam only being sent to me, and would
not use those same extact traits for spam being sent to someone else.

Spam Guy

unread,

May 19, 2008, 12:44:58 PM5/19/08

MrD wrote:

> >> What is the volume of e-mails (per day, per week, per month,
> >
> > If a spam makes it through my filters and into my in-box, I look at
> > the header.
>

> For the tape: the witness declines to answer.

My answer was that the number of spams I get has no relation to
whether or not I can identify a header trait that I can filter. Read
the Subject: of this thread.

Why don't you explain why the number of spams I get (per day, per
year, etc) has an impact on whether or not I can say that OE version
6.00.3790.4682 can be used as a spam indicator.

(and I've posted my spam-stats many times in this newsgroup, so go
look them up if you want)

MrD

unread,

May 19, 2008, 2:27:59 PM5/19/08

Spam Guy wrote:
> MrD wrote:
>
>>> And why should my spam exposure be any different than anyone
>>> elses?
>> Do you mean "why is it so"? I guess it's because different people
>> expose their addresses in different ways, and so get into different
>> lists.
>
> There are (currently) millions of working e-mail addresses, and there
> have been (over the past 20 years) billions of addresses.
>
> Are you saying that there are also millions (or billions) of
> different ways that an address becomes incorporated into spam lists?

(Checking) Nope. I don't seem to be saying that.

>
> The fact is that there is a limited number of ways to enroll an
> address into a list used by spammers. Almost certainly there are
> fewer than 50 unique ways, and most commonly only about a dozen ways.
> Direct web exposure, information theft via viral infection, insider
> list theft, responding to "don't send me any more e-mail" links,
> dictionary attack, bartering between spammers, etc.

Direct web exposure:- on one's homepage, on a Facebook page, in a
guestbook entry, in a blog, in a blog-comment (etc.). As a mailto: link
URL, as a plain (browser-legible) address, hidden in a comment, munged
(e.g. with "at"), more munged (e.g. with "nospam").

My point is that while there is a finite number of ways of exposing your
address, that number is not small. Each address-harvester will have
differing policies on where he looks for addresses, resulting in each
list having different contents.

This much seems obvious - to me, anyway.

>
> So if you say that my spam *exposure* is different than anyone elses,
> you're saying that the spammers that send me spam will ONLY send
> spam to me, and to no one else, because they are the only spammers
> that have my address(es), and they have no-one elses.

(Checking) nope, don't think I'm saying that.

>
> You tell me how realistic that it.

"That", i.e. what I'm not saying?

>
>> The fact is that your exposure *is* different from other peoples'.
>
> That is bullshit.

It obviously and demonstrably is different, as is evidenced by your own
recent remarks regarding The Bat and NANAS.

>
> My *exposure* is no different. My particular spam-load (the raw
> number of spams, the set or grouping of the particular messages
> themselves) will almost certainly be unique - but possibly or
> probably will be identical to a sub-set of someone else's spam load.

No doubt some subset of your spam matches exactly some subset of some
other recipients'. So what?

>
> Address acquisition by spammers is essentially a random or
> probabilistic event. My addresses are just as prone to fall into a
> spammer's list as anyone elses, and they will do so mainly through 10
> or 20 different methods or routes.

I disagree. Some people's addresses never receive any spam. Some people
receive spam only from certain spammers, and no others. Others receive
varying quantities of spam, apparently of a heterogenous nature.
Evidently these different groups are *not* equally likely to fall into
spammers' lists, and evidently they fall into different sets of lists.

>
> Yes, domains with many users or that have a high profile will attract
> dictionary attacks, and the operators of those domains will have a
> different set of anti-spam techniques to counter that type of spam.
> I think it's safe to say that not every spammer performs dictionary
> attacks, and not every domain experiences them either.
>
>>> So "The Bat" still seems to be a useful indicator for spam, but
>>> for some reason I've stopped getting them.
>> How could that be, if your spam-load is in any way representative?
>
> But I once did get "bat-spam" - lots of it. I suggested that people
> look for "The Bat" in the X-mailer line as an indicator of spam.
> Clearly, as you agree, there is still lots of bat-spam being
> received,

I haven't agreed with that claim; I'm not in a position to comment.

> which means that my advice to look for it is still useful or
> diagnostic. The fact that I not longer receive bat-spam is
> irrelavent to that advice.

You haven't addressed my question: IF "The Bat" is a good spam-sign, and
IF your spam is representative, then why are you not getting "bat-spam"
while others are? (You may need to find someone with an elementary
grasp of logic to help you answer that).

>
> Are you saying that every spammer sends spam to every valid, working
> e-mail address?

(Checking) nope, that's not one of the things I've said, I'm glad to say.

If that was the case, then:
>
> a) we would all get the same spam b) I would still be getting
> Bat-spams c) everyone's spam would be representative of everyone
> elses
>
>> Unavoidable conclusion: SG's spam-load is unrepresentative.
>
> There is no such thing as a representative spam load, because it
> would require that every spammer have a list of all currently working
> e-mail addresses, and send spam to all of them during every spam
> campaign or run. That doesn't seem to be the situation.

Good! We are making progress, perhaps.

>
> It is more likely that there are N spammers, and each has X different
> e-mail addresses to send spam to at any given time, and there is Y
> overlap in addresses between spammers. Because N, X and Y are
> unknown (unknown to those who are not part of the spam underground)
> there is no way to estimate what the spam distribution would or could
> or should be. Knowing the distribution would enable someone to
> locate their particular spam-load on the distribution curve and make
> a claim of one sort or another.

A claim of *what* sort, exactly? What are you talking about, actually?

>
> But then you have to factor in such things as if your server is
> performing IP blocking, whether your domain has an MX-record vs
> relying on A-record fall-back, if your server performs grey listing
> or RBL-lookups or challenge-response, etc.
>
> If you are trying to make the case that my posting of specific spam
> traits (such as suspect OE version numbers) would not be useful to
> someone else, they you need to do better than to simply say that my
> spam is not representative (which as I've pointed out is a useless
> statement because no spam load is or can be "representative").

I am saying that statistical properties of your spam collection should
be of little interest to anyone else, because:
a) the size of your samples of spams and goodmail is minuscule
b) the size of your sample of domains can be counted on the fingers of
one finger
c) your domain is anonymous; and so, beyond the small size of the
samples, we have nothing that enables us to determine whether it
resembles our domain(s) or not.

>
> You need to explain why a spammer would choose to use those traits
> (ie specific OE version numbers) for spam only being sent to me, and
> would not use those same extact traits for spam being sent to someone
> else.

I *need* to? Why? For example: what would it gain me? And what makes you
think that I think you receive "customised" spam?

--
Jack.

MrD

unread,

May 19, 2008, 2:59:27 PM5/19/08

Spam Guy wrote:
> MrD wrote:
>
>>>> What is the volume of e-mails (per day, per week, per month,
>>> If a spam makes it through my filters and into my in-box, I look
>>> at the header.
>> For the tape: the witness declines to answer.
>
> My answer was that the number of spams I get has no relation to
> whether or not I can identify a header trait that I can filter.

You are lying; you gave no such answer.

> Read the Subject: of this thread.

The answer isn't there either!

>
> Why don't you explain why the number of spams I get (per day, per
> year, etc) has an impact on whether or not I can say that OE version
> 6.00.3790.4682 can be used as a spam indicator.

If you want to determine the plausibiity of a claim that there is a
statistical correlation between some two independent attributes of a
population of samples, the the number of specimens with each attribute,
and the total size of the population, are needed. You can then
detyermine the likelihood that any given specimen might have either or
both attributes by chance; and then you can determine the likelihood
that the specific numbers of occurences of each attribute in *your*
population, both together (in the same specimens) and not together,
might have occured by chance.

Without these numbers, the significance of your correlations cannot be
determined.

In other words, if the total numbers are very small, then the
probability that any observed correlation is the result of random
variation is correspondingly very high; and the correlation can duly be
disregarded as insignificant.

I have asked you to give these numbers because I know that they are in
fact small; and that you are therefore (probably) not in a position to
draw statistically-significant conclusions from your "research". And I
think that when you make your sweeping claims, you should provide naive
readers with the information that would make it possible to evaluate
them properly, rather than just hoping that readers will assume you are
some kind of expert.

>
> (and I've posted my spam-stats many times in this newsgroup, so go
> look them up if you want)

I have seen numbers that you have posted on prior occasions; those
numbers were equivalent to "not many".

However, for each claim that you make, I think you should post:

- the size of the population the observations are drawn from
- the number of spam messages having the attributes in question
- the number of hams [ditto]
- the number of spams NOT having the attributes in question
- the number of hams [ditto]
- some facts about how the orginal population was obtained (what kinds
of pre-filtering it had been subjected to)

I say this because you have a history of posting wild claims without
evidence; obfuscation and evasion when challenged; a weak grasp of basic
reasoning; and extreme reluctance to provide solid facts.

For what it's worth, I don't do this kind of analysis on my mailboxes. I
know that they are representative of nothing but themselves.

- The ways in which they are or are not exposed are idiosyncratic
- They are all pre-filtered, and the filtering has changed over time

If I were to take the trouble to do that sort of work, I woud first want
a corpus of unmunged spam/ham headers that was sufficiently large and
diverse to justify the effort. A few hundred mailboxes, spread over a
few dozen independent domains, would be likely to yield interesting
results. It would be even better if the samples from those various
domains could be adjusted for the diversity of the pre-filtering to
which they had been subjected. I don't know how to get hold of such a
corpus, though.
--
Jack.

Spam Guy

unread,

May 19, 2008, 10:05:00 PM5/19/08

MrD wrote:

> My point is that while there is a finite number of ways of
> exposing your address, that number is not small. Each address-
> harvester will have differing policies on where he looks for
> addresses,

Wasn't it established some time ago that the most likely or widely
deployed method of extracting addresses from web sites was by scanning
the web-page caches on trojanized PC's?

Meaning that organized web-searching is something that spammers (or
"harvesters") no longer do, and probably haven't done for the past 5
years.

If you have more info (or different info) on that topic, feel free to
post it.

My point is that you can lump all the various web-related address
exposures into a single harvesting catagory - harvesting via searches
performed on infected PC's.

>>> The fact is that your exposure *is* different from other peoples'.
>

> It obviously and demonstrably is different, as is evidenced by your
> own recent remarks regarding The Bat and NANAS.

Everyone's e-mail address has an equal chance of being found on
someone else's virus-infected or trojanized PC.

That's why I say that all e-mail address's have equal exposure to
being discovered.

Your continued references to my bat-spams and NANAS is flawed.

If instead I said that I continued to receive bat-spams, then how
would your argument have changed?

What then would support your argument that my exposure is different?

> > Address acquisition by spammers is essentially a random or
> > probabilistic event. My addresses are just as prone to fall
> > into a spammer's list as anyone elses, and they will do so
> > mainly through 10 or 20 different methods or routes.
>
> I disagree. Some people's addresses never receive any spam.

And what if the reason is the anti-spam measures in place on their
mail server?

Spam attempts that are never delivered are still spam being sent to a
given address.

> Some people receive spam only from ...
> Others receive varying quantities of spam ...

> Evidently these different groups are *not* equally likely to
> fall into spammers' lists,

And what is your explanation for that?

Is it that spammers don't share their lists?

Are they not motivated to sell, borrow, barter or trade their lists
with each other?

Are there no central list maintainers, or list databases, where
everyone's address can be aggregated and re-distributed?

> > Clearly, as you agree, there is still lots of bat-spam being
> > received,
>
> I haven't agreed with that claim; I'm not in a position to
> comment.

Your argument (in this case) is based on evidence in NANAS that there
still are bat-spams occurring, and you point out that I'm not getting
any of them. You can't say that you agree with or believe NANAS on
one hand, and then say that you don't agree with it on the other hand.

> You haven't addressed my question: IF "The Bat" is a good spam-sign,

yes I do,

> and IF your spam is representative,

That is NOT a necessary condition for the premis of the rest of your
argument.

If you want to argue more about the "representative-ness" of my spam
load, then you need to define what you consider to be a representative
spam load, and why you think that's a valid concept or a knowable
thing in the first place.

> then why are you not getting "bat-spam" while others are?

We are back to -> what if I said I was still getting them? How would
that change your argument?

> > There is no such thing as a representative spam load, ...

>
> Good! We are making progress, perhaps.

No, we're not making progress.

If you don't believe there is such a thing as a representative spam
load, then why have you kept insisting that such a thing exists and
that I don't have it???

> I am saying that statistical properties of your spam collection

I am not making a statistical claim about my spam collection, nor is
one needed.

I have pointed out an e-mail characteristic that I believe is
diagnostic for spam.

I have provided several examples of this characteristic.

I have shown that others have received spam with these
characteristics.

I make no claims as to how prevalent these spams are or will be in the
future.

I make no claims as to the likelyhood of any given account-owner or
server-operator receiving such a spam in the future.

Spam Guy

unread,

May 19, 2008, 10:35:15 PM5/19/08

MrD wrote:

> > My answer was that the number of spams I get has no relation to
> > whether or not I can identify a header trait that I can filter.
>
> You are lying; you gave no such answer.

I said this:

> > I don't need to receive thousands of spams from dozens of
> > different accounts or trap-addresses in order to encounter
> > an example of a possibly bogus OE version string.

If you don't think the two phrases above are equivalent, tben please
explain why.

And now that you know my answer, what is your response?

> The answer isn't there either!

I've already given it.

> > Why don't you explain why the number of spams I get (per day,
> > per year, etc) has an impact on whether or not I can say that
> > OE version 6.00.3790.4682 can be used as a spam indicator.
>
> If you want to determine the plausibiity of a claim that there
> is a statistical correlation between some two independent
> attributes of a population of samples,

And what two independant attributes are we discussing?

The OE version number is an attribute, but it doesn't require
statistical analysis to determine if it is a ficticious attribute.

If 6.00.3790.4682 is a fictional OE version, then I don't have to
perform a statistical analysis to conclude that any e-mail containing
that string in the X-Mailer header line is spam.

> I have asked you to give these numbers because I know that they
> are in fact small;

Why don't you perform your statistical analysis on your own e-mail
inventory.

You have the variables being discussed, 6.00.3790.4682 being one of
them.

You've said before that you don't trust my numbers or my methods, so
there's no point in me performing the analysis on my e-mail inventory.

> I say this because you have a history of posting wild claims
> without evidence;

Currently, I am posting the claim that the presence of OE version
6.00.3790.4682 in the x-mailer line is diagnostic for spam.

If you want to catagorize that a wild claim, go ahead.

There is evidence in NANAS to support that claim.

If you want to refute or counter that claim, then identify a document,
a comment, something, that indicates that 6.00.3790.4682 is a valid
Microsoft OE version number. Or scan your own e-mail inventory for
that OE version and report your findings here.

> obfuscation and evasion when challenged;

There is no obfuscation or evasion on my part.

I have stated a very simple claim, and have stated it clearly.

I have stated how you can prove my claim is false, but you don't even
try, nor do you explain why.

> For what it's worth, I don't do this kind of analysis on my
> mailboxes. I know that they are representative of nothing but
> themselves.

Then you should have no trouble to tell me if you have seen OE version
6.00.3790.4682 in spam, or in good mail, or in both, or in neither.

> - The ways in which they are or are not exposed are idiosyncratic

This is not fundamentally an issue of address exposure.

> - They are all pre-filtered, and the filtering has changed over
> time
>

> If I were to take the trouble to do that sort of work, I would

> first want a corpus of unmunged spam/ham headers

Not necessary.

Presumably you have a "corpus" of mail that has passed through your
filters, and presumably they still have their headers intact. That
set of mail presumably represents "good" or legit mail - and not spam.

Search the headers for the various OE version numbers I've posted.
I'll post then again here:

6.00.3790.4682
6.00.3790.2962
6.00.3790.1106
6.00.3790.181
6.00.2900.2963
6.00.2900.2969
6.00.2800.2962
6.00.2720.4682
6.00.2600.4682
6.00.2600.1409

If you see those in your e-mail inventory (and not in e-mails that you
consider as spam), then my claim that they are diagnostic for spam is
false.

Spam Guy

unread,

May 19, 2008, 10:47:54 PM5/19/08

MrD wrote:

> [Newsgroups trimmed]

[Original distribution restored]

> When *I* alter the newsgroups header, I say so. Would you please
> have the courtesy to refrain from silently restoring them,
> asshole?

This thread was being cross-posted to comp.mail.headers by me before
you joined it.

I didn't alter the distribution - you did.

I kept restoring it, and no such say-so is required.

> Explanation: I prefer to see all the replies to my posts,
> and I don't care to subscribe to whatever arbitrary groups
> you happen to have included to do that.

Any replies from someone reading comp.mail.headers would (or should)
be cross-posted back to alt.spam, so I don't know what you're crying
about.

If you care enough about how someone in comp.mail.headers might
respond (and not cross-post back to alt.spam) then you should be
reading that group too.

BTW, that's how usenet is supposed to work. That's what cross-posting
is for. A thread can occurr across groups without everyone needing to
subscribe to all of them, yet everyone can participate in the thread.

> Furthermore, not being a reader of comp.mail.headers,

Don't worry. From what I can tell, it's a dead group anyways.

> I am unaware of the context into which you are injecting my
> replies made in alt.spam.

There is no external context present in comp.mail.headers surrounding
(or part of) this thread as it occurs in comp.mail.headers.

BTW - if I recall correctly, you are located in the UK.

If so, what are your thoughts about this:

http://business.timesonline.co.uk/tol/business/industry_sectors/telecoms/article3965033.ece

Jorgen Grahn

unread,

May 20, 2008, 7:43:58 AM5/20/08

["Followup-To:" header set to alt.spam.]

On Mon, 19 May 2008 22:47:54 -0400, Spam Guy <Sp...@Guy.com> wrote:
> MrD wrote:
>
>> [Newsgroups trimmed]
>
> [Original distribution restored]
>
>> When *I* alter the newsgroups header, I say so. Would you please
>> have the courtesy to refrain from silently restoring them,
>> asshole?
>
> This thread was being cross-posted to comp.mail.headers by me before
> you joined it.
>
> I didn't alter the distribution - you did.
>
> I kept restoring it, and no such say-so is required.

You are both wrong.

IMHO, this thread was offtopic for comp.mail.headers already when it
was just a list of MS Outlook version numbers, weeks ago. It adds no
knew knowledge about this particular header, from an RFC perspective.

Followup-To: set to alt.spam.

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!

Spam Guy

unread,

May 20, 2008, 10:18:44 AM5/20/08

Jorgen Grahn wrote:

> IMHO, this thread was offtopic for comp.mail.headers already when
> it was just a list of MS Outlook version numbers, weeks ago.

Is the X-Mailer term not found in e-mail headers?

Does this news group focus or pertain to discussions relating to mail
headers?

> It adds no knew knowledge about this particular header,
> from an RFC perspective.

Must all discussions in this group pertain to the RFC aspects of
headers?

Can they not extend to spam detection methods based on header
analysis?

0 new messages