how well do various blacklists work?

Al

unread,

Mar 18, 2007, 12:29:17 PM3/18/07

to

Your mileage may vary, but here's mine: http://stats.dnsbl.com

Data's compared against my large spamtrap and hamtrap feeds.

For my take on what the data shows, see:
http://www.dnsbl.com/2007/03/how-well-do-various-blacklists-work.html

I'm tracking more lists than are represented on the page; it's just
that I picked a few to highlight at first based on how interesting I
found the data.

What's different about my testing is that I'm not blocking anything;
just comparing. So I can spot check the data at any time to ensure
that a matched piece of mail really is spam or not.

Feedback welcome.

--
Comments posted to news.admin.net-abuse.blocklisting
are solely the responsibility of their author. Please
read the news.admin.net-abuse.blocklisting FAQ at
http://www.blocklisting.com/faq.html before posting.

Kevin Wayne Williams

unread,

Mar 18, 2007, 6:58:25 PM3/18/07

to

Al wrote:
> Your mileage may vary, but here's mine: http://stats.dnsbl.com
>
> Data's compared against my large spamtrap and hamtrap feeds.
>
> For my take on what the data shows, see:
> http://www.dnsbl.com/2007/03/how-well-do-various-blacklists-work.html
>
> I'm tracking more lists than are represented on the page; it's just
> that I picked a few to highlight at first based on how interesting I
> found the data.
>
> What's different about my testing is that I'm not blocking anything;
> just comparing. So I can spot check the data at any time to ensure
> that a matched piece of mail really is spam or not.
>
> Feedback welcome.

Your "wrong" data deserves a bit more analysis. If I understand it
correctly, you solicit some mail by signing up on legitimate commercial
mailings. A large percentage of those are outsourced to a few volume
mailers. If one server is blacklisted, but you have indirectly requested
that 20% of your mail come from that server, you are going to calculate
an artificially high number.

You should analyze your data both on the basis of numbers of e-mail
received, and number of IP addresses it is received from. That will help
reveal high-volume senders being blocked inappropriately. Anyone using
blocklists also uses whitelists, and high-volume senders would be on the
whitelist in real-life.

KWW

Al

unread,

Mar 19, 2007, 8:47:18 AM3/19/07

to

On Mar 18, 6:58 pm, Kevin Wayne Williams <kww.niho...@verizon.nut>
wrote:

> Your "wrong" data deserves a bit more analysis. If I understand it
> correctly, you solicit some mail by signing up on legitimate commercial
> mailings. A large percentage of those are outsourced to a few volume
> mailers. If one server is blacklisted, but you have indirectly requested
> that 20% of your mail come from that server, you are going to calculate
> an artificially high number.

I'm not tracking servers; I'm tracking requested pieces of mail. I've
signed up for over 400 different mailing lists and add 1-3 more every
day. So far, the resulting ham (540 pieces as of just now) has come
from 168 unique IP addresses, and I expect this to grow as well, as I
am not targetting any specific ESP or sender; simply signing up for as
many unique lists as possible. For a few senders I've signed up for
multiple lists; if they all send at the same time I have been going
back and unsubscribing from all lists but one. Beyond that, there will
still be multiple IP addresses showing up multiple times as a list is
sent to daily, weekly, monthly, etc.

> You should analyze your data both on the basis of numbers of e-mail
> received, and number of IP addresses it is received from. That will help
> reveal high-volume senders being blocked inappropriately. Anyone using
> blocklists also uses whitelists, and high-volume senders would be on the
> whitelist in real-life.

I don't think it's accurate to assume that everybody who uses a
blacklist (or even the majority) know enough to whitelist desired
senders in a way that will override the blacklist setting.

Regards,
Al Iverson
http://www.aliverson.com

Hal Murray

unread,

Mar 19, 2007, 12:39:01 PM3/19/07

to

>I don't think it's accurate to assume that everybody who uses a
>blacklist (or even the majority) know enough to whitelist desired
>senders in a way that will override the blacklist setting.

It's much worse than that. Even if you are smart enough to setup
a whitelist, you can't find the data you need to do it.

List operators don't include it on their web sites.

Including it in the confirmation request doesn't help if that
message gets blocked because it's not whitelisted.

--
These are my opinions, not necessarily my employer's. I hate spam.

Kevin Wayne Williams

unread,

Mar 19, 2007, 1:05:17 PM3/19/07

to

Al wrote:
> On Mar 18, 6:58 pm, Kevin Wayne Williams <kww.niho...@verizon.nut>
> wrote:
>
>> Your "wrong" data deserves a bit more analysis. If I understand it
>> correctly, you solicit some mail by signing up on legitimate commercial
>> mailings. A large percentage of those are outsourced to a few volume
>> mailers. If one server is blacklisted, but you have indirectly requested
>> that 20% of your mail come from that server, you are going to calculate
>> an artificially high number.
>

> I'm not tracking servers; I'm tracking requested pieces of mail. ...

I'm not saying you shouldn't publish the numbers you are publishing now.
I'm just saying that without some level of server analysis, it is hard
to interpret your false positive rates.

>
>> You should analyze your data both on the basis of numbers of e-mail
>> received, and number of IP addresses it is received from. That will help
>> reveal high-volume senders being blocked inappropriately. Anyone using
>> blocklists also uses whitelists, and high-volume senders would be on the
>> whitelist in real-life.
>
> I don't think it's accurate to assume that everybody who uses a
> blacklist (or even the majority) know enough to whitelist desired
> senders in a way that will override the blacklist setting.

Even if that's true, it would be helpful to distinguish the effects of
an intelligent user using a blocklist vs. someone that doesn't know what
he's doing.

KWW

Herb Oxley

unread,

Mar 20, 2007, 12:55:45 PM3/20/07

to

Hal Murray <hal-u...@ip-64-139-1-69.sjc.megapath.net> wrote:

> >I don't think it's accurate to assume that everybody who uses a
> >blacklist (or even the majority) know enough to whitelist desired
> >senders in a way that will override the blacklist setting.

> It's much worse than that. Even if you are smart enough to setup
> a whitelist, you can't find the data you need to do it.

> List operators don't include it on their web sites.

> Including it in the confirmation request doesn't help if that
> message gets blocked because it's not whitelisted.

White and black lists are data - which are processed by the mail system.

I have seen many business sites which send automated solicited bulk email
give the address(es) to add to your "address book".

For those managing more than their own email, the email system needs a
facility to input "whitelist" requests along with a line of communications
between the mail users and those in charge of the mail system or a
self-service way for the mail users to add addresses of those who send
them solicited bulk email or email containing features which trip
heuristic spam filters.

Of course the devil's in the details; every mail setup is different.

Are there any "canned" (either payware or open source) mail systems
or addons to such which can utilize
mail user's address books as whitelisting data?

--
The published From: address is a trap.
Take my first initial and last name
and look at the origin of this post.
If you really want to send me email.
Or request a private reply in the group.

Erik Warmelink

unread,

Mar 20, 2007, 9:48:24 PM3/20/07

to

In article <1174235713.5...@n76g2000hsh.googlegroups.com>,
"Al" <aliversonch...@gmail.com> writes:

> Your mileage may vary, but here's mine: http://stats.dnsbl.com
>
> Data's compared against my large spamtrap and hamtrap feeds.
>
> For my take on what the data shows, see:
> http://www.dnsbl.com/2007/03/how-well-do-various-blacklists-work.html
>
> I'm tracking more lists than are represented on the page; it's just
> that I picked a few to highlight at first based on how interesting I
> found the data.
>
> What's different about my testing is that I'm not blocking anything;
> just comparing. So I can spot check the data at any time to ensure
> that a matched piece of mail really is spam or not.
>
> Feedback welcome.

Your results are somewhat disturbing, I wouldn't have guessed that both
sbl and spamcop were completely inefficient in stopping subscription
bombs. Everyone can read
<http://stats.dnsbl.com/2007/03/about-this-data-this-data-is-compiled.html>:

| Some mail is double opt-in (confirmed opt-in), and some is not.

Bulk mail send to you because you asked for it, isnt't spam. If it is
sent to you because someone (possibly you) asked for it, it may be
ham, but it surely isn't halal meat.

You might try to "subscribe" to those kafir lists with two addresses
and unsubscribe one of the addresses. A DNSbl which doesn't stop the
rotten porc to the "unsubscribed" address isn't very useful.

--
er...@selwerd.nl

Al

unread,

Mar 27, 2007, 10:31:18 AM3/27/07

to

On Mar 20, 8:48 pm, e...@flits102-126.flits.rug.nl (Erik Warmelink)
wrote:

> Your results are somewhat disturbing, I wouldn't have guessed that both
> sbl and spamcop were completely inefficient in stopping subscription
> bombs. Everyone can read

> <http://stats.dnsbl.com/2007/03/about-this-data-this-data-is-compiled....>:

I don't see any evidence that any blacklist stops subscription bombs.
Unless there are a few listing 0/0.

> | Some mail is double opt-in (confirmed opt-in), and some is not.
>
> Bulk mail send to you because you asked for it, isnt't spam. If it is
> sent to you because someone (possibly you) asked for it, it may be
> ham, but it surely isn't halal meat.
>
> You might try to "subscribe" to those kafir lists with two addresses
> and unsubscribe one of the addresses. A DNSbl which doesn't stop the
> rotten porc to the "unsubscribed" address isn't very useful.

A DNSBL already exists to track unsusbcription abuse - the UBL:
http://www.lashback.com/support/UnsubscribeBlacklistSupport.aspx

That's not what I'm choosing to track witht his data. However, all
lists were subscribed to using unique, tagged addresses, and I have
already removed some inbound addresses from the feed due to address
misuse.

(Note that the website data is a bit out of date at the moment. I am
still collecting data just fine, but I am working on a new method of
publishing it to the web more often.)

Regards,
Al Iverson

Erik Warmelink

unread,

Mar 28, 2007, 8:22:16 AM3/28/07

to

In article <1175009117.3...@e65g2000hsc.googlegroups.com>,

"Al" <aliversonch...@gmail.com> writes:
> On Mar 20, 8:48 pm, e...@flits102-126.flits.rug.nl (Erik Warmelink)
> wrote:
>
>> Your results are somewhat disturbing, I wouldn't have guessed that both
>> sbl and spamcop were completely inefficient in stopping subscription
>> bombs. Everyone can read
>> <http://stats.dnsbl.com/2007/03/about-this-data-this-data-is-compiled....>:
>
> I don't see any evidence that any blacklist stops subscription bombs.
> Unless there are a few listing 0/0.

There is quite a range of possibilities between not stopping a
subscription bomb and being completely inefficient against them.

>> | Some mail is double opt-in (confirmed opt-in), and some is not.
>>
>> Bulk mail send to you because you asked for it, isnt't spam. If it is
>> sent to you because someone (possibly you) asked for it, it may be
>> ham, but it surely isn't halal meat.
>>
>> You might try to "subscribe" to those kafir lists with two addresses
>> and unsubscribe one of the addresses. A DNSbl which doesn't stop the
>> rotten porc to the "unsubscribed" address isn't very useful.
>
> A DNSBL already exists to track unsusbcription abuse - the UBL:
> http://www.lashback.com/support/UnsubscribeBlacklistSupport.aspx

I wasn't only talking about unsibscription abuse, some lists don't
handle "unsubscription"s at all. Admittedly, most often they only make
it very, very hard to "unsubscribe".

> That's not what I'm choosing to track witht his data. However, all
> lists were subscribed to using unique, tagged addresses, and I have
> already removed some inbound addresses from the feed due to address
> misuse.

Once again, if you asked to be subscribed to an opt-out list, the
email isn't spam for you. If someone else is subscribed without
asking to be subscribed, the list will send him/her unsolicited bulk
email.
If a DNSbl lists the source of that spam, it is working as intended.

--
er...@selwerd.nl

Elvey

unread,

Mar 28, 2007, 8:47:16 AM3/28/07

to

[note to mods: changed my mind; trying again.]

On Mar 18, 9:29 am, "Al" <aliversonchicagouse...@gmail.com> wrote:

> Your mileage may vary, but here's mine:http://stats.dnsbl.com
>

> Feedback welcome.

If one wanted to come up with statistics to make blocklists look bad
if they list ESPs that practice poor list hygiene, one could hardly do
better than to follow the methodology that produced these stats. These
stats help perpetuate the spam problem by leading folks to make
inappropriate decisions.

The methodology :
Explicitly define the set of non-spam (i.e. ham) as email sent by
bulk mailers. Sign up for mailing lists that are opt-out, AKA
unverified opt-in, AKA double opt-in,
and then claim that when the mail signed up for arrives from an IP
that's blacklisted, it's a false positive. These hits are not false
positives. When email from a spam source is marked as email from a
spam source, its correctly
marked. The spam problem will never get much better if there are no
consequences for spammers.

The methodology uses a definition of spam that is sharply at
odds with normal definitions such as http://www.spamhaus.org/definition.html
and http://www.spamcop.net/fom-serve/cache/125.html - definitions from
the very blocklists the stats indicate are most accurate.

Al says:
'Senders that misuse addresses are removed from this feed and lose
their
"ham" status. However, since I did legitimately give them an address,
I
don't usually redirect them into the "spam" feed.' This further skews
the statistics. In other words, the methodology is that if an ESP
starts to spam an address given it, don't count that spam as spam!

I can't speak about Al's motivations or characterizations of the stats
or the insights they provide into the stats without running afoul of
NANABLs charter and therefore getting (justifiably) censored, so I
won't.

All I CAN say is that any ESPs that EFFECTIVELY require ALL their
customers
follow "closed loop confirmed opt-in" list hygeine (e.g. see the
carefully worded description at http://www.five-ten-sg.com/blackhole.php,
virtually never get listed on any of the blocklists these stats refer
to. ESPs that inEFFECTIVELY require SOME OF their customers follow
some sort of vaguely defined 'double' or 'verified' opt in? Sure they
get listed.

On Mar 27, 7:31 am, "Al" <aliversonchicagouse...@gmail.com> wrote:
>
> A DNSBL already exists to track unsusbcription abuse - the UBL:http://www.lashback.com/support/UnsubscribeBlacklistSupport.aspx

Weird. I went to that page, and it said:
Total IPs Listed 0. (all stats were 0)

Then I closed it and re-opened it and it said:
Total IPs Listed 120,582.

Al

unread,

Mar 28, 2007, 3:22:52 PM3/28/07

to

On Mar 28, 7:22 am, e...@flits102-126.flits.rug.nl (Erik Warmelink)
wrote:

> Once again, if you asked to be subscribed to an opt-out list, the
> email isn't spam for you. If someone else is subscribed without
> asking to be subscribed, the list will send him/her unsolicited bulk
> email.
> If a DNSbl lists the source of that spam, it is working as intended.

That's probably true, depending on the DNSBL. And any DNSBL is well
within their rights to list whatever is appropriate as defined by
their listing policies.

Further, it seems like any competently-run DNSBL should be able to
survive a bit of light shined on what "working as intended" actually
means, especially as it intersects with various mail streams.

What you seem to be saying is, "See! This DNSBL blocked all these
*potential* spam sources!" And I'm fine with that. As you say, it can
be well within the charter of a DNSBL.

I'm also fine with it standing side by side with what my take on it
is, and letting users and potential users of any DNSBL decide.

Regards,
Al

Al

unread,

Mar 28, 2007, 4:36:03 PM3/28/07

to

On Mar 28, 7:47 am, "Elvey" <gg-pub...@matthew.elvey.com> wrote:
> [note to mods: changed my mind; trying again.]
>
> On Mar 18, 9:29 am, "Al" <aliversonchicagouse...@gmail.com> wrote:
>
> > Your mileage may vary, but here's mine:http://stats.dnsbl.com
>
> > Feedback welcome.
>
> If one wanted to come up with statistics to make blocklists look bad
> if they list ESPs that practice poor list hygiene, one could hardly do
> better than to follow the methodology that produced these stats. These
> stats help perpetuate the spam problem by leading folks to make
> inappropriate decisions.

I am not hunting down any sender based on which ESP is used. I am
searching for companies I've heard of, terms like "newsletter signup"
and so forth. As an example, today it has received mailings from the
Washington Post, Dell, National Geographic, US Department of Labor,
TechRepublic, ScienceDaily, Coach, and OfficeMax. It's pretty random.

> The methodology :
> Explicitly define the set of non-spam (i.e. ham) as email sent by
> bulk mailers. Sign up for mailing lists that are opt-out, AKA
> unverified opt-in, AKA double opt-in,

Double opt-in as I use the term explicitly refers to confirmed opt-in.
Some lists appear to be confirmed opt-in, some do not.

> and then claim that when the mail signed up for arrives from an IP
> that's blacklisted, it's a false positive. These hits are not false
> positives.

If the mail is wanted by somebody and blocked from reaching that
somebody, the potential exists for a false positive, and it's been
interesting to highlight where this potential exists.

Why? Having personally counseled many receiving sites on which
blacklists to use, or how to reduce the spam they deal with, I find
that there are a lot of folks who are not as savvy as the rest of us.
They don't understand that using a "hard core" blacklist is going to
block lots of mail that their users actually sign up for and want.

It allows potential users of a DNSBL to decide for themselves whether
or not they are in agreement with the DNSBL's choice of listing
criteria, and what that listing criteria actually means with regard to
mail desired by users at that domain. I think a well-run DNSBL should
easily be able to handle this type of spotlight on their data.

> The methodology uses a definition of spam that is sharply at
> odds with normal definitions such ashttp://www.spamhaus.org/definition.html

> andhttp://www.spamcop.net/fom-serve/cache/125.html- definitions from

> the very blocklists the stats indicate are most accurate.

And yet they're not listing the sources of mail you're apparently
taking issue with. I think that speaks to Spamcop and Spamhaus taking
a reasonable and sensible approach to how they handle listings and
issues. In particular, I was very surprised to find that Spamcop does
so well. I've been talking to a Spamcop developer on and off for a few
months about this, and he indicated to me that they have made
significant improvements in better identifying spam versus non-spam.
It's certainly possible that they are wrong and out of step, but my
data seems to show otherwise. Their data, they tell me, with a much
larger mixed corpus, also seems to be in line with mine.

> All I CAN say is that any ESPs that EFFECTIVELY require ALL their
> customers
> follow "closed loop confirmed opt-in" list hygeine (e.g. see the

> carefully worded description athttp://www.five-ten-sg.com/blackhole.php,

> virtually never get listed on any of the blocklists these stats refer
> to. ESPs that inEFFECTIVELY require SOME OF their customers follow
> some sort of vaguely defined 'double' or 'verified' opt in? Sure they
> get listed.

All I can say back is that there are some blacklists who purposely
list IP addresses that do not or have not sent spam. It's their right
to do so, and I respect that.

I'd love to see more data on how blacklists are working. Depth of
usage, i.e. how widely are they used? Information on how they
intersect with other peoples' mail streams beyond just mine. What type
of "regular desired mail" do they have the potential to impact?

Not enough data exists currently, which is what interested me in this
project.

Regards,
Al Iverson

Chris Lewis

unread,

Mar 29, 2007, 9:10:13 AM3/29/07

to

According to Al <aliversonch...@gmail.com>:

> On Mar 20, 8:48 pm, e...@flits102-126.flits.rug.nl (Erik Warmelink)
> wrote:

> > Your results are somewhat disturbing, I wouldn't have guessed that both
> > sbl and spamcop were completely inefficient in stopping subscription
> > bombs. Everyone can read
> > <http://stats.dnsbl.com/2007/03/about-this-data-this-data-is-compiled....>:

> I don't see any evidence that any blacklist stops subscription bombs.
> Unless there are a few listing 0/0.

Presupposing a subscription bomb requires that the mailing lists
don't do confirmations, the Trend/Maps NML will stop the ones that
it knows about.

See http://www.mail-abuse.com/removereq_nml.html

Other DNSBLs that use traps may well catch mailing lists that
have been similarly abused in the past.

> That's not what I'm choosing to track witht his data. However, all
> lists were subscribed to using unique, tagged addresses, and I have
> already removed some inbound addresses from the feed due to address
> misuse.

The problem with that portion of the measurement is that if you
do determine that the list was misused, by subsequently ignoring
the hits altogether you arrive at a skewed picture. It
take into account mail sources where some of it is solicited and
some of it isn't (the DNSBL listing being legit). Or, if the mailing
list is perfectly non-spamming, but had some sort of inadvertent
information leak to a bot runner (you can no longer tell which
emissions and thus which DNSBL hits should/should not apply).

There is no "correct" way of counting the hits for a list you've
determined to be misused. Even simply ignoring them is wrong.

That portion of your stats is of some minimal use to marketers
who don't spam, so that they get some sort of idea of what their
likelyhood of getting hit by a DNSBL is. It also shows to
the end user some measure of the likelyhood of something they've
asked for (ignoring the spammyiness of the source) getting blocked.

But, obviously, if you're indiscriminately subscribing to
gazillions of lists, that is in no means representative of what
a real user or an individual marketer would see. The sample set
is artificial and doesn't represent real users.

Here's one you can try:

http://shopperssavingcenter.net

Go through the enrollment process and confirm the opt-in. Check
out the Terms and Conditions (including the bit about spamming
others with it), but don't complete the qualifying sponsor offers.

Let us know if you think this one is being misused ;-)
--
Chris Lewis,

Age and Treachery will Triumph over Youth and Skill
It's not just anyone who gets a Starship Cruiser class named after them.

Al

unread,

Mar 29, 2007, 10:01:16 AM3/29/07

to

On Mar 29, 8:10 am, "Chris Lewis" <cle...@nortel.com> wrote:
> There is no "correct" way of counting the hits for a list you've
> determined to be misused. Even simply ignoring them is wrong.

I disagree. I think removing them from the data set is the correct
thing to do. I would not consider a DNSBL hit against such a sender to
be a false positive, so I would not like it represented within the
false positive results.

> That portion of your stats is of some minimal use to marketers
> who don't spam, so that they get some sort of idea of what their
> likelyhood of getting hit by a DNSBL is. It also shows to
> the end user some measure of the likelyhood of something they've
> asked for (ignoring the spammyiness of the source) getting blocked.

I believe the data being published does indeed show to the end user
some measre of the likelyhood of something they've asked for being
blocked.

I believe "spammyiness of the source" is a difficult and subjective
measure. I believe "list misuse and/or other unrelated content from
that IP to me" is a place to start, though not a destination. I
further believe that a lot of DNSBL operators list a /8 when they get
a single trap it, which I consider excessive. Feel free to consider
this conservative in the other direction, to offset it. And as I said,
a starting point.

> But, obviously, if you're indiscriminately subscribing to
> gazillions of lists, that is in no means representative of what
> a real user or an individual marketer would see. The sample set
> is artificial and doesn't represent real users.

You're right. It's only a subset of what a real mail stream into an
ISP looks like. That subset focuses on lists people would sign up for.
Feel free to call it an artificial data set, constructed by me, that
specifically highlights DNSBL hits against lists that I signed up for.

> Here's one you can try:
>
> http://shopperssavingcenter.net
>
> Go through the enrollment process and confirm the opt-in. Check
> out the Terms and Conditions (including the bit about spamming
> others with it), but don't complete the qualifying sponsor offers.
>
> Let us know if you think this one is being misused ;-)

I can't sign up, as I am not a Canadian resident. I could try to fool
them, but I have no clue what to put for an address and postal code.

That kind of site is not the kind I'm looking to highlight DNSBL hits
against it. I'm not looking for co-registration, sold or bought lists,
random resdistribution of lists to third parties, nonstop unrelated
mail to you until you cry uncle by unsubscribing, etc. I assume the
site you reference falls into one of those categories. Please feel
free to clarify. Legal or not, it's the type of mail that generates
lots of spam complaints, and is more likely to be blocked by an ISP.
Add all of this together, and it makes it the type of site I would be
unlikely to call DNSBL hits against "false positives." I'm signing up
for lists of stuff people would want to be on, signing up directly
with the list owner, and I would remove (and have removed) entities
from the feed who do things that may be perfectly legal, but are not
things that I want to be involved with.

Based on your recommendation of that site, I think you misunderstand
what I'm signing up for, and why. I hope that helps to clear things up
a bit.

huey.c...@gmail.com

unread,

Mar 29, 2007, 11:05:40 AM3/29/07

to

Chris Lewis <cle...@nortel.com> wrote:
> According to Al <aliversonch...@gmail.com>:

> > e...@flits102-126.flits.rug.nl (Erik Warmelink) wrote:
> > > Your results are somewhat disturbing, I wouldn't have guessed
> > > that both sbl and spamcop were completely inefficient in
> > > stopping subscription bombs.

> > That's not what I'm choosing to track witht his data. However, all
> > lists were subscribed to using unique, tagged addresses, and I have
> > already removed some inbound addresses from the feed due to address
> > misuse.
> The problem with that portion of the measurement is that if you
> do determine that the list was misused, by subsequently ignoring
> the hits altogether you arrive at a skewed picture. It
> take into account mail sources where some of it is solicited and
> some of it isn't (the DNSBL listing being legit). Or, if the mailing
> list is perfectly non-spamming, but had some sort of inadvertent
> information leak to a bot runner (you can no longer tell which
> emissions and thus which DNSBL hits should/should not apply).

That's an inherent flaw in DNSBLs, if you look at a hit as 'mail from
this IP address is spam' and a miss as 'mail from this IP address is
not', given that the same piece of mail can be solicited by one person
and not solicited by the other. In that case, either listing is wrong,
since a hit is a false positive for the subscribed user, and a miss
is a false negative for the spammed user. And I'm not aware of any
DNSBLs that have a 'sometimes' flag.

> That portion of your stats is of some minimal use to marketers
> who don't spam, so that they get some sort of idea of what their
> likelyhood of getting hit by a DNSBL is. It also shows to
> the end user some measure of the likelyhood of something they've
> asked for (ignoring the spammyiness of the source) getting blocked.

I think that was the point of the exercise.

--
Huey

Erik Warmelink

unread,

Mar 30, 2007, 9:34:10 AM3/30/07

to

In article <1175112987.1...@r56g2000hsd.googlegroups.com>,
"Al" <aliversonch...@gmail.com> writes:

> What you seem to be saying is, "See! This DNSBL blocked all these
> *potential* spam sources!" And I'm fine with that. As you say, it can
> be well within the charter of a DNSBL.

No, what I am saying is that if you can easily find those unconfirmed
mailing lists, others can find them too. To claim that listing the
emitting IP address is a false positive, you would have to prove that
neither spamtraps nor other unwilling recipients were "subscribed".

Otherwise that IP address wouldn't be a potential spam source, but a
real spam source and your claim of an "inaccurate listing" would
itself be inaccurate.

--
er...@selwerd.nl
.

Chris Lewis

unread,

Mar 30, 2007, 12:57:03 PM3/30/07

to

According to <huey.c...@gmail.com>:

> Chris Lewis <cle...@nortel.com> wrote:
> > According to Al <aliversonch...@gmail.com>:
> > > e...@flits102-126.flits.rug.nl (Erik Warmelink) wrote:
> > > > Your results are somewhat disturbing, I wouldn't have guessed
> > > > that both sbl and spamcop were completely inefficient in
> > > > stopping subscription bombs.
> > > That's not what I'm choosing to track witht his data. However, all
> > > lists were subscribed to using unique, tagged addresses, and I have
> > > already removed some inbound addresses from the feed due to address
> > > misuse.
> > The problem with that portion of the measurement is that if you
> > do determine that the list was misused, by subsequently ignoring
> > the hits altogether you arrive at a skewed picture. It
> > take into account mail sources where some of it is solicited and
> > some of it isn't (the DNSBL listing being legit). Or, if the mailing
> > list is perfectly non-spamming, but had some sort of inadvertent
> > information leak to a bot runner (you can no longer tell which
> > emissions and thus which DNSBL hits should/should not apply).
>
> That's an inherent flaw in DNSBLs, if you look at a hit as 'mail from
> this IP address is spam' and a miss as 'mail from this IP address is
> not', given that the same piece of mail can be solicited by one person
> and not solicited by the other. In that case, either listing is wrong,
> since a hit is a false positive for the subscribed user, and a miss
> is a false negative for the spammed user.

That's one of the arguments that I'm making. By constructing
the test in the way he has, he has introduced inherently
indeterminate states, where none of the options (count as
false positive, false negative, true positive, true negative, or
simply ignore altogether) are clearly the right thing to do. At
best he simply loses useful metrics. At worst, it's quite
misleading.

> And I'm not aware of any DNSBLs that have a 'sometimes' flag.

Anything that can be used in scoring is conceptually equivalent
to a "sometimes" flag. Which means that all DNSBLs can implicitly
have "sometimes" flags, depending on how the site admin utilizes it.

>
> > That portion of your stats is of some minimal use to marketers
> > who don't spam, so that they get some sort of idea of what their
> > likelyhood of getting hit by a DNSBL is. It also shows to
> > the end user some measure of the likelyhood of something they've
> > asked for (ignoring the spammyiness of the source) getting blocked.

> I think that was the point of the exercise.

With the indeterminancies (or at least the lack of documentation
of what the criteria actually are), I don't think it's very
compelling at all in what it's trying to say.
--
Chris Lewis,

Age and Treachery will Triumph over Youth and Skill
It's not just anyone who gets a Starship Cruiser class named after them.

--

Chris Lewis

unread,

Mar 30, 2007, 1:56:23 PM3/30/07

to

According to Al <aliversonch...@gmail.com>:

> On Mar 29, 8:10 am, "Chris Lewis" <cle...@nortel.com> wrote:
> > There is no "correct" way of counting the hits for a list you've
> > determined to be misused. Even simply ignoring them is wrong.

> I disagree. I think removing them from the data set is the correct
> thing to do. I would not consider a DNSBL hit against such a sender to
> be a false positive, so I would not like it represented within the
> false positive results.

But if it's a list that you've determined seemed to fall within
legitimacy, and it is repurposed, then, shouldn't you be counting
the DNSBL hits to be true positives and indicative of misleading
practises? That's just as important as false positives in the very
context you're looking at.

Or, since you're subscribing to gazillions of lists, you're obviously
not spending any time to understand what the topic flow _is_, you could
misconstrue something legit as misuse, drop the list, yet, a DNSBL hit
is a real false positive?

For example, I used to subscribe to a mailing list on ferrets (in fact,
I founded it and ran it for almost 10 years before handing it off).
Normal flow is chatter amongst ferret owners. If you happened to hit
the quarterly advertising issue, would you know it was what was supposed
to be there, and is normal for the group? Or would you simply ignore
that list from then on? If you did the latter, you're missing
DNSBL false positives.

I suppose my real issues with this is that the "testing environment"
is very inadequately described. You're selecting lists on criteria you
haven't described anywhere in anything approaching detail, I can't see
that you learn what's supposed to be on the lists well enough to
necessarily know what constitutes "misuse". Further, you're completely
ignoring a set of data that in itself is _very_ important.

It'd help if you describe _what_ sort of lists you're talking about.
I would have thought you'd have included shopperssavingscenter, but
without even seeing it, you say you wouldn't. Is that representative
of how people would react to that site? I dunno. Your criteria
wouldn't be the same as at least that portion of the user segment
that gets trapped by that crap. Obviously, some are.

Are these "craft/hobby/chatter" lists? Product line lists? Support
lists? Advertising lists? What?

> I believe "spammyiness of the source" is a difficult and subjective
> measure.

True.

> I believe "list misuse and/or other unrelated content from
> that IP to me" is a place to start, though not a destination. I
> further believe that a lot of DNSBL operators list a /8 when they get
> a single trap it, which I consider excessive. Feel free to consider
> this conservative in the other direction, to offset it. And as I said,
> a starting point.

Er, I don't think deliberate bias in the opposite direction is
helpful.

> > Let us know if you think this one is being misused ;-)

> I can't sign up, as I am not a Canadian resident. I could try to fool
> them, but I have no clue what to put for an address and postal code.

If you want to try, email me, and I'll give you a useable one.

> Based on your recommendation of that site, I think you misunderstand
> what I'm signing up for, and why. I hope that helps to clear things up
> a bit.

I may well misunderstand, but in my defence, I'll point out that
you're not being very clear on what you _are_ signing up for. Making
_that_ part of the "test environment description" would help considerably.
--
Chris Lewis,

Age and Treachery will Triumph over Youth and Skill
It's not just anyone who gets a Starship Cruiser class named after them.

--

Al

unread,

Mar 30, 2007, 7:51:33 PM3/30/07

to

On Mar 30, 12:56 pm, "Chris Lewis" <cle...@nortel.com> wrote:

> > > Let us know if you think this one is being misused ;-)
> > I can't sign up, as I am not a Canadian resident. I could try to fool
> > them, but I have no clue what to put for an address and postal code.
>
> If you want to try, email me, and I'll give you a useable one.

It wouldn't matter; it's not within the realm of what I'm attempting
to find information about. My goal isn't to track sites that
distribute email addresses far and wide or bombard people with ads
based on an opt-in premise hidden in tiny letters in a privacy policy.
My goal is to sign up for lists that in theory people actually want,
and I think this is too edge case.

I certainly wouldn't want to stop you from signing up yourself. It is
always good to have more folks tracking how senders use email
addresses, for better or worse.

> > Based on your recommendation of that site, I think you misunderstand
> > what I'm signing up for, and why. I hope that helps to clear things up
> > a bit.
>
> I may well misunderstand, but in my defence, I'll point out that
> you're not being very clear on what you _are_ signing up for. Making
> _that_ part of the "test environment description" would help considerably.

Well, I certainly strive to provide as much transparency as possible
regarding what data I'm collecting and summarizing.

I went to sites run by various companies and newsletters. I signed up
directly for their own lists, and I'm tracking which of the solicited
mail I'm receiving in response would be impacted by DNSBL listings for
many DNSBLs. I started by working myself through a list of the top
retailers I found on some business website (Fortune?), searching for
their websites, signing up for their mailings. I continued by
searching for phrases like "newsletter signup" in Google, and if it
was something I had heard of or seems legitimate or harmless, I've
signed up for it. I've further given out addresses to sites I've
stumbled across in my day-to-day surfing.

This is primarily senders/newsletter owners/companies like Amazon,
AARP, A Prairie Home Companion, Barnes and Noble, ABC TV, Tech
Republic, Bose, Dollar Tree, Al Franken, Lane Bryant, National Public
Radio, Overstock.com, Petco, Symantec, TJ Maxx, Victoria's Secret, The
Straight Dope, Adidas, Geeks.com, Dell, DIVX, US Dept. of Labor, WBEZ
Radio, Expedia, Hotwire, Sharper Image, Old Navy, Coca Cola,
Washington Post, NY Post, Land's End, Joseph A Bank, and Planet Out.
To name a few. No focus has been placed on a specific ESP, retail
segment, type of sender, etc. Just all the random stuff I can think of
and find. I am open to suggestions of additional things to consider. I
would prefer that they be things I've heard of, and I would prefer to
focus mostly on B2C (consumer oriented) lists.

I know some people here, on NANAE, on SPAM-L, maintaining DNSBLs,
etc., will have problems with some of these senders. That goes without
saying, as there are many in this realm who are more quick to block
than an ISP is, and will never, ever forget. I did not purposely pick
out senders that I knew to be blacklisted or hated by the "heads on
pikes" set. (In fact, I really had little visibility into this data
before I started this project. I suspected that a lot of DNSBLs
randomly blacklist a lot of senders like this. Turned out I was both
right and wrong. Some do, but some have gotten a lot better at how
they work, such as Spamcop.)

I further tried to make this data unique by signing up for things that
only send one email at a time or one at a day at the most. That means
I am not signing up for discussion lists. 400 pieces of mail against
the same IP a day skews the data either for or against a DNSBL, and I
don't find that as useful. For some companies, I realized later that I
signed up for enough lists that I was getting multiple emails from
them on the same day (like Tech Republic). I have been unsubcribing
from all but one list from a sender when I run across that. I did not
intentionally sign up for any list with the purpose of getting one
mail every day or every week or every month or etc. I just signed up
for whatever they seemed to have available for a signup, and if there
were multiple choices, I picked one (or a few) and went from there.

Some folks have suggested I sign up for various FreeBSD mailing lists
and etc. I have declined to do so both because of what I write above,
and also because I think that high-volume geek mailing lists are not a
typical user's traffic. Nor is the traffic sample I'm assembling, but
that is why I am careful to explain what the stream does indeed
contain.

And finally, I think it would be great to see additional folks publish
public data of their own. You obviously take issue with the data I'm
collecting and sharing. I would love to see you collect and share
similar data publicly. You've obviously got access to more spamtraps
and a larger mailstream.

However: Unlike a DNSBL operator, or you as the gatekeeper at a large
corporate site, I have what is potentially a different insight into
false positives. You might get a false positive report once a month,
and perhaps don't know what previous attempted mail from that IP was
blocked. I am not blocking anything, just downloading it, logging IPs
and DNSBL hits (and meta data on the email), so I can always look and
see what other mail came from that IP, would it have been blocked or
not, and I can make an instant visual call on whether or not it's
something I personally would consider spam or not.

It's allowed me to question both certain blacklist opponents (like the
APEWS-hating dude here who claimed it blocks lots of legitimate mail,
even though I've only seen one FP so far), and blacklist proponents
(like those who loudly proclaim that a list like Fiveten works as
designed--which is a perfectly fine, though incomplete answer). The
data is what it is. I've share both data that affirms my suspicions
and is at odds with my suspicions, and hope to continue doing so.

Regards,
Al Iverson
My site: http://www.aliverson.com
My site on DNSBLs: http://www.dnsbl.com
More information on spam: http://www.spamresource.com

Chris Lewis

unread,

Apr 2, 2007, 3:13:17 PM4/2/07

to

According to Al <aliversonch...@gmail.com>:

> On Mar 30, 12:56 pm, "Chris Lewis" <cle...@nortel.com> wrote:

> I certainly wouldn't want to stop you from signing up yourself. It is
> always good to have more folks tracking how senders use email
> addresses, for better or worse.

Actually, we did ;-) But I have other fish to fry than something
like this.

> > I may well misunderstand, but in my defence, I'll point out that
> > you're not being very clear on what you _are_ signing up for. Making
> > _that_ part of the "test environment description" would help considerably.
>
> Well, I certainly strive to provide as much transparency as possible
> regarding what data I'm collecting and summarizing.
>
> I went to sites run by various companies and newsletters. I signed up
> directly for their own lists, and I'm tracking which of the solicited
> mail I'm receiving in response would be impacted by DNSBL listings for
> many DNSBLs. I started by working myself through a list of the top
> retailers I found on some business website (Fortune?), searching for
> their websites, signing up for their mailings. I continued by
> searching for phrases like "newsletter signup" in Google, and if it
> was something I had heard of or seems legitimate or harmless, I've
> signed up for it. I've further given out addresses to sites I've
> stumbled across in my day-to-day surfing.

Your example listings are good in one area, but I don't think you're
documenting it well enough what your criteria are.

I can understand the reluctance to sign up for geeky and/or extremely
high volume things. On the other hand, people do sign up
for hobby/craft/special interest _non_ commercial lists in large
numbers. By avoiding those, you're leaving out a large part
of the picture of what your site seems to imply you're doing,
so some additional detail on the web site is in order.

> Tech Republic,

[I won't go into the trouble I had with unsubscribing to that... ;-)]

> However: Unlike a DNSBL operator, or you as the gatekeeper at a large
> corporate site, I have what is potentially a different insight into
> false positives. You might get a false positive report once a month,
> and perhaps don't know what previous attempted mail from that IP was
> blocked.

We're quite well aware of all email from any given IP that hits
our system. Our tools are oriented towards checking out each
FP report compared to what we've caught/rejected earlier. If only
to forward multiple mistakenly blocked emails from one FP report.
Or, to say "no, we're not unblocking this".

> It's allowed me to question both certain blacklist opponents (like the
> APEWS-hating dude here who claimed it blocks lots of legitimate mail,
> even though I've only seen one FP so far), and blacklist proponents
> (like those who loudly proclaim that a list like Fiveten works as
> designed--which is a perfectly fine, though incomplete answer). The
> data is what it is. I've share both data that affirms my suspicions
> and is at odds with my suspicions, and hope to continue doing so.

Your sample set is pretty small, and some derived from a unavoidably
skewed selection. Skewed isn't "bad", as long as it's clearly documented
and understood to not necessarily be reasonable to derive broader
conclusions.

I don't believe APEWS wouldn't turn out nearly as well on a larger
and more-real-life sample.

As a more concrete example - remember the study the FTC funded
and publicized at the FTC Spam Panel? It said, amongst other things,
that if you stopped using your email address in "unsafe ways",
the spam volume drops off.

Their sample size was < 100 userids and the duration of the test
was on the order of 60 days.

In contrast, my sample of 140,000 addresses that had been completely
decommissioned went from 110,000 spams/day the day before the addresses
were rendered 100% undeliverable, and had risen to 600K/day 18 months
later when I turned them on again.
--
Chris Lewis,

Age and Treachery will Triumph over Youth and Skill
It's not just anyone who gets a Starship Cruiser class named after them.

--

Matthew Sullivan

unread,

Apr 2, 2007, 4:40:02 PM4/2/07

to

Chris Lewis wrote:
> According to Al <aliversonch...@gmail.com>:
>> On Mar 30, 12:56 pm, "Chris Lewis" <cle...@nortel.com> wrote:
>
>> It's allowed me to question both certain blacklist opponents (like the
>> APEWS-hating dude here who claimed it blocks lots of legitimate mail,
>> even though I've only seen one FP so far), and blacklist proponents
>> (like those who loudly proclaim that a list like Fiveten works as
>> designed--which is a perfectly fine, though incomplete answer). The
>> data is what it is. I've share both data that affirms my suspicions
>> and is at odds with my suspicions, and hope to continue doing so.
>
> Your sample set is pretty small, and some derived from a unavoidably
> skewed selection. Skewed isn't "bad", as long as it's clearly documented
> and understood to not necessarily be reasonable to derive broader
> conclusions.

Chris,

Well said, this was my point, which I could not have put so eloquently.

Regards,

Matthew

Larry M. Smith

unread,

Apr 2, 2007, 7:44:44 PM4/2/07

to

Chris Lewis wrote:
(snip)

> I can understand the reluctance to sign up for geeky and/or extremely
> high volume things. On the other hand, people do sign up
> for hobby/craft/special interest _non_ commercial lists in large
> numbers. By avoiding those, you're leaving out a large part
> of the picture of what your site seems to imply you're doing,
> so some additional detail on the web site is in order.

Political; WI-Dems can't spell, and I used to have the trap that proved
it!, a misspelled domain. Well my dataset at least pointed out that
there are a *lot* of Democrats in Southern Wisconsin.

I couldn't use it as an automated trap, nor use any aggregate data from
it. But darn, do they pass those lists around. Is it just the
Democrats? Most likely not, but because my trap represented a
misspelled ISP in a "blue" state, this is what I ended up with.

I think Al, better than most, knows what he's seeing in the dataset, and
he knows that *normally* list-serv operators don't force sub people onto
other lists. As I see it, Al is on a slow start, and knowing that he
could never subscribe to all the lists out there; he is looking for
areas that he wants to track. But then again... I'm making assumptions
and could be completely wrong.

It does however appear to me that he is documenting this experiment well
enough to show what types of lists he is subscribing to. Should he wish
to expand this, I'm pretty confident that his documentation will reflect it.

SgtChains

Al

unread,

Apr 3, 2007, 9:24:22 AM4/3/07

to

On Apr 2, 3:40 pm, Matthew Sullivan <usenet-n...@sorbs.net> wrote:

> Well said, this was my point, which I could not have put so eloquently.

I'm open to suggestions on further information to include or more
disclaimers or clarifiers to add. Don't know that I will agree with
all, but I'll listen.

To wit, you basically indicated to me in email that my data is very US-
centric. Which is true...there are a few Candian and UK senders, but
not very many, and no APAC. I hope to increase this over time, and am
happy to highlight that the data is very US-centric. (I don't think
that's necessarily bad, either. If an ISP in Minnesota is going to use
SORBS or another DNSBL, their mailstream is not going to be very "Asia-
specific" as yours may be.)

Regards,
Al Iverson
My site: http://www.aliverson.com
My site on DNSBLs: http://www.dnsbl.com
More information on spam: http://www.spamresource.com

--

Al

unread,

Apr 3, 2007, 9:23:18 AM4/3/07

to

On Apr 2, 2:13 pm, "Chris Lewis" <cle...@nortel.com> wrote:
> > However: Unlike a DNSBL operator, or you as the gatekeeper at a large
> > corporate site, I have what is potentially a different insight into
> > false positives. You might get a false positive report once a month,
> > and perhaps don't know what previous attempted mail from that IP was
> > blocked.
>
> We're quite well aware of all email from any given IP that hits
> our system. Our tools are oriented towards checking out each
> FP report compared to what we've caught/rejected earlier. If only
> to forward multiple mistakenly blocked emails from one FP report.
> Or, to say "no, we're not unblocking this".

Indeed. I didn't say "unaware." I said "different insight."

Checking logs to see what else has been blocked gives you insight into
things like:
- # of previous attempts from IP
- volume of previous attempts from IP
- attempted recipients from previous IP
- volume from envelope sender domain

These are good stats. I add to it:
- From address
- Subject line
- Ability to review content

> > It's allowed me to question both certain blacklist opponents (like the
> > APEWS-hating dude here who claimed it blocks lots of legitimate mail,
> > even though I've only seen one FP so far), and blacklist proponents
> > (like those who loudly proclaim that a list like Fiveten works as
> > designed--which is a perfectly fine, though incomplete answer). The
> > data is what it is. I've share both data that affirms my suspicions
> > and is at odds with my suspicions, and hope to continue doing so.
>
> Your sample set is pretty small, and some derived from a unavoidably
> skewed selection. Skewed isn't "bad", as long as it's clearly documented
> and understood to not necessarily be reasonable to derive broader
> conclusions.

Just out of curiosity, how much of your data set do you publicly
publish? I know you do lots of far reaching stuff with it; yet I'm not
seeing the data made available for peer review.

One thing that's dawning on me is this. I need to stop being explicit
about what's going into the hamtrap. In a general sense, I've been as
descriptive as I can be. If I am explicit about which IPs and which
senders I am highlighting, I am giving blacklist operators a chance to
skew the results by delisting various senders to improve their stats
as I measure them. If I allow or enable this, it pulls my goals into
question. My goal isn't to get specific senders delisted, nor am I an
anti-blacklist tool. Instead, I want to highlight what types of things
that people might want and see if they are blacklisted. This is
turning out to be a lot like a spamtrap; I need to protect it to
prevent listwashing (heh), to preserve the integrity of what I'm
trying to do here.

> Their sample size was < 100 userids and the duration of the test
> was on the order of 60 days.
>
> In contrast, my sample of 140,000 addresses that had been completely
> decommissioned went from 110,000 spams/day the day before the addresses
> were rendered 100% undeliverable, and had risen to 600K/day 18 months
> later when I turned them on again.

URL to your public data? If you're all about showing me that you do it
better, put it on a website so others can see it. The more data out in
the world, the better. You're also quick to dismiss my data and
reports as anecdotal and incomplete. What I see here is a pull quote
that looks interesting but hardly is a report with a conclusive
summary and supporting data.

Interestingly, my only ham hits from Spamhaus so far are against a co-
reg vendor that works with a popular big city newspaper. I've decided
to test hat one a bit deeper with another address and track in a more
detailed fashion where it goes and potentially share that info with
the world. I'm not convinced that the SBL listing is inappropriate,
from looking at it so far.

Similarly, mail that is intersecting with some other DNSBL may also be
suspect to the maintainers and core users of that DNSBL. Which is
perfectly fine, and it should be similarly fine to highlight that
others may desire that mail in spite of the DNSBL listing.

Regards,
Al Iverson
My site: http://www.aliverson.com
My site on DNSBLs: http://www.dnsbl.com
More information on spam: http://www.spamresource.com

--

Chris Lewis

unread,

Apr 3, 2007, 9:58:39 AM4/3/07

to

According to Larry M. Smith <SgtChai...@FahQ2.com>:

> It does however appear to me that he is documenting this experiment well
> enough to show what types of lists he is subscribing to. Should he wish
> to expand this, I'm pretty confident that his documentation will reflect it.

He's documenting his experiment here after being questioned about
it. The website documentation of the experiment in question appears
to consist only of the sentence:

I am directed solicited mail that I signed up for from over 400
senders, big and small.

On this thread, we find out that the selection criteria is much
narrower than that. It appears to have no discussion, marketing
or very narrow focus lists to name a few.

He's explained why he's not including those. But by omitting those
on the basis of difficulty (or skew or other technical reasons) he's
not operating with a general distribution of the sorts of things that
real people do, and of necessity the result isn't generally applicable.

But it certainly isn't pitched that way.

For one to be able to draw conclusions from such a study (eg: "spamhaus
is a very accurate blacklist" or "spamcop .. is nowhere near as
bad..."), one either has to have a general sample, or be very clear up
front on what the selection criteria is, so that readers of his
experiment know what population (if any) it applies to.

Otherwise, it's an experiment with completely unknown applicability
to the real world. I don't know about you, but studies with no
real world implications aren't very interesting.
--
Chris Lewis,

Age and Treachery will Triumph over Youth and Skill
It's not just anyone who gets a Starship Cruiser class named after them.

--

Chris Lewis

unread,

Apr 3, 2007, 1:40:16 PM4/3/07

to

According to Al <aliversonch...@gmail.com>:

> On Apr 2, 3:40 pm, Matthew Sullivan <usenet-n...@sorbs.net> wrote:

> > Well said, this was my point, which I could not have put so eloquently.

> I'm open to suggestions on further information to include or more
> disclaimers or clarifiers to add. Don't know that I will agree with
> all, but I'll listen.

> To wit, you basically indicated to me in email that my data is very US-
> centric. Which is true...there are a few Candian and UK senders, but
> not very many, and no APAC. I hope to increase this over time, and am
> happy to highlight that the data is very US-centric. (I don't think
> that's necessarily bad, either. If an ISP in Minnesota is going to use
> SORBS or another DNSBL, their mailstream is not going to be very "Asia-
> specific" as yours may be.)

Despite not being American, I'd settle for US-centric. ;-)

The main thing I would suggest is a very clear description of
what qualifies/disqualifies a list for inclusion, and a discussion
of where that would/would not be representative of "real life".

Not so much regional perhaps, but rather of the sorts of lists
subscribed to.

A representative sample of the sorts of lists you subscribe
to would be nice, but that might not be a good idea.
--
Chris Lewis,

Age and Treachery will Triumph over Youth and Skill
It's not just anyone who gets a Starship Cruiser class named after them.

--

Chris Lewis

unread,

Apr 3, 2007, 2:09:22 PM4/3/07

to

According to Al <aliversonch...@gmail.com>:

> On Apr 2, 2:13 pm, "Chris Lewis" <cle...@nortel.com> wrote:

> > We're quite well aware of all email from any given IP that hits
> > our system. Our tools are oriented towards checking out each
> > FP report compared to what we've caught/rejected earlier. If only
> > to forward multiple mistakenly blocked emails from one FP report.
> > Or, to say "no, we're not unblocking this".

> Indeed. I didn't say "unaware." I said "different insight."

> Checking logs to see what else has been blocked gives you insight into
> things like:
> - # of previous attempts from IP
> - volume of previous attempts from IP
> - attempted recipients from previous IP
> - volume from envelope sender domain
>
> These are good stats. I add to it:
> - From address
> - Subject line
> - Ability to review content

I've got those too. We quarantine everything, and the interface
(provided you're authorized) gives access to anything you want,
including full message bodies.

The bare logs have from, subject and other things.

> > Your sample set is pretty small, and some derived from a unavoidably
> > skewed selection. Skewed isn't "bad", as long as it's clearly documented
> > and understood to not necessarily be reasonable to derive broader
> > conclusions.

> Just out of curiosity, how much of your data set do you publicly
> publish? I know you do lots of far reaching stuff with it; yet I'm not
> seeing the data made available for peer review.

For me to put up a web site such as yours (publically facing
continuous updates) would require almost a lifetime's worth of
poking at media relations and upper management. It's probably
isn't a worthwhile effort ;-) Worse, if I named names, I'd go poof!

It's not our business line and there is no business incentive
to approving it.

Most other Internet organizations with anything approaching sufficient
size don't either. Unless it's their business line, and if it is, their
numbers have a certain amount of suspicion attached...

That said, I have published data in a number of places, leading
to at least one person saying that I was the _only_ non-spam-vendor
willing to discuss real numbers more than single isolated datapoints.

[It's not quite true. There are some ISPs that do. Now.]

I'm not "into" publishing tech papers, I'm into getting things done.

The VB2004 conference proceedings has more formal numbers. The FTC Spam
Panel proceedings did too.

> One thing that's dawning on me is this. I need to stop being explicit
> about what's going into the hamtrap. In a general sense, I've been as
> descriptive as I can be. If I am explicit about which IPs and which
> senders I am highlighting, I am giving blacklist operators a chance to
> skew the results by delisting various senders to improve their stats
> as I measure them. If I allow or enable this, it pulls my goals into
> question. My goal isn't to get specific senders delisted, nor am I an
> anti-blacklist tool. Instead, I want to highlight what types of things
> that people might want and see if they are blacklisted. This is
> turning out to be a lot like a spamtrap; I need to protect it to
> prevent listwashing (heh), to preserve the integrity of what I'm
> trying to do here.

I guess. However, I suspect that the DNSBLs that people usually
use wouldn't pay attention, and those that people usually don't use
could just as easily list new IPs to skew their stats the way opposite
to what you think ;-)

[I'd not put it past FiveTen to list all of the IPs in your sample ;-)
Well, maybe not FiveTen. Spambag maybe.]

> > Their sample size was < 100 userids and the duration of the test
> > was on the order of 60 days.
> >
> > In contrast, my sample of 140,000 addresses that had been completely
> > decommissioned went from 110,000 spams/day the day before the addresses
> > were rendered 100% undeliverable, and had risen to 600K/day 18 months
> > later when I turned them on again.
>
> URL to your public data?

It's on the FTC proceedings video tape and google will probably
find many "publications" thereof ;-)

> If you're all about showing me that you do it
> better, put it on a website so others can see it. The more data out in
> the world, the better. You're also quick to dismiss my data and
> reports as anecdotal and incomplete. What I see here is a pull quote
> that looks interesting but hardly is a report with a conclusive
> summary and supporting data.

Presentations has been made. Yes, my stuff isn't "formal studies"
with double-blind, peer review etc. etc. etc., but I'm not presenting
it as fully public either. You're trying to do the latter, it behooves
you to caveat it as necessary, and I'm one of your peer reviewers ;-)

[I think you've been present when I've presented some of the stuff
I have _with_ all of the supporting data/caveats]

> Interestingly, my only ham hits from Spamhaus so far are against a co-
> reg vendor that works with a popular big city newspaper. I've decided
> to test hat one a bit deeper with another address and track in a more
> detailed fashion where it goes and potentially share that info with
> the world. I'm not convinced that the SBL listing is inappropriate,
> from looking at it so far.

If you let me know the IPs you're talking about by email, I could
probably give you a wider perspective on them without revealing
who/what they are to anyone. Which you could take/leave/comment
on as you wish, I won't say anything.

> Similarly, mail that is intersecting with some other DNSBL may also be
> suspect to the maintainers and core users of that DNSBL. Which is
> perfectly fine, and it should be similarly fine to highlight that
> others may desire that mail in spite of the DNSBL listing.

In a public study it makes sense to actually figure out why such
things occur, and at least outline the decision points.
--
Chris Lewis,

Age and Treachery will Triumph over Youth and Skill
It's not just anyone who gets a Starship Cruiser class named after them.

--

Al

unread,

Apr 3, 2007, 3:48:30 PM4/3/07

to

On Apr 3, 1:40 pm, cle...@nortelnetworks.com (Chris Lewis) wrote:

> A representative sample of the sorts of lists you subscribe
> to would be nice, but that might not be a good idea.

That information is already there. I guess I can't stop you from
continuing to claim otherwise, but I don't know what else to tell you,
other than the truth. The vast majority of info you're taking me to
task over, claiming that it's only being documented here, has
*already* been documented by me on the website.

You quoted and take issue with: "In this "hamtrap" I am directed

solicited mail that I signed up for from over 400 senders, big and
small"

This is from this page:
http://www.dnsbl.com/2007/03/how-well-do-various-blacklists-work.html
This page links to the actual data. It says go check it out for
yourself. That link goes to http://stats.dnsbl.com/

On http://stats.dnsbl.com/
There is a link that says: For information on how this data is
compiled, click here.
The click here links to:
http://stats.dnsbl.com/2007/03/about-this-data-this-data-is-compiled.html
Which is information substantially similar to what I've shared here.
I created that page on March 17th.
Since March 17th I've added only one sentence, a link to a NANAB post
of mine.

Regards,
Al Iverson
My site: http://www.aliverson.com
My site on DNSBLs: http://www.dnsbl.com
More information on spam: http://www.spamresource.com

--

Erik Warmelink

unread,

Apr 4, 2007, 9:31:49 AM4/4/07

to

In article <1175607982.9...@q75g2000hsh.googlegroups.com>,
"Al" <aliversonch...@gmail.com> writes:

> [...] My goal isn't to get specific senders delisted, nor am I an
> anti-blacklist tool.

If that isn't your goal, why do you measure the listing of opt-out
spammers as a false positive? Opt-out bulk email is unsolicited
bulk email (i.e. spam) for every recipient who did not solicit it.

That kind of spam may not be illegal in the US, but spam it is.
I don't care about "easy" instructions in a foreign language to
unsubscribe, I want to stop the spam.

--
er...@selwerd.nl

E-Mail Sent to this address will be added to the BlackLists

unread,

Apr 4, 2007, 2:38:22 PM4/4/07

to

Al wrote:
...

> <http://www.dnsbl.com/2007/03/how-well-do-various-blacklists-work.html>
> This page links to the actual data.
> It says go check it out for yourself.
> That link goes to <http://stats.dnsbl.com/>
> On <http://stats.dnsbl.com/>
> There is a link that says:
> For information on how this data is compiled, click here.
> The click here links to:
> <http://stats.dnsbl.com/2007/03/about-this-data-this-data-is-compiled.html>
> Which is information substantially similar to what I've
> shared here.

I looked briefly, (perhaps I missed it);
I didn't see anything that said which specific zones
queried & return codes were used.

{Some return codes are not about Spam, such as FiveTen's
return code 11. (I suppose it is considered potential
spam from IP of companies that make marketing phone calls.)

--
E-Mail Sent to this address <Blac...@Anitech-Systems.com>
will be added to the BlackLists.

Chris Lewis

unread,

Apr 11, 2007, 12:41:11 PM4/11/07

to

According to Al <aliversonch...@gmail.com>:

> On Apr 3, 1:40 pm, cle...@nortelnetworks.com (Chris Lewis) wrote:
>
> > A representative sample of the sorts of lists you subscribe
> > to would be nice, but that might not be a good idea.

> That information is already there. I guess I can't stop you from
> continuing to claim otherwise, but I don't know what else to tell you,
> other than the truth. The vast majority of info you're taking me to
> task over, claiming that it's only being documented here, has
> *already* been documented by me on the website.

The material you've got there is a coarse example of what lists you'd
use, but it doesn't give counterexamples of what you wouldn't. You
really do need to talk more about what you don't subscribe to,
and have a discussion on what that all means.

Secondly, as mentioned elsewhere, for completeness you should
list exactly what DNSBL queries you're doing/codes accepted. Yes,
the vast unwashed won't know what that means. But for it to be
useful to derive more conclusions from, you have to be quite specific.
--
Chris Lewis,

Age and Treachery will Triumph over Youth and Skill
It's not just anyone who gets a Starship Cruiser class named after them.

--