Default/sample "bad_from" file contents (targets RFC 2606 entries, common
syntax errors, and Microsoft TLD abuse):
^@
@\.
@[^(, ]*\.\.
@[[:alnum:]]*@
@(.+\.)?test>?$
@.+\.phx\.gbl>?$
@(.+\.)?[0-9]*>?$
@(.+\.)?localhost>?$
@(.+\.)?localdomain>?$
@(.+\.)?example(\.[a-z]{3,})?>?$
Note: The entry containing [0-9] also matches where "@" is the last
character - i.e. no domain-part. Such might not be obvious at first
glance. The entry containing "example" matches gTLDs only as "example.de",
a match with a ccTLD, is a valid, registered domain.
Cleanfeed patch:
--- ~news/bin/filter/cleanfeed.txt 2009-10-13 12:23:08.000000000 +0000
+++ ~news/bin/filter/filter_innd.pl 2009-11-29 23:22:37.000000000 +0000
@@ -5,7 +5,8 @@
#
# Modified by Steve Crook <st...@mixmin.net>
# Redistributed in accordance with the terms of the Artistic license.
-# $Id: cleanfeed 271 2009-10-13 12:23:08Z cleanfeed $
+# $Id: cleanfeed 271 2009-10-13 12:23:08z cleanfeed $
+$cleanfeed_version= '271 (2009-10-13 12:23:08z)';
#
# This software is distributed under the terms of the Artistic License.
# Please see the LICENSE file in the distribution.
@@ -88,9 +89,10 @@
low_xpost_maxgroups => 6, # max xposts in low_xpost_groups
meow_ext_maxgroups => 2, # max xposts from meow_groups to other
groups
+ politics_ext_maxgroups => 2, # max xposts from politics_groups to
other groups
off_topic_maxgroups => 2, # How many off topic groups allowed in a
distro
binaries_in_mod_groups => 0, # allow binaries in moderated groups?
max_encoded_lines => 15, # number of encoded binary lines to allow
max_base64_lines => 200, # Allow x lines of content containing
Base64
# in non-binary groups
@@ -173,6 +175,7 @@
meow_groups => '^alt\.fan\.karl-malden\.nose|^alt\.flame|^alt\.troll'.
'|^alt\.alien\.vampire\.flonk\.flonk\.flonk|^alt\.romath'.
'|^alt\.snuh|^alt\.fan\.natasha',
+ politics_groups => '\.politi[ck]',
### Topic groups allow administrators to limit crossposting from
defined
### groups to undefined groups. The allowed number of groups is
defined
@@ -298,8 +301,8 @@
phl_exempt phl_exclude supersedes_exempt bad_nph_hosts
phn_exempt phr_exempt phn_exclude flood_groups
refuse_messageids net_abuse_groups spam_report_groups
- adult_groups not_adult_groups faq_groups ratio_exclude
- image_allowed image_extensions meow_groups
+ adult_groups not_adult_groups faq_groups
politics_groups
+ image_allowed image_extensions meow_groups
ratio_exclude
topic1_groups topic2_groups off_topic_maxgroups)) {
if (defined $config_append{$_}) {
$config{$_} .= "|$config_append{$_}";
@@ -625,6 +628,8 @@
and /$config{low_xpost_groups}/o;
$gr{meow}++ if $config{meow_groups}
and /$config{meow_groups}/o;
+ $gr{politics}++ if $config{politics_groups}
+ and /$config{politics_groups}/o;
$gr{topic1}++ if $config{topic1_groups}
and /$config{topic1_groups}/o;
$gr{topic2}++ if $config{topic2_groups}
@@ -783,10 +788,14 @@
return reject('Too many newsgroups (meow)', 'Too many newsgroups')
if $gr{meow}
- #and $gr{meow} != scalar @groups
and $config{meow_ext_maxgroups}
and ($fupcnt - $gr{meow}) > $config{meow_ext_maxgroups};
+ return reject('Too many newsgroups (politics)', 'Too many
newsgroups')
+ if $gr{politics}
+ and $config{politics_ext_maxgroups}
+ and ($fupcnt - $gr{politics}) >
$config{politics_ext_maxgroups};
+
return reject('Off-topic newsgroups (topic1)', 'Off-topic
newsgroups')
if $gr{topic1}
and $gr{topic1} > 1
@@ -889,7 +898,6 @@
return reject('HTML post')
if $lch{'content-type'} =~ m#text/html#;
-
# MIME HTML without Text
return reject('HTML Multipart without Text/Plain.')
if $lch{'content-type'} =~ /multipart/
@@ -1113,16 +1121,15 @@
if $hdr{Subject} =~ m#\[(\d+)/(\d+)\]$# and $1 > $2;
#
- # Reject bad From headers. We also check Sender and Reply-To
headers
- # against the same Regular Expression.
+ # Reject bad From/Sender/Reply-To headers.
if ($Bad_From) {
- return reject("Banned Reply-To ($1)", 'Bad Reply-To')
- if $hdr{'Reply-To'} =~ $Bad_From_RE;
- return reject("Banned Sender ($1)", 'Bad Sender')
+ return reject("Bad Reply-To ($hdr{'Reply-To'})",'Bad Reply-To')
+ if $hdr{'Reply-To'} =~ qr/($Bad_From|@(.+\.)?invalid>?$)/i;
+ return reject("Bad Sender ($hdr{Sender})",'Bad Sender')
if $hdr{Sender} =~ $Bad_From_RE;
- return reject("Banned From ($1)", 'Bad From')
+ return reject("Bad From ($hdr{From})", 'Bad From')
if $hdr{From} =~ $Bad_From_RE;
- };
+ }
# Reject bad Subject headers.
return reject("Subject ($1)", 'Bad Subject')
@@ -1586,7 +1593,7 @@
return '' if not $config{do_mid_filter};
my ($id) = @_;
- if ($config{refuse_messageids} and $id =~
/$config{refuse_messageids}/o) {
+ if ($config{refuse_messageids} and $id =~
/$config{refuse_messageids}/io) {
$status{refused}++;
return 'No';
}
@@ -1654,6 +1661,7 @@
. "<title>Cleanfeed Status</title>\n"
. "</head>\n<body>\n\n"
. "<p>\n"
+ . "<b>Cleanfeed Version:</b> " . $cleanfeed_version . "<br>\n"
. "<b>Filter started:</b> " . scalar(localtime $Start_Time) . "<br>\n"
. "<b>Report generated:</b> " . scalar(localtime) . "<br>\n"
. 'Uptime: ' . ($now - $Start_Time) . " seconds\n"
@@ -1750,7 +1758,8 @@
slog('E', "Cannot open $config{statfile}: $!");
return;
}
- print FILE 'Filter started: ' . scalar(localtime $Start_Time) . "\n"
+ print FILE 'Cleanfeed Version: ' . $cleanfeed_version . "\n"
+ . 'Filter started: ' . scalar(localtime $Start_Time) . "\n"
. 'Report generated: ' . scalar(localtime) . "\n"
. 'Uptime: ' . ($now - $Start_Time) . " seconds\n\n"
. "Accepted: $status{accepted}\nRejected: $status{rejected}\n";
It makes sense to required a working address in the "Reply-To:"
header. With that list in the "bad_from" file non-working addresses
like any...@example.invalid in the "From:" header will be rejected;
would you explain why this should be done?
Or: any...@127.0.0.1 or localhost; do those addresses create
problems for email servers?
>> Changes:
>> 1) Print cleanfeed release number/date into statistics files.
Done. The version and date are now extracted from svn:keywords and
auto-inserted into $version and $version_date.
>> 2) Filter for politics groups need not consume one of the two local
>> topics.
I'm not sufficiently convinced that this is a good idea. I added the
topic filters to provide operators with a means to moderate their
services if they so choose, not as a global policy.
In its default form, Cleanfeed is a spam filter and whilst it has
functionality to do more, I think that should be left to the operators
discretion. Much of the functionality I've added recently has been
aimed at improving it's configurability at a local level which is why
many things (like topic filters) are turned off by default.
>> 3) Fix "Reply-To:" header check. ".invalid" TLD not valid for it.
This is reasonable but at the moment Cleanfeed uses the same filter for
From, Reply-To and Sender headers. Whilst f...@bar.invalid might be
abuse of a Reply-To header, it's quite correct in a From header. Is
there justification for implementing this by default when you could just
use a local entry like:
return reject("Invalid Reply-To")
if $hdr{'Reply-To'} =~ /\.invalid$/;
>> 4) Make message-ID checking case insensitive.
Done.
>> Default/sample "bad_from" file contents (targets RFC 2606 entries, common
>> syntax errors, and Microsoft TLD abuse):
>> ^@
>> @\.
>> @[^(, ]*\.\.
>> @[[:alnum:]]*@
>> @(.+\.)?test>?$
>> @.+\.phx\.gbl>?$
>> @(.+\.)?[0-9]*>?$
>> @(.+\.)?localhost>?$
>> @(.+\.)?localdomain>?$
>> @(.+\.)?example(\.[a-z]{3,})?>?$
This is a good example and I'll include it in the examples section. I
don't think it's a good default as RFC abuse of the From header doesn't
always imply spam. I suspect this would reject thousands of posts every
day that are legitimate in all other respects.
I figured you'd convert that to whatever generated your versioning.
Thanks.
> >> 2) Filter for politics groups need not consume one of the two local
> >> topics.
> I'm not sufficiently convinced that this is a good idea. I added the
> topic filters to provide operators with a means to moderate their
> services if they so choose, not as a global policy.
>
> In its default form, Cleanfeed is a spam filter and whilst it has
> functionality to do more, I think that should be left to the operators
> discretion. Much of the functionality I've added recently has been
> aimed at improving it's configurability at a local level which is why
> many things (like topic filters) are turned off by default.
I thought about using one of the two offered "topic[12]" fields for the
politics filter. However, here in the U.S., there's alot of traffic (and
alot of misplaced traffic). Repeatedly, the filter shows up in my list of
reasons for article blocking among the top 5. The EMP filter usually
occupies the top spot.
My top-10 filter rejection reasons for November 29, 2009 (midnight to
midnight, PST):
94338 articles accepted. 26912 Rejected (or 22.195%).
Reason Count (percentage of rejection)
#1 EMP (md5) 10464 (38.882%)
#2 EMP (phn path) 7947 (29.530%)
#3 Too many newsgroups (politics) 1635 (6.075%)
#4 Binary Image: misplaced jpg 1165 (4.329%)
#5 Subject (~~~) 914
EMP (phn nph) 783
EMP (fsl) 544
HTML embedded image 354
Cancel for rejected article 344
Body (http://natsjobs.com) 332
At #15:
Too many newsgroups (meow) 142 (0.528%)
Being the #3 rejection reason is a good reason for it to have its own
sub-filter. 2 out of every 33 rejections occur for this reason. About 1
in 74 (or 6 in 445)articles are matched by this cross-posting filter.
If someone doesn't want this locally, they may turn up the
non-political-group crosspost count (defaults to 2) to a higher value.
There also seems to be a spammer who is posting to random political groups
articles between 185,000-200,000 bytes about every half hour (crossposted
to 5 groups at a time) which isn't being caught by this filter. However,
he also posts to other groups. Examples:
<pnr7h59b597vdt57k...@4ax.com> 195950
soc.culture.israel,alt.politics,us.politics,alt.politics.liberal,alt.politi
cs.liberalism
<4gs7h5p7eish91ucg...@4ax.com> 196309
alt.sci.physics,sci.logic,rec.org.mensa,alt.fan.tolkien,alt.fan.letterman
<rms7h5h6aivst3igo...@4ax.com> 196339
alt.arabic.politics,talk.politics.mideast,soc.culture.arabic,alt.politics.b
ritish,nz.general
Maybe in Europe, you're not seeing these abusive articles. However, they
are happening in North America.
> >> 3) Fix "Reply-To:" header check. ".invalid" TLD not valid for it.
> This is reasonable but at the moment Cleanfeed uses the same filter for
> From, Reply-To and Sender headers. Whilst f...@bar.invalid might be
> abuse of a Reply-To header, it's quite correct in a From header. Is
> there justification for implementing this by default when you could just
> use a local entry like:
> return reject("Invalid Reply-To")
> if $hdr{'Reply-To'} =~ /\.invalid$/;
Did you look at how I implemented it? It starts with the same string as
for From/Sender, appending ".invalid" as a TLD.
- if $hdr{'Reply-To'} =~ $Bad_From_RE;
+ if $hdr{'Reply-To'} =~ qr/($Bad_From|@(.+\.)?invalid>?$)/i;
The ONLY reason to have a reply-to header is when the from header is
invalid to provide a VALID mailbox. Providing a specifically different bad
mailbox than the From header is what some newsgroup spammers are doing, and
makes no sense. If one doesn't want replies at all and the From header is
already pointing at a bogus mailbox, the Reply-to header should be OMITTED
(as it's optional to begin with). In other words, when "Reply-To:" is
used, it should be with a valid, reachable mailbox.
> >> 4) Make message-ID checking case insensitive.
> Done.
>
> >> Default/sample "bad_from" file contents (targets RFC 2606 entries,
common
> >> syntax errors, and Microsoft TLD abuse):
> >> ^@
> >> @\.
> >> @[^(, ]*\.\.
> >> @[[:alnum:]]*@
> >> @(.+\.)?test>?$
> >> @.+\.phx\.gbl>?$
> >> @(.+\.)?[0-9]*>?$
> >> @(.+\.)?localhost>?$
> >> @(.+\.)?localdomain>?$
> >> @(.+\.)?example(\.[a-z]{3,})?>?$
> This is a good example and I'll include it in the examples section. I
> don't think it's a good default as RFC abuse of the From header doesn't
> always imply spam. I suspect this would reject thousands of posts every
> day that are legitimate in all other respects.
Including it as an example is acceptable. My personal version of the file
has about 15 more entries targeting specific spammers and trolls.
I find that daily, these rules reject between 800 and 1200 articles per
day. I've examined the articles (when they show up at google-groups) and
most were either spam or trolling. Rules 1-4 and 7 target syntax errors,
rules 5 and 8-10 target RFC 2606 abuses, and rule 6 targets Microsoft's
abuse of a nonexistent TLD. Rule 7 also catches NON-BRACKETED IP address
literals - which are not valid for domain names. Only bracketed ("[...]")
address literals are valid. Rule 7 will also catch the "long integer"
format of an IP address ("@#0123...") which was depreciated a while ago but
still supported by some mailers.
Rule 4, "the @@" rule, does occasionally have a false positive from some
people who are trying to be "cute elites" that substitute @ for A in their
handles. However, I find that most hits do occur from including two "@"s
in a mailbox, thus breakng the syntax rules.
That may be the ONLY reason you can remember, but I did inform you a
couple of years ago why I use the Reply-To data shown my header.
As a reminder, it is an "alarm" that a newsgroup user will get should
they try to e-mail me. I don't want e-mail replies to newsgroup
articles. Any reply should be posted into the public newsgroup.
Should I not use the munged data in the field, then the message will be
accepted by the sender's MTA, and use unnecessary resources across the
Internet, including the return bounce.
If the sender doesn't receive a bounce for some reason, then they do not
know that their reply was never received, and may assume I don't care to
respond.
Additionally, the munged Reply-To field fails at the MUA and uses no
resources beyond their computer.
--
John
> The ONLY reason to have a reply-to header is when the from header is
> invalid to provide a VALID mailbox. Providing a specifically different bad
> mailbox than the From header is what some newsgroup spammers are doing, and
> makes no sense. If one doesn't want replies at all and the From header is
> already pointing at a bogus mailbox, the Reply-to header should be OMITTED
> (as it's optional to begin with). In other words, when "Reply-To:" is
> used, it should be with a valid, reachable mailbox.
Please don't make up 'rules'! There is no official - i.e. RFC et al -
support for your *opinion*. Using your opinion in your local policy is
of course fine, but it has no place as a general cleanfeed rule.
[...]
I'm not making up a rule; just simply applying COMMON SENSE.
The defined purpose of the "Reply-To:" header is to redirect replies to a
mailbox other than the one specified in the "From:" header. It is clear
from the RFCs and standards document that:
1) The "Reply-To:" header should specify a valid mailbox, and
2) The "Reply-To:" header should NOT specify the same mailbox as the
"From:" header.
Therefore, there is NO POINT to having a "Reply-To:" header that specifies
an invalid mailbox, especially (but not limited to the case) where the
"From:" mailbox itself specifies an invalid mailbox (whether different or
not). Furthermore, I have found that those Usenet articles which do have
an invalid mailbox address in the "Reply-To:" header data are often from
trolls and spammers.
The conclusion I have on this is a valid interpretation of the rules which
are actually stated or proposed. Tell me why you feel the conclusion is
wrong.
It's indeed common sense and I happen to agree with that common sense,
but a program like Cleanfeed should not have *default* rules which
happen to be common sense to you and me, but which rejects articles
which are not *invalid*, i.e. they don't violate any de jure standards.
That's all I'm saying. Make it a local policy - your server, your
rules - but keep it out Cleanfeed's defaults. And everybody lived
happily ever after.
[...]
OK. However, can you give an example where using the ".invalid" TLD in a
mailbox specified on a "Reply-To:" header is proper, beyond syntactic
correctness.
I say that in addition to common sense, we do have RFC authority for
disallowing this. It's not stated in a single RFC by itself but by a
combination of them, including but not limited to 2606, 3977, and 5322.
> OK. However, can you give an example where using the ".invalid" TLD in a
> mailbox specified on a "Reply-To:" header is proper, beyond syntactic
> correctness.
I agree, it's never correct to do it. You could also say it's not
correct to have the same address in the From and Reply-To. There are
plenty of similar annoyances that have become tolerated (if not
accepted) on Usenet since the Eternal September. Whilst they might make
me grind my teeth, I cannot in good conscience filter them out.
If individual service providers choose to enforce rules of netiquette
then they have my blessing but I'd rather see them applied to local
posters rather than to propagated messages. I added an additional
function to cleanfeed.local to identify local messages, see
http://www.mixmin.net/cleanfeed/files.html#local_flag_localfeed or of
course filter_nnrpd.pl could be used instead.
For your final decision - Today's ".invalid" TLD Reply-To's:
(Duplicates removed - from my news logs)
Reply-To ("Dirk Goldgar" <d...@NOdataSPAMgnostics.com.invalid>)
Reply-To ("Escape_the_Cult_Now" <kill...@invalid.invalid>)
Reply-To ("Jefferson Holston" <jefferso...@invalid.gmail.invalid>)
Reply-To ("Max max" <1o1kve402t...@sneakemail.com.invalid>)
Reply-To ("Newsgroup only please, address is no longer replyable."
<b...@example.invalid>)
Reply-To ("Nunya Bidnits" <inv...@invalid.invalid>)
Reply-To ("jopa" <ad...@jp-web.invalid>)
Reply-To (Roedy Green <see_w...@mindprod.com.invalid>)
Reply-To (a.non...@example.invalid)
Reply-To (davidke...@gmail.invalid)
Reply-To (djee...@hotmail.invalid)
Reply-To (eucl...@Mlive.invalid)
Reply-To (ev...@theobvious.espphotography.com.invalid)
Reply-To (fle...@domaine.tld.invalid)
Reply-To (james_t.kirk@invalid)
Reply-To (look...@nospam.invalid)
Reply-To (n.darold....@virgilio.it.invalid)
Reply-To (no....@spam.invalid)
Reply-To (patrickr.dubois.don't.s...@free.fr.invalid)
Reply-To (pelti...@Mgmail.com.invalid)
Reply-To (reply.t...@your.provider.invalid)
Reply-To (to.reply.p...@end.of.message.com.invalid)
Reply-To (trebli...@icioula.com.invalid)
Reply-To (via-amz...@invalid.invalid)
Reply-To (wa8...@arrl.invalid)
11 out of 23 of them cannot have the "real" domain-part determined. The
others can, sometimes with a guess for completion.
> I'm not making up a rule; just simply applying COMMON SENSE.
>
> The defined purpose of the "Reply-To:" header is to redirect replies to a
> mailbox other than the one specified in the "From:" header. It is clear
> from the RFCs and standards document that
Incidentally, isn't it also clear from the RFCs that:
RFC 5536:
User agents MUST meet the definition of MIME conformance in [RFC2049]
and MUST also support [RFC2231].
RFC 2049:
A mail user agent that is MIME-conformant MUST:
(1) Always generate a "MIME-Version: 1.0" header field in
any message it creates.
I do not see a Mime-Version: header in your articles.
Nor do I see a correct Content-Type: header with a charset. For instance,
when you answer me, you use � � � in your articles, without specifying an
encoding (and it is neither US-ASCII nor UTF-8).
But for all that, does it mean that servers should filter all your articles?
:-)
--
Julien �LIE
� Pour d�fendre une cause, un avocat met sa robe. Une femme... l'enl�ve. �
The only safe conclusion you may make is that my client agent is NOT
MIME-conformant, and therefore, the requirements do not apply! ;-)
> "Julien ÉLIE" <iul...@nom-de-mon-site.com.invalid> wrote in message
> news:hfdge3$t3h$1...@news.trigofacile.com...
[----]
>> Incidentally, isn't it also clear from the RFCs that:
>>
>> RFC 5536:
>> User agents MUST meet the definition of MIME conformance in [RFC2049]
>> and MUST also support [RFC2231].
>>
>> RFC 2049:
>> A mail user agent that is MIME-conformant MUST:
>>
>> (1) Always generate a "MIME-Version: 1.0" header field in
>> any message it creates.
>>
>>
>> I do not see a Mime-Version: header in your articles.
>> Nor do I see a correct Content-Type: header with a charset. For instance,
>> when you answer me, you use « É » in your articles, without specifying an
>> encoding (and it is neither US-ASCII nor UTF-8).
>>
>> But for all that, does it mean that servers should filter all your articles?
>
> The only safe conclusion you may make is that my client agent is NOT
> MIME-conformant, and therefore, the requirements do not apply! ;-)
Have you missed the former RFC 5536 reference? If it is not MIME-conformant,
it is breaking 5536 too.
--
Szymon Sokół (SS316-RIPE) -- Network Manager B
Computer Center, AGH - University of Science and Technology, Cracow, Poland O
http://home.agh.edu.pl/szymon/ PGP key id: RSA: 0x2ABE016B, DSS: 0xF9289982 F
Free speech includes the right not to listen, if not interested -- Heinlein H
Aside from you side-stepping the *real* issue - yes, I saw the smiley,
no, it doesn't change your (too) outspoken opinions -, your client *is*
MIME-conformant, it's just *configured* incorrectly [1].
Anyway, did you get Julien's (justified) point/criticism?
[1] -> Tools -> Options... -> Send tab -> News Sending Format -> Plain
Text Settings... ->
You have probably "Message format" set to "Uuencode" (which is some
stupid OE default). If so, set it to "MIME", set "Encode text using:" to
"None" and do *not* set (i.e. no tic-mark) "Allow 8-bit characters in
headers".
> Note: The entry containing [0-9] also matches where "@" is the last
> character - i.e. no domain-part. Such might not be obvious at first
> glance. The entry containing "example" matches gTLDs only as "example.de",
> a match with a ccTLD, is a valid, registered domain.
In case it interests you (for your filters), two RFCs have just been
published about reserved IPs (not domain names):
http://www.rfc-editor.org/rfc/rfc5735.txt
http://www.rfc-editor.org/rfc/rfc5737.txt
Address Block Present Use Reference
------------------------------------------------------------------
0.0.0.0/8 "This" Network RFC 1122, Section 3.2.1.3
10.0.0.0/8 Private-Use Networks RFC 1918
127.0.0.0/8 Loopback RFC 1122, Section 3.2.1.3
169.254.0.0/16 Link Local RFC 3927
172.16.0.0/12 Private-Use Networks RFC 1918
192.0.0.0/24 IETF Protocol Assignments RFC 5736
192.0.2.0/24 TEST-NET-1 RFC 5737
192.88.99.0/24 6to4 Relay Anycast RFC 3068
192.168.0.0/16 Private-Use Networks RFC 1918
198.18.0.0/15 Network Interconnect
Device Benchmark Testing RFC 2544
198.51.100.0/24 TEST-NET-2 RFC 5737
203.0.113.0/24 TEST-NET-3 RFC 5737
224.0.0.0/4 Multicast RFC 3171
240.0.0.0/4 Reserved for Future Use RFC 1112, Section 4
255.255.255.255/32 Limited Broadcast RFC 919, Section 7
RFC 922, Section 7
--
Julien �LIE
� Omnia uincit Amor et nos cedamus Amori. � (Virgile)
I caught those, and sent a mail yesterday to the guys at cymru.com about it
(so they could add the two new reserved sections to their "bogons" list).
I do use that for my MAIL server.
The change would not affect my NNTP filters - as I look for UNBRACKETED IP
addresses only (which are all syntax errors - domain literals require
bracketed IP addresses, and I'm not currently looking at those because I
noted that when it does happen, it always happens correctly - i.e. with a
routable address).