Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: CLP stats: last 500 posts

116 views
Skip to first unread message
Message has been deleted

Jon Ribbens

unread,
Dec 9, 2016, 4:36:09 PM12/9/16
to
On 2016-12-09, DFS <nos...@dfs.com> wrote:
> import sys as y,nntplib as t,datetime as d
> s='<news server>'
> g=y.argv[1]
> n=t.NNTP(s,119,'<usr>','<pw>')
> r,a,b,e,gn=n.group(g)
> def printStat(st,hd,rg):
> r,d=n.xhdr(st,'%s-%s'%rg)
> p=[]
> for i in range(len(d)):
> v=d[i][1]
> if st=='Subject':v=v[4:] if v[:3]=='Re:' else v
> p.append(v)
> x=[(i,p.count(i)) for i in set(p)]
> x.sort(key=lambda s:(-s[1],s[0].lower()))
> print('Posts %s %s'%(len(set(p)),hd))
> for v in x: print(' %s %s'%(v[1],v[0]))
> print
> print 'As of '+d.datetime.now().strftime("%I:%M%p %B %d, %Y") + '\n'
> m=(int(e)-int(y.argv[3])+1,int(e))
> printStat("From","Posters",m)
> printStat("Subject","Subjects",m)
> printStat("User-Agent","User-Agents",m)
> n.quit()

Was there ever an "International Obfuscated Python Code Contest"? ;-)

Chris Angelico

unread,
Dec 9, 2016, 4:49:27 PM12/9/16
to
On Sat, Dec 10, 2016 at 8:34 AM, Jon Ribbens <jon+u...@unequivocal.eu> wrote:
>
> Was there ever an "International Obfuscated Python Code Contest"? ;-)

I don't know, but if so, here's my entry:

print(*([0,"Fizz","Buzz","Fizzbuzz"][[3,0,0,1,0,2,1,0,0,1,2,0,1,0,0][i%15]]or
i for i in range(1,51)))

ChrisA

bream...@gmail.com

unread,
Dec 9, 2016, 6:53:51 PM12/9/16
to
On Friday, December 9, 2016 at 9:07:32 PM UTC, DFS wrote:

Usual drivel from someone who went into my email filters years ago snipped.

Quite how BartC and his "I don't do error handling" or "I don't know what a shell is" has survived on this list I'll never know, but what the heck.

Teachers need not reply.

Kindest regards.

Mark Lawrence.

Steve D'Aprano

unread,
Dec 9, 2016, 8:41:13 PM12/9/16
to
On Sat, 10 Dec 2016 08:07 am, DFS wrote:

>
> As of 04:04PM December 09, 2016
>
> Posts 85 Posters
[...]


Interesting stats, but couldn't you have post-processed the results to avoid
including the defamatory spam posts?

Your post is likely to be removed from the official web archive as it
contains defamatory material.



--
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

DFS

unread,
Dec 9, 2016, 11:16:18 PM12/9/16
to
On 12/09/2016 08:39 PM, Steve D'Aprano wrote:
> On Sat, 10 Dec 2016 08:07 am, DFS wrote:
>
>>
>> As of 04:04PM December 09, 2016
>>
>> Posts 85 Posters
> [...]
>
>
> Interesting stats, but couldn't you have post-processed the results
> to avoid including the defamatory spam posts?


Normally I don't censor, at all. But the spams are apparently way
off-topic, so I'll filter out Subjects containing certain keywords.

The spammer will still be counted, but the stats won't show all those
stupid Subjects.



> Your post is likely to be removed from the official web archive as
> it contains defamatory material.

Google seems to archive most Usenet posts, but there is no 'official web
archive'. Nor will those posts be auto-removed from GoogleGroups for
their content.

Why does this wackjob post all that Italian-language spam to clp anyway?

Terry Reedy

unread,
Dec 10, 2016, 12:37:09 AM12/10/16
to
On 12/9/2016 8:39 PM, Steve D'Aprano wrote:
> On Sat, 10 Dec 2016 08:07 am, DFS wrote:
>
>>
>> As of 04:04PM December 09, 2016
>>
>> Posts 85 Posters
> [...]
>
>
> Interesting stats, but couldn't you have post-processed the results to avoid
> including the defamatory spam posts?
>
> Your post is likely to be removed from the official web archive as it
> contains defamatory material.

Reading the news.gmane.org mirror, I never received it.

--
Terry Jan Reedy

Steve D'Aprano

unread,
Dec 10, 2016, 3:13:36 AM12/10/16
to
On Sat, 10 Dec 2016 03:15 pm, DFS wrote:

> On 12/09/2016 08:39 PM, Steve D'Aprano wrote:
>> On Sat, 10 Dec 2016 08:07 am, DFS wrote:
>>
>>>
>>> As of 04:04PM December 09, 2016
>>>
>>> Posts 85 Posters
>> [...]
>>
>>
>> Interesting stats, but couldn't you have post-processed the results
>> to avoid including the defamatory spam posts?
>
>
> Normally I don't censor, at all. But the spams are apparently way
> off-topic, so I'll filter out Subjects containing certain keywords.

Its not just the Subject, but also the fake Sender. There are at least five
distinct senders which are (apparently) defamatory messages in Italian.
They're all pretty obvious spam, in all caps, with various email addresses.

"... MEGLIO ..." <uccipaducci at gmx dot com>



> The spammer will still be counted, but the stats won't show all those
> stupid Subjects.

I don't mind if the spammer is counted. They probably should be collated
together, and count as a single sender using multiple addresses. But the
false name should be expunged or elided.


>> Your post is likely to be removed from the official web archive as
>> it contains defamatory material.
>
> Google seems to archive most Usenet posts, but there is no 'official web
> archive'. Nor will those posts be auto-removed from GoogleGroups for
> their content.

comp.lang.python is a mirror of the python-list at python dot org mailing
list, which has an official web archive:

https://mail.python.org/pipermail/python-list/

There are many unofficial ones as well.

There are a few other people who are banned from the mailing list but still
post to the newsgroup.


> Why does this wackjob post all that Italian-language spam to clp anyway?

Why do wackjobs do anything? He has a bee in his bonnet about some other
fellow, I don't even know if its a politician or just some guy he knows,
and (apparently) spams dozens of newsgroups with defamatory posts accusing
him of being a paedophile, a criminal, and more.

Terry Reedy

unread,
Dec 10, 2016, 5:29:14 AM12/10/16
to
On 12/10/2016 3:13 AM, Steve D'Aprano wrote:
> On Sat, 10 Dec 2016 03:15 pm, DFS wrote:

>> Normally I don't censor, at all. But the spams are apparently way
>> off-topic, so I'll filter out Subjects containing certain keywords.

python-list is a spam-moderated list. 95+% of spam is filtered out.

> Its not just the Subject, but also the fake Sender. There are at least five
> distinct senders which are (apparently) defamatory messages in Italian.

This person actively evades our filters.

> They're all pretty obvious spam, in all caps, with various email addresses.
>
> "... MEGLIO ..." <uccipaducci at gmx dot com>
>
>
>
>> The spammer will still be counted,

Why reward someone who actively evades defenses? If you want to count
spam, it is mostly missing, at least as far as python-list is concerned.

> comp.lang.python is a mirror of the python-list at python dot org mailing
> list, which has an official web archive:
>
> https://mail.python.org/pipermail/python-list/

These slanderous posts, in particular, are hand-removed from the archive
when they get past the automatic filters. They are no more part of
python-list than other spam.

--
Terry Jan Reedy


Steve D'Aprano

unread,
Dec 10, 2016, 9:43:35 AM12/10/16
to
On Sat, 10 Dec 2016 09:28 pm, Terry Reedy wrote:

>>> The spammer will still be counted,
>
> Why reward someone who actively evades defenses? If you want to count
> spam, it is mostly missing, at least as far as python-list is concerned.

Its not a reward. Spammers are not like trolls, they don't hang around to
see the result of their posts. There no evidence at all that this Italian
spammer is looking for replies or responses to his(?) posts. He apparently
just fires them out.

I think that it is relevant that comp.lang.python receives X spam messages
from a certain person. It gives a picture of the health of the newsgroup:
how much of it is spam? Hopefully only a small amount.


> These slanderous posts, in particular, are hand-removed from the archive
> when they get past the automatic filters. They are no more part of
> python-list than other spam.

Indeed. But although c.l.p is a mirror of the mailing list, it is not a
*perfect* mirror. The two do diverge: some things go to the mailing list
but apparently never make it to the newsgroup, and some things get to the
newsgroup but don't make it to the mailing list.

The stats generated by DFS are relevant to that.

Skip Montanaro

unread,
Dec 10, 2016, 11:15:37 AM12/10/16
to
On Sat, Dec 10, 2016 at 4:28 AM, Terry Reedy <tjr...@udel.edu> wrote:
>> comp.lang.python is a mirror of the python-list at python dot org mailing
>> list, which has an official web archive:
>>
>> https://mail.python.org/pipermail/python-list/
>
>
> These slanderous posts, in particular, are hand-removed from the archive
> when they get past the automatic filters. They are no more part of
> python-list than other spam.

Are they still getting past SpamBayes? A couple months ago, I trained
the instance on m.p.o on a few of those spams. I haven't seen any
others since then, and no messages on the postmaster list discussing
them. I just skimmed the archives for November and December but saw no
examples. Generally, when Ralf or Mark delete spams, then manually
rewrite the Subject: and From: headers and zero out the message body.
I saw nothing like those placeholders either.

Skip

Wildman

unread,
Dec 10, 2016, 12:07:04 PM12/10/16
to
On Fri, 09 Dec 2016 16:07:16 -0500, DFS wrote:

> code (py2.7)
> --------------------------------------------------------------
> import sys as y,nntplib as t,datetime as d
> s='<news server>'
> g=y.argv[1]
> n=t.NNTP(s,119,'<usr>','<pw>')
> r,a,b,e,gn=n.group(g)
> def printStat(st,hd,rg):
> r,d=n.xhdr(st,'%s-%s'%rg)
> p=[]
> for i in range(len(d)):
> v=d[i][1]
> if st=='Subject':v=v[4:] if v[:3]=='Re:' else v
> p.append(v)
> x=[(i,p.count(i)) for i in set(p)]
> x.sort(key=lambda s:(-s[1],s[0].lower()))
> print('Posts %s %s'%(len(set(p)),hd))
> for v in x: print(' %s %s'%(v[1],v[0]))
> print
> print 'As of '+d.datetime.now().strftime("%I:%M%p %B %d, %Y") + '\n'
> m=(int(e)-int(y.argv[3])+1,int(e))
> printStat("From","Posters",m)
> printStat("Subject","Subjects",m)
> printStat("User-Agent","User-Agents",m)
> n.quit()
> --------------------------------------------------------------
>
> usage on Windows:
> $ python stats.py group last N
> $ python stats.py comp.lang.python last 500

Do you happen to have a translation of the code that will
run on Linux?

$ ./nntp.py comp.lang.python last 500
Traceback (most recent call last):
File "./nntp.py", line 7, in <module>
n=t.NNTP(s,119,'<usr>','<pw>')
File "/usr/lib/python2.7/nntplib.py", line 119, in __init__
self.sock = socket.create_connection((host, port))
File "/usr/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known

--
<Wildman> GNU/Linux user #557453
The cow died so I don't need your bull!

DFS

unread,
Dec 10, 2016, 12:31:44 PM12/10/16
to
That code runs unchanged on py2.7 on Linux (I just now tested it).

You just need to put in your own credentials for the newsserver, user
and password (lines 2 and 4).

Change this line:
print 'As of '+d.datetime.now().strftime("%I:%M%p %B %d, %Y") + '\n'

to:
print('As of '+d.datetime.now().strftime("%I:%M%p %B %d, %Y") + '\n')

And it will also run on py3.5


Wildman

unread,
Dec 10, 2016, 1:00:12 PM12/10/16
to
OK, thanks. That didn't occur to me although it should have.

Peter Otten

unread,
Dec 10, 2016, 1:07:13 PM12/10/16
to
That's not a Linux problem. For the code to run in

>> s='<news server>'

>> n=t.NNTP(s,119,'<usr>','<pw>')

you need to replace the '<...>' strings with a real news server, user, and
password. If you use Gmane no password is required:

n = t.NNTP("news.gmane.org")

However, they use a different name for the "comp.lang.python" group, so you
have to modify the command line accordingly:

$ python stats.py gmane.comp.python.general last 500


Terry Reedy

unread,
Dec 10, 2016, 3:28:01 PM12/10/16
to
On 12/10/2016 9:43 AM, Steve D'Aprano wrote:
> On Sat, 10 Dec 2016 09:28 pm, Terry Reedy wrote:
>
>>>> The spammer will still be counted,
>>
>> Why reward someone who actively evades defenses? If you want to count
>> spam, it is mostly missing, at least as far as python-list is concerned.
>
> Its not a reward. Spammers are not like trolls, they don't hang around to
> see the result of their posts.

To me, the relevant difference is between posts related to python and
those not. It is usually clear which is which.

> There no evidence at all that this Italian
> spammer is looking for replies or responses to his(?) posts. He apparently
> just fires them out.
>
> I think that it is relevant that comp.lang.python receives X spam messages
> from a certain person. It gives a picture of the health of the newsgroup:
> how much of it is spam? Hopefully only a small amount.

Python-list gets unrelated-to-python spam from lots of people. They are
not outliers (unlike jmf's now blocked trolls), but contaminents from a
different universe.

I agree that the fraction of messages that are clearly spam has some
interest in itself, and definitely should be as small as possible. But I
contend that they should be excluded from a study of the universe of
python-related messages.

My other point is that this small sliver that used to get passed through
is extremely biased and statistically worthless as a study of
python-list spamming. If one wanted to study the rate and nature of
contamination, or the effectiveness of filtering, one would need access
to the raw stream of submissions.

--
Terry Jan Reedy

Wildman

unread,
Dec 11, 2016, 11:03:11 AM12/11/16
to
On Sat, 10 Dec 2016 12:31:33 -0500, DFS wrote:

>

After correcting my stupid oversights, the code runs fine
up to the point where the user agents are printed. I get
an error saying that 'User-Agent' is an unsupported header
field. It must have something to do with giganews. If I
use aioe.org I don't get the error and the user agents are
printed.

I don't think it is a problem with the code but any thoughts
why giganews is not playing nice? And it is not related to
the python group. I have tried on other groups and i get
the same error. Here is the complete error message.

Traceback (most recent call last):
File "./nntp.py", line 27, in <module>
printStat("User-Agent","User-Agents",m)
File "./nntp.py", line 12, in printStat
r,d=n.xhdr(st,'%s-%s'%rg)
File "/usr/lib/python2.7/nntplib.py", line 470, in xhdr
resp, lines = self.longcmd('XHDR ' + hdr + ' ' + str, file)
File "/usr/lib/python2.7/nntplib.py", line 273, in longcmd
return self.getlongresp(file)
File "/usr/lib/python2.7/nntplib.py", line 244, in getlongresp
resp = self.getresp()
File "/usr/lib/python2.7/nntplib.py", line 229, in getresp
raise NNTPPermanentError(resp)
nntplib.NNTPPermanentError: 501 unsupported header field

DFS

unread,
Dec 11, 2016, 12:03:21 PM12/11/16
to
On 12/11/2016 11:02 AM, Wildman wrote:
> On Sat, 10 Dec 2016 12:31:33 -0500, DFS wrote:
>
>>
>
> After correcting my stupid oversights, the code runs fine
> up to the point where the user agents are printed. I get
> an error saying that 'User-Agent' is an unsupported header
> field. It must have something to do with giganews. If I
> use aioe.org I don't get the error and the user agents are
> printed.


For this short stat version I only used the 'User-Agent' header. I have
a longer version that uses both 'User-Agent' and 'X-Newsreader'


You can put a conditional in place for now:

if s='giganews':
printStat("X-Newsreader","News Readers",m)
else:

DFS

unread,
Dec 11, 2016, 12:06:57 PM12/11/16
to
ha!

Look closer, and you'll see no obfuscation. Just short variable names.
And removing as much extraneous white space as possible, so as not to
waste time reading it.

s=server (then lambda sort)
d=data
v=value
p=posts
x=summarized list (z from now on)
m=I can't remember why I named it m, but it's the range of
articles being summarized

And it's even more clear if you know the NNTP library.

n=t.NNTP(s,119,'<usr>','<pw>')
news = nntplib(server,119,'<usr>','<pw>')

r,a,b,e,gn=n.group(g)
response,article_cnt,beginID,endID,groupname = nntplib.group(group)


Jon Ribbens

unread,
Dec 11, 2016, 3:28:30 PM12/11/16
to
On 2016-12-11, Wildman <best...@yahoo.com> wrote:
> I don't think it is a problem with the code but any thoughts
> why giganews is not playing nice?

Most likely because you're calling XHDR on a header which is not in
the server's overview file.

Wildman

unread,
Dec 11, 2016, 11:57:48 PM12/11/16
to
On Sun, 11 Dec 2016 12:03:07 -0500, DFS wrote:

> For this short stat version I only used the 'User-Agent' header. I have
> a longer version that uses both 'User-Agent' and 'X-Newsreader'
>
>
> You can put a conditional in place for now:
>
> if s='giganews':
> printStat("X-Newsreader","News Readers",m)
> else:
> printStat("User-Agent","User-Agents",m)

Thanks but I had already tried X-Newsreader and I got the
same result. It is odd because if you look at my headers
there is an entry for User-Agent....

User-Agent: Pan/0.139 (Sexual Chocolate; GIT bf56508
git://git.gnome.org/pan2; x86_64-pc-linux-gnu)

<scratching head>

DFS

unread,
Dec 12, 2016, 6:51:51 PM12/12/16
to
On 12/09/2016 06:53 PM, bream...@gmail.com wrote:
> On Friday, December 9, 2016 at 9:07:32 PM UTC, DFS wrote:
>
> Usual drivel from someone who went into my email filters years ago
> snipped.


What 'usual drivel' are you referring to, wanker?

And what do you mean, 'years ago'? My first post to clp was this year.




> Quite how BartC and his "I don't do error handling" or "I don't know
> what a shell is" has survived on this list I'll never know, but what
> the heck.

Because he's not a boring twat like you?



> Teachers need not reply.
>
> Kindest regards.


Really? What part of your post was kind?



> Mark Lawrence.


In another post you said to Steve D'Aprano:

"Steven, there is no need to be rude or condescending."

That's rich, you miserable hypocrite.

0 new messages