Notes
=====
- A line in the body of a post is considered to be original if it
does *not* match the regular expression /^\s{0,3}(?:>|:|\S+>|\+\+|\|\s+|\*\s)/.
- All text after the last cut line (/^-- $/) in the body is
considered to be the author's signature.
- The scanner prefers the Reply-To: header over the From: header
in determining the "real" e-mail address and name.
- Original Content Rating is the ratio of the original content volume
to the total body volume.
- Please send all comments to Christopher Browne <cbbr...@acm.org>
Excluded Posters
================
perlfaq-suggestions\@mox\.perl\.com
Totals
======
Posters: 142
Articles: 543 (270 with cutlined signatures)
Threads: 44
Volume generated: 1321.1 kb
- headers: 585.7 kb (9,248 lines)
- bodies: 694.7 kb (17,268 lines)
- original: 435.6 kb (11,560 lines)
- signatures: 40.1 kb (999 lines)
Original Content Rating: 0.627
Averages
========
Posts per poster: 3.8
median: 2.0 posts
mode: 1 post - 67 posters
s: 6.9 posts
Posts per thread: 12.3
median: 3.0 posts
mode: 1 post - 12 threads
s: 26.8 posts
Message size: 2491.3 bytes
- header: 1104.6 bytes (17.0 lines)
- body: 1310.2 bytes (31.8 lines)
- original: 821.5 bytes (21.3 lines)
- signature: 75.6 bytes (1.8 lines)
Top 10 Posters by Number of Posts
=================================
(kb) (kb) (kb) (kb)
Posts Volume ( hdr/ body/ orig) Address
----- -------------------------- -------
57 153.9 ( 64.6/ 80.3/ 63.5) Erik Naggum <er...@naggum.no>
27 66.6 ( 28.9/ 37.6/ 28.8) Tim Bradshaw <t...@cley.com>
23 82.0 ( 29.9/ 50.5/ 14.4) arien <spamme...@getlost.invalid>
21 46.8 ( 22.1/ 23.8/ 23.8) Vassil Nikolov <vnik...@poboxes.com>
19 47.2 ( 21.8/ 21.8/ 13.0) Pascal Costanza <cost...@web.de>
19 36.9 ( 23.6/ 13.2/ 8.0) Will Deakin <aniso...@hotmail.com>
16 35.7 ( 17.9/ 14.5/ 7.0) Barry Margolin <bar...@genuity.net>
15 35.5 ( 17.7/ 17.8/ 10.1) Joe Marshall <j...@ccs.neu.edu>
11 26.8 ( 10.5/ 16.3/ 3.8) "Vlastimil Adamovsky" <am...@ambrasoft.com>
11 27.2 ( 8.0/ 18.3/ 10.4) Nils Goesche <car...@cartan.de>
These posters accounted for 40.3% of all articles.
Top 10 Posters by Volume
========================
(kb) (kb) (kb) (kb)
Volume ( hdr/ body/ orig) Posts Address
-------------------------- ----- -------
153.9 ( 64.6/ 80.3/ 63.5) 57 Erik Naggum <er...@naggum.no>
82.0 ( 29.9/ 50.5/ 14.4) 23 arien <spamme...@getlost.invalid>
66.6 ( 28.9/ 37.6/ 28.8) 27 Tim Bradshaw <t...@cley.com>
47.2 ( 21.8/ 21.8/ 13.0) 19 Pascal Costanza <cost...@web.de>
46.8 ( 22.1/ 23.8/ 23.8) 21 Vassil Nikolov <vnik...@poboxes.com>
36.9 ( 23.6/ 13.2/ 8.0) 19 Will Deakin <aniso...@hotmail.com>
35.7 ( 17.9/ 14.5/ 7.0) 16 Barry Margolin <bar...@genuity.net>
35.5 ( 17.7/ 17.8/ 10.1) 15 Joe Marshall <j...@ccs.neu.edu>
34.5 ( 13.3/ 19.3/ 8.5) 9 Duane Rettig <du...@franz.com>
27.2 ( 8.0/ 18.3/ 10.4) 11 Nils Goesche <car...@cartan.de>
These posters accounted for 42.9% of the total volume.
Top 10 Posters by OCR (minimum of five posts)
==============================================
(kb) (kb)
OCR orig / body Posts Address
----- -------------- ----- -------
1.000 ( 23.8 / 23.8) 21 Vassil Nikolov <vnik...@poboxes.com>
0.837 ( 6.8 / 8.2) 5 ces...@qnci.net (William D Clinger)
0.819 ( 13.0 / 15.9) 7 Christopher Browne <cbbr...@acm.org>
0.815 ( 5.9 / 7.3) 8 Advance Australia Dear <fundamenta...@yahoo.com>
0.791 ( 63.5 / 80.3) 57 Erik Naggum <er...@naggum.no>
0.765 ( 28.8 / 37.6) 27 Tim Bradshaw <t...@cley.com>
0.758 ( 2.3 / 3.0) 5 jlsg...@netscape.net (Jules F. Grosse)
0.716 ( 7.0 / 9.8) 5 k...@ashi.footprints.net (Kaz Kylheku)
0.705 ( 6.5 / 9.3) 7 "Wade Humeniuk" <wa...@nospam.nowhere>
0.705 ( 2.1 / 3.0) 5 ozan s yigit <o...@blue.cs.yorku.ca>
Bottom 10 Posters by OCR (minimum of five posts)
=================================================
(kb) (kb)
OCR orig / body Posts Address
----- -------------- ----- -------
0.558 ( 1.1 / 2.0) 6 Kalle Olavi Niemitalo <k...@iki.fi>
0.514 ( 1.9 / 3.7) 6 Paolo Amoroso <amo...@mclink.it>
0.487 ( 2.0 / 4.0) 8 Chris Beggy <chr...@kippona.com>
0.485 ( 7.0 / 14.5) 16 Barry Margolin <bar...@genuity.net>
0.454 ( 5.9 / 12.9) 7 mic...@bcect.com (Michael Sullivan)
0.441 ( 5.8 / 13.2) 8 Nils Goesche <n...@cartan.de>
0.439 ( 8.5 / 19.3) 9 Duane Rettig <du...@franz.com>
0.391 ( 3.5 / 8.9) 10 Marc Spitzer <mspi...@optonline.net>
0.285 ( 14.4 / 50.5) 23 arien <spamme...@getlost.invalid>
0.235 ( 3.8 / 16.3) 11 "Vlastimil Adamovsky" <am...@ambrasoft.com>
Top 10 Threads by Number of Posts
=================================
Posts Subject
----- -------
152 Re: Difference between LISP and C++
67 Midfunction Recursion
56 A strange question...
47 Getting the PID in CLISP
41 Best combination of {hardware / lisp implementation / operating system}
23 Lisp options on Mac OS X (Was: Best combination of {hardware / lisp implementation / operating system})
19 Re: "Well, I want to switch over to replace EMACS LISP with Guile."
16 Lisp advocacy misadventures
12 iteration vs recursion Performance viewpoint
11 Re: Franz Liszt & Farewell my Dijkstra
Top 10 Threads by Volume
========================
(kb) (kb) (kb) (kb)
Volume ( hdr/ body/ orig) Posts Subject
-------------------------- ----- -------
442.1 (209.4/220.3/121.2) 152 Re: Difference between LISP and C++
142.0 ( 58.9/ 77.3/ 47.6) 67 Midfunction Recursion
135.4 ( 54.4/ 77.2/ 44.6) 56 A strange question...
97.1 ( 40.3/ 54.4/ 32.6) 41 Best combination of {hardware / lisp implementation / operating system}
78.0 ( 44.0/ 30.0/ 19.4) 47 Getting the PID in CLISP
55.3 ( 23.4/ 30.2/ 18.4) 23 Lisp options on Mac OS X (Was: Best combination of {hardware / lisp implementation / operating system})
50.6 ( 26.2/ 22.3/ 16.6) 19 Re: "Well, I want to switch over to replace EMACS LISP with Guile."
45.1 ( 13.8/ 29.8/ 23.1) 16 Lisp advocacy misadventures
28.3 ( 10.6/ 17.0/ 12.5) 12 iteration vs recursion Performance viewpoint
23.6 ( 11.7/ 11.7/ 6.7) 10 Re: Best combination of {hardware / lisp implementation / operating
system}
Top 10 Threads by OCR (minimum of three posts)
==============================================
(kb) (kb)
OCR orig / body Posts Subject
----- -------------- ----- -------
0.945 ( 6.4/ 6.8) 3 Stalin's optimisations: Can they be used outside Scheme ?
0.898 ( 10.7/ 11.9) 9 CMUCL's PCL Code Walker
0.859 ( 10.3/ 12.0) 5 Naggum's got some good points!
0.794 ( 3.0/ 3.8) 5 Lisp compiler
0.773 ( 23.1/ 29.8) 16 Lisp advocacy misadventures
0.744 ( 16.6/ 22.3) 19 Re: "Well, I want to switch over to replace EMACS LISP with Guile."
0.735 ( 12.5/ 17.0) 12 iteration vs recursion Performance viewpoint
0.718 ( 2.0/ 2.8) 3 setf-like forms on VALUE places
0.666 ( 3.4/ 5.1) 3 Re: M-Expressions and early Lisp (was Re: Lisp's unique feature: compiler available at run-time)
0.646 ( 19.4/ 30.0) 47 Getting the PID in CLISP
Bottom 10 Threads by OCR (minimum of three posts)
=================================================
(kb) (kb)
OCR orig / body Posts Subject
----- -------------- ----- -------
0.616 ( 47.6 / 77.3) 67 Midfunction Recursion
0.608 ( 18.4 / 30.2) 23 Lisp options on Mac OS X (Was: Best combination of {hardware / lisp implementation / operating system})
0.599 ( 32.6 / 54.4) 41 Best combination of {hardware / lisp implementation / operating system}
0.578 ( 44.6 / 77.2) 56 A strange question...
0.572 ( 6.7 / 11.7) 10 Re: Best combination of {hardware / lisp implementation / operating
system}
0.566 ( 3.7 / 6.6) 7 Bounding Indices in Sequence Functions
0.552 ( 2.4 / 4.4) 5 FFI Concept, way over my head.
0.552 ( 1.8 / 3.3) 6 Re: How much use of CLOS?
0.550 (121.2 /220.3) 152 Re: Difference between LISP and C++
0.495 ( 2.2 / 4.5) 3 Re: Lisp options on Mac OS X (Was: Best combination of {hardware
/ lisp implementation / operating system})
Top 10 Targets for Crossposts
=============================
Articles Newsgroup
-------- ---------
49 comp.lang.scheme
46 comp.lang.smalltalk
5 comp.lang.smalltalk.advocacy
2 comp.sys.xerox
1 comp.text.tex
Top 10 Crossposters
===================
Articles Address
-------- -------
20 "Vlastimil Adamovsky" <am...@ambrasoft.com>
8 Marc Spitzer <mspi...@optonline.net>
6 Vassil Nikolov <vnik...@poboxes.com>
6 "Boris Popov" <nospa...@shaw.ca>
4 richd...@msn.com (Rich Demers)
4 "Adam Warner" <use...@consulting.net.nz>
4 David Rush <ku...@bellsouth.net>
4 Pascal Costanza <cost...@web.de>
3 lo...@emf.emf.net (Tom Lord)
3 panu <pa...@fcc.net>
>Following is a summary of articles spanning a 7 day period,
>beginning at 20 Oct 2002 11:56:53 GMT and ending at
>27 Oct 2002 02:30:55 GMT.
[snip]
The results appear to be an almost perfect object lesson in why
statistical code metrics are useless without a deeper understanding of
what's going on. Both the tops and bottoms of the volume, "original
content" and thread rankings contain solid representations from the most
and least informative posters and threads of the past week.
At the level these statistics capture, there appears to be little
difference between succinct explanations and one-liners, or between
detailed technical responses and rants, and a complex discussion that
requires keeping a lot of context around looks very much like a
pedantic-mode exchange of flames. Beats KLOC and function points all
hollow though.
I'm not immediately sure how deep a parse you would have to do to make
completely reliable distinctions for this kind of thing.
paul
> At the level these statistics capture, there appears to be little
> difference between succinct explanations and one-liners, or between
> detailed technical responses and rants, and a complex discussion that
> requires keeping a lot of context around looks very much like a
> pedantic-mode exchange of flames. Beats KLOC and function points all
> hollow though.
> I'm not immediately sure how deep a parse you would have to do to make
> completely reliable distinctions for this kind of thing.
I'd be really interested to see what a naive bayesian approach would do
after I'd developed a database of a few thousand articles or so.
I'm thinking 3 categories: good, off-topic but interesting, crap.
The really interesting question is whether it could distinguish good
from bad in similar content style. e.g. distinguishing witty flamers
from whining teenagers.
On the backburner of "things I will do in my copious free time" is a
newsreader that uses this approach to write a 'probfile', as opposed to
scorefile or kill/tagfile. The basic idea is to have a way to tell the
program "This article was sorted incorrectly -- it should have been
here:" to update the database, then let it just work on the fly.
Michael
Well, what are the metrics you have used to determine your conclusions?
| I'm not immediately sure how deep a parse you would have to do to make
| completely reliable distinctions for this kind of thing.
Readers would have to rate news articles. For the past few months, I
have been working on a system to do this with the Norwegian newsgroup
hierarchy. I may decide to repeat the experiment with other newsgroups.
--
Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
[...]
EN> Readers would have to rate news articles.
Human readers, rather than news readers, I suppose?
---Vassil.
--
For an M-person job assigned to an N-person team, only rarely M=N.
On 27 Oct 2002 23:37:12 +0000, Erik Naggum <er...@naggum.no> said:
[...]
EN> Readers would have to rate news articles.
VN> Human readers, rather than news readers, I suppose?
Actually, I didn't mean that to sound sarcastic. I was just
thinking that with all those developments in AI I haven't followed,
I couldn't be sure what a news reader might be able to do...
>* Paul Wallich
>| The results appear to be an almost perfect object lesson in why
>| statistical code metrics are useless without a deeper understanding of
>| what's going on. Both the tops and bottoms of the volume, "original
>| content" and thread rankings contain solid representations from the most
>| and least informative posters and threads of the past week.
>
> Well, what are the metrics you have used to determine your conclusions?
Purely subjective, based on five years or so of regular reading and name
recognition, with a sense of what posts are interesting or informative
to me and what posts appear ditto to others. I expect that someone else
would have different personal metrics, but think that most of them would
show a similar spread with respect to the stats given.
>| I'm not immediately sure how deep a parse you would have to do to make
>| completely reliable distinctions for this kind of thing.
>
> Readers would have to rate news articles. For the past few months, I
> have been working on a system to do this with the Norwegian newsgroup
> hierarchy. I may decide to repeat the experiment with other newsgroups.
Does such a system integrate reasonably with common newsreaders?
(On reflection, I think that one could probably distinguish between good
and bad in threads consisting mostly of long posts with low "original
content" by looking at posting interval and total thread length. Longer
intervals and shorter ultimate length for "useful" threads because of
the time and effort required for cogent replies, but unfortunately the
measure would be mostly retrospective.)
paul
<http://quimby.gnus.org/gnus/manual/gnus_237.html>
"GroupLens (http://www.cs.umn.edu/Research/GroupLens/) is a
collaborative filtering system that helps you work together with other
people to find the quality news articles out of the huge volume of
news articles generated every day.
To accomplish this the GroupLens system combines your opinions about
articles you have already read with the opinions of others who have
done likewise and gives you a personalized prediction for each unread
news article. Think of GroupLens as a matchmaker. GroupLens watches
how you rate articles, and finds other people that rate articles the
same way. Once it has found some people you agree with it tells you,
in the form of a prediction, what they thought of the article. You can
use this prediction to help you decide whether or not you want to read
the article.
NOTE: Unfortunately the GroupLens system seems to have shut down, so
this section is mostly of historical interest."
You could presumably build a protocol to share Gnus "score" files with
others with similar interests which could also help.
A third approach would be to use something like Paul Graham's
statistical filtering scheme or something more sophisticated such as
IFile to filter between "good" and "bad", perhaps sharing a corpus of
"good" and "bad" material with others.
I think the ideal way of handling this would probably involve using a
GroupLens-like approach to allow people to share "scoring" information
on articles which would be used to define allocations of messages to
corpuses.
Those allocations would then be used to do IFile-like evaluations of
messages which would mean that /everyone/ would get improved scoring.
When articles were scored wrongly, the feedback would be used to
improve the corpus...
Note that the statistics are intended as much for amusement purposes
as for any serious analysis. I certainly agree that the value is
pretty dubious; you take them seriously at your own risk...
--
(concatenate 'string "cbbrowne" "@cbbrowne.com")
http://www3.sympatico.ca/cbbrowne/internet.html
"... While programs written for Sun machines won't run unmodified on
Intel-based computers, Sun said the two packages will be completely
compatible and that software companies can convert a program from one
system to the other through a fairly straightforward and automated
process known as ``recompiling.''" -- San Jose Mercury News
Well, I meant human, but the support for rating has to exist in both the
client and the server software.
oz
--
you take a banana, you get a lunar landscape. -- j. van wijk
Why only one way? If you really want this kind of flame bait, ask how
much "arien" has commented on others. I should like to see a count of
the number of times she has insulted me after she said she put me in her
kill-file because I had "insulted" her.
Taking sides is stupid. If you wish to understand, sides are irrelevant.