December 2002 comp.lang.* stats

Aaron K. Johnson

no llegida,

25 de gen. 2003, 2:04:2725/1/03

a

Ok.

I've finally taken into account the objections raised herein, and modified my
usenet query algorithms.

What you see below are pure and simple the number of unique posters to each
comp.lang.whatever hierarchy in December 2002.

Thanks,
Aaron.

comp.lang.java 5347
comp.lang.c++ 3075
comp.lang.perl 2136
comp.lang.javascript 2130
comp.lang.python 1996
comp.lang.basic 1758
comp.lang.c 1670
comp.lang.labview 958
comp.lang.clipper 922
comp.lang.tcl 853
comp.lang.pascal 805
comp.lang.ruby 767
comp.lang.clarion 707
comp.lang.lisp 587
comp.lang.fortran 517
comp.lang.smalltalk 391
comp.lang.asm 386
comp.lang.cobol 382
comp.lang.ada 376
comp.lang.vhdl 370
comp.lang.scheme 349
comp.lang.postscript 319
comp.lang.prolog 243
comp.lang.functional 237
comp.lang.idl-pvwave 230
comp.lang.forth 226
comp.lang.verilog 223
comp.lang.awk 211
comp.lang.vrml 153
comp.lang.rexx 147
comp.lang.apl 142
comp.lang.mumps 126
comp.lang.pl1 121
comp.lang.misc 115
comp.lang.objective-c 108
comp.lang.eiffel 101
comp.lang.logo 80
comp.lang.ml 73
comp.lang.asm370 65
comp.lang.dylan 38
comp.lang.modula3 30
comp.lang.oberon 26
comp.lang.modula2 23
comp.lang.pop 22
comp.lang.icon 17
comp.lang.idl 15
comp.lang.limbo 5
comp.lang.clos 4
comp.lang.prograph 2
comp.lang.clu 2

Erik Max Francis

no llegida,

25 de gen. 2003, 2:23:4925/1/03

a

"Aaron K. Johnson" wrote:

> What you see below are pure and simple the number of unique posters to
> each
> comp.lang.whatever hierarchy in December 2002.

So they're to the whole hierarchy? Several of these groups have large
hierarchies, some don't; some of those subgroups are moderated. Does
this take into account crossposts within hierarchies, or across them?
How about spam? Right there those factors are going to tweak the
numbers in favor of hierarchies with more groups rather than less, even
setting aside whether or not more groups would encourage more posts, or
whether or not those languages are more popular and thus have more
subgroups in the first place.

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/ \ When you love somebody, you're always in trouble.
\__/ Col. Sherman Potter
Sade Deluxe / http://www.sadedeluxe.com/
The ultimate Sade encyclopedia.

William

no llegida,

25 de gen. 2003, 4:15:0725/1/03

a

Aaron K. Johnson <akjm...@yahoo.com> writes:

> Ok.
>
> I've finally taken into account the objections raised herein, and modified my
> usenet query algorithms.
>
> What you see below are pure and simple the number of unique posters to each
> comp.lang.whatever hierarchy in December 2002.

can you give us your script ? i would like to compare with fr.comp.lang.*

--
William Dode - http://flibuste.net

Skip Montanaro

no llegida,

25 de gen. 2003, 8:43:2225/1/03

a

Aaron> What you see below are pure and simple the number of unique
Aaron> posters to each comp.lang.whatever hierarchy in December 2002.

Is a "unique poster" a single person (e.g. sk...@pobox.com) or a "unique
post" (e.g. message-id <b0tctd$9cn$1...@bob.news.rcn.net>)?

Skip

Skip Montanaro

no llegida,

25 de gen. 2003, 8:41:4725/1/03

a

>> What you see below are pure and simple the number of unique posters
>> to each comp.lang.whatever hierarchy in December 2002.

Erik> So they're to the whole hierarchy? Several of these groups have
Erik> large hierarchies, some don't; some of those subgroups are
Erik> moderated.

One might argue that more active discussion groups need more complex
hierarchies to properly group the messages.

Can you avoid the cross-post problem by simply counting unique message-ids?

Skip

Cameron Laird

no llegida,

25 de gen. 2003, 9:41:4725/1/03

a

In article <b0tctd$9cn$1...@bob.news.rcn.net>,
Aaron K. Johnson <akjm...@yahoo.com> wrote:
.
.

.
>What you see below are pure and simple the number of unique posters to each
>comp.lang.whatever hierarchy in December 2002.

.
.

comp.lang.php?
--

Cameron Laird <Cam...@Lairds.com>
Business: http://www.Phaseit.net
Personal: http://phaseit.net/claird/home.html

Peter Hansen

no llegida,

25 de gen. 2003, 10:06:0125/1/03

a

"Aaron K. Johnson" wrote:
>
> What you see below are pure and simple the number of unique posters to each
> comp.lang.whatever hierarchy in December 2002.
>

> comp.lang.java 5347
> comp.lang.c++ 3075
> comp.lang.perl 2136
> comp.lang.javascript 2130
> comp.lang.python 1996
> comp.lang.basic 1758
> comp.lang.c 1670
> comp.lang.labview 958
> comp.lang.clipper 922

...

> comp.lang.modula3 30
> comp.lang.oberon 26
> comp.lang.modula2 23

...

Thanks Aaron. I'm forced to admit that these numbers *appear* to
correspond to my purely subjective feeling as to the relative
popularity, in a very vague way, of these languages. It will
be interesting - if you can finish refining the script and then
"lock it down" - to compare the results over time.

I'm also intrigued by the labview and clipper numbers, which are
as I understand the only two purely proprietary languages listed.
National Instruments deserves some credit here, even if it's just
because LabVIEW is such an abomination to use for complex software
that it requires much more online help than the others, relative
to its actual usage. ;-)

-Peter

Aaron K. Johnson

no llegida,

25 de gen. 2003, 10:46:2825/1/03

a

In message <3E323B85...@alcyone.com>, Erik Max Francis wrote:
> "Aaron K. Johnson" wrote:
>
> > What you see below are pure and simple the number of unique posters to
> > each
> > comp.lang.whatever hierarchy in December 2002.
>
> So they're to the whole hierarchy? Several of these groups have large
> hierarchies, some don't; some of those subgroups are moderated. Does
> this take into account crossposts within hierarchies, or across them?
> How about spam? Right there those factors are going to tweak the
> numbers in favor of hierarchies with more groups rather than less, even
> setting aside whether or not more groups would encourage more posts, or
> whether or not those languages are more popular and thus have more
> subgroups in the first place.
>

Yes, all of the hierarchy is included. Thus an individual named 'Joe Blow' will
NOT be counted twice for posting to 'comp.lang.java.programmer' and
'comp.lang.java.cocaine-addict'. So I think the 'popularity' aspect is not a
problem there....

Spam....hmmm.....don't know what kind of mechanism I could set up that would
filter that easily, or without me blowing the script size to larger than I
would want it to be....if you want to take over, I'd pass you the code!

Best,
Aaron.

Aaron K. Johnson

no llegida,

25 de gen. 2003, 10:51:5925/1/03

a

In message <3E32A7D9...@engcorp.com>, Peter Hansen wrote:
> "Aaron K. Johnson" wrote:
> >
> > What you see below are pure and simple the number of unique posters to
> each
> > comp.lang.whatever hierarchy in December 2002.
> >
> > comp.lang.java 5347
> > comp.lang.c++ 3075
> > comp.lang.perl 2136
> > comp.lang.javascript 2130
> > comp.lang.python 1996
> > comp.lang.basic 1758
> > comp.lang.c 1670
> > comp.lang.labview 958
> > comp.lang.clipper 922

> ....

> > comp.lang.modula3 30
> > comp.lang.oberon 26
> > comp.lang.modula2 23

> ....

>
> Thanks Aaron. I'm forced to admit that these numbers *appear* to
> correspond to my purely subjective feeling as to the relative
> popularity, in a very vague way, of these languages. It will
> be interesting - if you can finish refining the script and then
> "lock it down" - to compare the results over time.
>
> I'm also intrigued by the labview and clipper numbers, which are
> as I understand the only two purely proprietary languages listed.
> National Instruments deserves some credit here, even if it's just
> because LabVIEW is such an abomination to use for complex software
> that it requires much more online help than the others, relative
> to its actual usage. ;-)
>
> -Peter

Thanks for your comments Peter. Your original comments were excellent
criticisms that I feel brought the script out of its primitive, unscientific
state of statistical cloudiness! at least now, we have a quasi-scientific
approach. now to the spam problem (I doubt I want to work much more on this
though ;) )

Best,
Aaron.

Aaron K. Johnson

no llegida,

25 de gen. 2003, 10:49:4725/1/03

a

Hmm.. I'll have to add that....

Aaron K. Johnson

no llegida,

25 de gen. 2003, 10:49:2325/1/03

a

In message <mailman.1043502285...@python.org>, Skip Montanaro
wrote:

A person is what counts as a 'unique poster'. A message ID alone would not
measure the 'user base' factor "im interested in.

Spam is still a problem to consider. It's not perfect yet.......

Best,
Aaron.

Skip Montanaro

no llegida,

25 de gen. 2003, 11:53:4225/1/03

a

Aaron> Spam....hmmm.....don't know what kind of mechanism I could set up
Aaron> that would filter that easily, or without me blowing the script
Aaron> size to larger than I would want it to be....if you want to take
Aaron> over, I'd pass you the code!

Download spambayes (http://spambayes.sf.net/), install it and train it on a
representative sample of ham and spam you find in the candidate groups
(100-200 messages should be sufficient), then run your counter script, and
ask spambayes to score each message with a tight ham_cutoff (0.1) and a
reasonable spam_cutoff (0.8). Ignore any message classified as spam and
save any message classified as unsure. Check the unsures. If there are too
many mistakes there, train on them and/or adjust your spam/ham cutoff values
to better reflect the nature of the messages you get.

Skip

Aaron K. Johnson

no llegida,

25 de gen. 2003, 14:16:3025/1/03

a

In message <877kct1...@flibuste.net>, William wrote:
> Aaron K. Johnson <akjm...@yahoo.com> writes:
>
> > Ok.
> >
> > I've finally taken into account the objections raised herein, and
> modified my
> > usenet query algorithms.
> >
> > What you see below are pure and simple the number of unique posters to
> each
> > comp.lang.whatever hierarchy in December 2002.
>
> can you give us your script ? i would like to compare with fr.comp.lang.*

William, I'll send you a private email with the script.

Anyone else who wants it, I'll send it to you, with the caveat that any
improvements you make to it, you send back to me, or uses you put it to, you
give me the output to......

Peter Hansen

no llegida,

25 de gen. 2003, 14:30:3025/1/03

a

"Aaron K. Johnson" wrote:
>
> A person is what counts as a 'unique poster'. A message ID alone would not
> measure the 'user base' factor "im interested in.
>
> Spam is still a problem to consider. It's not perfect yet.......

Spam is probably a problem best ignored. It would probably
affect all those groups equally anyway.

-Peter

Aaron K. Johnson

no llegida,

25 de gen. 2003, 15:38:1225/1/03

a

I agree. Plus, I'm not interested in working THAT hard to be that
anal-retentive about data which some would argue is still vague enough to be
discounted.

-Aaron.

William

no llegida,

25 de gen. 2003, 15:42:3225/1/03

a

thanks to
Aaron K. Johnson <akjm...@yahoo.com>:

the french stats (december 2002)

i'v added the total of messages and the % of messages by author
fr.comp.lang.php 1231 4661 3.79
fr.comp.lang.java 974 3541 3.64
fr.comp.lang.c++ 644 6074 9.43
fr.comp.lang.c 577 4240 7.35
fr.comp.lang.javascript 536 2221 4.14
fr.comp.lang.perl 359 1234 3.44
fr.comp.lang.basic 351 1001 2.85
fr.comp.lang.pascal 167 495 2.96
fr.comp.lang.general 156 357 2.29
fr.comp.lang.python 154 753 4.89
fr.comp.lang.ada 113 408 3.61
fr.comp.lang.tcl 76 304 4.00
fr.comp.lang.caml 68 355 5.22
fr.comp.lang.lisp 46 77 1.67
fr.comp.lang.postscript 26 39 1.50

:((( i go to post this stats on the french newsgroup, it will be
one more for the next month !

I want to interpret this like that: python is so easy that we don't need
to post any question on newsgroup.

--

Carl Banks

no llegida,

25 de gen. 2003, 15:55:5225/1/03

a

I don't think it would, but spam is probably a small enough percentage
of posts that it wouldn't make much difference.

--
CARL BANKS

Laura Creighton

no llegida,

25 de gen. 2003, 15:16:1825/1/03

a

> --
> http://mail.python.org/mailman/listinfo/python-list

I don't think so. I predict that spam will effect those newsgroups
which have a mail gateway more than those that do not, unless the
gateway has a sophisticated spam filter. This one, however, we
can test.

Laura Creighton

no llegida,

25 de gen. 2003, 16:17:1725/1/03

a

Just FYI -- I am aware this is something that you are mostly doing for
fun, and don't want to spoil it for you -- but you mentioned that you
wanted to be 'more scientific' about this. This question -- do
I bother with this data? is it significant? is the meat of the
'is what I am doing science' question. I also want to check to make
sure that you are aware that 'I don't know the factors which contribute
to having X in set Y' or even 'Nobody knows' or 'It is impossible to
know' implies 'it will affect all sets equally'. That belief is bad
science.

Have fun, thanks for the posts,
Laura Creighton

Aaron K. Johnson

no llegida,

25 de gen. 2003, 17:22:0325/1/03

a

In message <87znppx...@flibuste.net>, William wrote:

I gave William a new script. I found a bug in the old one.......

Aaron K. Johnson

no llegida,

25 de gen. 2003, 17:35:0425/1/03

a

Hello All,

I found a small bug which produced big errors in my last script......

I'm going to post the script in a seperate thread. Anyone who wants can hack
it, perfect it, whatever, as long as you return it to the community.

Basically, my date filter wasn't working well, so it grabbed the whole server.
Now, the filtering of the date works, I believe. Here are the results, again
for December 2002:

Yeah! we beat Perl! (and are in third place)

Best,Aaron.

comp.lang.java 2188
comp.lang.c++ 1349
comp.lang.python 919
comp.lang.perl 894
comp.lang.javascript 843
comp.lang.c 768
comp.lang.basic 726
comp.lang.php 548
comp.lang.clipper 451
comp.lang.tcl 405
comp.lang.ruby 398
comp.lang.labview 390
comp.lang.clarion 373
comp.lang.pascal 336
comp.lang.lisp 296
comp.lang.fortran 261
comp.lang.ada 210
comp.lang.smalltalk 177
comp.lang.cobol 176
comp.lang.asm 169
comp.lang.scheme 156
comp.lang.vhdl 153
comp.lang.postscript 139
comp.lang.prolog 115
comp.lang.idl-pvwave 111
comp.lang.forth 111
comp.lang.verilog 95
comp.lang.functional 93
comp.lang.awk 89
comp.lang.vrml 84
comp.lang.apl 81
comp.lang.mumps 63
comp.lang.rexx 60
comp.lang.eiffel 59
comp.lang.objective-c 48
comp.lang.misc 48
comp.lang.asm370 40
comp.lang.pl1 31
comp.lang.ml 29
comp.lang.logo 28
comp.lang.modula3 20
comp.lang.oberon 16
comp.lang.dylan 11
comp.lang.modula2 9
comp.lang.icon 8
comp.lang.pop 7
comp.lang.idl 4
comp.lang.prograph 2
comp.lang.limbo 2
comp.lang.clu 2
comp.lang.clos 2

Lulu of the Lotus-Eaters

no llegida,

25 de gen. 2003, 14:32:2525/1/03

a

Peter Hansen <pe...@engcorp.com> wrote previously:

|I'm also intrigued by the labview and clipper numbers, which are
|as I understand the only two purely proprietary languages listed.

Actually, no. Clipper (a very fine language I used for many years) has
a Free Software implementation called Harbour. It's funny that I just
mentioned this on another thread about FoxPro, even through I haven't
thought about it for a year or more. See:

http://www.harbour-project.org/

I have not read c.l.clipper recently. But when I last left it a fair
percentage of discussion was about Harbour, not only about Computer
Associates' proprietary product. Of course, there are also many free
and proprietary libraries, applications, and so on that get discussed
over there.

Yours, Lulu...

--
---[ to our friends at TLAs (spread the word) ]--------------------------
Echelon North Korea Nazi cracking spy smuggle Columbia fissionable Stego
White Water strategic Clinton Delta Force militia TEMPEST Libya Mossad
---[ Postmodern Enterprises <me...@gnosis.cx> ]--------------------------

Aaron K. Johnson

no llegida,

25 de gen. 2003, 18:04:1325/1/03

a

In message <mailman.104352994...@python.org>, Laura Creighton
wrote:

uh-oh. the science police! ;)

yes, Laura, I'm aware that there are too many unknowns to make this hardcore.
I'm just having fun, solving problems w/Python, and enjoying the fame that it's
generating among all five of you who wrote back.

Plus, I'm enjoying seeing python come out above perl and javascript (after I'd
eliminated a bug!).

Cheers,
Aaron.

Erik Max Francis

no llegida,

25 de gen. 2003, 20:40:3325/1/03

a

Skip Montanaro wrote:

> One might argue that more active discussion groups need more complex
> hierarchies to properly group the messages.

Sure, but quantifying that need with any artificial boosting that occurs
is not easy.

> Can you avoid the cross-post problem by simply counting unique
> message-ids?

Sure, although that wouldn't detect multiposts, although multiposts
_between_ language groups is probably not likely, though multiposts
between different subgroups of a hierarchy is probably moreso.

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE

/ \ Walk a mile in my shoes / And you'd be crazy too
\__/ Tupac Shakur
Crank Dot Net / http://www.crank.net/
Cranks, crackpots, kooks, & loons on the Net.

Erik Max Francis

no llegida,

25 de gen. 2003, 20:42:0225/1/03

a

"Aaron K. Johnson" wrote:

> Spam....hmmm.....don't know what kind of mechanism I could set up that
> would
> filter that easily, or without me blowing the script size to larger
> than I
> would want it to be....if you want to take over, I'd pass you the
> code!

Well, naturally. I wasn't saying you should have taken spam into
account, but it is one of the many complicating factors that makes
turning the numbers you've collected into useful data that _measure_
something substantial about the language difficult.

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE

Erik Max Francis

no llegida,

25 de gen. 2003, 20:47:1425/1/03

a

Peter Hansen wrote:

> Spam is probably a problem best ignored. It would probably
> affect all those groups equally anyway.

Actually, that's one of the problems with his collapsing hierarchies
into a single number. To first order, spammers would probably post to
every comp.* group with the same frequency. So if a hierarchy contains
six groups, the raw numbers will likely be overcounting spam by
approximately a factor of six, as compared to a solitary newsgroup.

To second order, there's probably an additional effect of newsgroups
with names that sort lexicographically early getting more spam, since
more spammers do their spams sequentially, and those that get forcibly
stopped will be less likely to hit comp.lang.z than comp.lang.a.

Erik Max Francis

no llegida,

25 de gen. 2003, 20:50:1525/1/03

a

"Aaron K. Johnson" wrote:

> I agree. Plus, I'm not interested in working THAT hard to be that
> anal-retentive about data which some would argue is still vague enough
> to be
> discounted.

Well, it depends on what you think the data mean. What you're measuring
is the number of unique posters per hierarchy over some period of time.
To first order, your figures are probably good for that (provided you're
doing it right, etc.). Other complicating factors such as spam will
throw a wrench into the validity of the measurable.

But now taking that measurable (unique posters per hierarchy per unit
time) and trying to apply it to something more general and far more
indirect (like the popularity of a language) is a bi-ig step. There is
surely a _correlation_, but how strong that correlation is and what goes
into it is extremely hard to judge.

Bengt Richter

no llegida,

25 de gen. 2003, 21:29:1825/1/03

a

On Sat, 25 Jan 2003 16:35:04 -0600, Aaron K. Johnson <akjm...@yahoo.com> wrote:

>Hello All,
>
>I found a small bug which produced big errors in my last script......

Are you _sure_ you are not a participant factor in the evolution of
bugs (genus product testing), where those bugs producing the results
you want to see are favored in the "natural selection" process? ;-)

OTOH, I'm sure that in any case your numbers can be part of a
scientifically formulated nutritious breakfast ;-)

>
>I'm going to post the script in a seperate thread. Anyone who wants can hack
>it, perfect it, whatever, as long as you return it to the community.
>
>Basically, my date filter wasn't working well, so it grabbed the whole server.
>Now, the filtering of the date works, I believe. Here are the results, again
>for December 2002:
>
>Yeah! we beat Perl! (and are in third place)
>

;-)

Regards,
Bengt Richter

Peter Hansen

no llegida,

25 de gen. 2003, 22:14:0025/1/03

a

Erik Max Francis wrote:
>
> Peter Hansen wrote:
>
> > Spam is probably a problem best ignored. It would probably
> > affect all those groups equally anyway.
>
> Actually, that's one of the problems with his collapsing hierarchies
> into a single number. To first order, spammers would probably post to
> every comp.* group with the same frequency. So if a hierarchy contains
> six groups, the raw numbers will likely be overcounting spam by
> approximately a factor of six, as compared to a solitary newsgroup.

I would think that removing unique posters would eliminate a lot
of this effect, as the same poster would be sending to each newsgroup.
Yes, many use random addresses... but don't they still send in bulk?

> To second order, there's probably an additional effect of newsgroups
> with names that sort lexicographically early getting more spam, since
> more spammers do their spams sequentially, and those that get forcibly
> stopped will be less likely to hit comp.lang.z than comp.lang.a.

I strongly doubt anyone gets stopped fast enough to prevent their
spamming one comp.lang group shortly after they've done another one.

In the end, my comment should really be taken as "spam is a small
enough issue, in my experience, to be ignored in the results as
mere noise". I readily admit my experience is limited to c.l.p
and several other groups *not* in the c.l. hierarchy, so maybe
some of those other groups get *much* more spam than c.l.p, but
I sort of doubt it. Maybe someone will take the time to calculate
actual numbers to prove or disprove this point. I wouldn't bother
though.

-Peter

Aaron K. Johnson

no llegida,

26 de gen. 2003, 1:14:1126/1/03

a

In message <3E333ED7...@alcyone.com>, Erik Max Francis wrote:
> "Aaron K. Johnson" wrote:
>
> > I agree. Plus, I'm not interested in working THAT hard to be that
> > anal-retentive about data which some would argue is still vague enough
> > to be
> > discounted.
>
> Well, it depends on what you think the data mean. What you're measuring
> is the number of unique posters per hierarchy over some period of time.
> To first order, your figures are probably good for that (provided you're
> doing it right, etc.). Other complicating factors such as spam will
> throw a wrench into the validity of the measurable.
>
> But now taking that measurable (unique posters per hierarchy per unit
> time) and trying to apply it to something more general and far more
> indirect (like the popularity of a language) is a bi-ig step. There is
> surely a _correlation_, but how strong that correlation is and what goes
> into it is extremely hard to judge.

My research indicates that the number of python users is precisely, to the last
decimal point, proportional to the usenet volume found by my script.

I'm right, and you're wrong for questioning me. So there (sticking tongue out).

-Aaron.

Erik Max Francis

no llegida,

26 de gen. 2003, 3:10:1726/1/03

a

"Aaron K. Johnson" wrote:

> My research indicates that the number of python users is precisely, to
> the last
> decimal point, proportional to the usenet volume found by my script.

Oh, there's no doubt that for this moment in time, with that exact
figure you just posted, there is some constant of proportionality such
that that's true. The question is whether or not that constant of
proportionality is really constant, and, more importantly, whether it's
the same constant of proportionality for other newsgroups and their
languages :-).

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE

/ \ Forever we / Infinitely
\__/ Sandra St. Victor
Maths reference / http://www.alcyone.com/max/reference/maths/
A mathematics reference.

John Roth

no llegida,

26 de gen. 2003, 8:15:2226/1/03

a

"Peter Hansen" <pe...@engcorp.com> wrote in message
news:3E335278...@engcorp.com...

I've been amused by this subthread, since I've almost never seen
spam in any of the comp.* groups I frequent. Maybe this has to
do with my using a paid service that does an excellent job of
de-spamming their newsfeed.

If someone wants to run the script against, say, Supernews,
I doubt if the numbers would be significantly different. But maybe
they would be.

John Roth

Nick Arnett

no llegida,

26 de gen. 2003, 11:46:2626/1/03

a

> -----Original Message-----
> From: python-l...@python.org
> [mailto:python-l...@python.org]On Behalf Of Peter Hansen

...

> Thanks Aaron. I'm forced to admit that these numbers *appear* to
> correspond to my purely subjective feeling as to the relative
> popularity, in a very vague way, of these languages. It will
> be interesting - if you can finish refining the script and then
> "lock it down" - to compare the results over time.

I've been working on a much more comprehensive analysis of Java and Dotnet
developer activity, covering close to 5,000 sources (Usetnet groups, mailing
lists, web forums -- a source is one group list, forum, etc.). I just
recently threw Python into the mix, mainly because it's what I'm using to
gather the data and do much of the analysis. This amounts to more than
5,000 messages a day. It isn't just venues for supporting the language; it
includes open-source projects being created with Java and Python (by
definition, aren't many true open source project done with Dotnet, except
for Mono and things like that).

Another quick way to get a sense of relative momentum is to look at
Sourceforge's "software map:"
http://sourceforge.net/softwaremap/trove_list.php?form_cat=160 and then
drill down to see the activity levels for the top projects for each
platform. For example, the VB projects' activity levels drop off much
faster than the Python projects. And you could keep digging deeper just at
Sourceforge, measuring what's really going on in each area.

I'm developing a number of metrics out of this data, some of which I'll be
making public. But this toolkit is mostly for me to use in providing
intelligence (but not, not, NOT e-mail addresses for spamming!) to my
company's clients.

O'Reilly & Associates has been doing this sort of thing for quite a while,
to forecast demand for books about open source software, in particular. I
did some brainstorming with them a few years ago and later started Opion,
which applied this kind of analysis to stock market discussions, feature
films and other topics. That's now owned by Intelliseek, which mostly does
consumer market research.

One thing that became clear early when I built the Opion prototype was that
unique participants is far and away the most meaningful basic statistic --
far more than number of posts. That was in stock market discussions, but I
haven't seen any reason to believe there a difference elsewhere.

At Opion and in my current work, I put a big emphasis on identifying the
most influential participants in the discussions through traffic analysis,
link analysis, etc., and giving higher weight to their activities. Spammers
end up with very low weights because they almost never trigger a meaningful
response from the community... or they cross-post so widely, with no depth
to the resulting discussions, that they're easily identified. I also use
various anti-spam mechanisms on my server. Happily, it's not much of an
issue on the mailing lists, where a lot of the action is.

If any folks here are seriously interested in this area, we may need some
technical help soon, preferably in the South Bay area, where I am.

Nick

Nick Arnett

no llegida,

26 de gen. 2003, 12:09:4226/1/03

a

Nothing from my current work is up yet... and probably won't be for a week
or two, at least.

--
Nick Arnett
Phone/fax: (408) 904-7198
nar...@mccmedia.com

> -----Original Message-----
> From: holger krekel [mailto:py...@devel.trillke.net]
> Sent: Sunday, January 26, 2003 9:01 AM

...

> Wow, that sounds interesting. Do you have any URL to look at or
> this is all closed?
>
> I would really enjoy statistics about various aspects of
> free softare projects without any "noise" from certain
> big companies.
>
> holger

holger krekel

no llegida,

26 de gen. 2003, 12:01:2526/1/03

a

Hi Nick,

> I've been working on a much more comprehensive analysis of Java and Dotnet
> developer activity, covering close to 5,000 sources (Usetnet groups, mailing
> lists, web forums -- a source is one group list, forum, etc.). I just
> recently threw Python into the mix, mainly because it's what I'm using to
> gather the data and do much of the analysis. This amounts to more than
> 5,000 messages a day. It isn't just venues for supporting the language; it
> includes open-source projects being created with Java and Python (by
> definition, aren't many true open source project done with Dotnet, except
> for Mono and things like that).

> [... interesting stuff ...]