Sent by Israel Herraiz to the OSS Watch mailing list (07/28/2008 10:24
PM):
Excerpts from Ross's message on Jul 28, 2008 about 9 PM:
> > The message I linked to was the cullimantion of all threads and resulted
> > in a commit to SVN of a summary of the discussion. An extremely useful
> > excercise as it marked out the boundaries of the project in the
> > early days.
Ok. I was trying (with not too much success :-) to show how if there
are patterns in the data, like when one guy shouts everyone else shuts
up, we can find that. But if the pattern is not clear, and it is
messed up (for instance, the same guy writing many times in the same
thread), maybe the method that I propose is not valid.
> > Your methodology as I understand it above would miss this vital
> > documentation activity.
Well, yes. Actually, I do not know if I mentioned that in my
presentation, but all the data that I have shown have been obtained
considering only code commits. Therefore, documentation activity
(unless it is done in the source code -commenting and that stuff) is
not included in the data.
> > So, how do you tell the difference between a firefighting post and a
> > conclusion post?
Well, maybe I have not chosen the right terms :-) . By firefighting I
understood someone that always stops (concludes, finishes, etc) a
thread (understand *always* in a statistical sense).
> > How do you tell the difference between a troll and a motivator?
Good point. I am now "preparing the case" for the paper that I have
commented you about, and trolls are one of the false positives of the
methodology. I have not figured out yet how to deal with that.
> > IN a mature community this should be fairly easy as they won't feed the
> > trolls. But how do you know it is a mature community and how do you
> > adapt your model to accomodate different types of troll handling within
> > the community?
Well, I think that those methods are not valid for communities that
are still in early stages of the project. After all, you need enough
historical information to find out whether or not there has been a
generational relay.
> > I'll have to read your thesis but I really can't support that in my
> > (anecdotal) experience of FOSS software development within the ASF
> > (which does not mean *all* FOSS development).
I don't know if I have mentioned it, but my thesis [1] is available
under a CC Attribution ShareAlike license. Please, consider that is
still under review, and some things can still have a lot of room for
improvement (in particular, I am desperately looking for an English
editor, spread the word if you know someone willing to earn someone
with the painful experience of reading my thesis).
> > I wonder if this is a
> > finding is another manifestation of the "volunteers" misunderstanding.
> > It is my experience that everything done is related to past decisions.
> > It is this reason that I maintain that a project membory is critical.
> > This allows the project to remember and learn from its past
> > mistakes.
I will explain it further. First of all, of course you might always
find particular cases that do not fit under that finding. The
conclusion is that "for software evolution analysis, you can handle
the history of the project as weather forecasting handles
weather. Recent events have the most influence in the current history
of the project. You have to take in account the stage where your
project is. It is like weather forecasting. If it is summer and today
is sunny, it is likely that tomorrow will be sunny. If it is summer,
it is quite unlikely that tomorrow will snow. And so on...."
But that is a statistical result. From a sample of 3821 projects,
about 80% of them were driven by a short memory dynamics (with a
memory of < 1 week). But some projects had very long memories.
And that is all. It is just a statistical property. Maybe it is just
nonsense bullshit. I don't know. At least, so far, it is useful to
build predictive models, because it tells you not too bother
considering very old events if your project is driven by a short
memory dynamics. In particular, it was useful for me to win the
MSR challenge 2007, about predicting the evolution of Eclipse, the
model had a memory of only 3 days (!). And it was good enough to win.
> > For evidence of this you only need to look at the recent thread on
> > Forrest in which a decision made around three years ago raised its head
> > again. This is something that surfaces on a fairly regular pattern: see
> > [1]
Good to know. I am probably going to use Forrest as a case for the
"undercover developer" paper, and all those cases that you are
highlighting are very much appreciated :-) .
The other project I am going to use is Libresoft :-) . In our group,
Jesus will not appear as very important in the repositories. He does
not write that much in the lists, and makes commits from time to
time. But it is by far the most important role of the group, and
fosters the participation and work of the rest of the group.
> > See above, this was not the killing of an endless thread - this was the
> > documentation of a useful thread. We need to be very careful about
> > making assumptions about the kinds of patterns we will find.
Yes. I have not chosen the proper terms. By "endless threads" I tried
to mean very long threads, or peaks of activity (actually, not all the
activity has to be in the same thread).
> > I don't know what it means so can make no comment.
Sorry. It is difficult for me to explain that in English (it is even
difficult in Spanish). I meant that instead of using the whole history
of the motivator, we could just use portions, and analyze if there is
any pattern in those portions.
> > However, since I
> > disagreeing with the basic premise of your work I suspect it's better
> > for you to just proceed and see if it shows the results you expect.
I completely agree. Have a look at the cite by Richard Feynman that I
have written in one of the first pages of my thesis. We can have
endless threads ;-) about any theory of how software is developed,
but
the numbers will tell us whether we are right or not (and without
disturbing the rest of the list with long discussions ;-) .
> > Good - you can trust me to be a pain in the ass ;-)
I hope that does not involve any kind of sexual harassment (sorry for
my bad jokes by the way ;-) .
Cheers,
Israel
PS: I guess we are breaking here the record of the longest messages in
this list.