How NOT to research Free Software developers

Enrico Zini

unread,

Aug 21, 2008, 11:39:45 AM8/21/08

to massie...@googlegroups.com

Hello,

this thread is quite emblematic: http://lists.debian.org/debian-newmaint/2008/08/msg00010.html

It would probably make sense to have some sort of rough guideline for
people that try to do such research, including links to datasets that
have already been collected (I'm thinking FLOSSMole). That way one can
refer such impromptu researches to the guide and existing literature and
data in a single shot.

Does something like that already exist?

Ciao,

Enrico

--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enr...@debian.org>

signature.asc

Israel Herraiz

unread,

Aug 21, 2008, 12:11:42 PM8/21/08

to massie...@googlegroups.com

Excerpts from Enrico's message on Aug 21, 2008 about 5 PM:

> Does something like that already exist?

As far as I know, it does not.

I have been thinking about where Massiel should lead in the future,
and I think that it should be something like what you suggest.

For instance, if I write a paper where I use some public data sources
regarding free / open source software, I could write a little howto as
well, to explain how to reproduce the results shown in the paper. That
could be done in our wiki. Of course, all the tools and data sources
must be public (or even free software in the case of the tools). We
could even encourage the original author to share a copy of the paper,
which are usually available only under subscription (although the
copyright agreements usually allow the original authors to make a copy
available in their websites).

That wiki of "howtos" could also contain a newbies howto, about where
to look, and more important how to do research (to avoid situations
like the one that you point out).

Cheers,
Israel

Ross Gardler

unread,

Aug 22, 2008, 5:39:22 PM8/22/08

to massie...@googlegroups.com

Israel Herraiz wrote:
> Excerpts from Enrico's message on Aug 21, 2008 about 5 PM:
>> Does something like that already exist?
>
> As far as I know, it does not.
>
> I have been thinking about where Massiel should lead in the future,
> and I think that it should be something like what you suggest.

I support this and can report that there has been much discussio, but
little action, around this idea in the foundations mailing list (a list
for collaboration across the various open source foundations).

I'd be happy to champion this over there when Massiel has something
useful to point at.

> For instance, if I write a paper where I use some public data sources
> regarding free / open source software, I could write a little howto as
> well, to explain how to reproduce the results shown in the paper. That
> could be done in our wiki. Of course, all the tools and data sources
> must be public (or even free software in the case of the tools). We
> could even encourage the original author to share a copy of the paper,
> which are usually available only under subscription (although the
> copyright agreements usually allow the original authors to make a copy
> available in their websites).

I really like this and again, I would be happy to champion it. If we can
create the kind of community peer recognition here that academics crave
we might even encourage people to do this.

Rather than wiki pages though, why not RDF descriptions. This way we can
use Simal [1] to provide a browsable project/people directory.

> That wiki of "howtos" could also contain a newbies howto, about where
> to look, and more important how to do research (to avoid situations
> like the one that you point out).

I agree this kind of "newbies howto" does fall into the documentation
category and could sit here in the wiki.

Ross

[1] http://simal.googlecode.com

Andrea Wiggins

unread,

Aug 22, 2008, 7:54:27 PM8/22/08

to massie...@googlegroups.com, Kevin Crowston

We have been working on building an infrastructure that could serve as
a FLOSS research portal, using an ePrints repository that can take
deposits of files or just metadata records that serve as pointers to
other locations; it also supports versioning, commenting, and provides
citation information. Each depositor can set license details as
desired, and for items that are under copyright, a repository record
without a deposit of the actual copyrighted content can still provide
a useful service. A working paper version of an academic paper can
often be deposited without violating publisher copyright.

It's still in development, but you can see what we've put together so
far in the FLOSS working papers repository [1]. Despite the (ungainly
and unrepresentative) title of the moment, there is actually more than
just working papers here - there are also metadata records for all the
FLOSSmole flat files that are hosted on SourceForge (try the Browse by
Subjects view, where there's one category with 1175 records). The
repository allows us to make records for, or take deposits of, pretty
much any digital file or linked resource, such as scripts, analysis
results, a "newbies howto" wiki, etc.

I expect we'll be announcing this more broadly at the upcoming OSS
conference in Milan, but any early feedback on what would make the
repository more useful is welcome. I've been attending to some of the
development tasks; at the moment we have plans to create more
repository documentation and some improved information retrieval views.

Cheers,

Andrea

[1] http://wp.floss.syr.edu

Ross Gardler

unread,

Aug 22, 2008, 8:38:54 PM8/22/08

to massie...@googlegroups.com

Andrea Wiggins wrote:
> We have been working on building an infrastructure that could serve as
> a FLOSS research portal, using an ePrints repository that can take
> deposits of files or just metadata records that serve as pointers to
> other locations;

...

> It's still in development, but you can see what we've put together so
> far in the FLOSS working papers repository [1].

...

This looks great and it would be silly of this project to duplicate your
work.

However, I'm not a big fan of repository approaches - everyone wants to
put their data in their own preferred repository and there is no
incentive to keep the myriad of other repositories up to date.

It is for this reason that we (OSS Watch) started to develop an RDF
registry of project outputs. The idea is that we get projects to manage
a single meta-data file and we point Simal at this file. By using
standard RDF schemas for capturing the data we maximise the sources of
data available to us and the projects can update multiple
registries/repositories/search engines etc. simply by updating their one
local file.

An added benfit is that we get some pretty powerful search facilities
since the data is RDF.

I'm not familiar with e-Prints, under the hood. What can it
import/export in terms of RDF data about it's contents? I'm thinking
that linking you future repository to the data we are collecting in
Simal will provide a truly useful resource.

Ross

Andrea Wiggins

unread,

Aug 25, 2008, 9:57:12 PM8/25/08

to massie...@googlegroups.com

On Aug 22, 2008, at 8:38 PM, Ross Gardler wrote:

> This looks great and it would be silly of this project to duplicate
> your
> work.
>
> However, I'm not a big fan of repository approaches - everyone wants
> to
> put their data in their own preferred repository and there is no
> incentive to keep the myriad of other repositories up to date.

Yes, this is a fundamental problem; depositors have to be incentivized
to deposit, which means they have to see some value in the effort...

> It is for this reason that we (OSS Watch) started to develop an RDF
> registry of project outputs. The idea is that we get projects to
> manage
> a single meta-data file and we point Simal at this file. By using
> standard RDF schemas for capturing the data we maximise the sources of
> data available to us and the projects can update multiple
> registries/repositories/search engines etc. simply by updating their
> one
> local file.
>
> An added benfit is that we get some pretty powerful search facilities
> since the data is RDF.
>
> I'm not familiar with e-Prints, under the hood. What can it
> import/export in terms of RDF data about it's contents? I'm thinking
> that linking you future repository to the data we are collecting in
> Simal will provide a truly useful resource.

Yes, we're increasingly interested in using RDF as well, but more in
connection with FLOSSmole at the moment. ePrints doesn't currently
have a plugin for RDF import/export that I have found, though it seems
to have been requested before [1] and there is a FOAF export format,
but apparently only for user records. XML seems to be the dominant
export format, but it seems that there are plugins for a number of
formats available [2]. It might be possible to use D2RQ mapping files
[3] to expose the data, but I don't really know enough about the
technologies to do more than speculate.

Cheers,

Andrea

[1] http://trac.eprints.org/trac/ticket/879
[2] http://wiki.eprints.org/w/Perl_lib/EPrints/Plugin/Export/
[3] http://www4.wiwiss.fu-berlin.de/bizer/d2rq/index.htm

Reply all

Reply to author

Forward