License database searching/editing

2 views
Skip to first unread message

Matthieu Riou

unread,
Aug 17, 2007, 7:21:26 PM8/17/07
to discuss-a-rele...@googlegroups.com
Hi,

So I've started a bit on this license database thing by scanning Maven repositories to extract a good set of data to start from. Compared to all the junk published on maven.org it's a small subset (most projects don't give their license info) but it's a decent list to start from, hopefully a few hundred. When I'm done with that the next part is going to be giving the database an interface that people can use.

First, I think the easiest way is for this "database" to just be a big list of RDF files describing the project with its license(s). Those would just sit on a public server somewhere and that's it. On the client side, some Javascript would just get these and render the information. Now 2 problems arise and I've given them a bit more thoughts.

1. Searching

The idea here is just to allow simple search to find a given artifact (and therefore its license). Normally any search capability would require require a small server running somewhere that would get the search request, go through some index and get back the content of the response or at least a link to it. That's a problem in our case because running that sort of things on the ASF infrastructure is not simple.

Here I'm proposing to use the file index that most web servers return (when configured for it) when you point your browser to the root of a directory. The Javascript client would just do a get, scrap the returned HTML to get the list of files and do the search in the browser. Then it's just about giving files a sensible name (like org.apache.axis2.axis2-kernel.1.2.rdf). That means a bit more bandwidth usage but the listing of all files is not that big, for about 600 files, the download is 80k (checked from http://repo1.maven.org/maven2 ).

2. Submitting the addition of a license

Here it's a bit more tricky because you need something that sits somewhere, can accept a post and write to the filesystem. Could be python, ruby, java, whatever but it's a piece of software sitting at the front, which for organizations like the ASF, is a potential problem. There's nothing much to work around that so the only solution I can find here is e-mail. People send addition requests to some e-mail address, following a provided template, and a small script read those every night and generates the RDF descriptors from there. It can even be run manually once in a while for that matter.

Any thoughts?

Matthieu

Robert Burrell Donkin

unread,
Sep 18, 2007, 4:32:13 PM9/18/07
to discuss-a-rele...@googlegroups.com
On 8/18/07, Matthieu Riou <matthi...@gmail.com> wrote:
> Hi,
>
> So I've started a bit on this license database thing by scanning Maven
> repositories to extract a good set of data to start from. Compared to all
> the junk published on maven.org it's a small subset (most projects don't
> give their license info) but it's a decent list to start from, hopefully a
> few hundred. When I'm done with that the next part is going to be giving the
> database an interface that people can use.

cool

> First, I think the easiest way is for this "database" to just be a big list
> of RDF files describing the project with its license(s). Those would just
> sit on a public server somewhere and that's it.

+1

> On the client side, some
> Javascript would just get these and render the information. Now 2 problems
> arise and I've given them a bit more thoughts.
>
> 1. Searching
>
> The idea here is just to allow simple search to find a given artifact (and
> therefore its license). Normally any search capability would require require
> a small server running somewhere that would get the search request, go
> through some index and get back the content of the response or at least a
> link to it. That's a problem in our case because running that sort of things
> on the ASF infrastructure is not simple.

conventionally, yes. these days, a good javascript library would allow
all this to be done clientside

> Here I'm proposing to use the file index that most web servers return (when
> configured for it) when you point your browser to the root of a directory.
> The Javascript client would just do a get, scrap the returned HTML to get
> the list of files and do the search in the browser. Then it's just about
> giving files a sensible name (like
> org.apache.axis2.axis2-kernel.1.2.rdf). That means a bit
> more bandwidth usage but the listing of all files is not that big, for about
> 600 files, the download is 80k (checked from http://repo1.maven.org/maven2
> ).

IIRC the HTTP collection extensions provide a standard framework but
the approach you propose to exactly the one that i would have
suggested for a start

> 2. Submitting the addition of a license
>
> Here it's a bit more tricky because you need something that sits somewhere,
> can accept a post and write to the filesystem. Could be python, ruby, java,
> whatever but it's a piece of software sitting at the front, which for
> organizations like the ASF, is a potential problem. There's nothing much to
> work around that so the only solution I can find here is e-mail. People send
> addition requests to some e-mail address, following a provided template, and
> a small script read those every night and generates the RDF descriptors from
> there. It can even be run manually once in a while for that matter.

IMHO license submission is a little more tricky. i hope to post
something on this to the labs.

> Any thoughts?

very much in the same direction as i would have taken. let's continue
this in the lab.

- robert

Reply all
Reply to author
Forward
0 new messages