Extracting Data from JIRA to produce Baetle

5 views

Skip to first unread message

SinDoc

unread,

Jul 16, 2007, 9:20:22 AM7/16/07

to bae...@googlegroups.com

Henry,

Could you please briefly point out the procedure one should follow to
extract data from a Jira database and convert it to have the data in
XML format?

The process can't stop there as we still need something like this [1]
to convert the XML output (above) into RDF, that is Baetle.

Thank you in advance,
SinDoc

[1] http://simile.mit.edu/repository/RDFizers/jira2rdf/

Henry Story

unread,

Jul 16, 2007, 10:56:43 AM7/16/07

to bae...@googlegroups.com

Hi,

Do you have access to the database itself? If so then simply download
d2rq, point it to the database, and it will create a default mapping
file, where all the relations have localhost urls or something. You
could then export the data in rdf using d2rq. It's a simple command
line call. Problem is your data will not be in a well known format.

You can use d2r server to see the data on your localhost with that
file already. You should be able to get to this point in 10 minutes
or so, it's that easy.

You will need to change the mapping file by adapting the one just
created and reading the d2rq documentation.

You can use this file as inspiration:

http://baetle.googlecode.com/svn/trunk/mappings/sesame/sesame-d2rqmap.n3

Now if it happens that the database you are using has the exact same
schema as the one in the Sesame Jira database
(see: http://baetle.googlecode.com/svn/trunk/mappings/sesame/sesame-
jira-dump-censored.sql.bz2 )
then you would have a lot less work to do.

You will still need to adapt some urls to your context. Here are some
things that occur to me:
- the URIs for the users of your JIRA instance will need to be
differnt - giving them openrdf.org urls would be misleading
- the URIs for attachments will need to be different (the
attachments for your bugs won't be on openrdf.org)
- the URIS for bugs will be different (again they won't be on
openrdf.org)
- the source code referred to by those bugs will also be located
somewhere else. This may be quite different if you have a CVS
repository.
- the group you are working with may have different ways of
classifying bugs. In any case it is worth giving them local
dereferenceable urls.

I always try to make my URIs dereferenceable. So that when you click
on them you get some real data, even if not in rdf.

TopBraid Composer has D2RQ built in, so you could get a lot done
using that tool, if you know how to use it.

Let me know it this is helpful, and send me feedback on issues you
come up with. A summary of your experience would be very valuable.