Dataverse and sematic data (RDF)

198 views
Skip to first unread message

Natalia Queiroz de Oliveira

unread,
Apr 17, 2020, 7:24:10 PM4/17/20
to Dataverse Users Community

Hi all, does anybody knows a way to add semantic data (RDF) to dataverse?

Thanks,
Natalia 

Janet McDougall - Australian Data Archive

unread,
Apr 18, 2020, 6:49:41 AM4/18/20
to Dataverse Users Community
Hi Natalia
How do you mean ‘add semantic data to dataverse’?
Thanks
Janet

Natalia Queiroz de Oliveira

unread,
Apr 21, 2020, 11:23:23 AM4/21/20
to dataverse...@googlegroups.com
Hi Janet. 
I mean if is there any module on Dataverse capable of convert the metadata stored in Dataverse in a triple RDF database (triple store / endpoint) allowing search and semantic interoperability via SPARQL 

or

if Dataverse has the ability to collect metadata in a RDF/XML document to allow the storage and manipulation of metadata in the form of RDF triples

Thanks,
Natália




Mailtrack Remetente notificado por
Mailtrack 04/21/20, 12:20:17 PM

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/22f45019-2d6e-46e2-8c2f-4c3c2b7b231a%40googlegroups.com.


--

Abs.,
Natália Oliveira

James Myers

unread,
Apr 22, 2020, 2:52:32 PM4/22/20
to dataverse...@googlegroups.com

Natália,

The current capabilities that Dataverse has in this area is via the association of metadata terms with external RDF URIs and the OAI_ORE metadata export format which provides a JSON-LD serialized description of the dataset with its user-editable metadata along with its Datafiles and their metadata.  That should be something you could import into an RDF store to support SPARQL queries, although I’m not aware of anyone doing that.

Dataverse has an internal API for adding new export formats, so other RDF serializations could be added as metadata export options if someone is willing/available to do the work.

Through support from the Research Data Alliance, we’re currently working to create an import capability that matches that export (and the associated capability to export all the files themselves in a BagIt bag) which will provide an API to add JSON-LD metadata to a Dataset. I expect that the initial capability there will be limited to allowing ingest of terms that are defined in Dataverse itself or a metadata block (which you can define and add to your instance to add new terms). How to handle terms in a JSON-LD import that aren’t known to Dataverse, and how to handle semantic information about other non-Dataset objects are TBD.

W.r.t. handling RDF files: Dataverse can accept any type of file, so one could upload RDF metadata as a Datafile. Dataverse wouldn’t parse/interpret the RDF in the file, but the external tools/previewers mechanism would allow you to associate RDF mimetypes with external tools that could, i.e. to show user a graphical representation (as a preview).

Dataverse also has a workflow mechanism that can call external applications when triggered by events such as dataset publication. For example, QDR is using that now to trigger creation of a zipped BagIt archive file, with the OAI_ORE map file included, that is sent to cold storage in the Google cloud. This mechanism could also be used to trigger processing of uploaded RDF files and/or the OAI_ORE formatted metadata export to, for example, populate a triple store upon dataset publication.

Hopefully these capabilities can help with whatever RDF integration you’re contemplating. If not, or if there are ways to provide more support within Dataverse to simplify such integrations, I’m sure many of us would be interested in learning what you think would be helpful. There are a broad range of discussions going on now around various potential extensions to Dataverse’s metadata capabilities and we’re gathering use cases and design ideas for how to address these in a coordinated fashion. (Expect a metadata document that will summarize things and a breakout session at the virtual Dataverse meeting in June that will involve presentations and community discussion.)

Cheers,

      -- Jim

Janet McDougall - Australian Data Archive

unread,
Apr 23, 2020, 1:03:10 AM4/23/20
to Dataverse Users Community
Hi Jim and Natália

Thanks Jim, that's a really good overview.   Natália, am interested in what structures you already have in place and how you would use  RDF and semantic capabilities if they were available as you described.   ADA has DDI study and variable level xml generated from Nesstar (pre Dataverse data only).  I have been wondering how this machine readable variable metadata could be used - such as comparing data, geographies etc, in anticipation of variable level metadata becoming available in Dataverse (Scholars Portal Data Curation Tool).  I don't have any hands on experience but am very interested in this space, from what I do understand.  

Janet

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.


 

--


Abs.,
Natália Oliveira

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Paul Boon

unread,
Apr 23, 2020, 10:46:45 AM4/23/20
to dataverse...@googlegroups.com

Hi Jim,

When you say "Dataverse has an internal API for adding new export formats", I guess you mean that you would have to change the source code in order to add a new export format, and it’s not configurable?

 

Cheers,

Paul

From: <dataverse...@googlegroups.com> on behalf of James Myers <qqm...@hotmail.com>
Reply to: "dataverse...@googlegroups.com" <dataverse...@googlegroups.com>
Date: Wednesday, 22 April 2020 at 20:52
To: "dataverse...@googlegroups.com" <dataverse...@googlegroups.com>
Subject: RE: [Dataverse-Users] Dataverse and sematic data (RDF)

 

Natália,

The current capabilities that Dataverse has in this area is via the association of metadata terms with external RDF URIs and the OAI_ORE metadata export format which provides a JSON-LD serialized description of the dataset with its user-editable metadata along with its Datafiles and their metadata.  That should be something you could import into an RDF store to support SPARQL queries, although I’m not aware of anyone doing that.

Dataverse has an internal API for adding new export formats, so other RDF serializations could be added as metadata export options if someone is willing/available to do the work.

Through support from the Research Data Alliance, we’re currently working to create an import capability that matches that export (and the associated capability to export all the files themselves in a BagIt bag) which will provide an API to add JSON-LD metadata to a Dataset. I expect that the initial capability there will be limited to allowing ingest of terms that are defined in Dataverse itself or a metadata block (which you can define and add to your instance to add new terms). How to handle terms in a JSON-LD import that aren’t known to Dataverse, and how to handle semantic information about other non-Dataset objects are TBD.

W.r.t. handling RDF files: Dataverse can accept any type of file, so one could upload RDF metadata as a Datafile. Dataverse wouldn’t parse/interpret the RDF in the file, but the external tools/previewers mechanism would allow you to associate RDF mimetypes with external tools that could, i.e. to show user a graphical representation (as a preview).

Dataverse also has a workflow mechanism that can call external applications when triggered by events such as dataset publication. For example, QDR is using that now to trigger creation of a zipped BagIt archive file, with the OAI_ORE map file included, that is sent to cold storage in the Google cloud. This mechanism could also be used to trigger processing of uploaded RDF files and/or the OAI_ORE formatted metadata export to, for example, populate a triple store upon dataset publication.

Hopefully these capabilities can help with whatever RDF integration you’re contemplating. If not, or if there are ways to provide more support within Dataverse to simplify such integrations, I’m sure many of us would be interested in learning what you think would be helpful. There are a broad range of discussions going on now around various potential extensions to Dataverse’s metadata capabilities and we’re gathering use cases and design ideas for how to address these in a coordinated fashion. (Expect a metadata document that will summarize things and a breakout session at the virtual Dataverse meeting in June that will involve presentations and community discussion.)

Cheers,

      -- Jim

 

 

From: dataverse...@googlegroups.com [mailto:dataverse...@googlegroups.com] On Behalf Of Natalia Queiroz de Oliveira
Sent: Tuesday, April 21, 2020 11:23 AM
To: dataverse...@googlegroups.com
Subject: Re: [Dataverse-Users] Dataverse and sematic data (RDF)

 

Hi Janet. 

I mean if is there any module on Dataverse capable of convert the metadata stored in Dataverse in a triple RDF database (triple store / endpoint) allowing search and semantic interoperability via SPARQL 

 

or

 

if Dataverse has the ability to collect metadata in a RDF/XML document to allow the storage and manipulation of metadata in the form of RDF triples

 

Thanks,

Natália



 

Image removed by sender. Mailtrack

Remetente notificado por
Mailtrack 04/21/20, 12:20:17 PM

 

On Sat, Apr 18, 2020 at 7:49 AM Janet McDougall - Australian Data Archive <janet.m...@anu.edu.au> wrote:

Hi Natalia
How do you mean ‘add semantic data to dataverse’? 
Thanks
Janet

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/22f45019-2d6e-46e2-8c2f-4c3c2b7b231a%40googlegroups.com.


 

--


Abs.,
Natália Oliveira

Image removed by sender.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CAAuYkdUZML02gk7WQv2HkwkuMT3%2BAN1yuF_JYEFjJO5wxRMN6Q%40mail.gmail.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

James Myers

unread,
Apr 23, 2020, 10:53:07 AM4/23/20
to dataverse...@googlegroups.com

Paul – yes – there’s an Exporter interface that defines the methods you need to implement to show up in the metadata exports list. The main one is exportDataset(DatasetVersion version, JsonObject json, OutputStream outputStream) so your class gets access to the DatasetVersion and the raw json metadata dump and then has to write the output format you want to the stream. The code to cache the output, regenerate it for new versions, etc. is all generic/handled by Dataverse code.

 

-- Jim

Durand, Gustavo

unread,
Apr 23, 2020, 3:30:44 PM4/23/20
to dataverse...@googlegroups.com
Export is one of the areas where we have done the SPI model. So in theory (though may still need some development on the core code), one should be able to develop an exporter that implements the export interface and deploy that as a separate jar.

On Thu, Apr 23, 2020 at 10:53 AM James Myers <qqm...@hotmail.com> wrote:

Paul – yes – there’s an Exporter interface that defines the methods you need to implement to show up in the metadata exports list. The main one is exportDataset(DatasetVersion version, JsonObject json, OutputStream outputStream) so your class gets access to the DatasetVersion and the raw json metadata dump and then has to write the output format you want to the stream. The code to cache the output, regenerate it for new versions, etc. is all generic/handled by Dataverse code.

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages