Ontology annotation, SPARQL, and data subset retrieval

82 views
Skip to first unread message

Matthew Lange

unread,
Aug 7, 2020, 4:13:30 PM8/7/20
to Dataverse Users Community
Hi I'm part of a research group interested in building knowledge graphs about food--but I think this pertains to any domain. We are interested in querying dataverse datasets and semi-automatedly (with human curation) matching columnar (for example) headers with domain ontology terms in order to enable SPARQL queries and return subsets of data on the fly. 
I am interested to hear from this group whether this is on the roadmap for Dataverse product dev, and if not what the procedures (and hooks) would be for making this happen in an external plugin-type product.
Also interested to know if this resonates with others in the community.
Kind regardes,
~Matthew

Brooke, Danny

unread,
Aug 7, 2020, 4:30:37 PM8/7/20
to dataverse...@googlegroups.com
Hi Matthew, thanks for bringing this up. We do have a few Dataverse installations focused on food/agriculture and I'm hopeful we'll hear from them about similar use cases. 

As far as integration between Dataverse and external tools, we're always interested in extending APIs or adding new APIs to support interesting use cases. If there's some specific information that's not easily available from Dataverse currently (or not available in a format that's useful to you) feel free to open up an issue at https://github.com/IQSS/dataverse/issues and we can discuss. We can also discuss here in more detail before creating an issue. 

Are you aware of the DDI Variable Metadata XML file that's created as part of the ingest process? This information may be useful for you in addition to the data in the file themselves. More information is here: http://guides.dataverse.org/en/latest/api/dataaccess.html#data-variable-metadata-access. I'll also mention briefly our external tools framework, which allows external tools to be launched from the dataset and file levels: http://guides.dataverse.org/en/latest/admin/external-tools.html.

Hope this helps!

- Danny

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/9dbcd175-9466-4197-bcf7-e357ad5103ean%40googlegroups.com.

Chris Baker

unread,
Aug 7, 2020, 8:15:28 PM8/7/20
to Dataverse Users Community
Hello Matthew, I have used SPARQL to query Ag / food data (about eggplants) provided as interoperable data services in real time with a query engine called HYDRA http://ipsnp.com/hydra/

If there was a way to semantically annotate Dataverse data sets about food with ontologies we could build a registry of services and have them queried with SPARQL.
Specifically we deploy services in a registry that expose very specific subsets like GetCropDiseasebyGeoLocation. A registry can contain thousands of subsets as services and 
HYDRA is able to discover and combine services into workflows defined by the user's SPARQL query. 

Matthew, I can point you to our previous work on this from 2018 .... Decision Support for Agricultural Consultants With Semantic Data Federation
Has anyone tried to expose Dataverse subsets as discoverable services. This approach fits well with FAIR principles and would extend from data set discovery to querying inside data sets.
I'd be happy to talk with anyone about this, whether its food specific or not.

Cheers
Chris

Matthew Lange

unread,
Aug 7, 2020, 8:19:55 PM8/7/20
to dataverse...@googlegroups.com
Yes Chris, you get my meaning--precisely what I want to do with Dataverse. It would seem that combining the power of SPARQL with an established repository of datasets would be dreamy for any discipline trying to aggregate data--especially if they are wanting to do that on the fly.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

Chris Baker

unread,
Aug 7, 2020, 8:39:27 PM8/7/20
to dataverse...@googlegroups.com
Thanks Matthew, please take a look at the resources I shared and let me know if there is something we can follow up on wrt any Dataverse sets you find on food. 

btw .... HYDRA is a product from my company but we contribute to academic / community projects too.
Here we supported a Bill and Medlina Gates funded project on Malaria surveillance. No Ag or food but topical still today.
A Surveillance Infrastructure for Malaria Analytics: Provisioning Data Access and Preservation of Interoperability

Cheers
Chris



--
With best regards,
Chris

Professor, Dr. Chris Baker CEO.
IPSNP Computing Inc.
----
E-MAIL CONFIDENTIALITY CLAUSE:
This e-mail and the information contained in it is confidential, may be privileged and is intended
for the exclusive use of the addressee(s). Any other person is strictly prohibited from using, disclosing, distributing or reproducing it. If you havereceived this communication in error,
please reply by e-mail to the sender and delete or destroy all copies of this message.

CLAUSE DE CONFIDENTIALITÉ POUR LES ENVOIS PAR COURRIEL
Le présent courriel et les renseignements qu'il contient sont confidentiels, peuvent être protégés par le secret professionnel et sont à l'usage exclusif du (des) destinataire(s) susmentionné(s). Toute autre personne est par les présentes avisée qu'illui est strictement interdit d'en faire l'utilisation, la diffusion, la distribution ou la reproduction. Si cette transmission vous est
arrivée par erreur, veuillez en aviser immédiatement l'expéditeur par courriel, puis
effacer ou détruire toutes les copiesdu présent message.

James Myers

unread,
Aug 8, 2020, 8:57:55 AM8/8/20
to dataverse...@googlegroups.com

One thing that Dataverse already does that may be of interest is that it exposes Dataset-level metadata as json-ld in the OAI_ORE metadata export file. (This doesn’t currently capture the variable-level metadata for individual files in the Dataset (see below for plans), but does have all of the manually entered metadata for the Dataset along with the list of files with their names, paths, descriptions.) Many of the terms in Dataverse’s core metadata block have already been mapped to external vocabularies (e.g. Dublin Core, Schema.org) and it is possible to create new metadata blocks that cover domain-specific vocabularies (e.g. an initial DarwinCore metadata block was presented at the recent Dataverse 2020 meeting.).

 

You might also be interested in the Data Curation Tool recently created by Scholar’s Portal – it’s integrated using the external tools framework Danny mentioned. It allows editing of the DDI metadata and might serve as a model/something that could be extended to match column names to external vocabularies.

 

Related to that - one of the reasons this metadata wasn’t originally included in the OAI_ORE export I mentioned above is because that metadata was always machine generated by Dataverse and would be re-created if the file was re-imported. Now that it can be edited, I’m looking to add it to the OAI_ORE export where it would be easy to also expose mappings of that info.

 

-- Jim

Reply all
Reply to author
Forward
0 new messages