DVN 3.0 extensible metadata requirements

18 views
Skip to first unread message

Gustavo Durand

unread,
Sep 14, 2011, 6:16:56 PM9/14/11
to dataverse...@googlegroups.com
Hi all,

Last week we released DVN version 2.2.5 and are now ready to work on
the enhanced metadata features for the next version, DVN 3.0.

Over the past few days, we've had some meetings here and have come up
with a high level design of how we think it should work and wanted to
run it by the community to see what people thought and make sure that
it should handle your needs. We also have some specific questions to
make sure we're not missing anything.

___

After going back and forth, we think the most straightforward way to
make changes is to allow the network admin to define different "sets"
of Data Collection and Methodology (.e.g. one for Astronomu, one for
BioMed, etc).

Then a dv admin / curator can choose whichever one of those is
appropriate for his/her field when defining a template.

This is different than we were originally thinking, which was allow
new single fields for any of the sections, but we are leaning towards
thinking this will be simpler to the curators. We think that the other
sections have pretty standard fields which should work well across
disciples, so would like to try to those closely tied to the DDI
metadata fields, whereas with Data Collection, you won't just want one
field here or there, but a whole set of them tied more closely with
your discipline.

What does the group think about this? Would this work for your
metadata needs*?

* There are other changes as well, such as allowing multiple data
collection sections per study, or hiding individual fields at the
template level, but this is just to check the high level approach of
defining sets of Data Collection fields.


A few specific questions:
- When we last met, there were some fields that we currently have as
single value for which some in the group wanted multiple values. Can
anyone remind us of which fields these are?
- Could you send a list of what metadata fields you would add if you
had full flexibility? That way we can see if this maps to our
solution. Please specify which, if any, should accept multiple values.


Thanks,
Gustavo

Tim Clark

unread,
Sep 15, 2011, 6:02:12 AM9/15/11
to dataverse...@googlegroups.com
Hi -

If I understand the proposal, it seems FAR better to have "modules" or sets of metadata than individual fields.

At the same time I think we need to be doing two other very important things to make this approach work.

1 - Prototyping these per-domain modules in real situations; and
2 - Promoting the modules by making them available to the community with the DVN software.

I have a specific situation I would like to propose to prototype a BioMed metadata module - in collaboration with the editors in chief of PLoS Computational Biology (http://www.ploscompbiol.org/home.action) and Omics (http://www.liebertpub.com/products/product.aspx?pid=43) journals.

If DVN team is interested in discussing and learning more please suggest a couple of slots for a brief concall to hear the details. I have both editors in chief on board if you want to do this.

Best

Tim

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Merce Crosas

unread,
Sep 15, 2011, 4:41:43 PM9/15/11
to dataverse...@googlegroups.com
Just clarifying (hopefully!) the proposal for everybody (Tim and I talked and we are on the same page):

- In a given Dataverse Network (DVN)  installation, the Network admin will be able to define the Data Collection/Methodology section per domain (or more than one option per domain, if needed). 
- The new Data Collection/Methodology sections will be defined as a set of fields, and each field can be mapped to one value, multiple values or a text area.
- Network and Dataverse admins will be able to define metadata "templates" for just one dataverse or to share across all dataverses. When defining a template, the admin will be able to choose the Data Collection/Methodology section for their domain, plus will be able to define which fields are required, recommended or hidden, and will define controlled vocabularies as the pre-defined set of possible values for a field.

Assumptions and other comments:
- The metadata fields in other sections (Citation information, Abstract/Scope, Data Set Availability, Terms of Use, Other Info) are general enough and will not be changed per domain. There are other section in the exported DDI that are generated automatically when files are uploaded to the study (List of data and other files, information about the files, variable information - if applies)
- The Data Collection/Methodology fields have no structure, that is, they are a simple flat list. Is this a problem.
- The Metadata will be internally defined in our data model (with extensible key-value pairs for the new sections) and stored in the DVN database, as it is now. The metadata will be exported to XML (DDI, Dublin Core) for harvesting and preservation, and the new fields in the Data Collection/Methodology section will be mapped to extensible Notes in the DDI.
- At installation time, we could run a script to pre-populate the Data Collection/Methodology sections for some domains. For example, right now, when we install a new DVN we run a DB script to populate the metadata for Social Science, but in addition we could say, do you want the Astronomy script? or the Biology script? etc. This is not relevant to dataverse owners, but it's useful if an institution chooses to install DVN and want to use our pre-defined sections.

What we would like from you as soon as possible:
- Tim/Sudeshna, could you provide an example of what a Data Collection/Methodology section could be for your domain?
- Alberto A., others in the Astronomy group, could you provide an example for Astronomy?

As a reminder, here are the fields in this section for our current "social science" domain. You can have a whole new set of fields (they don't need to map to these ones...I don't even understand half of them!):
Time Method
Data Collector 
Frequency
Sampling Procedure
Major Deviations for Sampling Design
Collection Mode
Type of Research Instrument
Data Sources
Origin of Sources
Characteristic of Sources Noted
Documentation of Access to sources
Characteristics of Data collection situation
Actions to minimize losses
Control operations
Weighting
Cleaning Operations
Study Level error notes
Response Rate
Estimates of sampling error 
Other forms of data appraisal

Thanks!
Merce
--
Mercè Crosas, PhD
Director of Product Development
IQSS, Harvard University
1737 Cambridge St., Cambridge, MA, 02138
Room K329
617-496-1274
mcr...@hmdc.harvard.edu
www.iq.harvard.edu

Reply all
Reply to author
Forward
0 new messages