Questions for the community (re: templates)

29 views
Skip to first unread message

Gustavo Durand

unread,
Sep 20, 2011, 5:40:16 PM9/20/11
to dataverse...@googlegroups.com
Hi all,

So we're continuing the design of the new extensible metadata
functionality and have come up with some use cases that we're trying
to figure out how to handle.

(while we are still discussing requirements with the group, we are
making some assumptions, so we can proceed with how to model and will
tweak as we need to)

Basically, when you define a new Data Collection / Methodology (DCM)
module, what should when a study uses it, but then the DCM changes.
Say for example you have a field called "Observatory" in the DCM,
Study A sets the observatory to Hawaii, and the it is decided to
remove "Observatory from that DCM. What should happen to the study?

Similarly, and more importantly (i.e. more likely to happen), what
about controlled vocabulary? Here again, is the specific use case:

Curator defines a template using a DCM module that has "Observatory"
and decides that the studies in sing this template should only ever
use "Observatory A" and "Observatory B", so sets up this controlled
vocabulary for it. Then a contributor creates a study for this sets
the Observatory to "Observatory A" and the study gets officially
released and harvested by other dataverse networks.

Some time late, the curator realized that the official name for the
observatory is actually "Official Name A", so decided to change the
name of the controlled vocabulary in the template. What should happen
to study?

Some possibilities:
1) nothing, the study retains "Observatory A"
2) a new draft version is created with the name being changed to
"Official Name A" and then this needs to be manually released (as
version 1 had)
3) a new draft version is created with the name being changed to
"Official Name A" and this is automatically released
4) other suggestions?

Also, in case 3 above if there is already a draft existing with some
other change (a new author was added). In other words a draft may
exist with other changes that have not yet been reviewed and so is not
ready for release?

Lastly, a similar issue is when a template exists and is sued by a
study, and then the template owner decided to "hide" a field (i.e.
should never show)? What happens to the study? Does it lose that field
and value (and if so is this in a draft that needs to be released or
would automatically get released). Or does it still maintain the old
value and show it, since it was there at creation time? Or another
option, should you not be allowed to modify the template thus, if it
is being used (i.e not allow hiding of fields at this point)

Thanks for your time in considering these issues. We're trying to work
them through and if/when we come up with the best solution will let
you know, but we thought it would be useful to get input from the
community. Also, if you find it easier to discuss in person, please
feel free to stop by and/or set up a time and we can meet, disucss,
and brainstorm solutions.

Thanks,
Gustavo

Tim Clark

unread,
Sep 23, 2011, 2:37:07 PM9/23/11
to dataverse...@googlegroups.com
Hi Gustavo,

I have some proposed answers for you, and a couple questions in return.

(1) Referential integrity: What happens when studies are defined using a metadata module which is later deleted?

RDBMS-style referential integrity rules ought to apply here, but with the consciousness that metadata modules might well be versioned - in fact I would expect them to be.

In the case you describe, your module "DCM Astronomy Vsn 1.0" is used to define metadata for some studies, and among the terms it contains are ObservatoryA and ObservatoryB.

Now you want to replace this module as the default with "DCM Astronomy Vsn 1.1" containing ObservatoryA, ObservatoryC and ObservatoryD - ObservatoryA has been dropped.

In normal referential integrity constraints you would impose a "delete restrict" rule on all terms in "DCM Astronomy Vsn 1.0" so that if a term has been used in a study, it cannot be deleted.

However I suggest that this be done at one level up, at the level of the module.

RULE #1: If <module_name, version> has been used to define any existing studies, it cannot be deleted, unless the studies themselves are deleted first (not likely).
RULE #2: In fact, normally, no <module_name, version> should ever be deleted, but simply deactivated as the default metadata for new studies. The prior versions should always be kept around.
RULE #3: Of course in case an error is made in setting things up you should still retain the ultimate capability to delete a <module_name, version> , under the constraints provided by RULE #1.

Hope this is clear and makes sense.

(2) Realism: What if a thing referred to by metadata remains the same but its name changes?

In a normal RDBMS you would do this by only using abstract identifiers for the metadata elements, integers for example, and then the relations <integer_key, attribute_name> are stored and updated in a separate table.

Ontology-style, all the terms for things would be URIs signifying the actual observatory (or whatever) and their names would be string attributes of the predicate "has_name".

Perhaps if we provided a concrete example, it would help.

Now for my question: can you provide a model (sketch is fine) of the overall data architecture you are proposing as I get confused by some if this. In particular I'd like to see clearly what are the "stable" or essentially default and permanent core metadata elements for DVN, what are contained in "modules", and how those module-contained elements relate (a) to the module itself, (b) to the core elements.

Best

Tim

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Tim Clark

unread,
Sep 23, 2011, 2:43:02 PM9/23/11
to dataverse...@googlegroups.com
PS in example (1), text ought to read "...ObservatoryB has been deleted."

On Sep 23, 2011, at 2:37 PM, Tim Clark wrote:

ObservatoryA

Merce Crosas

unread,
Sep 24, 2011, 9:40:42 AM9/24/11
to dataverse...@googlegroups.com
Tim, Thank you for your thorough and useful reply. Gustavo is away for several days. We've had a number of discussions this week and we were converging towards your suggestions, following standard RDBMS rules. We are also planning to separate more clearly the behavior of the "modules" or "study Types" based on discipline/ontology, with the study templates, which are lighter weight and just help guiding the data depositor when he/she adds metadata for a study. In this case, I agree that versioning should apply to 'modules' but probably not to templates ( we need to go through all the use cases to see if this stays true)
We'll finalize data model and examples and send them to the group.

Best, Merce
--
Mercè Crosas, PhD
Director of Product Development
IQSS, Harvard University
1737 Cambridge St., Cambridge, MA, 02138
Room K329
617-496-1274
mcr...@hmdc.harvard.edu
www.iq.harvard.edu

Merce Crosas

unread,
Sep 24, 2011, 9:41:26 AM9/24/11
to dataverse...@googlegroups.com
I thought that was the case.:-) thanks!

Tim Clark

unread,
Sep 24, 2011, 9:46:47 AM9/24/11
to dataverse...@googlegroups.com
Hi Merce,

Would it be useful, after you have your data model and examples ready, to get together for a group discussion in real time for an hour or so?

Best

Tim

Merce Crosas

unread,
Sep 24, 2011, 10:10:31 AM9/24/11
to dataverse...@googlegroups.com
Sure, great idea. Thanks for all your help. I'll talk to the team this week and we'll schedule it soon.
Merce
Reply all
Reply to author
Forward
0 new messages