Call for PID's: A critique on current Author/PoContact/Depositor date entry fields

65 views
Skip to first unread message

Richard Zijdeman

unread,
Jul 11, 2024, 2:59:50 AM (12 days ago) Jul 11
to Dataverse Users Community
Hi, 

Dataverse wonderfully provides the option to create Persistent IDentifiers for datasets and does that already for a very long time. Over time, the awareness to create PID's for other things than datasets / archival resources has grown, some even to believe that all things must be identified (Tim Berners-Lee "The Next Web", Wikidata). Dataverse supports this reasoning via the optional ID field for 'authors' of datasets, allowing authors to be identified via person identifiers such as ORCID's.

I'd like to raise four issues, that hamper the quality of data currently entered in Dataverse instances world wide.

1. Name of Dataset Author should not be obligatory
Currently, it is obligatory to enter the name of the author of a dataset. However, if I were to provide a ORCID for said author, the name would be a redundant characteristic as identification via PID supersedes identification via a string. Moreover, data entry of the name string is prone to inconsistencies, and is not friendly to author's changing their names over time (marriage, gender change). I don't want to make the case, here that no one should enter names anymore, but I don't want to be forced anymore to enter a author's name, when I provided that person's PID (e.g. ORCID).
So: please remove the forced entry of names (perhaps conditional on when a identifier is given).

2. PID's should be available for Point of Contact / Depositor
I don't need a lot of words to drive this point home as it follows from the previous. If we care about PID's for entities other than datasets, it should also be possible to add PID's for the Point of Contact / Depositor.

3. Point of Contact fields should not imply entry of person data
The point of contact data fields suggest that data on a person need to be entered: name ('Familyname,Givenname'), affiliation, email. However, many institutes use a 'helpdesk@institute' like email, to properly support Dataverse users (e.g. safeguarding non-response as a result of personnel having changed institutes). Thus, the point of contact could also be a organisation like entity. This is not just my opinion, it is also what the onMouseOver alt-text provides: "the name of the entity.. person ... or organisation". And this is also concurs with say: https://schema.org/ContactPoint

4. URI should be added as open identifier type
The implication of points 2 and 3, is that one would need to be able to add an identifier to a organisation or department. Currently, most identifier types relate to person's or controlled 'vocabulary' entities (e.g. VIAF). However, if an institute would like to work with the Identifier the community assigned to said institute, in our case, https://www.wikidata.org/wiki/Q1667757, I feel institutes should be able to do so.

These points may not work for all. I would like to emphasise that I am not requesting for my points to be added as 'obligatory'. However, points 1-4 underline that the current data entry fields are restrictive in terms of adding homogenous yet flexible metadata on entities responsible for datasets and prevent the adoption of community driven identifiers for said entities.

I would like to ask you for your view points regarding points 1-4. 

All the best,

Richard

Julian Gautier

unread,
Jul 12, 2024, 2:15:48 PM (10 days ago) Jul 12
to Dataverse Users Community
Hi Richard,

This is great! Thanks for sharing!

I agree about the first and second points, and I think the Dataverse community agrees, too. Members are working toward those goals. The external controlled vocabulary functionality is one example of this.

And the Dataverse UX Working Group is working on a redesign of metadata fields and the use of that external controlled vocabulary functionality in large part to improve the metadata of people and organizations associated with data. Eventually the group will be testing design ideas and it sounds like getting your input at that point could be really helpful. If you're interested in participating in usability testing, let me know and I can email you directly when we're ready.

For your third point, you wrote that the "point of contact data fields suggest that data on a person need to be entered: name ('Familyname,Givenname')". Just so we're sure, are you referring to the text that appears in the Point of Contact Name field when you haven't entered anything in that field (sometimes called a watermark)?

In the latest versions of the Dataverse software, and many, if not most repositories using those later versions, the field's watermark reads "1) FamilyName, GivenName or 2) Organization". Depending on the size of your browser, the fields can become too small to show the full watermark, so often the most I see is "1) FamilyName, GivenName or 2) Organi".

Is it possible you're only seeing part of the watermark, too?

There are old, resolved GitHub issues about how watermarks in other fields that ship with the Dataverse software could be cut off, like the GitHub issue at https://github.com/IQSS/dataverse/issues/2759. I couldn't find any open GitHub issues. It's possible that the redesign I mentioned earlier might address this problem.

And could you let me know how you're looking at these fields? Is it in a particular repository that uses the Dataverse software? Knowing this could be generally be helpful because folks who manage Dataverse installations are able to change the design of metadata fields that ship with the Dataverse software, so you and I might be referring to two different designs.

I think your fourth point is interesting and worth more discussion, too, although it's outside the scope of the redesign I mentioned earlier.

Thanks again!
Julian

Julian Gautier (he/him)
Product Research Specialist, IQSS

Richard Zijdeman

unread,
Jul 12, 2024, 5:14:01 PM (10 days ago) Jul 12
to Dataverse Users Community
Dear Julian,

Thank you for your very supportive and exciting answer!
Re 1 and 2: It's amazing all this work is already anticipated and under way and yes I'd be very happy to help out with usability testing.
Re 3: you are completely right! When zooming out my browser's view I also see the 'and organization' part. Sorry, I had completely missed this.
Re 4: I can see this requires more discussion from a substantive point of view. Do we really want to allow people to add any uri? For the frond end the implementation seems straightforward though, because it would only acquire a 'full URI' label as identifier type.

Thanks! Let me know when I can help testing things.

Best,

Richard

Gautier, Julian

unread,
Jul 15, 2024, 8:53:25 AM (8 days ago) Jul 15
to dataverse...@googlegroups.com
I'm just adding some more information that I think is relevant and hopefully helpful about letting depositors include a URI as an identifier type.

One goal of the deposit form redesign I mentioned last week is to encourage the use of ROR IDs, so that use will be one measure of the success of that redesign. And I've recommended that we review the metadata published by repositories that implement the changes to the deposit form, to look for cases where the organizations that depositors add to the metadata don't include ROR IDs and to learn why. One reason will probably be because the ROR API that Dataverse is pulling suggestions from doesn't include information about those organizations, and in those cases I've recommended that the Dataverse community take up ROR's offer to help add to their API information about those missing organizations.

I suppose that while an organization isn't in the ROR API, or may never be added to the ROR API, a URI would be better than nothing, so letting a depositor add a URI to something like a Wikidata page would be better than nothing.

Another thing that's related I think is the "Identifier Type" field that's part of the "Related Publication" field. In that case, one of the identifier types is "url", which seems similar to what you've suggested.
Reply all
Reply to author
Forward
0 new messages