source of subject ControlledVocabulary in citation metadata block

75 views
Skip to first unread message

fooba...@gmail.com

unread,
Jul 8, 2021, 7:34:49 PM7/8/21
to Dataverse Users Community
Does anyone know how the subjects in the citation metadatablock were chosen? This is a required field so I'm wondering if it's part of a spec or standard outside of dataverse. The choices are:

Agricultural Sciences
Arts and Humanities
Astronomy and Astrophysics
Business and Management
Chemistry
Computer and Information Science
Earth and Environmental Sciences
Engineering
Law
Mathematical Sciences
Medicine, Health and Life Sciences
Physics
Social Sciences
Other

Thanks,
Aaron Curtis

Matthew Lange

unread,
Jul 8, 2021, 10:13:16 PM7/8/21
to dataverse...@googlegroups.com
Also wondering same thing.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/6297db52-0f5c-44c3-aa9c-2b164814ac19n%40googlegroups.com.

Philip Durbin

unread,
Jul 9, 2021, 12:47:45 PM7/9/21
to dataverse...@googlegroups.com
I'll take a stab at this but would be happy to learn more.

Based on some poking around design documents from 2014 when the rewrite of DVN 3 to Dataverse 4 was taking place, it seems like the list of subjects may have been influenced by "the revised field of science and technology classification" in a PDF labeled "OECD Controlled Vocabulary Subjects" at https://www.oecd.org/science/inno/38235147.pdf

I'm attaching a screenshot from a spreadsheet called "Applying a Controlled Vocabulary to Dataverse Subjects" that references this OECD list.

I hope this helps,

Phil



--
Screen Shot 2021-07-09 at 12.44.32 PM.png

Aaron Curtis

unread,
Jul 9, 2021, 4:17:23 PM7/9/21
to Dataverse Users Community
Thanks, Philip, that's very illuminating. FWIW this subject list is a long way from what we need (we want things like Planetary Geology, Mission Operations, Electrical Engineering, Mechanical Engineering) so we'll probably just end up setting it to Other for all records and using a custom metadatablock. It's a bit annoying that it's a required field in the citation block, but not a huge deal.

Aaron

Sherry Lake

unread,
Jul 9, 2021, 4:42:15 PM7/9/21
to dataverse...@googlegroups.com
Subject is a required field, but are those exact values required?

We needed different subjects as well and I modified that controlled vocab list (in the citation.tsv file) to fit our needs. 

Sherry

fooba...@gmail.com

unread,
Jul 9, 2021, 4:53:15 PM7/9/21
to Dataverse Users Community
Ah, good point. I'll just modify citation.tsv.

Aaron

Philip Durbin

unread,
Jul 9, 2021, 5:07:45 PM7/9/21
to dataverse...@googlegroups.com
We advise only modifying citation.tsv and other metadata blocks that ship with Dataverse via pull request rather than trying to maintain local copies. We express this in the guides as "Generally speaking it is safer to create your own custom metadata block rather than editing metadata blocks that ship with the Dataverse Software, because changes to these blocks may be made in future releases." https://guides.dataverse.org/en/5.5/admin/metadatacustomization.html

That said, many installations out there must be modifying citation.tsv because "Genetic Resource" is the number one subject under the "Datasets by Most Common Subject" graph (screenshot attached) on our metrics site that aggregates across known Dataverse installations. Please see https://dataverse.org/metrics

There's quite a long tail of subjects installations have added. Here's the data behind that graph:

name count
Agricultural Sciences 2331
Animal Breeding and Animal Products 52
Animal Health and Pathology 17
Architecture 27
Arts and Humanities 2707
Arts and Humanities (Ex: English, History, Foreign, Language) 47
Astronomy 2
Astronomy and Astrophysics 884
Biodiversity and Ecology 179
Business and Management 876
Business, Management, Leadership 1
Chemistry 1025
Chemistry and chemical engineering 30
Climate 63
Climate Change, Energy and low carbon development (CCE) 197
Computer and Information Science 2066
Computer science 28
Earth and Environmental Sciences 5385
Economics 16
Engineering 1403
Environmental Sciences 16
Equal opportunities, Gender, Justice and Tenure (EGT) 8
Farming Systems and Practices 75
Fine and Performing Arts 10
Fishes and Aquaculture 19
Food Safety and Toxicology 14
Food and food processing 62
Forest Management & Restoration (FMR) 173
Forests and Forest Products 52
Forests and Human Well-being (HWB) 51
Genetic Resource 81082
Human Health and Pathology 19
Human Nutrition and food security 24
Information management 21
Insects and Entomology 24
Land Use 4
Law 4686
Material Science and Engineering 23
Mathematical Sciences 352
Medicine, Health and Life Sciences 7465
Microorganisms 73
Omics 125
Other 2535
Physics 2034
Plant Breeding and Plant Products 107
Plant Health and Pathology 80
Rural and Agricultural Sociology 8
Social Sciences 30428
Social Sciences (Ex: Education, Politics, Sociology, Economics, Psychology) 24
Soils and soil sciences 85
Sustainable Landscapes & Food (SLF) 41
Sustainable Landscapes & Livelihoods (SLL) 3
Value Chain, Finance & Investments (VFI) 23
Water resources 48

Thanks,

Phil

Screen Shot 2021-07-09 at 4.57.59 PM.png

Philipp at UiT

unread,
Aug 14, 2021, 8:56:05 AM8/14/21
to Dataverse Users Community

To me this sounds like a good candidate for moving to a set-up which is integrated with external controlled vocabularies (CVV). This would allow users to choose the scientific field from the OECD list or whatever other CVV there might be with such terms. If there is no suitable CV, the user would choose "Other" and add a customized term.

Best, Philipp
Reply all
Reply to author
Forward
0 new messages