Finding and Reusing Data on a Generalist Repository

0 views
Skip to first unread message

Open Science Framework

unread,
Apr 13, 2023, 2:35:49 PM4/13/23
to Varieties of Elitism
Tips and tricks for finding relevant datasets
Monthly Tips and Tricks: Finding and Reusing Data on a Generalist Repository

Finding Relevant Data

Over the next two months, we are going to share a series of tips and tricks at each stage of searching for datasets that can be reused in a repository like the OSF. The process of collecting research datasets can be broken into two stages: finding relevant datasets and evaluating datasets for selection. This month we are focusing on step one: finding relevant datasets. 

The OSF is a generalist repository, this means it is not limited to a single discipline. Other repositories may be more focused on disciplines, types of resources, or data created by specific institutions (for an exploration of different types of repositories, see this article from PLOS ONE). Generalist repositories can also accommodate data that may not have an existing disciplinary repository. Because the data can come from many fields, the common fields used to describe and categorize data can be fairly broad. Once you are familiar with the repository though, there can be ways to hone in on metadata related to specific topics.

Citation Chaining 

The first strategy is to use citation chaining, which is the process of mining the citations in relevant literature to find more sources. While many of these will be to other published articles, you can also find citations to datasets. When checking citations, look specifically for PID references like a DOI, a good sign that the data will still be available somewhere since these indicate a commitment to persistent access. In OSF, anyone publishing a preprint can also post and link to their data in an OSF project. These links appear as Supplemental Resources.

Screenshot of preprint on OSF with arrow pointing to Supplemental Materials

Previous Reuse 

Another method that could be helpful in finding reusable datasets is to look specifically at studies that successfully reproduce results they are by default using reproducible data.

In OSF Registries you can search specifically for replication studies by limiting results to the “Replication Recipe” types.

Screenshot of OSF Registries with arrows pointing to Replication Recipe
The “Replication Recipe” registration type is a standardized template for describing studies that are intended to reproduce original results. There are two types in OSF: one for preregistration (before the replication study is done) and one for post-completion.

In each of these cases, the fact that the study is being reproduced provides a good clue that this might be usable data (since it has already been reused once).

Targeted Searching 

When moving on to search directly for datasets, it’s good to understand the metadata structure of the repository. In OSF, metadata is a mix of “free-text” descriptive fields in which anything can be put (title, description, tags), and more controlled fields that use a list of specific terms like resource types and disciplines. Using the controlled terms from those lists in your searches may help you find more relevant data. 

A way to further pinpoint materials in OSF is to use filters when searching (we have them for OSF content type, tags, and licenses, but will be adding more in the future). You can also look for key indicators like the badges on registries.

Screenshot of OSF Registries with arrow pointing to open resource badges
Finally, as a general tip or best practice, try to document your search strategy. Keep a record of the terms used, the filters, and other refinements, as well as the dates and repositories searched. This will help you to avoid repetition in one repository while helping you replicate the same strategies in others.

Want to Know More?

For more information and other OSF tips and trick please see our support guides, or contact OSF Support for more information.

Interested in learning more about improving research? Join us for the Metascience 2023 Conference, taking place May 9-10 in Washington, DC. Free virtual pre-conference sessions will occur in late April and early May for a global audience.

Facebook
Twitter
LinkedIn
Website
Copyright © 2023 Center for Open Science, All rights reserved.
You are receiving this email because you subscribed to the Open Science Framework General mailing list.

Our mailing address is:
Center for Open Science
210 Ridge McIntire Road
Suite 500
Charlottesville, Virginia 22903

Add us to your address book


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.

Reply all
Reply to author
Forward
0 new messages