Evaluating Data Viability

0 views

Skip to first unread message

Open Science Framework

unread,

May 11, 2023, 2:15:44 PM5/11/23

to Varieties of Elitism

View in browser

Evaluating Data Viability

This month we are going to continue our series of tips focused on different stages of dataset discovery on a generalist repository like the OSF. The process of collecting research datasets can be broken into two stages: finding relevant datasets and evaluating datasets for selection. This month we are focusing on step two: evaluating data for reuse. See Finding Relevant Data for step one.

Recap: Once we’ve found some data that we think is relevant the next step is to evaluate the datasets: are they of sufficient quality to be reused and relevant to the research question? We have some good guidelines to use for evaluation relating to metadata quality, documentation completeness, indicators of use, and licensing.

Metadata Quality

We can first look at the metadata quality using some of the FAIR guidelines, which stands for findable, accessible, interoperable, and reusable. Applying those concepts to this record, some things we can see are:

This data is easily findable. In this case, it has a DOI making it easy to find in the future.
Accessibility refers to the ease of access, and in this case, we have the download of the full file readily available.
Interoperability refers to using standards, and at a glance, we can see that this set uses standard formats like CSV.
And finally, something like a license can indicate reusability.

Screenshot of OSF looking at factors for metadata quality

View the full-size image.

These aren’t the only aspects that meet these criteria, but just some examples.

Documentation Quality

When judging whether data and documentation are complete we can look for things like logically structured directories, read-me files that have good descriptions, and meaningful file names. These are all good signs that we will have enough information to reuse the data. Let's look at an example dataset.

Screenshot of file on OSF pointing to the where, what, when, and who

View the full-size image.

Good data documentation also tells the prospective user about the following:

Who collected the data, or who is the subject of it?
What type of data is it? In this case, software.
When was it created? How recently was it revised? What time period does it cover if it is time-based?
Where is the data from, in terms of the institution or repository which can help instill trust? It could also refer to geographies being covered or sources of origin.

Best Practice: Data Citations

Finally, a best practice is to record the citation of the data when it is gathered to make sure you have all the relevant details and don’t lose track. A good data citation includes:

Author
Title
Version
Publisher
Date
Persistent identifier (PID)

Want to Know More?

For more information and other OSF tips and tricks please see our support guides, or contact OSF Support for more information.

Center for Open Science

210 Ridge McIntire Road

Suite 500

Charlottesville, Virginia 22903

Add us to your address book

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.

Reply all

Reply to author

Forward

0 new messages