Earning Badges for Open Practices

Brian Nosek

unread,

Jun 24, 2013, 2:41:14 PM6/24/13

to Open Science Framework

The purpose of this thread is to spur review and discussion of the standards for earning open data, open materials, and preregistered badges. Some context:

(1) The draft test is here: https://docs.google.com/document/d/1pke3-HrYOj9KMftvLHLF-uNMtNEXEiyWgNbc8C4fG1E/edit. You should be able to review and comment. If you'd like editing privileges, just email me with the gmail account to add.

(2) The badge design contest is for four badges - this text is for just three. The "reproducible" badge is not part of this initial specification because it has some substantial complexities for developing a standard. We can work on that one in the future.

(3) The goal of the specification document is to provide guidance for journals about how to adopt the badges, and standards to establish a shared meaning of the badges.

(4) The specification document should be non-specific in terms of scientific discipline. That is, it should capture the issues and common language to apply across the sciences. Discipline, or journal-specific, needs will develop as additional standards on top of these general standards.

(5) The purpose of badges is to incentivize good practices. As such it has to strike a balance between idealized practices that might be possible but impractical on the one hand versus being overly flexible so as to have no constraint for adequately promoting good practices on the other hand.

(6) The text does not yet offer the examples and illustrations for how to integrate the badges into journal practices. The ideal document will be just a few pages long and be distributable to journal editors to rapidly see the value of badges and how they could be incorporated into practice.

(7) There will be lots of nuance for specific discipline or journal use. We might be able to provide some helpful guidance for that with some accompanying documentation about dealing with unusual cases, some disciplinary specific needs, or other issues (e.g., an FAQ guide). Some of these nuances and exceptional cases will come up in the review process of the document and we can maintain a list for elaborating solutions and support.

As with all OSC projects, please feel free to comment publicly or privately - on this list or directly on the document. The more comprehensive our critique now, the better the document will be in the end.

Brian

Gustav Nilsonne

unread,

Jun 24, 2013, 4:04:36 PM6/24/13

to openscienc...@googlegroups.com

Hi all,

Thanks Ben for the proposed text (and for your efforts in general).

One additional point comes to my mind. It applies when the same experiment or dataset gives rise to several publications. The open approach, in my opinion, is to describe the whole experiment or dataset in every paper well enough that the reader can understand the interrelationship between the particular finding reported in the paper and the data collection more broadly. Furthermore, all papers except the first should clearly reference the first so that it is clear that findings have been obtained from the same cohort or dataset. This is particularly important in order not to introduce bias to meta-analyses. If others agree, this aspect could perhaps be incorporated under one of the headings (open data?).

Best wishes, Gustav

Gustav Nilsonne, MD, PhD
Postdoc

+46 (0) 736-798 743

Stockholm University

Stress Research Institute

106 91 Stockholm

gustav....@stressforskning.su.se

Karolinska Institutet

Department of Clinical Neuroscience and Osher Center for Integrative Medicine
Retzius väg 8 A2:3
171 77 Stockholm

gustav....@ki.se

From: openscienc...@googlegroups.com [openscienc...@googlegroups.com] on behalf of Brian Nosek [no...@virginia.edu]
Sent: 24 June 2013 20:41
To: Open Science Framework
Subject: [OpenScienceFramework] Earning Badges for Open Practices

--
You received this message because you are subscribed to the Google Groups "Open Science Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframe...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Gustav Nilsonne

unread,

Jun 24, 2013, 4:05:39 PM6/24/13

to openscienc...@googlegroups.com

Meant to say: thank you Brian. Sorry.

/Gustav

Gustav Nilsonne, MD, PhD
Postdoc

+46 (0) 736-798 743

Stockholm University

Stress Research Institute

106 91 Stockholm

gustav....@stressforskning.su.se

Karolinska Institutet

Department of Clinical Neuroscience and Osher Center for Integrative Medicine
Retzius väg 8 A2:3
171 77 Stockholm

gustav....@ki.se

From: openscienc...@googlegroups.com [openscienc...@googlegroups.com] on behalf of Brian Nosek [no...@virginia.edu]
Sent: 24 June 2013 20:41
To: Open Science Framework
Subject: [OpenScienceFramework] Earning Badges for Open Practices

Brian Nosek

unread,

Jun 24, 2013, 5:10:30 PM6/24/13

to Open Science Framework

Ben probably doesn't mind being thanked either -- he's putting a lot of work in too.

Disclosure of prior uses of the same data is certainly important, if only as ethical practice let alone citation of prior relevant work. Is this something that needs to be rewarded or described for badges, or is it already a standing expectation? My sense is the latter, but perhaps there are some variations by discipline about this?

Gustav Nilsonne

unread,

Jun 24, 2013, 6:02:27 PM6/24/13

to openscienc...@googlegroups.com

OK! :-)

In my area this is a real issue. I see fMRI papers from time to time that report one experiment out of several that were performed in the same scanning session, with no mention of the other experiments. Of course there is no way to know in such a case whether there might have been carryover effects from an earlier experiment. This seems to be a fully accepted practice in the fMRI field.

In other areas such as clinical trials, I would agree that it is a standing expectation not to silently double-report different outcomes from the same study, although it happens too often nonetheless.

/Gustav

Gustav Nilsonne, MD, PhD
Postdoc

+46 (0) 736-798 743

Stockholm University

Stress Research Institute

106 91 Stockholm

gustav....@stressforskning.su.se

Karolinska Institutet

Department of Clinical Neuroscience and Osher Center for Integrative Medicine
Retzius väg 8 A2:3
171 77 Stockholm

gustav....@ki.se

From: openscienc...@googlegroups.com [openscienc...@googlegroups.com] on behalf of Brian Nosek [no...@virginia.edu]

Sent: 24 June 2013 23:10
To: Open Science Framework
Subject: Re: [OpenScienceFramework] Earning Badges for Open Practices

Jon Peirce

unread,

Jun 25, 2013, 5:36:19 AM6/25/13

to openscienc...@googlegroups.com, no...@virginia.edu

Nice work both of you! The draft is impressively complete.

I added a couple of comments there about a minor issue that I see analysis as part of materials necessary to conduct a study rather than part of the data. When I make my data available I should certainly provide sufficient info for that to be unpacked/decoded etc, but not necessarily for you to perform all my analyses. Many people (e.g the neuroimaging community and funding bodies) think of sharing data as reducing the recollection of it, rather than necessarily for reproducibility purposes.

But this is a minor thing, next to the main message of "Nice job!" :-)

For the question about Reproducible Research, maybe we can borrow form the definitions used by others, such as Victoria Stodden (actually, wondering about how her Reproducible Research Standard could be encouraged was what got me thinking about the badging idea in the first place)

http://www.stanford.edu/~vcs/papers/Licensing08292008.pdf‎

or this by Kovacevik:

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4218340

Although I have to say I like the concise way you've written the descriptions in the google document. Just thinking we shouldn't walk past existing ideas of others if we can include them.

best wishes,

Jon

Roger Giner-Sorolla

unread,

Jun 25, 2013, 7:16:48 AM6/25/13

to openscienc...@googlegroups.com, no...@virginia.edu

A couple of things, more relevant to going on beyond this scheme than the scheme itself:

Once you have made materials public, what is left to fulfill "reproducibility"? I can think of two things: details of procedures (but how much?) and ensuring results reproducibility by using statistical power high enough to make results reliable. Both, I think, can easily be quantified, although not without controversy as to the criterion. But they seem kind of odd to jam into the same badge.

Should a badge for disclosure a-la the "21/18 words" exist? Or are we anticipating that that will just become the norm for honesty in the next few years?

Leon Rozenblit

unread,

Jun 25, 2013, 9:42:30 AM6/25/13

to openscienc...@googlegroups.com, no...@virginia.edu

Great start. One issue jumps out at me -- we may need to clarify what "data are publicly available" means in greater detail. The problematic case I have in mind is as follows:

An organization makes biomedical research data available to the the research community, but only after a Data Access Committee reviews the request. An organization has a policy that all reasonable data requests from qualified researchers are granted, but the approval process must remain in place to insure that only qualified researchers (liberally defined as anyone with an affiliation to a research institution) can obtain data (largely because of concerns that data may be misused by malefactors to identify patients). Is the data "publicly available"?

What if the organization further requires that data consumers sign an agreement prior to accessing the data promising to comply with reasonable data use policies (e.g., keep it safe, don't try to re-identify participants).

For real-world examples, see

http://sfari.org/resources/sfari-base

https://research.agre.org/

http://ndar.nih.gov/

Ben Blohowiak

unread,

Jun 25, 2013, 10:02:21 AM6/25/13

to openscienc...@googlegroups.com

I share Leon's concern, and genetic data aren't the only ones that may prove problematic or challenging. Coded observation of video of research participants in psychology research can involve facial and/or vocal phenomena that can convey personal identity as well as content that participants do not necessarily want to disclose publicly. While identifying information can be obscured, obscuring techniques (image blurring, audio manipulation) can also corrupt data from the perspective of others' ability to reliably code such video as if it were equivalent to its unadulterated form.

Joshua Hartshorne

unread,

Jun 26, 2013, 8:36:55 AM6/26/13

to openscienc...@googlegroups.com, gustav....@ki.se

I don't know about fMRI, but having several unrelated experiments in one EEG session is pretty common and certainly not something we want to discourage. Good EEG experiments require that your critical trials be only a small fraction of all trials. The rest can be random filler, but it's more efficient to simply run several studies, with each one using the others as fillers.

From: openscien...@googlegroups.com [openscienc...@googlegroups.com] on behalf of Brian Nosek [no...@virginia.edu]

Sent: 24 June 2013 20:41
To: Open Science Framework
Subject: [OpenScienceFramework] Earning Badges for Open Practices

The purpose of this thread is to spur review and discussion of the standards for earning open data, open materials, and preregistered badges. Some context:

(1) The draft test is here: https://docs.google.com/document/d/1pke3-HrYOj9KMftvLHLF-uNMtNEXEiyWgNbc8C4fG1E/edit. You should be able to review and comment. If you'd like editing privileges, just email me with the gmail account to add.

(2) The badge design contest is for four badges - this text is for just three. The "reproducible" badge is not part of this initial specification because it has some substantial complexities for developing a standard. We can work on that one in the future.

(3) The goal of the specification document is to provide guidance for journals about how to adopt the badges, and standards to establish a shared meaning of the badges.

(4) The specification document should be non-specific in terms of scientific discipline. That is, it should capture the issues and common language to apply across the sciences. Discipline, or journal-specific, needs will develop as additional standards on top of these general standards.

(5) The purpose of badges is to incentivize good practices. As such it has to strike a balance between idealized practices that might be possible but impractical on the one hand versus being overly flexible so as to have no constraint for adequately promoting good practices on the other hand.

(6) The text does not yet offer the examples and illustrations for how to integrate the badges into journal practices. The ideal document will be just a few pages long and be distributable to journal editors to rapidly see the value of badges and how they could be incorporated into practice.

(7) There will be lots of nuance for specific discipline or journal use. We might be able to provide some helpful guidance for that with some accompanying documentation about dealing with unusual cases, some disciplinary specific needs, or other issues (e.g., an FAQ guide). Some of these nuances and exceptional cases will come up in the review process of the document and we can maintain a list for elaborating solutions and support.

As with all OSC projects, please feel free to comment publicly or privately - on this list or directly on the document. The more comprehensive our critique now, the better the document will be in the end.

Brian

--
You received this message because you are subscribed to the Google Groups "Open Science Framework" group.

To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframework+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Open Science Framework" group.

To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframework+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Open Science Framework" group.

To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframework+unsub...@googlegroups.com.

Gustav Nilsonne

unread,

Jun 26, 2013, 10:31:53 AM6/26/13

to openscienc...@googlegroups.com

Do you agree that the publications from an EEG experiment should ideally mention what the "filler" experiments were, and which other papers (if any) were based on data from the same experimental runs?

/Gustav

Gustav Nilsonne, MD, PhD
Postdoc

+46 (0) 736-798 743

Stockholm University

Stress Research Institute

106 91 Stockholm

gustav....@stressforskning.su.se

Karolinska Institutet

Department of Clinical Neuroscience and Osher Center for Integrative Medicine
Retzius väg 8 A2:3
171 77 Stockholm

gustav....@ki.se

From: openscienc...@googlegroups.com [openscienc...@googlegroups.com] on behalf of Joshua Hartshorne [jkhart...@gmail.com]
Sent: 26 June 2013 14:36
To: openscienc...@googlegroups.com
Cc: Gustav Nilsonne

To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframe...@googlegroups.com.

Ben Blohowiak

unread,

Jun 26, 2013, 11:25:57 AM6/26/13

to openscienc...@googlegroups.com

I agree. The distinction between a single experiment in a single session and multiple experiments in a single session may be more semantic than ontological, and because of this it is incumbent upon us to report (to the extent possible) what we are already controlling and documenting as part of experimental procedure, or at least make it accessible to investigators. The potential for new insights via more fully-informed review of findings or via meta-analyses seems worth the minimal administrative overhead involved in tagging or linking experiments otherwise framed as discreet. A multiplicity of experiments in a given session is no great tragedy; missing an opportunity for transparency about it that could lead to discovery may be.

Melissa Lewis

unread,

Jun 26, 2013, 6:02:31 PM6/26/13

to openscienc...@googlegroups.com

This may already have been discussed elsewhere, but because (if I understand correctly) the badge system is meant to incentivize rather than mandate these specific practices, won't it be possible for some articles just to meet whatever subset of criteria they can and provide commentary on the rest? That is, I could earn a badge for my open materials and somewhere in the article account for why it isn't possible to share data also.

Ben Blohowiak

unread,

Jun 26, 2013, 6:46:23 PM6/26/13

to openscienc...@googlegroups.com

The question is not if data sharing is compulsory, but rather what qualifies as data sharing. I agree that a given author ought be able to address why a given article falls short of a given badge standard if (s)he so opts; to do so, it helps if the decision regarding the badge not being awarded is clearly understood by all involved. This is made possible by having logical and consistent standards for badge-awarding.

I personally don't think that video of subjects with digitally altered faces and voices in the example that I gave necessarily meets the threshold of openness that I'm envisioning, but I as an individual expressing a private opinion may be overruled by the emerging consensus and my own changing mind.

If I had to verbalize the essence of the current question, I would summarize it thusly: when it comes to open data, how much is enough?

Joshua Hartshorne

unread,

Jun 26, 2013, 10:52:25 PM6/26/13

to openscienc...@googlegroups.com, gustav....@ki.se

My concern is that the conversation on this list is often focused on solving the problems of social psychology and then imposing the solutions on other fields, where they may not make sense. Running many manipulations and only reporting the one that "works" is a problem and needs to be discouraged. But we wouldn't want the method of discouragement to result in discouraging the running of multiple EEG experiments in the same session, which is actually a very good idea!

Conversely, nobody would want to apply the standards of, say, psycholinguistic EEG studies to social psychology. (Step 1: Exclude left-hander and bilinguals!) Using the fMRI example, do we want a badge for using a functional localizer, with the implicit designation of any behavior study -- which by definition cannot involve functional localizers -- as using suspect methods?

Gustav Nilsonne

unread,

Jun 27, 2013, 2:01:18 AM6/27/13

to openscienc...@googlegroups.com

I agree that we should not get into discouraging or encouraging aspect of experimental design such as you mention Joshua. To my mind, the question is not about methods as such, e.g. whether to run several experiments in the same EEG run. The question is whether it would be valuable as an aspect of open reporting to state exactly what was done, i.e. to report that several experiments were included in the same run, what they were, and whether any previous papers have reported results from the same experimental run/cohort of participants. I think broadly speaking this applies to many kinds of studies: brain imaging, behavioural, clinical, and epidemiological.

/Gustav

Gustav Nilsonne, MD, PhD
Postdoc

+46 (0) 736-798 743

Stockholm University

Stress Research Institute

106 91 Stockholm

gustav....@stressforskning.su.se

Karolinska Institutet

Department of Clinical Neuroscience and Osher Center for Integrative Medicine
Retzius väg 8 A2:3
171 77 Stockholm

gustav....@ki.se

From: openscienc...@googlegroups.com [openscienc...@googlegroups.com] on behalf of Joshua Hartshorne [jkhart...@gmail.com]

Sent: 27 June 2013 04:52

To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframe...@googlegroups.com.

Frank Farach

unread,

Jun 27, 2013, 2:28:04 AM6/27/13

to openscienc...@googlegroups.com, gustav....@ki.se

The approach we seem to be headed toward is that we should incentivize practices that 1) meet a reasonable standard of openness, 2) are considered by the local discipline to be desirable, and 3) are not already common in the field. We (COS and community) provide general guidance through the badge documentation, but journals' editorial staff and peer reviewers are given some guided leeway to make judgments based on their knowledge of the subdomain.

Will this approach work across fields while retaining the meaning of "openness"? We can test it using some of the examples offered in this thread.

For example, let's assume that in EEG research, it is considered good practice to buffer your primary study with multiple filler studies, and that it is desirable to disclose all of these studies in your papers or supplemental materials. If researchers who use EEG aren't disclosing this information (or data) often enough, then journals editors and reviewers should be encouraged to incentivize it by awarding badges to researchers who do make such disclosures. The same principles could also be applied to reporting other in-scanner tasks in fMRI research or disclosing prior uses of the same data in clinical trials.

There are also important edge cases to consider, like the restricted-data-access scenario Leon mentioned. In some fields (e.g., clinical trials), it is common for authors not to have permission to release the data publicly. They may even have to send other researchers' back-channel requests for data to an IRB before sharing the data privately (a scenario noted in the documentation). Perhaps because of the red tape involved, it's not very common to see authors explicitly disclose to readers how they can request access to the data. Yet such disclosures might encourage (embolden?) other authors to do so, which is arguable a desirable outcome. So, we have a concrete, desirable step toward openness that is not already standard practice. Should editors and reviewers be allowed to assign badges for such disclosures? I think so, but some may disagree that it would water down the badges' meaning. That's a debate we should have, as it will help us clarify how far we want to push the concept of "openness" through these badges.

Frank

To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframe...@googlegroups.com.

Ben Blohowiak

unread,

Jun 27, 2013, 8:58:09 AM6/27/13

to openscienc...@googlegroups.com

Farach's framing of the effects of badge use on openness in terms of
*pushing the concept* prompted me to reflect on how open data,
although perhaps a worthwhile end in itself, is more of value as a
means to an end in the context of scientific practice. If approached
from this means-end or utilitarian perspective, questions regarding
what kinds of open data tend to be helpful and to what extent may thus
be determined by what one does with them.

As of now, the draft document regarding badge definitions says that,
in the most general terms, the open data badge indicates the
availability of data "necessary <i>to reproduce the reported
results</i> [emphasis added]." (This is distinct from the
reproducibility badge, which, according to my current understanding,
focuses less on results and more on how to run the experiment that
yielded them.) From a long-term, strategic perspective, it may make
sense to crawl before we run, and by that I mean that while in a
perfect imaginary world all raw captured data might be available to
all interested parties, I don't see that happening any time soon.

However, the notion that someone using spreadsheet- or relational
database-like archives to store their coded information could simply
upload or make public at the touch of a button (thanks, OSF!) the file
or files from which they ran their calculations and the code used to
calculate them--as opposed to not doing so--seems within our grasp. So
let's grasp it. It may be that a trend toward making raw data
accessible (when ethical and practical) will emerge as an auxiliary
effect of the sea change wrought by the preponderance of badge-seeking
behavior and the attitudes that accompany it. To make the
accessibility of raw data (or its iterations as it is
processed/cleaned/prepared for analysis) a prerequisite for
badge-earning may have the unintended effect of alienating and
discouraging many otherwise earnestly disclosing researchers whose
practices ought be regarded as model behavior in their respective
fields and publications.

Sean Mackinnon

unread,

Jun 27, 2013, 11:33:17 AM6/27/13

to openscienc...@googlegroups.com

Another thing worth considering that I run up against at Dalhousie University here in Nova Scotia, Canada is ethical constraints with releasing data outside of Canada. Rules got a lot more strict here for data sharing after the Patriot Act was enacted in the United States. Having talked to the folks at ethics about this, they are *very* uncomfortable with the idea of raw data being posted online or to a repository, and if data is to be shared with anyone that wasn't originally on the ethics application, they typically expect a new ethics application to be submitted for "secondary analysis of data." At a bare minimum, they seem to want to be consulted before any data sharing outside of Canada takes place. So there are structural issues that make it challenging both for me to share the data easily, and for other researchers to access the data.

This said, it's definitely possible to share data with other researchers; however, it would involve a multi-step process with lots of forms, and probably take a couple of months to finally send the data if I follow the university procedure as currently written. The situation is basically the same as the biomedical example raised by Leon. What might I (or other people in similar situations) do to promote open data sharing under these circumstances? As currently written, I would not be able to earn the "Open Data" badge from research I conduct at Dalhousie, because university policy restricts me from posting data to a repository in most cases.

Perhaps the authors would need to include a step-by-step guide of how interested parties can access the dataset, along with a freely available codebook posted to a repository? Would that be sufficient, or would some other third party also need to verify that said dataset actually exists and matches the codebook to get the badge?

Eric-Jan Wagenmakers

unread,

Jun 27, 2013, 12:15:45 PM6/27/13

to openscienc...@googlegroups.com

If it seems to you that posting the raw data (e.g., a series of RTs,
properly anonymized) is the right academic course to take, then I'd
say go for it. Ethics folks are useful for borderline cases perhaps,
or to avoid one's own personal responsibility. But this is not a
borderline case. Publishing data and disseminating knowledge is a
university's core business. Sometimes we happen to summarize the data
before we publish them, reporting box plots or means instead of time
series, but this practice is born out of convenience, taking into
account our inherent limitations in information processing. It is
certainly not a practice that is born out of ethical concerns.

So if the worst that could happen when you make your data available
online is that folks at ethics get very worried, then let them. It's
after all what they get paid for. But in the wide majority of cases I
don't see an ethical issue on the horizon. In fact, hiding your data
from public scrutiny seems to me to be the ethical concern that is of
primary importance.

Cheers,
E.J.

> --
> You received this message because you are subscribed to the Google Groups
> "Open Science Framework" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openscienceframe...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
********************************************
Eric-Jan Wagenmakers
Department of Psychological Methods, room 2.09
University of Amsterdam
Weesperplein 4
1018 XA Amsterdam
The Netherlands

Web: www.ejwagenmakers.com
Book: www.bayesmodels.com
Email: EJ.Wage...@gmail.com
Phone: (+31) 20 525 6420

“Man follows only phantoms.”
Pierre-Simon Laplace, last words
********************************************

Shauna Gordon-McKeon

unread,

Jun 27, 2013, 12:46:42 PM6/27/13

to openscienc...@googlegroups.com

I agree with those who worry that restricting the awarding of the "open data" badge to studies where raw data is made available might discourage the majority of folks doing human subjects research, many of whom might like to make their data as open as possible but face real concerns about privacy. That said, it is no easy to task to determine how to maximize openness while protecting privacy, both in general and in specific fields and specific cases. The de-identification research that has been done is very suggestive - and also very preliminary. Were I on an IRB or similar committee I would absolutely default to denying requests to share raw data.

I think it would be very useful for us to develop/refine a set of standards (for anonymizing raw data without compromising reproducibility, for defining an acceptable summary statistic process, for determining which hoop-jumping-requirements are reasonable, etc, etc) but that's a pretty big task.

Ben Blohowiak

unread,

Jun 27, 2013, 1:04:54 PM6/27/13

to openscienc...@googlegroups.com

Sean--you were kind enough to share some of the constraints around sharing in Canada at this point in time. Looked at from the other direction, what are your ethics folks comfortable with? That might be fertile soil from which to proceed.

--

Brian Nosek

unread,

Jun 27, 2013, 1:38:25 PM6/27/13

to Open Science Framework

Hi all --

Some really good discussion about the constraints on data sharing here. There is a useful distinction to make between: (a) what are good and defensible practices for data sharing in the various disciplines that have unusual constraints (e.g., video, identifiable human subject data, massive or difficult to interpret raw data), and (b) what are the criteria for earning an "open data" badge. The first is a very worthwhile discussion, and there are many relevant points here about it. But, some of it may be drifting away from the purpose of badges.

Badges do not define the boundaries of good practice, they certify that a particular practice was followed. Consider, for example, organic foods. To earn a "badge" as organic food, the food must be grown and prepared following particular guidelines. If those guidelines are not met, then it doesn't get the badge. Further, it is likely harder to meet those criteria for some types of food compared to others. But, not getting the badge does not mean that the food is not okay, or that the "inorganic" factors in that circumstance were ill-advised. The organic label just certifies that the food meets the guidelines.

This applies to the open data badge similarly. There are lots of good reasons that data cannot be open in various circumstances. That will not go away. Researchers working with human participant data will often have more difficulty meeting an open data standard than other researchers. That is simply a fact of competing ethics - the ethic of openness for data and the ethic of privacy for research participants.

It is important, however, that the badge have as clear a meaning as possible. If the meaning of "organic" is highly labile across foods, then it will fail to mean anything. So, our task for badges is to define what does "open data" mean. The challenge is that this needs to be specific enough to have some shared meaning, but flexible "just enough" to be broadly applicable.

Focusing just on open data, here is the criteria that we have presently:

The Open Data badge is earned for making available the digitally-shareable data necessary to reproduce the reported results. Criteria: (1) data are publicly available on an institutional repository (e.g., university repository or find a repository at http://www.re3data.org/), and (2) codebook included with sufficient description for an independent researcher to reproduce the reported analyses and results. If data are from a larger dataset than reported, then only the data needed to reproduce the reported results are necessary for badge eligibility (e.g., other data might be embargoed for future use).

Leon brought up an excellent use case to consider against this definition. What if an organization is willing to share, but has a review policy that will provide access for "all reasonable requests from qualified researchers". Should this count as achieving "Open Data"?

Pro: Widens the definition to include groups that are willing to share but have some minor restrictions legally, ethically, functionally, or because they want to retain *some* degree of control

Cons: Researcher is retaining some control leaving unknown the extent to which it is truly open (ex. who decides who is a "qualified researcher"? I have witnessed people use this criterion to decline sharing to critics because they are judged, by the target, to be not qualified). Difficult to identify the boundary between open and not in this scenario (ex. Could anyone say "I am willing to share" and earn the badge without actually making it open?).

Perhaps we can identify other circumstances to pit against the criteria to decide whether they are robust - but keeping in mind that the goal of the badge is to certify "Open Data" not "defensible, reasonable, ethical practice". In fact, might violate ethics by making data open that should not be.

A few other comments have come up in discussion that may require some revision/clarification in the proposals. Separately, I'll try to propose some resolutions for the group to consider.

Sean Mackinnon

unread,

Jun 27, 2013, 1:42:11 PM6/27/13

to openscienc...@googlegroups.com

Sure, this link might clear up some of their thoughts:

http://www.dal.ca/content/dam/dalhousie/pdf/research-services/Personal_Information_Protection_Guide.pdf

Realistically though, I don't tend to find that rulings on ethics tend to be consistent or stable over time. It depends a great deal on who is reviewing a particular case, and the rulings can sometimes be contradictory. There's also a lot of unwritten rules (e.g., submitting ethics applications for secondary analysis of data is not clearly documented, but is definitely expected after talking to people here). The rule of thumb around here is that everything is taken on a case-by-case basis, and evaluated separately. I think that the ethics board here really is most comfortable with being consulted before any instance of data sharing (i.e., each time a person wants access to the data).

I don't think that people in our ethics board, at least, have really sat down and considered the argument that hiding or otherwise limiting access to data can also be unethical. I think that they may (eventually) warm up to the idea that some data, at least can be sufficiently anonymized to be posted on a repository. I suspect that there will always be some instances where extra precautions will need to be taken to protect privacy (e.g., posting variables like gender, age, race, or psychiatric diagnoses can represent a potential privacy concern). Balancing privacy vs openness is challenging in human subjects research, and probably always will be.

I really do like the idea of posting data online to repositories, and I'm certainly a proponent of open data (wouldn't be here otherwise!), but I don't think it will always be a viable option to post data publicly in the interest of privacy. Obviously people should post data online in an easy-to-access format wherever possible, but I think that issues of privacy and respect for the ethical rights of participants can't be ignored in many sub-fields of psychology. When issues of privacy are a genuine concern (which I suppose you would verify by consulting with an ethics board) then I think it is reasonable, and expected to restrict access to certain types of data.

On Thursday, June 27, 2013 2:04:54 PM UTC-3, Ben Blohowiak wrote:

Sean--you were kind enough to share some of the constraints around sharing in Canada at this point in time. Looked at from the other direction, what are your ethics folks comfortable with? That might be fertile soil from which to proceed.

On Thursday, June 27, 2013, Shauna Gordon-McKeon wrote:

I agree with those who worry that restricting the awarding of the "open data" badge to studies where raw data is made available might discourage the majority of folks doing human subjects research, many of whom might like to make their data as open as possible but face real concerns about privacy. That said, it is no easy to task to determine how to maximize openness while protecting privacy, both in general and in specific fields and specific cases. The de-identification research that has been done is very suggestive - and also very preliminary. Were I on an IRB or similar committee I would absolutely default to denying requests to share raw data.

I think it would be very useful for us to develop/refine a set of standards (for anonymizing raw data without compromising reproducibility, for defining an acceptable summary statistic process, for determining which hoop-jumping-requirements are reasonable, etc, etc) but that's a pretty big task.

On Thu, Jun 27, 2013 at 12:15 PM, Eric-Jan Wagenmakers

If it seems to you that posting the raw data (e.g., a series of RTs,

properly anonymized) is the right academic course to take, then I'd
say go for it. Ethics folks are useful for borderline cases perhaps,
or to avoid one's own personal responsibility. But this is not a
borderline case. Publishing data and disseminating knowledge is a
university's core business. Sometimes we happen to summarize the data
before we publish them, reporting box plots or means instead of time
series, but this practice is born out of convenience, taking into
account our inherent limitations in information processing. It is
certainly not a practice that is born out of ethical concerns.

So if the worst that could happen when you make your data available
online is that folks at ethics get very worried, then let them. It's
after all what they get paid for. But in the wide majority of cases I
don't see an ethical issue on the horizon. In fact, hiding your data
from public scrutiny seems to me to be the ethical concern that is of
primary importance.

Cheers,
E.J.

On Thu, Jun 27, 2013 at 5:33 PM, Sean Mackinnon

To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframework+unsub...@googlegroups.com.

Fred Hasselman

unread,

Jun 28, 2013, 11:01:59 AM6/28/13

to openscienceframework@googlegroups.com Framework

Hi,

I started a new subject line but am responding to Gustav Nilsonne and Joshua Hartshorne posting on the reporting of multiple experiments in one session, which I take to mean one participant.

In order to not interfere with the primary purpose of this mailing list, I turned this response into a blog-post where discussion can continue, or off-list email is fine as well, should one want to do such a thing of course.

Short summary:

I would question based on power and credibility of observing significant effects in a study whether it is really such a good idea to have different conditions in a single EEG session.

It is likely extremely important to know about all the conditions and measurements that might end up in different publications. More important perhaps for EEG/fMRI studies than for chunked behavioural studies as these will be more likely to probe the questions of a specific group of researchers.

http://anti-ism-ism.blogspot.com/2013/06/truths-glorified-truths-and-statistics_28.html

Best,

Fred

To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframe...@googlegroups.com.

Reply all

Reply to author

Forward