Hi all --
Some really good discussion about the constraints on data sharing here. There is a useful distinction to make between: (a) what are good and defensible practices for data sharing in the various disciplines that have unusual constraints (e.g., video, identifiable human subject data, massive or difficult to interpret raw data), and (b) what are the criteria for earning an "open data" badge. The first is a very worthwhile discussion, and there are many relevant points here about it. But, some of it may be drifting away from the purpose of badges.
Badges do not define the boundaries of good practice, they certify that a particular practice was followed. Consider, for example, organic foods. To earn a "badge" as organic food, the food must be grown and prepared following particular guidelines. If those guidelines are not met, then it doesn't get the badge. Further, it is likely harder to meet those criteria for some types of food compared to others. But, not getting the badge does not mean that the food is not okay, or that the "inorganic" factors in that circumstance were ill-advised. The organic label just certifies that the food meets the guidelines.
This applies to the open data badge similarly. There are lots of good reasons that data cannot be open in various circumstances. That will not go away. Researchers working with human participant data will often have more difficulty meeting an open data standard than other researchers. That is simply a fact of competing ethics - the ethic of openness for data and the ethic of privacy for research participants.
It is important, however, that the badge have as clear a meaning as possible. If the meaning of "organic" is highly labile across foods, then it will fail to mean anything. So, our task for badges is to define what does "open data" mean. The challenge is that this needs to be specific enough to have some shared meaning, but flexible "just enough" to be broadly applicable.
Focusing just on open data, here is the criteria that we have presently:
The Open Data badge is earned for making available the digitally-shareable data necessary to reproduce the reported results. Criteria: (1) data are publicly available on an institutional repository (e.g., university repository or find a repository at http://www.re3data.org/), and (2) codebook included with sufficient description for an independent researcher to reproduce the reported analyses and results. If data are from a larger dataset than reported, then only the data needed to reproduce the reported results are necessary for badge eligibility (e.g., other data might be embargoed for future use).
Leon brought up an excellent use case to consider against this definition. What if an organization is willing to share, but has a review policy that will provide access for "all reasonable requests from qualified researchers". Should this count as achieving "Open Data"?
Pro: Widens the definition to include groups that are willing to share but have some minor restrictions legally, ethically, functionally, or because they want to retain *some* degree of control
Cons: Researcher is retaining some control leaving unknown the extent to which it is truly open (ex. who decides who is a "qualified researcher"? I have witnessed people use this criterion to decline sharing to critics because they are judged, by the target, to be not qualified). Difficult to identify the boundary between open and not in this scenario (ex. Could anyone say "I am willing to share" and earn the badge without actually making it open?).
Perhaps we can identify other circumstances to pit against the criteria to decide whether they are robust - but keeping in mind that the goal of the badge is to certify "Open Data" not "defensible, reasonable, ethical practice". In fact, might violate ethics by making data open that should not be.
A few other comments have come up in discussion that may require some revision/clarification in the proposals. Separately, I'll try to propose some resolutions for the group to consider.