Data accreditation - please weigh in your opinions

6 views
Skip to first unread message

Grant Humphries

unread,
Mar 30, 2011, 6:44:34 PM3/30/11
to seabird-data-dev
Hello everyone,

As part of seabirds.net, I believe it would be of great benefit to
create a data accreditation system that would acknowledge people's
contributions to databases, as well as the amount of usage that their
data gets.

What I am suggesting is through relational databases and via PETREL
(the world seabird personnel database OR PErsonnel TRacker and E
List), we develop and accept a scheme where the amount of data that
one uploads, and the amount of times their data is downloaded is taken
into account, and corrected for by the number of years that a
researcher has been collected data (which would standardize the
scale). This would create a P-index (PETREL index), which I believe
would create prestige for those members of the community who have
dedicated their lives to collecting valuable data.

What are your opinions of such a system?
If a system like this were put in place, do you believe it would be
accepted by the general seabird community?
Do you think that this would act as an index that one could put on
their CV which would generate prestige?

If we could discuss this, we could move forward and make a data
accreditation scheme that would benefit everyone

Cheers
Grant Humphries

Diamond, Tony

unread,
Mar 30, 2011, 6:51:44 PM3/30/11
to seabird-...@googlegroups.com, Grant Humphries, seabird-data-dev
Hmm. I'm not at all sure about this. Yet another metric to attempt to
measure the essentially qualitative. I have grave reservations about
it, and think our time would be much better spent in other ways
(actual seabird conservation, for example).

Tony Diamond

Quoting Grant Humphries <humphri...@gmail.com>:

> --
> You received this message because you are subscribed to the Google
> Groups "seabird-data-dev" group.
> To post to this group, send email to seabird-...@googlegroups.com.
> To unsubscribe from this group, send email to
> seabird-data-d...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/seabird-data-dev?hl=en.
>
>

A.W. Diamond, Ph.D.
Research Professor, Wildlife Ecology
University of New Brunswick
P.O. Box 4400
Fredericton, NB
Canada E3B 5A3
Phone: (506)453-5006 (a.m.), -4926 (p.m.)
Fax: (506) 453-3583
http://www.unb.ca/web/acwern/index.html


Adrian Gall

unread,
Mar 30, 2011, 7:02:22 PM3/30/11
to seabird-...@googlegroups.com
How often one's data is downloaded doesn't seem to me as important a question as what is done with those data, how they are analyzed and presented, and what scientific, conservation, and management goals are accomplished with their use.


--
You received this message because you are subscribed to the Google Groups "seabird-data-dev" group.
To post to this group, send email to seabird-...@googlegroups.com.
To unsubscribe from this group, send email to seabird-data-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/seabird-data-dev?hl=en.




--
~:~:~:~:~:~:~:~:~:~:~:~:~
Adrian Gall
Senior Scientist
ABR, Inc. - Environmental Research & Services
P.O. Box 80410
Fairbanks, AK 99708-0410
(PH) 907-455-6777 xt 125
(FAX) 907-455-6781
 
www.abrinc.com

Heather...@fws.gov

unread,
Mar 30, 2011, 8:21:58 PM3/30/11
to seabird-...@googlegroups.com, Grant Humphries, seabird-...@googlegroups.com

I 100% agree with Tony

Heather M. Renner
Wildlife Biologist - Bering Sea Unit
Alaska Maritime NWR
95 Sterling Highway, Suite 1
Homer, AK 99603
phone - (907) 226-4623
fax - (907) 235-7783



"Diamond, Tony" <dia...@unb.ca>
Sent by: seabird-...@googlegroups.com

03/30/2011 02:51 PM


To
seabird-...@googlegroups.com, Grant Humphries <humphri...@gmail.com>
cc
seabird-data-dev <seabird-...@googlegroups.com>
Subject
Re: [seabird-data-dev] Data accreditation - please weigh in your opinions


Grant Humphries

unread,
Mar 30, 2011, 9:20:19 PM3/30/11
to seabird-...@googlegroups.com
I see where you're coming from.  
I like to think of this as a way to encourage people to submit data to the larger databases. One of the biggest issues I seem to come across when talking about data sharing is the lack of credit to the people who collect all that data.  
I believe that if there is some form of accreditation, then people will be more willing to submit data - which in the larger picture, directly benefits conservation and study of seabirds by making available valuable data that others can look at with a critical and fresh eye.   

So yes, it's another metric, but I think there are benefits to the individuals involved, and will encourage data sharing, and will go on to encourage creative uses of seabird data, directly benefiting conservation in the end.

Cheers
Grant 

Scott A Hatch

unread,
Apr 1, 2011, 1:00:47 PM4/1/11
to seabird-...@googlegroups.com, seabi...@googlegroups.com, Grant Humphries, seabird-...@googlegroups.com

I think there is a certain inevitability to the idea of digital accounting for data usage in science.  As unpackaged data come to be freely shared via the Internet (while continuing to being packaged, and remaining largely inaccessible, in journal articles), some practitioners initially, and most or all eventually, will find value in receiving credit for this new mode of scientific contribution.  For data users, this need not be any more onerous than the currently accepted practice of citing our literary sources when we write a journal article.  Whether scientists have any responsibility to share their data in openly accessible databases is a separate issue, of course.  I believe they do (increasingly so, as the rapidly evolving online world takes shape), and moreover, that this has huge (positive) implications for both fundamental and applied sciences, conservation of the biosphere being a compelling case in point.

Best,

Scott



From: Heather...@fws.gov
To: seabird-...@googlegroups.com
Cc:

Date: 03/30/2011 04:22 PM
Subject:

Re: [seabird-data-dev] Data accreditation - please weigh in your opinions

Sent by: seabird-...@googlegroups.com


Scott A Hatch

unread,
Apr 1, 2011, 3:08:43 PM4/1/11
to seabi...@googlegroups.com, seabird-...@googlegroups.com

Tony makes excellent points.  I would only clarify that I have used the terms "data packaging" and "unpackaged data" to distinguish between data that are published in traditional peer-reviewed papers (the "packaged" version) and data that are distributed via a mechanism such as the PSMD (i.e., a sort of telegraphic, "unpackaged" delivery of the data, sans all the verbal trimmings and interpretation that normally come with a journal article).  I fully agree though that in many (perhaps most) cases, the process of capturing data in shared databases requires the investigator to do a little work to create datasets readily usable by others (i.e., without having to deal with all the, to them, intractable details).  Essentially, users are looking for the bottom-line values, such as would be presented in a graph or table of a journal article, but they want/need direct access to the values, and to be able to collate and re-analyze quality data from many sources.  It's my belief that with a little effort (judicious use of comment fields for time series and yearly observations in the PSMD, for example, a modicum of information on sampling design, standard errors where available, etc) data contributors can make their life-work available and valuable to peers, both current and future, in a manner that may far surpass their impact in the shorter-term (through conventional publication alone).

Scott

 


From: "Tony [NCR] Gaston" <tonygast...@gmail.com>
To: seabi...@googlegroups.com
Date: 04/01/2011 10:31 AM
Subject:
Re: [seabird-data-dev] Data accreditation - please weigh in your opinions
Sent by: seabi...@googlegroups.com





Mark Hipfner and I had a lengthy discussion the other day relating to the issue of data sharing. As a general principle, Mark felt that either you posted all the raw data that you have on a particular topic or you post nothing, because any intermediate stage would not provide the flexibility for future investigators to make full use of the data. I personally felt a little different. Every data set collected in the field contains elements of different quality and significance and it is very hard for any researcher unfamilar with the site and the species to fully comprehend the potential biases and pitfalls inherent in a particular dataset and unique to a particular locality. Consequently, I feel that some tidying up by the actual data collector ("packaging" as Scott puts it) is inevitable between the field notebook (or increasingly the field logger) and the data set consigned for general use on the web.
 
A few examples:
We measure Thick-billed Murre eggs annually at Coats Island and observe dates of hatching for the same sample. We measure maybe 100 eggs, but maybe only 70 hatch. To avoid the inclusion of replacement eggs we can select only those hatched within the first 14 days of the hatching period. This also ensures that most eggs are laid by experienced breeders, as first time breeders lay towards the end of the laying period. Such a sample improves our power to detect year-to-year variation in environmental effects on laying and egg size, because the sample is more consistent (for age/experience and lack of replacements). However, it is not "all the raw data" because that would include the entire egg sample. To provide summary data for the PSMDB we generally make selections such as those described and calculate our annual datum (mean egg volume index) on the reduced sample. The procedure is described in the notes. The other option is to provide all the data but to add all the foregoing caveats as an annotation - the problem with that approach is that people may not read or understand the annotations. On the other hand if they don't read our notes on sample selection it wont hurt inter-year analyses.
 
At Reef Island we check burrow attendance by putting up knock-down sticks at burrow entrances. There are effects of date, date relative to median laying and time from clutch completion on the frequency of knock-downs. It looks as if knock-down frequency is a good indicator of inter-year variation in feeding conditions. However, to make these comparisons we need to control for the timing variables. Because we do not manage to be in the field for the whole laying and incubation period we need to use just that sample of burrows where date of hatching is known and of course we do not know that until the end because some clutches never hatch. Then we need to control for the other variables. When we have done that we can estimate an adjusted knock-down frequency that should be good for inter-year comparisons (eg via PSMDB).
 
In both these instances the researcher is not only contributing data, but is also providing species-specific and local expertise and a certain level of analysis (in the case of the knock-downs). 
 
On the other hand, we read the temperature daily at both camps. The max/min data need no selection or analysis: they come from the thermometer good to go.
 
What is clear is that the researcher may, and for many purposes should, contribute more than "just" raw data. I am not sure how or whether it is possible to capture this contribution in any appraisal of the value contributed.
 
On the more general point of whether an accounting system for data contributions is useful, I think the answer has to be yes, but only if it can be provided very easily and inexpensively. Given the amazing shower of statistics provided by every online entity, I believe that the tools are available to allow such an accounting with minimal expenditure of money or effort. All we need is someone savvy enough to figure out how to do it.
 
Question for Grant: why do we need to correct by number of years?
Tony Gaston

Scott A Hatch

unread,
Apr 1, 2011, 3:22:23 PM4/1/11
to seabird-...@googlegroups.com, seabi...@googlegroups.com

More good points. The key, I think, is to develop and collectively adopt a data definition language suitable to our needs (i.e., as seabird researchers, in the present discussion).  I've referred to this as "Seabird Research Mark-up Language", but "Seabird Data Mark-up Language" (SDML) might be more intelligible.  Once there is a convention for sharing data documents that have a fully-defined, universally understood, underlying schema that is used by everyone for "packaging" their data (there's that term again, but in a somewhat different context here), then it is true that the distinction between centralized and decentralized data warehousing pretty much goes away.  There have been attempts to provide mark-up language to serve broad communities (too broad in my view), such as EML (Ecological Mark-up Language).  These mark-up languages can be translated (by computers) one to another, but for usability sake it makes sense for disciplines (such as seabird science) to do some work to create their own data-sharing/mark-up language.  Not difficult, and the approach is definitely becoming the mode of the day.  It's the only way to go in the Internet age.

Scott

P.S.  As an example of seabird data mark-up language, the schema represented by the Pacific (or World) Seabird Monitoring Database essentially means the job is already done (with minor evolutionary changes still occurring) for the case of seabird monitoring data.


From: "Tony [NCR] Gaston" <tonygast...@gmail.com>
To: seabi...@googlegroups.com
Date: 04/01/2011 11:05 AM
Subject:
Re: [seabird-data-dev] Data accreditation - please weigh in your opinions
Sent by: seabi...@googlegroups.com





More fuel for discussion, from http://www.nature.com/nclimate/journal/v1/n1/full/nclimate1057.html
 
In fact, in an age of distributed data it may not be necessary to maintain giant centralized repositories. At the National Snow and Ice Data Center (NSIDC) in Boulder, Colorado, researchers are working on a concept called 'data casting'. The technology would be similar to an RSS news feed and individual researchers could publish data anywhere they wanted to, explains Mark Parsons, lead program manager at NSIDC. Then they would publish a 'feed' on their website that advertised the availability of the data along with keywords describing it. All of the feeds would be tracked and indexed by feed aggregators, providing a central location where people could search for data they might be interested in. “We're saying that you should expose your data in a way that anybody can access and aggregate it,” Parsons says.

Neylon thinks that another good policy change would be to treat well-presented data as a publication in its own right. The policy would encourage people to spend the time it takes to present the data clearly and completely. Moreover, it would recognize that putting together a good data set can be a valuable and even creative scientific accomplishment in its own right. In 2008 a service called DataCite (http://datacite.org) began to provide Digital Object Identifiers (DOIs) for data, making it easy to cite and locate. The service is sponsored by an international collection of institutions including the British Library and the German National Library of Science and Technology. Also, in 2009 the open-access publisher Copernicus Publications started a peer-reviewed data-only journal called Earth System Science Data. Papers consist of a description of the data — what it is, where and how it was collected, and other information — along with a link to the publicly available data.


On Fri, Apr 1, 2011 at 1:00 PM, Scott A Hatch <sha...@usgs.gov> wrote:

Falk Huettmann

unread,
Apr 3, 2011, 7:13:26 PM4/3/11
to seabird-...@googlegroups.com, Grant Humphries
 
Hi,
 
1. I agree with Tony D.
 
2. As with many of these seabird data discussions I have seen so far, the issue of a DATA QUALITY label (=accreditation) is already considered for years and adressed in GBIF, with CODATA-NSF, Conservation Commons etc. (=No need to re-invent it for seabirds, their data and the seabird community as such).
These bodies suggest for instance using peer-review, and a label FIT FOR USE, e.g. as an extra column.
Myself, in such cases I would always promote the use of Metadata to document the data and their quality
(this will allow the user to decide and for making best decisions using the data, or not).
You might find the recent papers about online data qualities of interest,
e.g. with Genbank (some state c. 20% of serious problems there). For GBIF data, I hear people speaking about 60% error (taxonomy, geo-referencing etc).
 
3. Re data credit and tracking things for citations, this is basically already resolved using a DOI (Digital Object Identifier). The Ecological Society journal is now allowing for such DATA PUBLICATIONs, similar to publishing a traditional paper. Also, in Germany with AWI, they run an international journal where peer-reviewed data and metadata get published (Earth System Science Data ESSD journal; IPY is basically supporting it).
I simply suggest the seabird journals do the same, or link up with them.
 
4. Re the obsession with GETTING CREDIT and COPYRIGHTs: Yes, this is important for general authorship, but should always be less the driver per se, and for global data sharing. It should not hold us up sharing data and should not reach vanity levels, specifically if governments and public money is involved (=mandate to service the public and in best public interest).
First of all, and for global data sharing I care about PROGRESS in sustainability using latest data, less about whether my data can be tracked throughout all details.
As long as I have a paper in the international peer-reviewed literature about this data set, and things show online in GBIF or OBIS-Seamap, and with metadata, we should have covered the basics. 
 
Of course, we should always try to be better, and push GBIF etc forward.
I am a big fan to have seabird data being the global lead (but we are still far from it.
I still see many digital divides and digital culture problems to overcome first)
 
Very best
      F.
 
On Wed, Mar 30, 2011 at 2:44 PM, Grant Humphries <humphri...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages