Re: [seabird-data-dev] Data accreditation - please weigh in your opinions

8 views
Skip to first unread message

Scott A Hatch

unread,
Apr 1, 2011, 1:00:47 PM4/1/11
to seabird-...@googlegroups.com, seabi...@googlegroups.com, Grant Humphries, seabird-...@googlegroups.com

I think there is a certain inevitability to the idea of digital accounting for data usage in science.  As unpackaged data come to be freely shared via the Internet (while continuing to being packaged, and remaining largely inaccessible, in journal articles), some practitioners initially, and most or all eventually, will find value in receiving credit for this new mode of scientific contribution.  For data users, this need not be any more onerous than the currently accepted practice of citing our literary sources when we write a journal article.  Whether scientists have any responsibility to share their data in openly accessible databases is a separate issue, of course.  I believe they do (increasingly so, as the rapidly evolving online world takes shape), and moreover, that this has huge (positive) implications for both fundamental and applied sciences, conservation of the biosphere being a compelling case in point.

Best,

Scott



From: Heather...@fws.gov
To: seabird-...@googlegroups.com
Cc: Grant Humphries <humphri...@gmail.com>, seabird-...@googlegroups.com
Date: 03/30/2011 04:22 PM
Subject: Re: [seabird-data-dev] Data accreditation - please weigh in your opinions
Sent by: seabird-...@googlegroups.com






I 100% agree with Tony


Heather M. Renner
Wildlife Biologist - Bering Sea Unit
Alaska Maritime NWR
95 Sterling Highway, Suite 1
Homer, AK 99603
phone - (907) 226-4623
fax - (907) 235-7783


"Diamond, Tony" <dia...@unb.ca>
Sent by: seabird-...@googlegroups.com

03/30/2011 02:51 PM

To
seabird-...@googlegroups.com, Grant Humphries <humphri...@gmail.com>
cc
seabird-data-dev <seabird-...@googlegroups.com>
Subject
Re: [seabird-data-dev] Data accreditation - please weigh in your opinions







Hmm. I'm not at all sure about this. Yet another metric to attempt to  
measure the essentially qualitative. I have grave reservations about  
it, and think our time would be much better spent in other ways  
(actual seabird conservation, for example).

Tony Diamond

Quoting Grant Humphries <humphri...@gmail.com>:

> Hello everyone,
>
> As part of seabirds.net, I believe it would be of great benefit to
> create a data accreditation system that would acknowledge people's
> contributions to databases, as well as the amount of usage that their
> data gets.
>
> What I am suggesting is through relational databases and via PETREL
> (the world seabird personnel database OR  PErsonnel TRacker and E
> List), we develop and accept a scheme where the amount of data that
> one uploads, and the amount of times their data is downloaded is taken
> into account, and corrected for by the number of years that a
> researcher has been collected data (which would standardize the
> scale).  This would create a P-index (PETREL index), which I believe
> would create prestige for those members of the community who have
> dedicated their lives to collecting valuable data.
>
> What are your opinions of such a system?
> If a system like this were put in place, do you believe it would be
> accepted by the general seabird community?
> Do you think that this would act as an index that one could put on
> their CV which would generate prestige?
>
> If we could discuss this, we could move forward and make a data
> accreditation scheme that would benefit everyone
>
> Cheers
> Grant Humphries
>
> --
> You received this message because you are subscribed to the Google  
> Groups "seabird-data-dev" group.
> To post to this group, send email to seabird-...@googlegroups.com.
> To unsubscribe from this group, send email to  
> seabird-data-d...@googlegroups.com.
> For more options, visit this group at  
>
http://groups.google.com/group/seabird-data-dev?hl=en.
>
>



A.W. Diamond, Ph.D.
Research Professor, Wildlife Ecology
University of New Brunswick
P.O. Box 4400
Fredericton, NB
Canada E3B 5A3
Phone: (506)453-5006 (a.m.), -4926 (p.m.)
Fax: (506) 453-3583

http://www.unb.ca/web/acwern/index.html


--
You received this message because you are subscribed to the Google Groups "seabird-data-dev" group.
To post to this group, send email to seabird-...@googlegroups.com.
To unsubscribe from this group, send email to seabird-data-d...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/seabird-data-dev?hl=en.

--
You received this message because you are subscribed to the Google Groups "seabird-data-dev" group.
To post to this group, send email to seabird-...@googlegroups.com.
To unsubscribe from this group, send email to seabird-data-d...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/seabird-data-dev?hl=en.

Tony [NCR] Gaston

unread,
Apr 1, 2011, 2:31:55 PM4/1/11
to seabi...@googlegroups.com
Mark Hipfner and I had a lengthy discussion the other day relating to the issue of data sharing. As a general principle, Mark felt that either you posted all the raw data that you have on a particular topic or you post nothing, because any intermediate stage would not provide the flexibility for future investigators to make full use of the data. I personally felt a little different. Every data set collected in the field contains elements of different quality and significance and it is very hard for any researcher unfamilar with the site and the species to fully comprehend the potential biases and pitfalls inherent in a particular dataset and unique to a particular locality. Consequently, I feel that some tidying up by the actual data collector ("packaging" as Scott puts it) is inevitable between the field notebook (or increasingly the field logger) and the data set consigned for general use on the web.
 
A few examples:
We measure Thick-billed Murre eggs annually at Coats Island and observe dates of hatching for the same sample. We measure maybe 100 eggs, but maybe only 70 hatch. To avoid the inclusion of replacement eggs we can select only those hatched within the first 14 days of the hatching period. This also ensures that most eggs are laid by experienced breeders, as first time breeders lay towards the end of the laying period. Such a sample improves our power to detect year-to-year variation in environmental effects on laying and egg size, because the sample is more consistent (for age/experience and lack of replacements). However, it is not "all the raw data" because that would include the entire egg sample. To provide summary data for the PSMDB we generally make selections such as those described and calculate our annual datum (mean egg volume index) on the reduced sample. The procedure is described in the notes. The other option is to provide all the data but to add all the foregoing caveats as an annotation - the problem with that approach is that people may not read or understand the annotations. On the other hand if they don't read our notes on sample selection it wont hurt inter-year analyses.
 
At Reef Island we check burrow attendance by putting up knock-down sticks at burrow entrances. There are effects of date, date relative to median laying and time from clutch completion on the frequency of knock-downs. It looks as if knock-down frequency is a good indicator of inter-year variation in feeding conditions. However, to make these comparisons we need to control for the timing variables. Because we do not manage to be in the field for the whole laying and incubation period we need to use just that sample of burrows where date of hatching is known and of course we do not know that until the end because some clutches never hatch. Then we need to control for the other variables. When we have done that we can estimate an adjusted knock-down frequency that should be good for inter-year comparisons (eg via PSMDB).
 
In both these instances the researcher is not only contributing data, but is also providing species-specific and local expertise and a certain level of analysis (in the case of the knock-downs). 
 
On the other hand, we read the temperature daily at both camps. The max/min data need no selection or analysis: they come from the thermometer good to go.
 
What is clear is that the researcher may, and for many purposes should, contribute more than "just" raw data. I am not sure how or whether it is possible to capture this contribution in any appraisal of the value contributed.
 
On the more general point of whether an accounting system for data contributions is useful, I think the answer has to be yes, but only if it can be provided very easily and inexpensively. Given the amazing shower of statistics provided by every online entity, I believe that the tools are available to allow such an accounting with minimal expenditure of money or effort. All we need is someone savvy enough to figure out how to do it.
 
Question for Grant: why do we need to correct by number of years?
Tony Gaston

Tony [NCR] Gaston

unread,
Apr 1, 2011, 2:44:58 PM4/1/11
to seabi...@googlegroups.com
 
In fact, in an age of distributed data it may not be necessary to maintain giant centralized repositories. At the National Snow and Ice Data Center (NSIDC) in Boulder, Colorado, researchers are working on a concept called 'data casting'. The technology would be similar to an RSS news feed and individual researchers could publish data anywhere they wanted to, explains Mark Parsons, lead program manager at NSIDC. Then they would publish a 'feed' on their website that advertised the availability of the data along with keywords describing it. All of the feeds would be tracked and indexed by feed aggregators, providing a central location where people could search for data they might be interested in. “We're saying that you should expose your data in a way that anybody can access and aggregate it,” Parsons says.

Neylon thinks that another good policy change would be to treat well-presented data as a publication in its own right. The policy would encourage people to spend the time it takes to present the data clearly and completely. Moreover, it would recognize that putting together a good data set can be a valuable and even creative scientific accomplishment in its own right. In 2008 a service called DataCite (http://datacite.org) began to provide Digital Object Identifiers (DOIs) for data, making it easy to cite and locate. The service is sponsored by an international collection of institutions including the British Library and the German National Library of Science and Technology. Also, in 2009 the open-access publisher Copernicus Publications started a peer-reviewed data-only journal called Earth System Science Data. Papers consist of a description of the data — what it is, where and how it was collected, and other information — along with a link to the publicly available data.


On Fri, Apr 1, 2011 at 1:00 PM, Scott A Hatch <sha...@usgs.gov> wrote:

Scott A Hatch

unread,
Apr 1, 2011, 3:08:43 PM4/1/11
to seabi...@googlegroups.com, seabird-...@googlegroups.com

Tony makes excellent points.  I would only clarify that I have used the terms "data packaging" and "unpackaged data" to distinguish between data that are published in traditional peer-reviewed papers (the "packaged" version) and data that are distributed via a mechanism such as the PSMD (i.e., a sort of telegraphic, "unpackaged" delivery of the data, sans all the verbal trimmings and interpretation that normally come with a journal article).  I fully agree though that in many (perhaps most) cases, the process of capturing data in shared databases requires the investigator to do a little work to create datasets readily usable by others (i.e., without having to deal with all the, to them, intractable details).  Essentially, users are looking for the bottom-line values, such as would be presented in a graph or table of a journal article, but they want/need direct access to the values, and to be able to collate and re-analyze quality data from many sources.  It's my belief that with a little effort (judicious use of comment fields for time series and yearly observations in the PSMD, for example, a modicum of information on sampling design, standard errors where available, etc) data contributors can make their life-work available and valuable to peers, both current and future, in a manner that may far surpass their impact in the shorter-term (through conventional publication alone).

Scott

 


From: "Tony [NCR] Gaston" <tonygast...@gmail.com>
To: seabi...@googlegroups.com
Date: 04/01/2011 10:31 AM
Subject:
Re: [seabird-data-dev] Data accreditation - please weigh in your opinions
Sent by: seabi...@googlegroups.com


Scott A Hatch

unread,
Apr 1, 2011, 3:22:23 PM4/1/11
to seabird-...@googlegroups.com, seabi...@googlegroups.com

More good points. The key, I think, is to develop and collectively adopt a data definition language suitable to our needs (i.e., as seabird researchers, in the present discussion).  I've referred to this as "Seabird Research Mark-up Language", but "Seabird Data Mark-up Language" (SDML) might be more intelligible.  Once there is a convention for sharing data documents that have a fully-defined, universally understood, underlying schema that is used by everyone for "packaging" their data (there's that term again, but in a somewhat different context here), then it is true that the distinction between centralized and decentralized data warehousing pretty much goes away.  There have been attempts to provide mark-up language to serve broad communities (too broad in my view), such as EML (Ecological Mark-up Language).  These mark-up languages can be translated (by computers) one to another, but for usability sake it makes sense for disciplines (such as seabird science) to do some work to create their own data-sharing/mark-up language.  Not difficult, and the approach is definitely becoming the mode of the day.  It's the only way to go in the Internet age.

Scott

P.S.  As an example of seabird data mark-up language, the schema represented by the Pacific (or World) Seabird Monitoring Database essentially means the job is already done (with minor evolutionary changes still occurring) for the case of seabird monitoring data.


From: "Tony [NCR] Gaston" <tonygast...@gmail.com>
To: seabi...@googlegroups.com
Date: 04/01/2011 11:05 AM
Subject:
Re: [seabird-data-dev] Data accreditation - please weigh in your opinions
Sent by: seabi...@googlegroups.com


Grant Humphries

unread,
Apr 3, 2011, 10:37:13 PM4/3/11
to Seabirds.net
I agree very much with what you're saying Tony.

in answer to your question:

If one an index like I am suggesting would be created, it would need
to be an index that puts all researchers on the same "level". Let
me give you an example to try and illustrate (because I'm still trying
to work the kinks out of this myself).

user A collected 2 "datasets" (what defines a dataset i.e. by year/
subject/species has yet to be defined) and puts those into the
databases. This user's first instance of data was for 2008, the next
dataset was for 2009 - making this user someone who has contributed
data for 3 years (since 2008).

user B collected 15 "datasets", and the first "dataset" was for 1989
(making this user have 21 years of "data collection experience).


lets say user A has 2 very good, important "datasets" that are
downloaded frequently I.E. 2 datasets X 500 downloads / 2 years = P
index of 500

user B has 15 moderate "datasets" that are downloaded somewhat
frequently. I.E. 15 datasets X 300 downloads / 22 years = P index of
~ 204

this way user A has a higher P index, despite being a less
"experienced" researcher, but by having "more important" or "more
utilized" data.

If you didn't correct by years and just did something like 15 X 300 or
2 X 500, then the more experienced researcher would have a higher
value just based solely on the fact that they have more "datasets",
which would give young researchers a disadvantage.

Hope that clarifies - Obviously there is a lot of discussion to occur
around this - i.e. What defines a dataset? How does one determine the
number of years a researcher has been collecting data (i.e. if an
experienced researcher has data from 1989, but doesn't submit anything
prior to 1999?) etc.. etc...

Cheers
grant







214





On Apr 2, 6:31 am, "Tony [NCR] Gaston" <tonygastoncons...@gmail.com>
wrote:
> >   From: Heather_Ren...@fws.gov To: seabird-...@googlegroups.com Cc: Grant
> > Humphries <humphries.gr...@gmail.com>, seabird-...@googlegroups.com
> > Date: 03/30/2011 04:22 PM Subject: Re: [seabird-data-dev] Data
> > accreditation - please weigh in your opinions Sent by:
> > seabird-...@googlegroups.com
> > ------------------------------
>
> > I 100% agree with Tony
>
> > Heather M. Renner
> > Wildlife Biologist - Bering Sea Unit
> > Alaska Maritime NWR
> > 95 Sterling Highway, Suite 1
> > Homer, AK 99603
> > phone - (907) 226-4623
> > fax - (907) 235-7783
>
> >   *"Diamond, Tony" <diam...@unb.ca>*
> > Sent by: seabird-...@googlegroups.com
>
> > 03/30/2011 02:51 PM
> >   Please respond to
> > seabird-...@googlegroups.com
>
> >    To
> > seabird-...@googlegroups.com, Grant Humphries <
> > humphries.gr...@gmail.com>
> > cc
> > seabird-data-dev <seabird-...@googlegroups.com>
> > Subject
> > Re: [seabird-data-dev] Data accreditation - please weigh in your opinions
>
> > Hmm. I'm not at all sure about this. Yet another metric to attempt to
> > measure the essentially qualitative. I have grave reservations about
> > it, and think our time would be much better spent in other ways
> > (actual seabird conservation, for example).
>
> > Tony Diamond
>
> > > *http://groups.google.com/group/seabird-data-dev?hl=en.
>
> > A.W. Diamond, Ph.D.
> > Research Professor, Wildlife Ecology
> > University of New Brunswick
> > P.O. Box 4400
> > Fredericton, NB
> > Canada E3B 5A3
> > Phone: (506)453-5006 (a.m.), -4926 (p.m.)
> > Fax: (506) 453-3583*<http://groups.google.com/group/seabird-data-dev?hl=en>
> > *
> > **http://www.unb.ca/web/acwern/index.html
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "seabird-data-dev" group.
> > To post to this group, send email to seabird-...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > seabird-data-d...@googlegroups.com.
> > For more options, visit this group at *<http://www.unb.ca/web/acwern/index.html>
> > *http://groups.google.com/group/seabird-data-dev?hl=en.
> > * <http://groups.google.com/group/seabird-data-dev?hl=en>
Reply all
Reply to author
Forward
0 new messages