Paper: Global priorities for an effective information basis of biodiversity distributions

83 views
Skip to first unread message

Carrie Seltzer

unread,
Mar 3, 2015, 10:27:55 AM3/3/15
to inatu...@googlegroups.com
I have a Google Scholar alert for "GBIF" to see if iNaturalist data is being used in interesting ways via GBIF. I just found a cool one-- it's a preprint, but it examines gaps and biases in digitally accessible information of species distributions. They focused on relatively well-represented taxa-- mammals, birds, and amphibians. The paper itself isn't too long, but it's got a 55-page supplement, which includes the sources of data (open that up and search for iNaturalist to see how many records they used from each group). 


I've only skimmed this, but it looks like it deserves a close read by anyone interested in accessibility, use, and validation of biodiversity occurrence data.

Enjoy!

Carrie

Scott Loarie

unread,
Mar 3, 2015, 11:23:38 AM3/3/15
to inatu...@googlegroups.com
Thanks Carrie, very interesting article.

I was surprised by how few iNat data they used, (e.g. only 479 amphibians) and then I realized that they pulled data on Oct 2012. I just checked and GBIF now has nearly 12,000 amphibian iNat records. So ~25x increase in about 2.5 years.

While 'cit-sci' sources like iNat still make up a small percentage of these data (except in the case of eBird/birds), I think thats one of the most exciting things about tools like iNat for filling the data gaps described in this paper is its potential to scale faster than a lot of other sources.


--
You received this message because you are subscribed to the Google Groups "iNaturalist" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inaturalist...@googlegroups.com.
To post to this group, send email to inatu...@googlegroups.com.
Visit this group at http://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.



--
--------------------------------------------------
Scott R. Loarie, Ph.D.
Co-director, iNaturalist.org
California Academy of Sciences
55 Music Concourse Dr
San Francisco, CA 94118
--------------------------------------------------

Carrie Seltzer

unread,
Mar 4, 2015, 9:30:51 PM3/4/15
to inatu...@googlegroups.com
Yeah, I thought that number was surprisingly low, too. I just went through the supplement to read more about how they validated and excluded data, and they started with 565 amphibian records from iNat but excluded some. For birds and mammals from iNat, they started with 5325 birds and validated 4684 (88%) and 1018 mammals and validated 833 (81.8%). Not bad percentages!

Highlighting some methodological details from the supplement-- They say they only included "records with potentially sensible geographic coordinates (Longitude: -180° – +180°, Latitude: -90° – +90°) reported with a precision of at least one tenth of a degree. We excluded ... records that did not have either a binomial or trinomial scientific name... We then matched the taxonomies of records and range maps... By validating localities of records against expert-opinion range maps, we ensure that records are biologically plausible and do not refer to zoo or invasive animals outside of their native ranges." 

The data nerd in me really wants to see their validation scripts. Theoretically, someone could use those on the current set of iNat data from GBIF to see how it holds up to those same criteria! It looks like they intend to archive the "validated records and derived datasets," but that's less interesting to me than the validation scripts or even the analysis scripts used to run the regressions and make the visualizations. This is a big, dense project through, and I can only imagine how difficult it would be to organize and document all of those pieces thoroughly enough for someone else to use them. 

I also find it interesting that they did NOT use data from eBird. "...while the big biodiversity data aggregators like GBIF, VertNet, SpeciesLink or eBird provide the infrastructure for linking biodiversity data, they are themselves not responsible for the amount or informational content of the data (this lies with distributed data providers). We therefore excluded data for which the indicated publisher itself is an international data aggregator from the calculation of our index." Unless eBird data can enter GBIF via their country-specific portals (which does not seem to be the case), I don't understand why they excluded eBird and included iNaturalist. In this situation, they seem analogous. This makes me wonder if the authors only used US records from iNaturalist, since they listed its location as USA. This could also explain the low numbers! 

Let's do some math to check this. Right now, research-grade observations (sent to GBIF) are 52% of the total observations. RG observations are about 4.8% mammals, 2.8% amphibians, and 29.4% birds. I happened to add my first iNat observation in October 2012 when they downloaded data for this paper, so let's take the observation ID for that (138626) as the number of iNat observations at the time (this is somewhat of an overestimate since observations get deleted). If we assume that those proportions of RG mammals, amphibians and birds were also true in October 2012, then you'd expect them to start with 3,460 mammal, 2018 amphibian, and 21,194 bird records for this paper. That's a lot of assumptions, but it seems to me that they might have only used US records. This is a great example of why I'd like to see their data cleaning scripts! 

Theoretically, we should be able to see their download from GBIF, but the earliest download viewable is from September 2013. http://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7/activity?offset=30870

This ended up being much longer than I thought. There goes my evening!

Carrie

Charlie Hohn

unread,
Mar 4, 2015, 10:17:33 PM3/4/15
to inatu...@googlegroups.com
well, ebird has no photos or verification protocol. It surely generates a lot of amazing data but I think they rely on quantity, not 'quality' or precision at least. i am guessing that is part of it.
============================
Charlie Hohn
Montpelier, Vermont

Carsten Meyer

unread,
Mar 5, 2015, 3:09:14 AM3/5/15
to inatu...@googlegroups.com
Hi Carrie,

thanks for your interest!

1) The time lag between downloading data and getting this paper out obviously means that some data providers contribute substantially more records now that what we used in our study. The web service that we mention in the paper is meant as a tool for staying up-to-date on data gaps. But I am not sure whether we could easily extend this tool to track data from individual data providers. Seems useful, though.

2) We did NOT exclude eBird data, quite the contrary. eBird data make up most of the data we used (see 'Avian Knowledge Network' in the last Appendix table).

3) In the part where we say that we "excluded data for which the indicated publisher itself is an international data aggregator", we do not refer to the maps of inventory completeness. We created those maps based on all validated data points. In that part we refer to the calculation of an index of "proximity of grid cells to data-contributing research institutions", that we used in the regression analyses. We actually excluded iNaturalist data from that particular index as well. 

4) The validation was not too complicated. We overlaid range map polygons and occurrence records with a 110 km grid. We then simply reduced the species-grid cell combinations created from occurrence records to those that were also created from range maps. But you are right Re: difficulty in organizing all those pieces. 
If you have further questions just send me an email to the address quoted in the paper ;)
 
Cheers,
Carsten

Kent McFarland

unread,
Mar 5, 2015, 7:58:14 AM3/5/15
to inatu...@googlegroups.com
There is a very detailed review process using regional experts and regional filters for EBird Data Charlie. It is not simply a question of overwhelming the noise.  If anyone would like more details on this I would be happy to provide them. I have been involved with this process for over a decade.
Kent


--
____________________________

Kent McFarland
Vermont Center for Ecostudies
PO Box 420 | Norwich, Vermont 05055
802.649.1431 x2


    



Charlie Hohn

unread,
Mar 5, 2015, 8:02:17 AM3/5/15
to inatu...@googlegroups.com
I hope i didn't come off as disparaging eBird! It's great, if there were a plant version I'd probably use it, and I've dabbled in eBird but I don't know enough birds to make it worthwhile. Otherwise it's just different. 

Seems like I was in error anyway as the eBird data WAS considered.

C

Kent McFarland

unread,
Mar 5, 2015, 8:16:06 AM3/5/15
to inatu...@googlegroups.com
Not at all. I was just pointing out in case folks didn't know. 
K

Carrie Seltzer

unread,
Mar 10, 2015, 9:07:50 PM3/10/15
to inatu...@googlegroups.com
Hi Carsten,

Thanks for the detailed reply! I'm sorry it's taken me a few days to respond. Too many daycare snow days in DC last week really got me behind! I'm counting on those being the last of the winter as it now feels like spring.

Thanks for pointing out the nuance in the exclusion of eBird (and iNat) data for that one particular index, not the maps of inventory completeness.

I'd be curious to think about this more if/when you get the data (and scripts?) archived. Thanks again for an interesting paper and your prompt reply!

Carrie

P.S. Hope you'll add your own observations to iNaturalist, too! :-)

You received this message because you are subscribed to a topic in the Google Groups "iNaturalist" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/inaturalist/Jxkp6JfjWK4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to inaturalist...@googlegroups.com.

To post to this group, send email to inatu...@googlegroups.com.
Visit this group at http://groups.google.com/group/inaturalist.
For more options, visit https://groups.google.com/d/optout.



--
Carrie E. Seltzer, Ph.D.
National Geographic's Great Nature Project
Reply all
Reply to author
Forward
0 new messages