Yeah, I thought that number was surprisingly low, too. I just went through the supplement to read more about how they validated and excluded data, and they started with 565 amphibian records from iNat but excluded some. For birds and mammals from iNat, they started with 5325 birds and validated 4684 (88%) and 1018 mammals and validated 833 (81.8%). Not bad percentages!
Highlighting some methodological details from the supplement-- They say they only included "records with potentially
sensible geographic coordinates (Longitude: -180° – +180°,
Latitude: -90° – +90°) reported with a precision of at least
one tenth of a degree. We excluded ... records that
did not have either a binomial or trinomial scientific name... We then matched the taxonomies of records and range
maps... By validating
localities of records against expert-opinion range maps, we
ensure that records are biologically plausible and do not
refer to zoo or invasive animals outside of their native
ranges."
The data nerd in me really wants to see their validation scripts. Theoretically, someone could use those on the current set of iNat data from GBIF to see how it holds up to those same criteria! It looks like they intend to archive the "validated records and derived datasets," but that's less interesting to me than the validation scripts or even the analysis scripts used to run the regressions and make the visualizations. This is a big, dense project through, and I can only imagine how difficult it would be to organize and document all of those pieces thoroughly enough for someone else to use them.
I also find it interesting that they did NOT use data from eBird. "...while the big
biodiversity data aggregators like GBIF, VertNet,
SpeciesLink or eBird provide the infrastructure for linking
biodiversity data, they are themselves not responsible for
the amount or informational content of the data (this lies
with distributed data providers). We therefore excluded
data for which the indicated publisher itself is an
international data aggregator from the calculation of our
index." Unless eBird data can enter GBIF via their country-specific portals (which does not seem to be the case), I don't understand why they excluded eBird and included iNaturalist. In this situation, they seem analogous. This makes me wonder if the authors only used US records from iNaturalist, since they listed its location as USA. This could also explain the low numbers!
Let's do some math to check this. Right now, research-grade observations (sent to GBIF) are 52% of the total observations. RG observations are about 4.8% mammals, 2.8% amphibians, and 29.4% birds. I happened to add my first iNat observation in October 2012 when they downloaded data for this paper, so let's take the observation ID for that (138626) as the number of iNat observations at the time (this is somewhat of an overestimate since observations get deleted). If we assume that those proportions of RG mammals, amphibians and birds were also true in October 2012, then you'd expect them to start with 3,460 mammal, 2018 amphibian, and 21,194 bird records for this paper. That's a lot of assumptions, but it seems to me that they might have only used US records. This is a great example of why I'd like to see their data cleaning scripts!
This ended up being much longer than I thought. There goes my evening!
Carrie