"...and though the holes were rather small, they had to count them all..."
A day in the life.
Sargent Peppers
I have been thinking recently about the status of bees in the world. In the media and within the introductions and discussion sections of reports and scientific papers the status of bees usually gets summarized as a story of decline and disappearance. This is true, trivial really, as when you replace plant communities largely maintained by Nature with those largely maintained by man most species disappear and that is just what we have done. At the extreme are lands completely deleted from Earth using concrete, asphalt, and buildings followed by greatly degraded losses due to agriculture, continuously mown landscapes, and the many ways we try to tidy up Nature. There are bees and other creatures that can persist and even thrive in those landscapes but most cannot.
Yet bees do persist. There are many parks, private wildlands, and just places we haven't had time to completely corrupt that clearly retain these less tolerant bee species and the plants they depend upon. So the question of bee decline is both apocalyptic (yes, there is massive species loss in human dominated landforms) and squishy (are all the bee species we had hundreds of years ago still present when we look within the residue of natural landscapes?).
Figuring out the squishy part is really important. We could intellectually foreclose on the human landscape if the bank of species (so to speak) is still ready for withdrawal in our wildish lands.
Is it?
Well, we have no national inventory or monitoring program to consult, but there are now several state efforts and past information on bee status resides in museums (who needs those right?).
Those old records can (and many have been) compiled and can be compared to recent collections (many of which have also been compiled). Several papers and reports have done this.
There are intellectual issues to contend with however. In the perfect world, any set of specimens (old or new) would contain all the bee species present in an area and the numbers collected reflect the real number present or at least the correct ratios of their real commonness. This is at least sort of true. A species that is common has a higher probability of being detected than a rare on for sure. But there are problems in probability land.
"Can you do addition?" the White Queen asked.
"What's one and one and one and one and one and one and one and one and one and one?"
"I don't know," said Alice. "I lost count."
Lewis Carroll - Through the Looking Glass.
OK, well and good. Lets just ignore the inconvenience of probability land for the moment and gather all of the available bee data from the past and present and run a regression line and see how our bees are doing. Done. But does that analysis reflect how bee populations have changed?
Nope.
Probability land is a land of corruption and our regression lines usually do not reflect trends in the true populations because of that corruption. Without correcting, qualifying, and interpreting the results through both a human and bee behavior lens then the answer is a hard no to a straight regression, and its subsequent reporting and conclusions. The problem comes down to probabilities. Those darn probabilities of detection for a species and their shifts over time are a corrupting force in our search for answers to "are bees in trouble".
So, this deserves some explanation. Below are some of the cloaking factors that I can think of that impact the relationship between specimens in a database and the true number of bees out there in the wild. In other words alterations to any of these factors will skew bee counts away from reflecting the real population of bees and therefore impact our conclusions about how bees are doing. We want our dataset on bees to reflect changes in bee populations over time not some unknownable combination of changes in time of bee populations and changes in the probabilities of detecting those bees.
I believe these are some of the primary factors:
Time: If you spend more time (days/hours) trying to catch bees you will catch more, spend less time and you catch fewer (I almost said "less", a grievous grammatical error that sadly is no longer being enforced).
Date: If you change the dates you go out looking for bees you will get different species and counts of bees.
Technique: Catching bees with a net, malaise trap, bowl trap, vane trap etc. favors the capture of some bees and unfavors the capture of others.
Experience: An experienced person will catch more bees and different species of bees than an inexperienced person, this is most obvious when comparing netting results. And. Even experienced people differ in how they approach the capture of bees and each collector will favor the capture of different species of bees and numbers of those species depending on their proclivities.
Retention: This one is not often thought about, for sure. If you either avoid capturing certain species (do people really capture every honey and bumble bee they see?) in the field or pitch them after you have brought them back to your sorting table (do I really need another Augochlorella aurata in my collection?) or never identify them (e.g., I still don't identify most Lasioglossum males in the Dialictus group to species....) you impact the resulting "counts" that would be used to calculate change.
Taxonomy and Identification: Through time species are lumped together as well as split into new species. In some cases this means that identifications in the literature or a database can't be safely ascribed to species prior to analysis and must be dropped or lumped into "groups".
Location: Bee species are not distributed evenly across any state, county, city or even within a single field. Some bee species are primarily found in fields, some woods, some beaches, some follow rivers, some only on mountain tops, and a few reach peak abundance in urban areas ... you get the point. One would want your long-term dataset to sample evenly across these habitats, or at least some subset consistently. Usually, if you do it right, you get close to this if you have a true monitoring program. But again, we don't have any monitoring programs for bees out there. We have a collection of data points (occurrence data) collected for all sorts of reasons, using all sorts of techniques, on different dates, and different places by different people. If, for example, long ago people sampled mostly near towns but recent people sampled throughout the state, comparisons may require restricting the comparative area to only those that are sampled consistently. If in time one, people sampled in woodlands and time two agriculture fields what would snapping a trend line through those data points tell you?
In the United States, where we have both old and new data inhabiting our databases, all these factors come to play and our interpretation of status and change becomes tricky. It is easy enough to simply pooh pooh (what is the proper spelling here of pooh pooh?) any such analysis and walk away, but I think there is an analysis path forward that provides insight into how our bees are doing. Or. At least provides grounds for hypotheses, targets for data collection efforts, and conservative lists of species of concern.
Much of my musing here comes from our regional work to document the bees of Maryland and the District of Columbia. We have plenty of old and new data for the roughly 450 species of bees found so far in Maryland and DC. We have enough data now to document what the common bees are, but a veil starts descending as the recent records for individual species become fewer and fewer and we see that some species were found in the old days but are not found now.
Our goal for what follows is to see if we can glean understanding (broad or narrow) regarding changes in bee populations in the Washington D.C. area using the data available.
The Three Subregions
Consistent historic data on bee species in Maryland and Washington D.C. (DC) only exist for DC and the Maryland counties of Montgomery (MOCO) and Prince George's (PG). The reason we have a lot of past data for these area is due to the extensive presence of government and private collectors going back to the late 1800s. It is certainly one of the best collected regions in the country.
Three sources of data were used.
GBIF: This dataset represents specimens recorded in Global Biodiversity Information Facility's (GBIF) database from the year 2000 and earlier.
BIML: This dataset represents specimens found in the USGS/FWS Bee Lab (BIML) database from 2001 until present.
Other: Additional data are available from many other sources (local collectors, iNaturalist, literature, University Collections) and due to their problems (detection probabilities you know) are used in a limited way (but important!).
What would appear to be the most potentially informative information in these datasets are comparisons between GBIF and BIML, but only using the subset of netting data from BIML. Other data are also surprisingly useful and brought into the discussion to illuminate species that are/were present but not detected within the GBIF/BIML nettiverse.
Descriptions of the Datasets
GBIF: Bee data were downloaded from the GBIF website. Data with collection dates after 2000 were discarded. The remaining data represent bee species found by netting (it is possible there was some malaise samples in there somewhere, but we have no evidence for that). It turns out that many bees were collected in the DC area but ended up being deposited in collections from around the country. Many collectors were involved. There were 87 different years with data and every decade was represented.
Where as, as you will see, a lot of DC region data are available in GBIF, the Smithsonian's Natural History Collection (NMNH) is under-represented. Only bumble bees at NMNH have been databased. Databasing of the rest of the collection is in the works, but nothing more is available at this point.
While we know that there are bee specimens at NMNH that are not in the GBIF database we also can safely presume there to be AWOL specimens in collections around the country. In addition to the fact that some historic data will not be in the database we have to keep in mind that rare species were largely kept by the old collectors and common species often either passed over in the field or discarded prior to pinning.
BIML: Bee data found in the BIML database come from many sources. Much of the data can be tagged directly to Bee Lab activities but many other groups in collaboration with the Bee Lab have their data deposited in BIML after BIML staff have looked over the identifications. A diverse set of techniques are also represented (e.g., netting, bowl traps, glycol traps, vane traps, bucket traps, malaise traps). All specimens captured were identified to species and entered into the database, though some data still await identifications. This is quite the contrast with the GBIF data which, as noted, would be highly biased towards rare species. On the positive side the netting data would have been collected in approximately the same way as that in GBIF. When netting bees the netter is basically using their understanding of patterns of occurrence and floral association to target bee-important flowering vegetation. The Bee Lab and associates never had a specific project to survey the bees of Maryland and DC, but they would devote time to collecting in different regions of the state when time permitted during the workweek and on weekends and holidays.
One way to help diminish the problem of bias in counts between these two time periods is to use the number of days a species was collected rather than the total number of bees collected. Recall that most bees in the GBIF dataset would have been discarded but all bees in the BIML dataset were kept, though when netting only a sample of the bumble bees, honey bees, and carpenter bees taken for practical reasons. An assumption with the use of days rather than counts is that common bees will still show up on many days in both datasets reflecting their commonness (though, as will be explained later, there are clear problems with this assumption for some species groups) with rarer bees will having even less sampling bias. I will use only the number of days a species was collected in subsequent analyses.
The patterns of sampling dates are are listed below (note "trapping dates" refers to all the data that are collecting using something other than a net in the BIML data set):
DC GBIF 75 Netting Dates
DC BIML 85 Netting Dates
DC BIML 196 Trapping Dates
MOCO GBIF 336 Netting Dates
MOCO BIML 30 Netting Dates
MOCO BIML 75 Trapping Dates
PG GBIF 171 Netting Dates
PG BIML 277 Netting Dates
PG BIML 585 Trapping dates
Recall that GBIF data represent over 100 years of sampling and that BIML data only 25 years. The two different time lengths are not a problem per se, but interpretations of the result have to keep them in mind and the greater year range in the pooled historic data muffles what are likely some interesting changes. Turns out that things change over a 100 years and the present analysis hides those changes, but then again it also helps flattens some of the rise and fall of population numbers when using the entire time period. I felt the GBIF data were sparse enough that dividing the data into more time periods would be a problem and a complication, so I leave it to a more clever person to come up with a better approach (if you are that clever person, I would be happy to give you all our data, oh, and even if you are not a clever person I am still glad to give you our data).
Ok, a quick inspection of the number of netting dates for each subregion shows sometimes sharp differences in the number of sampling dates involved. No surprise. But. While these differences obviously (in addition to several other factors) preclude direct comparisons if we involve our numerical friends: ratio and proportion some of the problems are diminished. We will be using our friends in detail in a later section.
This is a good lead into talking about where sampling has taken place in the subregions over the years. Two major regional factors contribute to changes to the distribution of bees in the region during the last 125+ years. These are an increase in human density and the concomitant increase in urban environments created by those humans and a shift in the residual natural areas from open landscapes towards a more wooded environments with greater maturity the trees in those forests.
Historically, early collectors used the trolly networks in the region to go collecting. Their targets appeared to be the residual farms and open country of eastern Washington D.C. and the habitat along the Anacostia and Potomac Rivers. The Anacostia originally contained extensive freshwater tidal marshes (most of which became landfills at some point) and the Potomac has largely remained the same except that the area has become much more heavily wooded. One of these trolly systems ran along the Potomac to Glen Echo Park (an early Amusement Park) in Montgomery County. As can be seen by label information, this trolley was clearly used by naturalists on their collection trips to the Chain Bridge flats in Washington D.C., Plummer's Island (home to the Washington Biologist's Field Club) just outside of Washington D.C., and the Glen Echo area. The landscape of all three of these areas originally contained extensive areas of scrub and open landscapes, while most but not all, are now heavily forested (Chain Bridge Flats is still scrubby due to flood scour by the Potomac River on its low rocky shores). In 1910 the USDA Beltsville Agricultural Research Center was established in Prince George's County and in 1936 Patuxent Wildlife Research Center was established and also became centers for collecting.
Similar to early naturalists, recent collections by those associated with the USGS Bee Lab sought out natural areas in the region. The Bee Lab was originally located at Patuxent Wildlife Research Center (now the USGS Eastern Ecological Research Center), was moved to the adjacent USDA Beltsville Agriculture Center, and then moved back to Patuxent. I collected extensively near my home along the Patuxent River near Upper Marlboro as well as at the Bee Lab locations. Collecting occurred regularly throughout Washington D.C. by myself and others but less commonly so in Montgomery County compared to the past. Traps were often used during the past 25 years (856 dates among the 3 subregions) much more so than netting (392 subregion/dates) surpassing even the the historic netting efforts (582 subregion/dates). Evaluation of changes to populations will concentrate on comparisons of netting data, but trapping information as well as records from the literature will be used to gain perspective on the netting results.
(The next section will have results using a comparison of species lists)