Thank you! I applaud your efforts and the rest of the seabird community who are thoughtfully crafting this new paradigm of the internet-enabled scientific data commons. I appreciate your patience while I absorbed your papers and the Seabirds.net materials.
I agree with your suggestion about mediating this discussion through a list-serve which can archive responses, allow people to opt in/out, store reference files, and communally edit documents. You all should've just received an invitation to a Google group where we can continue discussion by emailing seabird-...@googlegroups.com.
Seabird Data Development Group
Below are some initial comments on Scott's papers and the Seabirds.net initiative from my perspective working with the OBIS-SEAMAP (aka SEAMAP) marine mammal, bird and turtle geo-archive for the last 8 years at the Duke Marine Geospatial Ecology Lab. In summary, SEAMAP could be useful for serving telemetry and survey data to Seabirds.net as well as porting a subset upstream to biodiversity portals OBIS and GBIF.
For the intrepid of you willing to brave some geek speak, read on...
Metadata Formats and Federated Portals. As you point out, the ecological metadata language (EML) is extremely flexible but does put a large onous on the data provider. I fully agree with your objective of “shielding seabird researchers from much of the arcane and confusing detail that attends data-sharing protocols” (p. 13 Hatch in draft). At SEAMAP, we consider as another value-added service the creation of metadata documents, which feed into larger portals to aid in data discovery. Spatial and temporal extents, along with full taxonomic hierarchies of observed species, are automatically generated for various metadata formats. Another value-added carrot is cascading the basic observational data up to OBIS, and in turn GBIF. We also agree that these biodiversity portals, historically based on museum specimens (and limited Darwin Core schema), are insufficient for many of the conservation and management applications of the seabird community, which warrants the additional data storage and functionality.
XML Schema. A community schema such as the seabird research markup language (SRML) as outlined in Hatch (2010) provides an excellent starting point for new projects and worthwhile migration challenge for existing ones. Getting existing legacy databases converted/imported into this schema could be a valuable exercise, which could result in some handy scripts to offer the technically inclined. Also providing template databases for a variety of common relational database platforms (MS Access, SQL Server, PostgreSQL, MySQL) should ease the process for those starting.
Motivating Tools. The ability to readily use tools for common analysis can be a big motivator for conforming to a data schema. For instance, the ArcHydro data model has become popular amongst network hydrologists for its common stream network analytical functions. But the ArcMarine data model turned out to be a bit too unwieldy for the marine community which was too broad. The seabird community seems the proper tractable size to build common analytical tools based on a community schema. A related push from the EcoInformatics community has been to use scientific workflows, ie a drag-and-drop canvas with which to string input data to analytical tools - a repeatable analysis with visually intuitative wire diagram that doesn't requiring programming. The open-source eco community uses Kepler, and the ESRI community uses Model Builder. We've been working on many tools within ESRI for the creation of the Marine Geospatial Ecology Tools (MGET). The more tools readily available to produce commonly desired analysis that run off this schema, the more worthwhile the migration process becomes to the data providers.
Common Conservtion Goal/Application: CBD High Seas MPAs 2012. The Convention on Biological Diversity presents a mandate and opportunity to protect areas of importance on the high seas (ie outside EEZs) by 2012. Even if data providers are reticent about sharing, to withhold data for this initiative could equate to allowing your study species to miss out on protection. David Hyrenbach shared a quote from marine spatial planning: "If you're not at the table, you're on the menu." The urgent need for contributions from the seabird community towards this effort, as Scott Shafer pointed out, could be the greatest boon and common goal for the Seabirds.net initiative. For more on the Global Ocean Biodiversity Initiative, see gobi.org. (Another consortia model for gathering conservation-critical data is the North Atlantic Right Whale Consortium.)
Raw Data vs Derived Products. I am still a bit unclear as to the difference between the colony register (WSCR) and the monitoring database (WSMD). So the colony register sounds like a derived product with annual estimates of abundance, while the monitoring database provides the underlying raw observational data. This may simply be a matter of semantics, but the distinction between raw observational data and derived products is important and seems commonly confused. We wouldn't want to lose the opportunity to archive raw data now that would enable alternative analysis in the future. For instance, pelagic survey data may or may not use birds in flight and different detection parameters to come up with estimates of density. While you might expect to filter out “bad” telemetry points based on speed-distance-angle filters, these can now be used with Bayesian state-space filters to further inform a more accurate track. Tracking the provenance of the data in subsequent analysis is also important for not duplicating observation (eg raw telemetry + filtered track) in later analysis. A global view of the status of seabird abundance and distribution (including colony counts, pelagic surveys and telemetry data) would make for an excellent derived end-product for Seabirds.net. Since pointing out SeaTurtleStatus.org which does this annually for sea turtles in their SWOT reports, I noticed a similar BirdLife state of the world's birds effort.
Telemetry and Pelagics Database. These two databases were left out of the in the original Hatch (2010) article. Based on the OBIS-SEAMAP experience, a widely usable schema should not be too difficult to formulate. Andy Webb gave a broad outline of techniques used and sample database schema used by ESAS, which could provide an excellent starting point for the pelagics. ESAS is currently archived at SEAMAP here. Scott Shaffer and others at TOPP along with the BirdLife presented on telemetry and have much experience in the field. An SEAMAP example of showing a telemetry dataset with filtering is here. Additional functionality at SEAMAP is the time-synchronous display of environmental data, and animation of telemetry tracks. Visualization of telemetry dive profile data and automatic conversion of "occurrence" survey data to density are still to be desired.
Scientific Attribution. I am impressed with the level of detailed thought given to the all-important topic of crediting contributors. Lifescience identifiers (LSID; Page, 2008), such as urn:lsid:seabirds.net:people:benjamindbest.1 offer a good method for assigning globally unique identifiers to people, projects, and datasets. If you're going to assign these "accountable data units", you might consider attributing all the way to the individual observation / collection rather than aggregating up to the annual summary or project study (Table 3, Hatch in draft). Then you could attribute people at higher levels, such as project and survey / trip. This may actually make certain datasets easier to manage. Consider the eBird database. Without record-level attribution, credit to the individual citizen scientists would be lost. Citizen science should be promoted wherever possible in conservation for increasing monitoring capacity and spreading awareness. Finally, the linkage from the individual person to the credited product is not quite clear enough to me. The database would have to include a reference to all products, whether peer-reviewed literature (with a DOI) or "grey" literature such as an EIS report. Maybe this was already intended in the Publications section of the Seabirds.net sitemap. You could additionally make these publications searchable by taxa and space using additional attribution, similar to our little marine wildlife behavior database. Providing a spatial-species bibliography could become a powerful tool for finding existing data, whether on trophics or telemetry, and identifying gaps to prioritize future proposals.
Portal Interoperability.
SEAMAP is committed to being interoperable with Seabirds.net. By way
of analogy to the sea turtle community, SEAMAP provides the underlying nesting database for SWOT, and serves datasets using the SeaTurtle.org satellite telemetry and analysis tool (STAT).
We should be able to similarly provide bidirectional functionality
with Seabirds.net. We could, for instance as a start, provide to the
Seabirds.net metadata portal a land-clipped minimum convex polygon and
basic metadata with URL back to each SEAMAP dataset. For promoting
awareness of datasets across other portals, MoveBank.org suggested
that it could show a centroid dot on its mapper indicating a remote
dataset with a popup and link. SeabirdTracking.org could provide images
(spatially projected, low-resolution, semi-transparent for map overlay)
on many of their datasets. SeabirdTracking.org highlighted access
control to various levels of telemetry dataset abstractions: raw,
filtered, 50% density kernels, tracklines and low-resolution jpeg.
Amongst portals, we do thankfully seem to be using converging on the
same backend open-source software stack (OpenGeo: Postgres, PostGIS,
GeoServer...). This is the same software stack as our group, especially Ei Fujioka, used to redesign OBIS which now has long-term commitments from IOC.
Hi Ben,
FYI, I'm attaching a couple files of powerpoint slides, with accompanying presentation notes. These are two presentations I gave on the first day of the conference, and which I think you may have missed (along with most others receiving this message). They are both relevant to the ongoing discussion of seabird databases.
Some colleagues have wondered aloud whether I advocate a competing exercise or duplication of effort vis-a-vis broad-scale data-sharing initiatives like OBIS-Seamap, GBIF, or others. The answer is definitely not, but rather would hope to hasten the day when large amounts of seabird data are available to such systems. For that to happen (in anything but a highly piecemeal and inconsistent manner), it is necessary for seabird data contributors to get themselves better organized (and to become more accepting of the new paradigm in general). If we can get a lightweight XML-based markup language adopted, and use it to establish internally-consistent seabird databases (5 of them, distributed or centralized, it doesn't matter, along the lines I have outlined--though not necessarily all at once, nor incapable of standing alone), then making all this information available through OBIS-Seamap, the Avian Knowledge Network, etc., will be a simple and natural step (a 'no-brainer' so-to-speak). Interim products are (or will be) a given, but a systematic approach and coordinated response by the seabird research profession is vital to the desired outcome, in my view.
Cheers,
Scott
From: Ben Best <bdb...@gmail.com>To: Falk Huettmann <fhuet...@alaska.edu>, Scott A Hatch <sha...@usgs.gov>, Grant Humphries <grhum...@alaska.edu>, John Croxall <John.C...@birdlife.org> Cc: Patrick Halpin <pha...@duke.edu>, Ei Fujioka <efuj...@duke.edu> Date: 09/19/2010 08:56 PM Subject: Re: Seabird databases
Hi Scott, Falk, John and Grant,
Thank you very kindly for all your thoughts and materials on this exciting frontier of rallying the seabird data community. I'm very impressed with the broad level of interest, forward thinking leadership and solid demonstrations. The seabird community could well become an excellent role model for other taxonomic groups.
There is great opportunity to realize tractable goals within a specified community using tailored data models - something of a goldilocks approach relativizing flexibility with usability.
I'm going to spend a bit more time reading the papers, giving this a deep think and conferring with others at OBIS-SEAMAP/MGEL before fully responding.
More soon, Ben
~~~~
Ben Best
PhD Student
Marine Geospatial Ecology Lab
http://mgel.nicholas.duke.edu
+1.805.323.6237
On Fri, Sep 17, 2010 at 6:37 PM, Falk Huettmann <fhuet...@alaska.edu> wrote:
(feel free to distribute to your colleagues as needed)
Hi,
this is a great discussion to have, and generally I like the way this is going, e.g.
building something better than EML, if so.
Anyways, for now, it's easier to work from a global platform, and then switch, rather
then to invent something entirely new, and what nobody has experience with, or has adopted even and globally!!
My concern is that virtually all "XMLs" are very bad for huge data amounts (too much tagging stuff), how would SRML deal with it ?
For now, we are just into Metadata, not ?
If you bring in data + metadata, it's ABCD with TAPIR. Anyways, ABCD is NOT ISO-compliant, and not adopted globally and by GBIF.
So I think there are only three institutions that have real experience with EML:
NCEAS, MMI http://marinemetadata.org/ and GBIF. They must be asked for experience.
Please keep me posted, and informed re. progress and questions.
Yours
F.
On Fri, Sep 17, 2010 at 5:06 PM, Scott A Hatch <sha...@usgs.gov> wrote:
Hi Ben (et al.),
I attach a copy of the latest version of a manuscript I have submitted for publication in Marine Ornithology. This is a follow-up to an earlier one (which you already have, and which is available online at MO).
Since returning from the conference, I've been taking a closer look at EML (Ecological Metadata Language). I had taken at least a cursory look at it previously, and concluded it was a remarkably cumbersome way of packaging seabird data, something that seemed to me could be a considerably less onerous job than is suggested by the EML model. At some point during the conference, a colleague commented to the effect that, "It seems pretty much settled that we will all use EML for seabird metadata."--though I doubt whether the colleague nor, I would guess, 99% of other seabird researchers have, as yet, actually checked into EML to gain a clear idea of what it is and how it's used.
I think it's convenient shorthand to refer to this whole topic as the problem of "data packaging." When I access seabird data on the Internet, I want to receive a package that bundles data and metadata together in a lightweight XML file, for which the elements are intuitive (i.e., pretty much instantly meaningful to a seabird practitioner, who will have gathered similar data him/herself), clearly defined, and limited to the information actually pertinent to the data at hand (i.e., unencumbered by a lot of extraneous terminology). So I promote the idea of creating a Seabird Research Markup Language that is designed to do an excellent job of just that. Such a language (and associated schemas) can be translated into anything designed to be more generic (e.g., EML) as needed (perhaps especially to support wider "discovery" of data online), but it seems sensible to get people into the mode of packaging and transmitting their data between computers on the Web by adopting a relatively simple language they can easily relate to (because it has been tailored to the kind of work they do). I would characterize this general strategy for data sharing as a "bottom-up" approach, as opposed to the relatively "top-down" approach embodied in EML (or CSDGM, Content Standard for Digital Geospatial Metadata) and similar initiatives. Once peoples' data are captured in at least some version of an XML-structured format, it is possible for others to write the translations from one format to another. As I say, people should be confronted initially with something they can easily comprehend and integrate directly into their personal data management systems if desired. They can then let the digital geeks worry about migrating the metadata upward through increasingly generic and arcane systems of metadata.
Besides promoting the development of SRML, this paper also explores some possible mechanisms for digital accounting and crediting the use of data. I'll be interested to see if any such ideas catch on over the next decade or so. I think this could be the key to getting people over what is perhaps the greatest barrier to data sharing.
All the best,
Scott
P.S. I promise not to keep using this ad hoc email list for general discussion purposes (this may be the last time). In the wake of the conference, a group of enthusiasts (thus far consisting of Grant Humphries, Peter Kappes, and Michelle Kappes, who are looking for others to join their efforts) are showing a lot of initiative and ideas for enhancing communication through Seabirds.net. One great idea I've heard would be to set up a forum on Seabirds.net to deal (recursively, so to speak) with the development of Seabirds.net itself. Probably the single most valuable thing we can do initially is to get everyone in our profession to account for their existence on Seabirds.net by creating a record in a world directory of seabird personnel. Hopefully, everyone receiving this message will do that (as soon as the functionality is available), and will encourage all their colleagues in seabird research and conservation to do so as well.