scope of working group

11 views

Skip to first unread message

Arlin Stoltzfus

unread,

Nov 19, 2012, 3:20:51 PM11/19/12

to wg-...@googlegroups.com, Richard Pyle, Gaurav Vaidya, David Shorthouse

Dear all--

Now that the list is working, we can start a discussion. The most important thing to work out is the scope that defines what a working group hopes to accomplish (in 3 to 4 meetings of 10 to 12 people). Those who responded enthusiastically obviously see an opportunity here, but we are coming at this issue from different perspectives, and we may have different ideas about the nature of the opportunity.

I'll start with my own perspective. I think that the long-term vision should be world domination. Google's "did you mean?" service is currently a highly responsive spell-checking TNRS without taxonomic limitations (try "eschericia cola", "homo sipiens", etc, then try these with any other service). Google searches also reveal synonyms indirectly. Relative to the possible future in which google is the uber-TNRS, I think we all want to ensure, instead, that the name-mappings developed by taxonomy providers are the go-to choice for users who want to integrate data via names. This means exposing those resources in a way that is convenient, scalable, and sustainable.

If that is the vision for the future, then the goal of the working group would be to make progress toward that vision, in some way that involves 10 to 12 people meeting for 3 or 4 times. Those of us who initiated this effort tend to see this as a problem of creating a common web services standard for taxonomic name-resolution services, so that clients (including robots and aggregators) can leverage them in a standard way that satisfies their needs and doesn't change much over time. That is a do-able strategy for a working group.

But of course, others may have a different conception, and I welcome discussion on that.

Arlin

-------
Arlin Stoltzfus (ar...@umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850
tel: 240 314 6208; web: www.molevol.org

David Patterson

unread,

Nov 19, 2012, 4:44:37 PM11/19/12

to Arlin Stoltzfus, wg-...@googlegroups.com, Richard Pyle, Gaurav Vaidya, David Shorthouse

I am coming into this a little late, but promise I will catch up. So you have my apologies if I am running a little late.

Firstly, I think Biodiversity Informatics needs one good names management environment, that is open and flexible. It needs to be an environment that interconnects available services and authoritative data sources.

It is my presumption that a significant goal is to interconnect distributed data by using the names. That means we have to overcome problems that there are many names written in various ways for the same species. Then we have to standardise the results in the context of any one of many authoritative taxonomies. These are the challenges that Global Names was set up to address, sitting on top of a 10 year discussion (boringly outlined at globalnames.,org)

As a simple example of interconnecting content, http://escjam2012.shorthouse.net/tweet/587 shows interception of tweets, discover name in tweet, and cross link name to a classification and to some literature.

We are more or less in the position of delivering pretty solid services that will deal with some of the issues that Arlin has identified.

GN has names recognition and discovery tools, so we can run through sources from docs, to pdfs, to text, to html, to images and so on, find the names, and spit them out. This is scalable and is currently being run against the full corpus of Biodiversity Heritage Library.

Once we have the names we can run a variety of services.

There is that one of dealing with variant name strings. We currently apply the Tony Rees / Mike Giddens fuzzy match but should also be looking at other options that are available. In addition, we have already rendered down our reference system of about 22 million name strings to about 7 million groups. This environment also helps to open up 'Did you mean ...?' options.

Alongside that we need to ensure that we cover the homonyms problems., but through collaboration with Tony Rees and IRMNG, we have access to a vast amount of homonymy information, for much but not all of which we can offer taxonomic context that can be used to help in disambiguating homonyms.

I am attaching a copy of our TREE article in which we laid out our approach.

I am sure there will be many more emails from me as I work through myu inbox./

Paddy

--

--
___________________________________
David J Patterson

Senior Scientist, Marine Biological Laboratory
7 MBL Street, Woods Hole, MASS 02543, USA.

Research Professor
School of Life Sciences, Arizona State University
Tempe, AZ 85287-4501

Professor (MBL) Ecology and Evolutionary Biology
Brown University, Providence, Rhode Island

Life Sciences Lead, Data Conservancy dataconservancy.org

globalnames.org