Hi all,
Just got the word this morning that our working group proposal was
declined. The reviews are pasted below.
Regardless of this outcome it would be good to keep in touch and
thinking about these issues, the need hasn't gone away. For example,
I just spoke with two genomics/disease researchers at the recent
Phenotype RCN meeting who both asked the questions we hoped to
address, should I use NCBI "taxonomy", how do I know what is better to
use, what other options there are etc.
Thanks for everyone's help. Selfishly, at minimum the exercise was
extremely useful for me, I learned a lot from everyone (e.g. I'm using
neo4j now for various things), and it was good overall learning
experience with respect to how these things are done.
Cheers,
Matt
----
Review 1
Name standardization and computation is a key to phylogenetic
interoperability and data re-use. This proposal grew out of years of
NESCent-sponsored phyloinformatics interoperability projects, which
are now hosted under an umbrella research group EvoIO (
www.evoio.org).
There have been concrete informatics products and standards grew out
of previous working group meetings and hackathons, including an
XML-based tree format (nexml), an ontology for phylogenetic data
(CDAO), and a workflow standards for querying phylogenetic information
through web (PhyloWS). Different from earlier working groups on
phylogenetic interoperability, this group would focus on a single, and
more fundamental issue of taxonomic name resolution. Three work group
meetings have been planned. The first meeting aims to formulate a new
naming standard (MIRPAT) based on use cases, followed by a meeting to
implement a MIRPAT-based naming service (2nd generation TNRS). The
last meeting aims for documentation and dissemination. While this
project would enhance NESCent’s investment and mission in promoting
Open Source interoperable phylogenetic computing, my main concern is
on a lack of wide adoption of new standards and practices. It would be
helpful if the authors present 1-2 use cases that could excite
evolutionary biologists and (better) phylogenomics researchers in
general.
Rating 1 - Support
Review 2
This proposal's goal is a "unified," "next-generation" solution to the
problem of taxonomic name resolution, which they correctly describe as
arising from a broad and heterogeneous range of needs (use cases). The
authors provide a considerable amount of background and context about
TNRS developments, and the group's leaders and members have notable
experience in the relevant fields of expertise (informatics,
databases, etc.). The novelty of this proposal (at least in terms of
specific ideas) seems to boil down to something they call "MIRPAT
assertions", which appear to be provenance-backed statements about the
properties and relationships of names. It is not at all clear to me
how this will fundamentally advance the objectives of TNRS. I don't
doubt that a new graph database might help with some queries of
importance to EOL, OpenTree, etc. But real worth of such an exercise
lies in the data that are input. There is no point in databasing 22
million name strings without the associated knowledge of how they are
linked together. Knowledge of this kind is the real limiting factor,
and technological innovations in acquiring/harvesting that knowledge
are needed more than the tools proposed for development here.
Rating 2 - Maybe
Review 3
For most groups of organisms nomenclature is not static. It changes
with some regularity. The only thing that stays the same is the
basionym of the type species. Sometimes existing taxa are moved into
different genera, sometimes they are sunk into other species and cease
to 'exist', and other times new taxa are described. The more we learn
about a clade the better we can get the nomenclature to reflect the
evolution of the group. It is complicated by the fact that there are
lots of miss-identifications, especially in groups that have not been
monographed recently. So it is not a simple matter of taking a name
and looking up what genus it is in now. Each name needs some type of
estimation of how likely it is to be correct. There are many ways of
doing this. When was the most recent identification? Does it have
accompanying information (photos, publications, etc.)? Was it used as
a voucher in a publication?, Does it have a 'concept', etc. If this
group could work out how to do this electronically it would be a
brilliant accomplishment and I would be extatic! It would save huge
amounts of time and be well worth the investment. Weaknesses: I liked
this proposal until I got to end and realized I was left hanging. It
does not talk about how to implement it on a broad level. I always ask
four questions: What do you want to do? When will it be done? Who is
going to do the work? Who is going to pay for it? They have covered
these questions for this first bit, but what about scaling it up to
all names? Where are the answers to those questions? They do have one
sentence that says they will have a strategy and that it might include
an NSF proposal but not much else. Maybe this is enough. But, I was
left thinking that if they do a good job it will be much harder than
they seem to think and that finding funding for this might be
impossible. I was part of a group that tried to use a new tool to
provide a concept for every name in a large family (25,000 used names,
ca. 75,000 names in synonomy) and after 4 years we ran out of funds
and it was only 70% finished. It is now static, rapidly becoming out
of date and embarrassing to have around. I think a funding strategy
for the future has to be part of a project from the beginning and it
should include how it will be maintained.
Rating 2 - Maybe
Review 4
Resolving taxonomic names in the modern age is important and useful
for a variety of databases and integrative efforts. There are many
ongoing projects to sort out the mess of taxanomic names, but they are
not currently coordinated. Members of some of those projects have
apparently expressed enthusiasm (need) for this WG. The advantage of
this proposal is that it will pretty clearly produce a deliverable
useful to many large scale database efforts, including others
supported by NESCent. The weakness is that, unlike many other NESCent
proposals, it will not be helping to build a new field of inquiry.
There is a bimodal distribution of rank in the participants and very
few women.
Rating 1 - Support
Decision
Board Score 1.5
Final Decision 3 - Do Not Support
Summary There was broad agreement that the problem is important.
However, there was concern that without expert curation of the names
themselves, the value would be limited to just updating names to the
current versions. Furthermore, taxonomies change so fast, that the
tool will quickly decay unless kept constantly up to date. However,
there is clear value in getting the players from the different efforts
in the same room to help hash out differences and come to a standard.
Although, past experience shows this can be hard as each team would
like the larger group to adopt their solution. It was not entirely
clear from the participant list if all the independent efforts were
represented.
---------- Forwarded message ----------
From: NESCent Proposal Team via RT <
proposal...@nescent.org>
Date: Mon, Mar 4, 2013 at 8:53 AM
Subject: [
help.nescent.org #15310] Working Group: Matthew Yoder -
notification from NESCent about your proposal
To:
diap...@gmail.com
Dear Dr. Matthew Yoder,
We regret to tell you that your Working Group proposal to the National
Evolutionary Synthesis Center (NESCent), Improving data integration
with a unified approach to taxonomic name resolution , has been
declined. As you might expect, we received many more proposals than we
are able to support. Your proposal generated interest and support from
the board, but weaknesses were identified as well.
The individual reviews of your proposal are available on-line at
http://nead.nescent.org. You can use the login name and password you
established when you submitted your proposal. If you have forgotten
your password or have questions about the on-line system, please send
email to our help system (
propos...@nescent.org ).
Feel free to contact me if you have questions about your proposal or
the review and evaluation process. To do so, please reply to this
email rather than directly to me; that will insure that the entire
proposal team (which includes me) is copied on the message.
Thank you for your interest in NESCent. We are always happy to discuss
ways that NESCent may support exciting research in synthetic
evolutionary science, and we look forward to the possibility of
working with you in the future.
Yours,
Allen Rodrigo
Director
a.ro...@nescent.org