Re: ABI proposal for phyloinformatics

6 views
Skip to first unread message

Jim Leebens-Mack

unread,
Jun 7, 2011, 1:30:30 PM6/7/11
to Karen Cranston, Arlin Stoltzfus, MIAPA, phy...@googlegroups.com, TreeBASE devel
Hi Folks,

The "pitch" - ABI Development: Towards a comprehensive, community-owned and sustainable repository of reusable phylogenetic knowledge - now in the googledocs document looks good. I think this would be a very ambitious project, but there is clearly need for resources to improve access to phylogenetic knowledge.

I am a bit concerned about the tight connection between TB and ToLWeb that is outlined in the pitch. Are folks intending to re-engineer BOTH TB and ToLWeb? In my mind, ToLWeb would be just one of many platforms from which folks may want to delve into the phylogenetic knowledge that could be accessed in TreeBASE. I understand the desire to use ToLWeb as resource for integrating data in TreeBASE relating to species relationships, but wonder if this connection could be better presented as an example or test-case for interoperability with TB rather than a primary feature of the proposal.

Along these lines, I would think the AVAToL panel/ideas lab participants would be very enthusiastic about development of ToLWeb as an integrator of phylogenetic trees/knowledge obtained from TreeBASE or any other source. Let's hope some of us have a chance to discuss this in August!

Karen,

Will you be sending the pitch to Reed and asking for feedback? I suspect he will want to talk about some of the technical points being discussed in the googledoc in order to be certain that this is an ABI development proposal. My sense from our last discussion is that development project proposals should include pretty watertight plans for software/database engineering and testing.

Bests,
Jim

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Jim Leebens-Mack
Department of Plant Biology
University of Georgia
Athens, GA 30602-7271

Phone: 706-583-5573
Fax: 706-542-1805
email: jleebe...@plantbio.uga.edu
url: http://www.plantbio.uga.edu/~jleebensmack/JLMmain.html
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


----- Original Message -----
From: Karen Cranston
[mailto:karen.c...@nescent.org]
To: Arlin Stoltzfus
[mailto:ar...@umd.edu]
Cc: MIAPA [mailto:miapa-...@googlegroups.com],
phy...@googlegroups.com, TreeBASE devel
[mailto:Treebas...@lists.sourceforge.net]
Sent: Mon, 06 Jun 2011
10:04:48 -0400
Subject: Re: ABI proposal for phyloinformatics


> There are several pitches now in the Google doc, with a fair bit of
> overlap between them. I am willing to consolidate into a single page
> and send to NSF (Reed?) and see what he has to say about the various
> components. It seems like these components are:
> 1. some level of re-engineering of TreeBASE
> 2. further development of MIAPA, with annotation tools and TreeBASE
> integration
> 3. use of ToLWeb as a crowd sourcing and data synthesis platform
> 4. NeXML refinement and development
>
> I don't think this one-pager needs to capture all of the ideas and
> details we currently have, but instead give a general sense of what we
> are proposing and if all / some of these ideas is potentially
> fundable.
>
> Everyone in agreement? I will post the single page in the doc later today.
>
> Karen
>
> On Fri, Jun 3, 2011 at 3:38 PM, Arlin Stoltzfus <ar...@umd.edu> wrote:
> > Today is the deadline for our 1-page synopsis to pitch to an NSF program
> > officer (before going further).   Currently we seem to have 3 pitches.
>  It
> > is time now for some energetic person to consolidate this, so that we can
> > move ahead.
> >
> > Arlin
> >
> > On May 31, 2011, at 12:19 PM, Karen Cranston wrote:
> >
> >> Tomorrow morning (Wed, June 1) looks to be good for everyone, and
> >> sooner seems better than later. I propose we talk at 9:00 am EST. I
> >> will send connection information later today.
> >>
> >> Cheers,
> >> Karen
> >>
> >> On Thu, May 26, 2011 at 3:00 PM, Karen Cranston
> >> <karen.c...@nescent.org> wrote:
> >>>
> >>> There has been some interest among various groups in an ABI proposal
> >>> for development of phyloinformatics resources. This email is an
> >>> attempt to connect those threads and move the process forward. The
> >>> conversations that have been happening up to this point are:
> >>>
> >>> 1. The Phyloinformatics Research Foundation (phylofoundation.org,
> >>> stewards of TreeBASE and ToLWeb) started a Google doc aimed at
> >>> TreeBASE
> >>> 2. MIAPA developers started a wiki page
> >>> (https://www.nescent.org/sites/evoio/NSF_ABI_2011), recognizing the
> >>> need for coordination with TreeBASE and other resources
> >>> 3. NESCent (Todd, Hilmar and myself), as the current TreeBASE host and
> >>> as a third party interested in coordinated development across
> >>> resources started a third document (now added to the already mentioned
> >>> Google doc)
> >>>
> >>> If you are interested in this discussion and do not already have
> >>> access to the Google doc entitled TreeBASE_ABI.doc, let me know and I
> >>> can grant you access. Hilmar and I made some substantial edits earlier
> >>> this morning. I point you specifically to the section at the end
> >>> entitled "An attempt to re-think all of this". Briefly, we wanted to
> >>> encourage some radical thinking and explore the idea of developing a
> >>> PhyloCommons that incorporates both TreeBASE and ToLWeb into the
> >>> proposal (as the data repository and the data sharing / dissemination
> >>> / synthesis platform, respectively).
> >>>
> >>> The ABI deadline is July 7, so we have a short period of time to pull
> >>> this together. Here is a link to a Doodle poll for an initial
> >>> teleconference.
> >>>
> >>> http://doodle.com/zf2tz7sftyk3naxy
> >>>
> >>> During this meeting, we hope to come to agreement on the broad
> >>> direction of the grant, identify possible leaders of the various
> >>> components and create a plan for getting this pulled together in time
> >>> for the deadline. Please feel free to continue the conversation on the
> >>> Google doc between now and the teleconference. If there are others who
> >>> you think should be invited, feel free to do so. Not everyone who
> >>> participates in this first phase will end up being named on the grant,
> >>> but these resources require input from a much larger group.
> >>>
> >>> Cheers,
> >>> Karen
> >>>
> >>>
> >>> --
> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> Karen Cranston
> >>> Training Coordinator and Informatics Project Manager
> >>> nescent.org
> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>>
> >>
> >>
> >>
> >> --
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> Karen Cranston
> >> Training Coordinator and Informatics Project Manager
> >> nescent.org
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups "MIAPA" group.
> >> For more options, visit this group at
> >> http://groups.google.com/group/miapa-discuss?hl=en
> >
> > -------
> > Arlin Stoltzfus (ar...@umd.edu)
> > Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
> > IBBR, 9600 Gudelsky Drive, Rockville, MD
> > tel: 240 314 6208; web: www.molevol.org
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "MIAPA" group.
> > For more options, visit this group at
> > http://groups.google.com/group/miapa-discuss?hl=en
> >
>
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Karen Cranston
> Training Coordinator and Informatics Project Manager
> nescent.org
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> --
> You received this message because you are subscribed to the Google
> Groups "MIAPA" group.
> For more options, visit this group at
> http://groups.google.com/group/miapa-discuss?hl=en
>

Hilmar Lapp

unread,
Jun 7, 2011, 1:48:38 PM6/7/11
to Jim Leebens-Mack, Karen Cranston, Arlin Stoltzfus, MIAPA, phy...@googlegroups.com, TreeBASE devel
Jim:

On Jun 7, 2011, at 1:30 PM, Jim Leebens-Mack wrote:

> I am a bit concerned about the tight connection between TB and
> ToLWeb that is outlined in the pitch. Are folks intending to re-
> engineer BOTH TB and ToLWeb? In my mind, ToLWeb would be just one
> of many platforms from which folks may want to delve into the
> phylogenetic knowledge that could be accessed in TreeBASE.


Good points, and indeed what I had in mind technically(*). The way I
am envisioning this to be implemented is indeed using technologies
(HTTP/REST APIs, canonical resolvable identifiers, RDF) that allow
very loose coupling. I left that out as I thought the tech soup
shouldn't be in there, but I agree what's missing now is the notion
that this will be achieved through loose coupling.

So as for reengineering, the idea is to engineer (or reengineer where
that's necessary) components for *both* systems that allow that loose
coupling in a way that achieves the stated goals. Components that need
not change to achieve this would not be touched.

-hilmar

(*) Socially (as opposed to technically), I think ToLWeb can, and
should, play a much more important role in enhancing TreeBASE content
in the sense of turning data into knowledge than other platforms that
we would enable here. Perhaps a useful analogy to think about is
Genbank as the the raw sequence data repository and NCBI Gene (and its
predecessor LocusLink) as well as RefSeq as resources that attempt to
turn this into curated knowledge.
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :
===========================================================

Karen Cranston

unread,
Jun 7, 2011, 1:53:08 PM6/7/11
to Jim Leebens-Mack, Arlin Stoltzfus, MIAPA, phy...@googlegroups.com, TreeBASE devel
> Karen,
>
> Will you be sending the pitch to Reed and asking for feedback?  I suspect he will want to talk about some of the technical points being discussed in the googledoc in order to be certain that this is an ABI development proposal.   My sense from our last discussion is that development project proposals should include pretty watertight plans for software/database engineering and testing.

I am putting together two pitches of ~1 page each to send to NSF. One
is the grand ToLWeb + TreeBASE version, and the second is the TreeBASE
and MIAPA-focused ideas that you, Bill and Rutger submitted (I am
merging these into a single doc). This way, we can get a sense of
which version is more likely to be viewed favourably by the panel and
program officers. Working madly, hoping to send this ASAP.

Karen

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karen Cranston, PhD

Karen Cranston

unread,
Jun 7, 2011, 2:26:34 PM6/7/11
to Jim Leebens-Mack, Arlin Stoltzfus, MIAPA, phy...@googlegroups.com, TreeBASE devel
There is now a link to the two pitches in the Google Doc. Please
review, particularly pitch A, where I have merged ideas from several
authors (and hopefully have not misrepresented anyone too poorly).

On Tue, Jun 7, 2011 at 1:53 PM, Karen Cranston

William Piel

unread,
Jun 7, 2011, 3:20:04 PM6/7/11
to Hilmar Lapp, Jim Leebens-Mack, Karen Cranston, Arlin Stoltzfus, MIAPA, phy...@googlegroups.com, TreeBASE devel

Yeah, since ToLWeb provides an XML dump of the whole tree, from the TreeBASE side a cross-walk with ToLWeb (or mechanisms to benefit from the ToLWeb skeleton for enhancing the searching of TreeBASE) could be done without *any* modifications of ToLWeb (or at least we could provide ToLWeb with clade-level query links that they can add to each clade page if they choose to). Likewise, we will no-doubt want to provide higher-name-sensitive queries using NCBI and ITS (etc) without requiring permission to change their code (to say the least!).

But clearly we want something tighter than just consuming ToLWeb's XML -- and we want ToLWeb to benefit as much as TreeBASE. But in that case, I think some sort of tap on David Maddison and Andrew Lenards's shoulder is needed (David is on the phylorf mailing list, but is he following this?). I gather that ToLWeb code is not yet Open Source...

bp


On Jun 7, 2011, at 1:48 PM, Hilmar Lapp wrote:

> Jim:
>
> On Jun 7, 2011, at 1:30 PM, Jim Leebens-Mack wrote:
>

>> I am a bit concerned about the tight connection between TB and ToLWeb that is outlined in the pitch. Are folks intending to re-engineer BOTH TB and ToLWeb? In my mind, ToLWeb would be just one of many platforms from which folks may want to delve into the phylogenetic knowledge that could be accessed in TreeBASE.

Karen Cranston

unread,
Jun 7, 2011, 3:35:06 PM6/7/11
to phy...@googlegroups.com, Hilmar Lapp, Jim Leebens-Mack, Arlin Stoltzfus, MIAPA, TreeBASE devel
I was thinking that ToLWeb would provide a framework for the community
engagement / annotation aspects outlined in the original list of
TreeBASE "problems". This may include metadata annotations (from
ToLWeb -> TreeBASE), as well as possibilities for synthesizing
published phylogenies into the tree of life (TreeBASE -> ToLWeb).
Hopefully, closer integration would also prevent parallel development
of similar search, browse or visualization interfaces in each
resource.

I will ping David Maddison about this discussion.

Karen

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karen Cranston, PhD

Jim Leebens-Mack

unread,
Jun 7, 2011, 4:08:55 PM6/7/11
to Karen Cranston, Arlin Stoltzfus, MIAPA, phy...@googlegroups.com, TreeBASE devel
I would only change the second paragraph of specific objective 1:

- change "a future-proof" to "an extensible" 
- change "TreeBASE III will to ingest the large and complex..." to "TreeBASE III will be designed to ingest the large and complex..."


Thanks!
Jim



To remedy this situation, we envision an extensible TreeBASE III that is designed, architected, and implemented using modern information science-informed paradigms for engineering robust, highly efficient, cloud-ready online digital information repositories. This re-engineering also extends to the existing interfaces for data deposition, query and retrieval. TreeBASE III will to ingest the large and complex phylogenetic data packages produced by current research, add new search tools through integration with external services such as for taxon name resolution, georeferencing or BLAST and support large-scale data query and retrieval.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 
Jim Leebens-Mack 
4505 Miller Plant Scinces

Arlin Stoltzfus

unread,
Jun 7, 2011, 4:34:10 PM6/7/11
to William Piel, Hilmar Lapp, Jim Leebens-Mack, Karen Cranston, MIAPA, phy...@googlegroups.com, TreeBASE devel
Regarding ToLWeb, two comments. I think this proposal is not going to
do as well if it focuses on updating specific resources and bringing
them closer to meeting user needs, as opposed to starting with the
user needs and bringing to bear whatever tools or resources solve the
problem most easily. But we can wait to see what the program
officers tell us.

Having said that, I'm pretty sure from the analysis of literature
(that Brian and I are doing) that re-use of large species trees is an
important use-case. In a sample of 40 recent papers that hit
"phylogen*" in the title or topic (obviously not a random sample, but
we wanted to find the folks who focus on trees), we found 5 that use
phylomatic or APG trees, and 1 that uses the animal supertree from
Bininda-Emonds.

APG and phylomatic appear to be leaving ToLWeb and TreeBASE in the
dust, in terms of scientific re-use. Whatever they are doing to bring
trees to users, we should be doing.

Arlin

Hilmar Lapp

unread,
Jun 8, 2011, 4:24:51 PM6/8/11
to Karen Cranston, phy...@googlegroups.com, Jim Leebens-Mack, Arlin Stoltzfus, MIAPA, TreeBASE devel
It looks like a response from NSF is still pending. There is not a lot
of time left until the submission deadline, and I'll be out of
commission for at least 7 days during that time starting Wed next
week. So I suggest we start planning and get together independently of
the NSF response to hash out over a conference call possible
contributions and commitments. Here's a Doodle poll for scheduling.

http://www.doodle.com/8zvwbidtxm9gzxcp

To make sure that we can have a relatively targeted discussion, my
suggestion would be that everyone who is willing to play a role in
this proposal enter their availability, and come prepared for the
following questions:

1. What aims would a proposal need to have to for you to commit to be
part of it, and conversely, what aims should it not have. (Ideally,
the aims would be from either pitch A or pitch B that Karen sent to
NSF for feedback.)

2. What aims, expertise, and partners are we missing from the group.
Do you have suggestions for how to pull those in.

3. What role are you interested in playing, for which aim(s). What
kind and how many resources do you anticipate requiring support for to
accomplish those aims.

At the end of this, ideally we have a concrete sense for whether there
are 0, 1, or 2 proposals that are viably going to come together, what
size of proposal(s) we are talking about, who would take
responsibility for what, and who else we need to reach out to.

Comments / suggestions / additional items for the enumeration above
welcome.

-hilmar

Karen Cranston

unread,
Jun 9, 2011, 12:58:14 PM6/9/11
to phy...@googlegroups.com, MIAPA, TreeBASE devel
Hilmar and I talked to Anne Maglia from NSF this morning. The notes
are on the "Pitches for TreeBASE_ABI" document (which is now editable
by anyone with the link, BTW). She did not see any major issues and
had plenty of advise on how to avoid common pitfalls when writing for
the ABI panel.

Summary:
1. Making the MIAPA component into a separate Innovation proposal is
probably a good idea.
2. The TreeBASE / ToLWeb piece is well-suited for a Development
proposal, and we can discuss MIAPA in this proposal as long as we have
a concrete contingency plan for the possibility that this gets funded
and the MIAPA proposal does not.
3. There is no general rule about incremental improvement vs major
re-engineering, but the goals of the proposal must be novel in some
way and have intellectual merit. A re-engineering proposal could be
computationally novel, while a proposal with only incremental
improvements must instead have novel interface components or strong
biological motivations.
4. There seems to be an empty niche for proposals that include novel
front-end as well as back-end development, but we need to make sure we
have the appropriate expertise for the former.
5. She suggests sharing the draft with someone from BIO (perhaps
Maureen Kearney) to get the user community perspective

Please fill out the doodle poll so that we can plan the next course of action!

Cheers,
Karen

--

Karen Cranston

unread,
Jun 9, 2011, 2:01:20 PM6/9/11
to phy...@googlegroups.com, MIAPA, TreeBASE devel
Tomorrow morning at 11 am EST is a winner so far. It would be good to
talk before the weekend, so I am going to tentatively schedule that
block unless I hear otherwise. Connection information will be in the
Google Doc - please ask if you need access.

Talk to you tomorrow,
Karen

Hilmar Lapp

unread,
Jun 9, 2011, 2:04:29 PM6/9/11
to Karen Cranston, phy...@googlegroups.com, MIAPA, TreeBASE devel
Yes, I was going to suggest the same. -hilmar

> --
> You received this message because you are subscribed to the Google
> Groups "MIAPA" group.
> For more options, visit this group at
> http://groups.google.com/group/miapa-discuss?hl=en

--

Arlin Stoltzfus

unread,
Jun 13, 2011, 8:22:16 AM6/13/11
to Karen Cranston, phy...@googlegroups.com, MIAPA, TreeBASE devel
After our telecon, which suggested that splitting out the MIAPA part was a better strategy, I started a separate doc for this here:

https://docs.google.com/document/d/16bno1sB3gBHHnew5TnoCLawScuoydG-i5LCPcB30OZY/edit?hl=en_US

The focus of this, as currently conceived, is to combine problem-solving with development of a draft standard. The problem-solving attempts to address relevant user needs (e.g., helping users to create a properly formatted and annotated archive submission). This way, we will be developing technology support at the same time as the draft standard (which, ideally, will encourage the broader community to try it out and work with us).

If you are interested, please take a look at the proposal, help us to identify problems to address and possible strategies to address them by leveraging available technologies and resources. Those who are interested will need to solidify partnerships as soon as possible, as there is only a month left to formulate the plan and write the proposal.

Arlin
________________________________________
From: miapa-...@googlegroups.com [miapa-...@googlegroups.com] On Behalf Of Karen Cranston [karen.c...@nescent.org]
Sent: Thursday, June 09, 2011 12:58 PM
To: phy...@googlegroups.com; MIAPA; TreeBASE devel


Subject: Re: ABI proposal for phyloinformatics

Hilmar and I talked to Anne Maglia from NSF this morning. The notes

Cheers,
Karen

--

Rutger Vos

unread,
Jun 14, 2011, 9:28:39 AM6/14/11
to Arlin Stoltzfus, Karen Cranston, phy...@googlegroups.com, MIAPA, TreeBASE devel
I just shared some sketches (thinking out loud about TreeBASE3) with
various people on these lists (Arlin, Hilmar, Karen, Bill, Harry).
MIAPA might play a role in the automated submission process, so if
anyone else is interested in seeing these documents please let me or
any of the other people with access now and we can share it with you.

--
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading, RG6 6BX, United Kingdom
Tel: +44 (0) 118 378 7535
http://rutgervos.blogspot.com

David Maddison

unread,
Jun 14, 2011, 3:42:19 PM6/14/11
to phy...@googlegroups.com, MIAPA, TreeBASE devel
Sorry, folks for being silent. Travelling, daughter's graduation, mom's visit, yada yada yada...

Here are some comments, and I must say I haven't read the proposal yet, just the pitches, and the emails.

I like the general plan.

I agree that Jim that fundamentally ToLWeb should be viewed as a test case for interoperability, even if there is specific effort in the project on it. I would go so far to say that even TreeBASE should be viewed here as a test case. The most important thing that would be built in the process of doing this the vision, understanding of needs, standards, design of the tools, and community building and inspiration, rather than the particular implementations that will come out in TreeBASE and ToLWeb. The products should be done with enough abstraction that they will serve for other cases. I always find that it is good to have two test cases, just to force one to think about alternatives at various decision points. There is a cost to that, of course, and that is the added funds that would be needed to support other test cases. But if we could have a light-weight alternative to TreeBASE as an alternative test case at that end, and a light-weight alternative to ToLWeb to serve as a test case there, then it might be worth thinking about including a bit of effort there that to force greater abstraction.

ToLWeb is "emotionally" open source. That is, it's fully available as far as I am concerned, but we haven't gone through the effort of actually making it open source. Anyone who wants the source can have it. So, I am enthusiastically in support of having a small portion of the budget devoted to the effort to push it onto Source Forge or somewhere. Ideally that would involve a bit of contract money to Andy Lenards, and possibly Danny Mandel (programmer that proceeded Andy; Danny understands more of the source, I suspect).

OK, more comments after I look at the proposal.
David

---------------------------------
David R. Maddison
Department of Zoology
3029 Cordley Hall
Oregon State University
Corvallis, OR 97331 USA

david.m...@science.oregonstate.edu

http://david.bembidion.org
http://mesquiteproject.org
http://macclade.org
http://tolweb.org

(541) 737 2834


Rutger Vos

unread,
Jun 15, 2011, 12:49:33 PM6/15/11
to Westneat, Mark, phy...@googlegroups.com, TreeBASE devel, MIAPA
> In talking with Rob G, we
> are both really interested in UI, and we both have programmers in our groups
> who are working on various UI features for Tree Viz, so that would be an
> area where I think we would be interested to help the project as the interop
> details are hammered out.

Tree visualization and interaction design of web applications are two
different things. There is a lot of community expertise of the former
(and there are other people we might involve in this, e.g. Tamara
Munzner), but not enough of the latter. People complain about TreeBASE
in part because the workflows and visual metaphors aren't informed by
good HCI design, and this is not going to be addressed by better tree
viz. People will complain less about TreeBASE if we know why facebook
is nicer than myspace and apply those principles.

Robert Guralnick

unread,
Jun 15, 2011, 1:14:54 PM6/15/11
to phy...@googlegroups.com, Westneat, Mark, TreeBASE devel, MIAPA, Andrew Hill
I see TreeViz and UI are partly overlapping. There are a lot more
interesting ways to explore and annotate trees that push back to
overall UI design. Also, it might be wise for us to both look
internally to experts within the community, and also externally to
those who do design in different biological (or other) arenas. We
want to develop for a broad set of users, and not necessarily simply
for card carrying systematists. I am very sympatico to the idea that
we really do need to bring in HCI expertise, but am even more keen to
find great and visually talented artists and designers can carry us a
ways forward too, without getting too "academic". In developing some
UI and TreeViz software, our lab has found great value in having a
talented visual designer (Sander Pick;
http://plebeian.tv/#information) to help us think through some of the
UI/Viz challenges. Sander's skills transfer well across domains (he
is now doing design work for clean energy).

Best, Rob

Westneat, Mark

unread,
Jun 15, 2011, 12:32:30 PM6/15/11
to phy...@googlegroups.com, MIAPA, TreeBASE devel
Hi everyone, sorry for joining this thread a bit late but I am catching up and have been discussing some things about ToLWeb and user interface ideas with the phylorf committee.  Regarding ToLWeb, we agreed that we need a point person and so I have volunteered to be that person, with David and Karen joining to form a three-person group to help steer ToLWeb issues in this ABI grant.  We are totally open to all ideas about where ToLWeb should head- we mainly thought having a group with a point person would help get decisions made.  In order to be effective, I am trying to aggregate information on ToLWeb features and site maps, existing strengths and weaknesses, steps needed to go open source, features we need/want, user interface ideas, past discussions like ToLWebII, etc.  Regarding user interface issues, I think we can probably all agree that UI design and implementation is a key issue for both TB and TW.  In talking with Rob G, we are both really interested in UI, and we both have programmers in our groups who are working on various UI features for Tree Viz, so that would be an area where I think we would be interested to help the project as the interop details are hammered out.  More soon-  Thanks, Mark
--
Mark W. Westneat
Curator of Zoology
Robert A. Pritzker Director, Biodiversity Synthesis Center of the Encyclopedia of Life
Field Museum of Natural History
1400 S Lake Shore Dr.,  Chicago, IL  60605-2496
(312) 665-7734

My website: http://synthesis.eol.org/users/mwestneat
BioSynC website: http://synthesis.eol.org

Karen Cranston

unread,
Jun 15, 2011, 1:23:36 PM6/15/11
to phy...@googlegroups.com
These are some of the reasons that we invited Matt Bietz to
participate (and why we are thrilled that he has accepted). He comes
from the social sciences and has experience in HCI and
cyberinfrastructure development.

I agree that while there is overlap between visualization and
interface development, the pitch that we have outlined so far needs
usability and interface experts more than visualization experts.

Karen

--
~~~~~~~~~~~~~~~~~~~~~~~
karen.c...@gmail.com
~~~~~~~~~~~~~~~~~~~~~~~

Andrew Hill

unread,
Jun 15, 2011, 2:28:32 PM6/15/11
to Robert Guralnick, phy...@googlegroups.com, Westneat, Mark, TreeBASE devel, MIAPA
Hi all,

Need to throw in my two cents here. I imagine these comments might
represent a slightly different pov than has come up.

Near-future grant proposals probably should not spend heavily on the
development of tree visualization tools with goals of long-term
sustainability. Instead, the next wave of proposals (I'm only talking
about viz here) should focus on a few pre-tool advancements. My idea
of what these advancements should be is based on

1) the future/now of viz is on the open-web (html5 etc).
2) the best data viz people are probably not in our community, they
are in their own or are unsuspecting tinkerers who will stumble upon
it.
3) the coolest innovations in tree viz aren't going to be wrapped into viz tools

So, with that, here is what I propose needs to happen,

1) PhyloJSON notations. XML on the web is slow and unnecessary. JSON
is quickly moving in on XML dominance. PhyloBox and other projects all
moved down their own paths for creating json phylogeny notation. This
is the normal way to do it, but our community could be developing some
common notations that we could all start using/expecting.

2) Advanced REST tree queries (TreeBASE) for strictly web-client based
consumption. What I mean here, is the use of cross-domain rest queries
to find, trim, scope, and combine trees. On top of that, wrapping the
results into the product of (1) above. This will allow web mashups and
tools to emerge that combine the data in unexpected ways.

3) Proof of concept tools and documentation. All I mean here, are
examples of how the products of (1) and (2) can be used to merge
phylogeny with external sources of data (wikipedia, eol, flickr, etc),
using the modern web, html5, css3, and javascript. This will provide
the foundations for a much larger pool of producer/consumers to do
interesting things with the data. Documentation is king. It will
enable those not at the table to understand the decisions made.

Each of those have clear integration points with treebase (especially
the meeting of 1 and 2) and other projects. More importantly, none of
them have been correctly explored with sufficient resources and
planning. On top of that, developing the three in coordination will
allow much better development of use-cases for phylogeny viz that feed
back into each of the three.

best,

a

--
Ecology and Evolutionary Biology
University of Colorado

http://biodiversity.colorado.edu
http://biodivertido.blogspot.com

Westneat, Mark

unread,
Jun 15, 2011, 2:28:34 PM6/15/11
to phy...@googlegroups.com
Agreed with all the above- in mentioning visualization, I meant to point out that in that area we are faced with a number of UI needs that may be relevant.  Lots of UI is not viz, and vice versa of course.  But I see UI and viz as tightly linked for both TB and TW, in the sense that for TB it is hard to see the trees, and in TW we would like to have a more flexible way to view the structure at different zooms.  So, one key UI feature that both projects need is a fresh viz framework- preferably a shared one.  There are lots of other UI features that are not viz, and we need a list of those ASAP.  Lastly, on viz, my feeling is that we will not want to propose a whole new viz initiative here, creating new views, but rather implement existing ones to offer a range of view frames that are already available.  On the other hand, if the HCI leads us to new viz, that would be great.  -MW

Karen Cranston

unread,
Jun 15, 2011, 2:46:12 PM6/15/11
to phy...@googlegroups.com
Visualization (of trees or other data) was not something that we have
discussed in our previous teleconferences, and not something we
discussed with an NSF program officer. With the proposed
re-engineering, interoperability, UI and user engagement components,
bringing in visualization as well might be too much. Not that I don't
personally have an interest in visualization software and principles,
but I do not think that should be a key piece of this proposal. One
fear I have is that viz may overlap too heavily with the AVAToL panel
- with the pitches we provided to NSF, they did not think that was an
issue. Incorporating a significant viz component might be a different
scenario.

I agree with Andrew, though, that the re-engineering should involve
well-designed interoperability goals so that people can re-use the
data as easily as possible in new and creative ways. But, we also need
to consider the users that will access these resources via the web
interface, not only the APIs.

Karen

--
~~~~~~~~~~~~~~~~~~~~~~~
karen.c...@gmail.com
~~~~~~~~~~~~~~~~~~~~~~~

Rutger Vos

unread,
Jun 15, 2011, 3:20:04 PM6/15/11
to phy...@googlegroups.com
On the topic of tree viz and json I would just for the sake of
completeness want to bring (Andrew's competitor) jsPhyloSVG to mind -
that was the project with the sleek iPad demo at last year's iEvoBio.
Sam Smits has been very responsive in adding/fixing support for
json-serialized NeXML to his widget.

I agree with Andrew's general observation that tree viz widgets that
consume json from public APIs (phylows?) are the way forward, and
phylobox and jsPhyloSVG are both compelling examples.

Having said that, I think that a good user experience would include
intuitive integration of the viz tool with the rest of the UI, i.e.
the opposite of the way things work now with phylowidget on TreeBASE
(on the other hand, the way it's done on ToLWeb works rather well
IMO).

Westneat, Mark

unread,
Jun 15, 2011, 4:18:43 PM6/15/11
to phy...@googlegroups.com
Hey all, OK so my group's main interest in participating would be on the user interface side, and improving the user experience in both TreeBASE and ToLWeb.  Any re-engineering of either resource should keep in mind what people want to see and do when they arrive at a tree topology.  Most people want to see the trees, right?  So, among the goals should be a good way to see the trees (and tree sets) in TreeBASE and move around in ToLWeb, seeing the tree at various levels.  I would call that visualization, I guess, nothing outrageous, but this kind of basic tree viewer should always be there as part of the end product.  So I think we should include it at some level, certainly as part of any UI developed, even if (as I suggested earlier) we don't propose new viz frameworks and just use what we already have.  Andrew's suggestions for web-readiness should make this relatively easy to do and not require that a large part of the proposal be focused on viz- I do think we should focus a chunk of the proposal on the interface, with some basic tree topology viz being a key part of that.  But if that is maybe a different grant, thats totally cool with me too.   Thanks, Mark

William Piel

unread,
Jun 15, 2011, 4:32:51 PM6/15/11
to phy...@googlegroups.com

If someone asked me to list improvements / features to ToLWeb, I'd say that the main area for improvement is to get biologists to re-use the ToLWeb tree in their research. Currently this doesn't happen enough because the tree is incomplete and it's not easy to download the exact skeleton that you want in Newick (Mesquite let's you do this, but only in a very basic way -- we need something more akin to Phylomatic, i.e. give a list of OTUs and ToLWeb sends you the tree pruned to just those OTUs; or send ToLWeb OTUs x, y, and z, and it returns the smallest subtree that includes this triplet). To solve the "incomplete" problem, I would vote to tree-ify certain chunks of classifications and do a whole-scale upload (perhaps with these added taxa flagged as "un-curated") -- e.g. using any classifications that make an effort to resolve synonyms, e.g. Tropicos, Plant List, ITIS, etc. Later, the curators can come back and move the new names/branches around. But at least this way we can start out with a "tree" of all, or most, species. At a minimum, ToLWeb should have every species that is found in NCBI as an OTU, marked up with the NCBI taxid (e.g., currently it has only three species of Araneidae -- but NCBI has more than 100, and if we took the Platnick catalog and tree-ified the family classification, we could boost it to all known species of Araneidae). Too often, biologists turn to NCBI's classification as a faux-tree to, for example, map gene trees to species trees. We want them to use ToLWeb instead. If every NCBI OTU has a presence somewhere in ToLWeb, I could imagine NCBI modifying Entrez to allow people to search Genbank using ToLWeb as an alternative to using its own classification. Maybe then people could query Genbank with critically important clades like "Ecdysozoa" (which currently they can't do).

Other improvements: make the UI more like Google Maps, in that is zooms seamlessly, and different levels of zoom show different features/metadata. Allow third-party users to "attach" things to the tree (e.g. like photos in Google Maps) or make overlays, without actually affecting/modifying the database --- i.e. to be able to reuse parts of the tree on their own website with their own decorations, much the way people put overlays on Google Maps and stick them on their own websites. Allow other classes of curators, rather than strictly clade-specific curators -- e.g. allow people from TimeTree to run through the whole ToLWeb tree and put paleo anchor metadata on node, or identify Gondwana clades. I'd like it to be very easy for a teacher to say: make me a tree of x, y, and z, and output a SVG or JPG for a PowerPoint slide to show the class. 

However... however... I think it is too grandiose to try to do a TreeBASE redesign and make major improvements / changes to ToLWeb in the same ABI proposal.  Either fork the proposals, or limit the ToLWeb improvements to modest, TreeBASE-centric changes. For example, limit it to: (1) the TreeBASE-ToLWeb crosswalk (which hardly requires any coding on ToLWeb's side), (2) harmonizing the API so that both TreeBASE and ToLWeb respond to the same query language syntax, including having ToLWeb export NeXML and NEXUS/Newick.  IMO, this is the extent to which we should co-develop with ToLWeb, before it overburdens a single proposal. 

bp


William Piel

unread,
Jun 15, 2011, 4:52:07 PM6/15/11
to phy...@googlegroups.com

On Jun 15, 2011, at 4:18 PM, Westneat, Mark wrote:

> Hey all, OK so my group's main interest in participating would be on the user interface side, and improving the user experience in both TreeBASE and ToLWeb. Any re-engineering of either resource should keep in mind what people want to see and do when they arrive at a tree topology. Most people want to see the trees, right? So, among the goals should be a good way to see the trees (and tree sets) in TreeBASE and move around in ToLWeb, seeing the tree at various levels. I would call that visualization, I guess, nothing outrageous, but this kind of basic tree viewer should always be there as part of the end product. So I think we should include it at some level, certainly as part of any UI developed, even if (as I suggested earlier) we don't propose new viz frameworks and just use what we already have. Andrew's suggestions for web-readiness should make this relatively easy to do and not require that a large part of the proposal be focused on viz- I do think we should focus a chunk of the proposal on the interface, with some basic tree topology viz being a key part of that. But if that is maybe a different grant, thats totally cool with me too. Thanks, Mark

To me, the TreeBASE API issue is less about visualizing trees (although clearly we need, as Rutger says, a highly integrated and easy/powerful way to view and rummage through trees) and more about giving people the power to use all the metadata to their search advantage without having an interface that is too complicated. Metadata includes size of tree, size of matrix, kind of tree, matrix data type, taxonomic identifiers, gene names, author names, analysis methods, etc etc.

e.g.:

- find me trees that include Mustelids and Pinnipeds, but not Canids (or anything below that), and that result from molecular analyses of character sets with more than 1000 bases.

- which trees don't support monophyly of bats?

- find me matrices authored by Michael Donoghue that use morphological data and that don't deal with Viburnum.

- give me a dump of all trees that have at least 5 Elasmobranchii in them, but only one tree per "study" in a form that easily allows me to build a supertree with them (i.e. all OTUs are remapped to common set of identifiers)

bp

Karen Cranston

unread,
Jun 15, 2011, 7:49:52 PM6/15/11
to phy...@googlegroups.com
I just want to highlight this comment of Rutger's:

> (on the other hand, the way it's done on ToLWeb works rather well
> IMO).

There is plenty of functionality that we can build around the type of
static tree browsing interface that ToLWeb has now. If we expose the
data in consistent ways according to the standard formats that treeviz
tools want, then others can build (perhaps even incorporate?) more
exciting visualization elements.

I also agree with Bill that a more crucial interface issue is giving
users the ability to find the trees they want, using a wealth of
available metadata and with an interface designed using modern
principles of UI design and usability.

--
~~~~~~~~~~~~~~~~~~~~~~~
karen.c...@gmail.com
~~~~~~~~~~~~~~~~~~~~~~~

Hilmar Lapp

unread,
Jun 15, 2011, 7:25:08 PM6/15/11
to phy...@googlegroups.com, phy...@googlegroups.com
Great discussion. To throw in my two cents: I think it's important to keep this vision of creating a platform in mind. A platform doesn't solve all problems and doesn't serve all needs, but it provides unified and easily programmable access to useful content (it's the data, stupid!) such that people with the skills and drive can innovate. The content being useful in this case to me implies that it is much more comprehensive than it is now (most published phylogenetic trees are not deposited in TreeBASE, and the ToL is far more incomplete than current state of published phylogenetic knowledge), and that it is much better annotated and curated than it is now, which I find hard to accomplish without a much improved connection between ToLWeb and TreeBASE, and without well-thought through approach to overcoming the social barriers to user engagement.

Having said that, small proof-of-concept viz (and other) apps can serve well to guide and validate the "platform" and API goals. But they shouldn't take a scope that would start to distract from the main focus. Perhaps a good instrument for supporting these proof-of-concept efforts are developer-engagement events (challenges, hack-days, competitions, ?).

-hilmar

Sent with a tap.

Westneat, Mark

unread,
Jun 16, 2011, 12:05:55 PM6/16/11
to phy...@googlegroups.com, TreeBASE devel
Hi all, Picking this up again today, I agree with Bill that feasibility and scope are key for this grant, so if it is focused on the TreeBASE re-engineering and first steps of interop to ToLWeb, that seems like a feasible project.  I didn't get that impression, however, from reading the proposal draft (seemed like retooling of both projects), so it may need some honing to make that clear.  So, for our ToLWeb group, in this grant we should plan on open sourcing, and helping make the tie-ins with TreeBASE, and perhaps use this exercise as a way to plan for future more complex modifications to ToLWeb.
Trying to think this through, here are some questions, ideas, devil's advocate positions, etc.
1. How many phylogenetic trees have been published?  The totality of all phylogenetic knowledge is probably not very big.  Maybe 10,000 studies proposing topologies?  For maybe a half million taxa?  Just a guess.  We should map out the path to the capture of all phylogenetic knowledge, past and future, and declare how far this grant will get us toward that goal. 
2. What should the ultimate set of tools for phylogeny storage, retrieval, viewing, and use look like?  Are we getting there by re-engineering TreeBASE?  I think we are, but we better make that really clear in the proposal that this time, a new TreeBASE is really going to be awesome and give user joy to the community.  Reviewers will demand this, and I see the various criticisms of TB2 as a potential risk of having this proposal be too TreeBASE centric. 
3. Will TreeBASE and ToLWeb remain independent projects during and after this grant?  Or should they be somehow fundamentally joined to produce a resource for storage, retrieval, and public display of evolutionary trees?  I can see advantages of either path, but we are going to have to make a decision on this, defend it in the grant, and show reviewers that this is headed where the community wants it to. 
4. The idea of integrating social networking into TreeBASE (or ToLWeb) is risky- I would advise against it.  There are plenty of ways to connect with colleagues without trying to duplicate FaceBook tools.  IMO we would be better off building a way to view and browse the trees.  Or, I like Hilmar's suggestion of building a platform, flexible enough to incorporate social tools and viz tools, etc.
5. The pushback on integrating tree viewers in this project is surprising to me- I recommend against a rework of TreeBASE plumbing and infrastructure without starting from what the users want, which is to be able to easily see the trees.  Not sure if you have done any user needs survey or audience assessment, but the lack of a friendly UI and tree viewer are the two most common things I hear about TB2.
6. I like the idea of the links between TB and TW.  Thinking user functionality, what will this provide? While browsing ToLWeb, an icon or link will automatically take me to a TreeBASE tree that has any of the same taxa.  Or while searching TreeBASE for primate trees, the ToLWeb site for primates is offered as an option.  This will require taxonomic resolution of clade names and tips between the two resources.  Is this name resolution functionality part of the grant- has it been tried?  Shouldn't be too hard to see what the matches and misses currently are, but an auto taxon mapping tool may be more challenging.
Those are my thoughts for now- probably not helpful for actually writing the grant, I'm afraid, but perhaps useful for deciding what kinds of things to include.  Cheers, Mark

Arlin Stoltzfus

unread,
Jul 13, 2011, 10:00:27 AM7/13/11
to Rutger Vos, Karen Cranston, phy...@googlegroups.com, MIAPA, TreeBASE devel
I think we should consider some ways to publicize elements of the
approaches we all have been advocating in various ways, which depends
on open standards and web services and so on, e.g., the idea that any
tool should be able to submit an annotated record to an archive using
a standard protocol. There might be other people around the world who
would cooperate with us if they knew what we were thinking. Of course
we are discussing these things on public lists, but still I would
guess that many people who would be interested in this have no idea
what we are considering.

Arlin

>> dis...@googlegroups.com] On Behalf Of Karen Cranston [karen.c...@nescent.org

-------

Enrico Pontelli

unread,
Jul 13, 2011, 10:25:05 AM7/13/11
to Arlin Stoltzfus, Rutger Vos, Karen Cranston, phy...@googlegroups.com, MIAPA, TreeBASE devel
I agree. Some of the components are taking shape or are already well
defined (MIAPA 0.1/1.0, Nexml).
I hope that the standard record will have a semantic foundation (let it be
MIAPA?).

It would be good to start formulating the structure of such annotated
record and perhaps move forward the creation of a PhyloWS service that at
least, as a start, provides validation (and possibly completion) of the
record.

Enrico

--
Dept. Computer Science,
New Mexico State University
MSC CS, Box 30001, Las Cruces, NM 88003
Voice: 575-646-6239 Fax: 575-646-1002

Reply all
Reply to author
Forward
0 new messages