Annotating trees/MIAPA/the mammal supertree use case

3 views
Skip to first unread message

Rutger Vos

unread,
Feb 13, 2013, 11:58:16 AM2/13/13
to MIAPA, h...@nescent.org
Hi all,

attached is the spreadsheet that we used for collecting source trees to assemble the mammal supertree. Since this is an real use case it might be useful to look at this to tell us what the required amount of minimal information actually is.

Best wishes,

Rutger

--
Dr. Rutger A. Vos
Bioinformaticist
Naturalis Biodiversity Center
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com
Treedbase_ver1_2.xls

Arlin Stoltzfus

unread,
Feb 13, 2013, 12:14:47 PM2/13/13
to MIAPA, HIP Hackathon Participants
On Feb 13, 2013, at 11:58 AM, Rutger Vos wrote:

Hi all,

attached is the spreadsheet that we used for collecting source trees to assemble the mammal supertree. Since this is an real use case it might be useful to look at this to tell us what the required amount of minimal information actually is.

To save you all the trouble of opening this, it is an empty spreadsheet with 18 columns with the titles: 

Ref ID
Full Ref.
Tree ID
Tree ID Descriptor
Taxa
# OTU's
Data Type
Data Quantity
Morphological Data Type
Molecular Data Type
Molecular Source
Analysis Method
Model of Evolution
Data Weighting
Node Support
Divergence Times
Outgroup(s)
Notes

One of the key things for the supertree analysis of Bininda-Emonds, et al. 2007 was to ensure that the trees to be combined actually were independent, rather than being based on the same data.  So I think they must have spent some time tracking this down.  

I'd really like to see the populated spreadsheet, rather than the empty one.  This would reveal how much detail they were capturing in their free-text annotations. 

But I agree completely with the point that here is an actual case of tree re-use, on a scale of 100s or 1000s of re-used trees, and so it has a value (relative to the largely hypothetical exercise of imagining what kind of information would be ideal to have). 

Arlin

Best wishes,

Rutger

--
Dr. Rutger A. Vos
Bioinformaticist
Naturalis Biodiversity Center
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com

--
--
You received this message because you are subscribed to the Google
Groups "MIAPA" group.
For more options, visit this group at
http://groups.google.com/group/miapa-discuss?hl=en
 
---
You received this message because you are subscribed to the Google Groups "MIAPA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to miapa-discus...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
<Treedbase_ver1_2.xls>

-------
Arlin Stoltzfus (ar...@umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850
tel: 240 314 6208; web: www.molevol.org

Hilmar Lapp

unread,
Feb 13, 2013, 12:23:35 PM2/13/13
to Arlin Stoltzfus, MIAPA, HIP Hackathon Participants

On Feb 13, 2013, at 12:14 PM, Arlin Stoltzfus wrote:

I'd really like to see the populated spreadsheet, rather than the empty one.  This would reveal how much detail they were capturing in their free-text annotations. 

I agree, and indeed this is a good a example use-case.

-hilmar

-- 
===========================================================
: Hilmar Lapp  -:- Durham, NC -:- informatics.nescent.org :
===========================================================



Rutger Vos

unread,
Feb 13, 2013, 12:29:48 PM2/13/13
to Arlin Stoltzfus, MIAPA, HIP Hackathon Participants
I'd really like to see the populated spreadsheet, rather than the empty one.  This would reveal how much detail they were capturing in their free-text annotations. 

There is not really one single combined spreadsheet: the data were collected by different people who each took on a monophyletic major group (e.g. the Primates). Olaf suggested we try to contact the people responsible for these groups, in the hopes, I guess, that they still have a six year old spreadsheet lying around. I must admit that I don't, somewhat to my dismay. Maybe on one of the USB drives I have in temporary storage in England?

Note that not all the columns are free-text: if you click on one of the cells for some of the models you can see that there is actually a pull-down menu, so it is a bit more controlled than just free-text. For example, under "Model of evolution" there are the options: "From sequences", "Fixed estimates/defaults", "Unspecified", "Not applicable". 
 

Arlin Stoltzfus

unread,
Feb 13, 2013, 12:54:40 PM2/13/13
to MIAPA, HIP Hackathon Participants
On Feb 13, 2013, at 12:29 PM, Rutger Vos wrote:


I'd really like to see the populated spreadsheet, rather than the empty one.  This would reveal how much detail they were capturing in their free-text annotations. 

There is not really one single combined spreadsheet: the data were collected by different people who each took on a monophyletic major group (e.g. the Primates). Olaf suggested we try to contact the people responsible for these groups, in the hopes, I guess, that they still have a six year old spreadsheet lying around. I must admit that I don't, somewhat to my dismay. Maybe on one of the USB drives I have in temporary storage in England?

I'm sure that information is stored here in the US, along with all of the other information we are tracking from around the world :-)

Note that not all the columns are free-text: if you click on one of the cells for some of the models you can see that there is actually a pull-down menu, so it is a bit more controlled than just free-text. For example, under "Model of evolution" there are the options: "From sequences", "Fixed estimates/defaults", "Unspecified", "Not applicable". 

Below I have expanded the list of columns into the list of columns with any restrictions on values.  These restrictions appear as pull-down menu items in Excel, which will not allow any other values to be entered.  

Ref ID
Full Ref.
Tree ID
Tree ID Descriptor
Taxa
# OTU's
Data Type : { Purely Morphological, Purely Molecular, Mixed Molecular and Morphological, Other }
Data Quantity
Morphological Data Type : {  Purely Cranial/dental, Purely Post-Cranial, Purely Soft Anatomy, Mixed, Unspecified Morphological Data, Not Applicable }
Molecular Data Type : {  DNA Sequences, RNA Sequences, Protein Sequences, Other Sequences, Immunological Distances, DNA-DNA Hybridisation, Restriction Maps, SINE/LINE Insertion Sites, Satellite DNA/RNA, Mixed Molecular data types, Unspecified, Not Applicable }
Molecular Source : { Mitochondrial Only, Nuclear Only, Mixed Mitochondrial and Nuclear, Unspecified Molecular Source, Not Applicable }
Analysis Method
Model of Evolution : { From Sequences, Fixed Estimates/Defaults, Unspecified, Not Applicable }
Data Weighting
Node Support : { Nodal Support Present, Nodal Support Not Present }
Divergence Times
Outgroup(s)
Notes
Reply all
Reply to author
Forward
0 new messages