We are writing to all the users of the SATé software, in preparation for developing a grant application to NIH to continue its development. We are requesting your feedback on your experience with SATé, comments on improvements and extensions to SATé, and information about your papers or studies that used SATé.
Although SATé can be used on amino-acid datasets, it has been designed for DNA sequence alignment and phylogeny estimation; most of the testing of SATé has been done with DNA sequence data. SATé provides highly accurate alignments and trees on datasets with up to a few thousand nucleotide sequences, but larger datasets are computationally hampered by the limited use of parallelism and the reliance on RAxML for tree estimation. SATé can perform multi-gene analyses and estimate the phylogeny through a partitioned maximum likelihood search. However, it assumes that there is no incongruence between gene trees and species trees (an assumption that may not hold in the presence of incomplete lineage sorting or when datasets contain paralogs rather than orthologs). Finally, although SATé outputs the set of trees and alignments it computes in each iteration, it does not provide any statistics about these alignments and trees. Our proposed improvements will address these limitations, as follows:
1. Modification of SATé's design to produce highly accurate alignments and trees on amino-acid sequences,
2. Providing alternatives for multi-marker analyses, including methods that explicitly address incomplete lineage sorting and gene duplication and loss,
3. Improved speed for large dataset analyses, through re-engineering the software to use faster methods of maximum likelihood estimation (e.g., FastTree) and exploit parallel architectures,
4. Improving the accuracy of the tree estimation by developing new statistical methods that can better utilize the phylogenetic signal in gaps and new methods that can better assess the reliability of homology statements within an alignment, and
5. Building interfaces so that SATé can be used in larger pipelines. For example: pre-analysis steps such as selecting the AA model or detecting orthology, and post-analysis steps such as rooting trees, bootstrapping, modifying alignments using masking techniques, visualizing trees and alignments, computing supertrees from different markers, comparing trees and alignments obtained using different techniques, etc
At this time, we are seeking your assistance in three regards: (1) comments on our proposed improvements and extensions which are described below, (2) requests for additional features, and (3) lists of your own papers (submitted or accepted) that used SATé, or analyses that you have performed that have used SATé.
We are in the process of preparing more documentation for the software (an online tutorials and a manual rather than the README file). Any comments that you have about your experiences using the software would be very welcome and would help us improve the usability SATé.
Please respond to this request by emailing Tandy Warnow at ta...@cs.utexas.edu (preferably with subject title "SATE Survey").
We thank you for your time.
Best wishes,
Tandy Warnow and Mark Holder
email sent by: Mark Holder mtho...@ku.edu
http://phylo.bio.ku.edu/mark-holder
==============================================
Department of Ecology and Evolutionary Biology
University of Kansas
6031 Haworth Hall
1200 Sunnyside Avenue
Lawrence, Kansas 66045
lab phone: 785.864.5789
fax (shared): 785.864.5860
==============================================