In making phylogeny of protein superfamily I have 15000+ refseq sequences, many of them very similar to each other. Suggestions for reducing number?

17 views
Skip to first unread message

ramiro barrantes

unread,
Oct 7, 2015, 9:45:07 PM10/7/15
to POY - Phylogenetic Analysis Software
I am revisiting a phylogeny I did years ago but now refseq in genbank has about 15000+ relevant protein sequences!! I wanted to filter those out to a more manageable set and I am using t-coffee (like I used to) but it's taking a long time and I am wondering what people do these days. Are there other things people use to automatically remove sequences that are very similar to each other?? Since I am interested in the deep branches I don't need to have all sequences, just the few hundred most divergent ones (a lot of these are different strains of e.coli of the same subfamily for example, where I could just use one). Any suggestions??

Thank you very much for any help,

Ramiro

alizoh...@gmail.com

unread,
Jan 22, 2016, 11:46:28 AM1/22/16
to POY - Phylogenetic Analysis Software
HI 
Dear 
You can upload to online servers for analysis. Just download your sequences in Bioedit and uplaod to 
ebi.ac.uk choosing appropriate alignment tools 
Reply all
Reply to author
Forward
0 new messages