In making phylogeny of protein superfamily I have 15000+ refseq sequences, many of them very similar to each other. Suggestions for reducing number?

17 views

Skip to first unread message

ramiro barrantes

unread,

Oct 7, 2015, 9:45:07 PM10/7/15

to POY - Phylogenetic Analysis Software

I am revisiting a phylogeny I did years ago but now refseq in genbank has about 15000+ relevant protein sequences!! I wanted to filter those out to a more manageable set and I am using t-coffee (like I used to) but it's taking a long time and I am wondering what people do these days. Are there other things people use to automatically remove sequences that are very similar to each other?? Since I am interested in the deep branches I don't need to have all sequences, just the few hundred most divergent ones (a lot of these are different strains of e.coli of the same subfamily for example, where I could just use one). Any suggestions??

Thank you very much for any help,

Ramiro

alizoh...@gmail.com

unread,

Jan 22, 2016, 11:46:28 AM1/22/16

to POY - Phylogenetic Analysis Software

Dear

You can upload to online servers for analysis. Just download your sequences in Bioedit and uplaod to