Re: number of proteins used in a user tree

41 views
Skip to first unread message

Curtis Huttenhower

unread,
Oct 27, 2017, 9:23:50 AM10/27/17
to Pedro J Cabello Yeves, phylophl...@googlegroups.com
Thanks for getting in touch, Pedro - I've CCed in the PhyloPhlAn users group to help take a look.  In general, you request a particular number of proteins to be used - 400 is the default - but if others can provide more detail, or you have a specific example, we can take a closer look.

Thanks again -
Curtis

On Fri, Oct 27, 2017 at 5:46 AM, Pedro J Cabello Yeves <pedrit...@gmail.com> wrote:


Dear C. Huttenhower,

I am a phd student working at UMH, Spain, at Francisco Rodriguez-Valera's laboratory.

I have been recently working with PhyloPhlAn package to make user phylogenies and trees with certain genomes (including MAGs, pure cultures and SAGs for different phylums/classes). I read that PhyloPhlAn uses around 400 conserved proteins in order to make trees but I would like to know how can I know how many conserved proteins/genes were used to make my user tree.

Is there any way to obtein this specific number of proteins and which proteins were used for each independent tree (option -user) ?

thanks in advance,

Kind regards,

Pedro J Cabello-Yeves


Francesco Asnicar

unread,
Oct 27, 2017, 10:11:43 AM10/27/17
to phylophl...@googlegroups.com, Pedro J Cabello Yeves
Dear Pedro,

First of all, many thanks for using PhyloPhlAn.

In the PhyloPhlAn folder, you should find a data folder. If you go inside the "data" folder you should find a folder named as the input folder you specified when you ran PhyloPhlAn.
If you enter the data folder of your project, you can use the following bash code to count the number of markers identified in each one of your inputs:
$ for i in $(ls *.b6o); do echo -en "${i::-4}\t"; cat ${i} | wc -l; done

Then, if you want to know in total how many of the 400 universal markers where use to build the tree, you can use the following bash code (again from the same data folder of your project):
$ ls p0*.sub.aln | wc -l

Finally, the 400 universal markers are numbered from p0000 to p0399. If you want the list of markers that were identified and retained for the phylogenetic analysis, you can use the following bash code (always form the data project directory):
$ ls p0*.sub.aln | cut -f1 -d'.' | sort

Please, let me know if these answer to your questions or if you need some other information.

Many thanks,
Francesco

--
You received this message because you are subscribed to the Google Groups "PhyloPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phylophlan-use...@googlegroups.com.
To post to this group, send email to phylophl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/phylophlan-users/CAComemYzc6qHMdmeX4LNhSfkWaJxYtBGCqH2UmYifD2Fn5t8dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages