About the parameter of nsites and fixage of r8s

326 views
Skip to first unread message

梁栋

unread,
Jun 14, 2018, 8:50:09 AM6/14/18
to hahnlab-cafe
Hi, everyone,I have the output of OrthoFinder, which provide me one species_rooted tree and gene family result. when I load this species_rooted tree to r8s to generate ultrametric tree, the parameter nsites and fixage confused me, how can I set these? is there any tools or software can help me to infer these parameters? and by the way, wether it's suitable or not to use the OrthoFinder-version species tree as input of CAFE? Thanks very much!

Gregg Thomas

unread,
Jun 14, 2018, 11:30:55 AM6/14/18
to 梁栋, hahnlab-cafe
Hi,

The nsites parameter is simply the total number of columns in the alignment(s) used to construct your species tree. Note that for input to r8s, the species tree must have branch lengths in terms of relative number of substitutions (as commonly given from maximum likelihood programs like RAxML. I'm not entirely sure how OrthoFinder does species tree inference, but as long as it uses a multiple-sequence alignment and infers a tree with branch lengths in relative number of substitutions, then you should easily be able to count nsites from the alignment.

fixage is the parameter to set the calibration points for divergence time estimation. You will need at least one fossil calibration point to obtain a tree with branch lengths in terms of millions of years. To do this, you first need to define the name of an internal node with the mrca command. For example, given the simple example topology ((A,B),C), to name the internal node that is the direct ancestor of taxa A and B, the command would be:

mrca ABancestor A B;

Then, with some prior knowledge that this divergence occurred 1 million years ago, the fixage command is used to set that calibration point:

fixage taxon=ABancestor age=1;

Alternatively, you can provide ranges of possible ages for nodes with the constrain command. For example, if there is fossil evidence that the A and B lineages diverged between 1 and 5 million years ago:

constrain taxon=ABancestor minage=1 maxage=5;

Fossil calibration points will have to be obtained from the literature, but if you're unsure of where to look a good place to start is http://www.timetree.org/, which compiles average and median divergence times from the literature. Simply search for the two species you wish to know the divergence time of and the website will show you average divergence estimates and the references from which they were obtained.

For more info on r8s, see the manual: https://web.bioinformatics.ic.ac.uk/doc/r8s
Hope that helps!

-Gregg Thomas

On Thu, Jun 14, 2018 at 8:50 AM, 梁栋 <liangd...@163.com> wrote:
Hi, everyone,I have the output of OrthoFinder, which provide me one species_rooted tree and gene family result. when I load this species_rooted tree to r8s to generate ultrametric tree, the parameter nsites and fixage confused me, how can I set these? is there any tools or software can help me to infer these parameters? and by the way, wether it's suitable or not to use the OrthoFinder-version species tree as input of CAFE? Thanks very much!

--
You received this message because you are subscribed to the Google Groups "hahnlab-cafe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hahnlabcafe+unsubscribe@googlegroups.com.
To post to this group, send email to hahnl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hahnlabcafe/ef84c13a-3887-4ae7-a0af-2581e50e5f38%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gregg Thomas

unread,
Jun 18, 2018, 10:35:00 AM6/18/18
to Dong Liang, hahnlab-cafe
Hi Dong,

Yes, the best way to determine the fossil calibration points for your phylogeny is to survey the literature for your set of species. If you need a place to start though http://www.timetree.org/ is helpful. They compile average and median divergence times from the literature. Simply search for the two species you wish to know the divergence time of and the website will show you average divergence estimates and the references from which they were obtained.

-Gregg

On Mon, Jun 18, 2018 at 10:02 AM, Dong Liang <liangd...@163.com> wrote:
hi, Dear Thomas,
Thanks for your answer, but I have another question, how can I ensure the parameter fixage, just read some paper to find it?

Thanks
Dong


发自网易邮箱大师

梁栋

unread,
Jun 19, 2018, 12:42:12 PM6/19/18
to hahnlab-cafe
Thanks for Gregg!
But I still have two question:
genfamily tutorial_genfamily/rnd -t 100. But I cannot find tutorial_genfamily/rnd at the website https://iu.app.box.com/v/cafetutorial-files. So what does the example file tutorial_genfamily/rnd means?  raw protein fasta file or gene count table of mcl?
2. At the CAFE tutorial pdf, there is 7 running of CAFE, so I wannna know the relationship bewteen every running? and if I only wanna identify the expansion or contraction of certain family, must I finish all of these 7 running or anything else?

Thanks for your patient help!

Cheers
Dong

在 2018年6月18日星期一 UTC+2下午4:35:00,Gregg Thomas写道:
Hi Dong,

Yes, the best way to determine the fossil calibration points for your phylogeny is to survey the literature for your set of species. If you need a place to start though http://www.timetree.org/ is helpful. They compile average and median divergence times from the literature. Simply search for the two species you wish to know the divergence time of and the website will show you average divergence estimates and the references from which they were obtained.

-Gregg
On Mon, Jun 18, 2018 at 10:02 AM, Dong Liang <liangd...@163.com> wrote:
hi, Dear Thomas,
Thanks for your answer, but I have another question, how can I ensure the parameter fixage, just read some paper to find it?

Thanks
Dong


发自网易邮箱大师

On 06/14/2018 17:30Gregg Thom...@iu.edu> wrote:
Hi,

The nsites parameter is simply the total number of columns in the alignment(s) used to construct your species tree. Note that for input to r8s, the species tree must have branch lengths in terms of relative number of substitutions (as commonly given from maximum likelihood programs like RAxML. I'm not entirely sure how OrthoFinder does species tree inference, but as long as it uses a multiple-sequence alignment and infers a tree with branch lengths in relative number of substitutions, then you should easily be able to count nsites from the alignment.

fixage is the parameter to set the calibration points for divergence time estimation. You will need at least one fossil calibration point to obtain a tree with branch lengths in terms of millions of years. To do this, you first need to define the name of an internal node with the mrca command. For example, given the simple example topology ((A,B),C), to name the internal node that is the direct ancestor of taxa A and B, the command would be:

mrca ABancestor A B;

Then, with some prior knowledge that this divergence occurred 1 million years ago, the fixage command is used to set that calibration point:

fixage taxon=ABancestor age=1;

Alternatively, you can provide ranges of possible ages for nodes with the constrain command. For example, if there is fossil evidence that the A and B lineages diverged between 1 and 5 million years ago:

constrain taxon=ABancestor minage=1 maxage=5;

Fossil calibration points will have to be obtained from the literature, but if you're unsure of where to look a good place to start is http://www.timetree.org/, which compiles average and median divergence times from the literature. Simply search for the two species you wish to know the divergence time of and the website will show you average divergence estimates and the references from which they were obtained.

For more info on r8s, see the manual: https://web.bioinformatics.ic.ac.uk/doc/r8s
Hope that helps!

-Gregg Thomas
On Thu, Jun 14, 2018 at 8:50 AM, 梁栋 <liangd...@163.com> wrote:
Hi, everyone,I have the output of OrthoFinder, which provide me one species_rooted tree and gene family result. when I load this species_rooted tree to r8s to generate ultrametric tree, the parameter nsites and fixage confused me, how can I set these? is there any tools or software can help me to infer these parameters? and by the way, wether it's suitable or not to use the OrthoFinder-version species tree as input of CAFE? Thanks very much!

--
You received this message because you are subscribed to the Google Groups "hahnlab-cafe" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hahnlabcafe...@googlegroups.com.

Gregg Thomas

unread,
Jun 21, 2018, 3:53:12 PM6/21/18
to 梁栋, hahnlab-cafe
Hi Dong,

1. The tutorial_genfamily/ directory should be created by you BEFORE you run the genfamily command. Then, the genfamily command will create simulated datasets and place them in the tutorial_genfamily/ directory. The number of datasets created is specified by -t, so -t 100 will create 100 datasets. Since the path specified is tutorial_genfamily/rnd, this means that within the tutorial_genfamily/ directory, you will create 100 files named rnd_1, rnd_2, ... rnd_100. Does that make sense?
2. I'm not sure I understand what you mean by there being 7 CAFE runs. Do you mean there are 7 runs throughout the whole tutorial or that there are 7 runs during a specific command? The tutorial is meant to walk you through various analyses one can perform with CAFE, so there are multiple times in which CAFE is run for different purposes. Its up to the user to decide which analyses they want to perform.
For instance, if you just want to estimate a single gain and loss rate to infer the number of expansions and contractions, then you need to only do step 3.1.1 (and possibly 3.1.2 if you had families with large sizes).
If you want to estimate a single gain and loss rate while accounting for error to infer the number of expansions and contractions, then you need to only do step 3.4 (and possibly 3.1.2 if you had families with large sizes).

If you mean that CAFE is running 7 times in a given step though, you'll have to clarify which step so I can help more.

Hope that helps some!
-Gregg Thomas

To unsubscribe from this group and stop receiving emails from it, send an email to hahnlabcafe+unsubscribe@googlegroups.com.

To post to this group, send email to hahnl...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages