Can't get both best tree & distances?

閲覧: 369 回
最初の未読メッセージにスキップ

bigdoyle

未読、
2017/05/04 17:06:152017/05/04
To: raxml
Hi 

New to using RAxML. I must be missing something obvious, but I can't get both a best tree and pair-wise distances from the same command. I've repeated this with two versions of raxmlHPC-PTHREADS-AVX (8.0.0 & 8.2.10); this is running on a Mac Pro 3 GHz 8-Core Intel Xeon E5

My assumption was that the '-f x' option would add a distances file to the output. But, that's not what I get.

Please advise, Thanks!
-Doyle

-------------
Details
-------------

I have tried this with a large data set ~600 sample MSA of ~250,000nt sequences (hours, both 2 or 6 theads), and a small set of ~10 samples (seconds, both 1 thread or 2 threads). 

When I execute:
RAXML -s ../snp.phylip -m GTRGAMMA -n out -T 4 -f x -p 1234

I get files:
RAxML_distances.out
RAxML_info.out
RAxML_parsimonyTree.out

The last few lines of the log reads like this:

RAxML was called as follows:
RAXML -s ../snp.phylip -m GTRGAMMA -n out -T 4 -f x -p 1234
Log Likelihood Score after parameter optimization: -6391855.062250
Computing pairwise ML-distances ...
Time for pair-wise ML distance computation of 218791 distances: 569.084506 seconds
Distances written to file: /Volumes/CMR_30T/raxml_SA638+ref/raxml_3/RAxML_distances.sa_snps3

When I execute:
RAXML -s ../snp.phylip -m GTRGAMMA -n out -T 4 -p 1234

I get files:
RAxML_bestTree.out
RAxML_log.out
RAxML_result.out
RAxML_info.out
RAxML_parsimonyTree.out

The last few lines of the log reads like this:

RAxML was called as follows:
/Applications/raxmlHPC-AVX-v8/raxml -s core_snp.phylip -m GTRGAMMA -n out -T 2 -p 1234 
Partition: 0 with name: No Name Provided
Base frequencies: 0.260 0.234 0.226 0.280 
Inference[0]: Time 0.670686 GAMMA-based likelihood -18496.720872, best rearrangement setting 5
alpha[0]: 1000.000000 rates[0] ac ag at cg ct gt: 0.772393 3.746670 1.359205 0.291902 3.546292 1.000000 
Conducting final model optimizations on all 1 trees under GAMMA-based models ....
Inference[0] final GAMMA-based Likelihood: -18493.101949 tree written to file /Users/WardD/Documents/WORK/UMMS/_ACTIVE_PROJECTS/Ellison/raxml/test2/RAxML_result.out
Starting final GAMMA-based thorough Optimization on tree 0 likelihood -18493.101949 .... 
Final GAMMA-based Score of best tree -18493.101949
Program execution info written to /Users/WardD/Documents/WORK/UMMS/_ACTIVE_PROJECTS/Ellison/raxml/test2/RAxML_info.out
Best-scoring ML tree written to: /Users/WardD/Documents/WORK/UMMS/_ACTIVE_PROJECTS/Ellison/raxml/test2/RAxML_bestTree.out
Overall execution time: 1.013804 secs or 0.000282 hours or 0.000012 days

Alexandros Stamatakis

未読、
2017/05/05 3:47:462017/05/05
To: ra...@googlegroups.com
page 23 in the manual
(http://sco.h-its.org/exelixis/resource/download/NewManual.pdf) should
answer your question about how those distance are computed.

There must also be some previous posts about this option on the google
grpup if I remember well.

Alexis

On 04.05.2017 23:06, bigdoyle wrote:
> Hi
>
> New to using RAxML. I must be missing something obvious, but I can't get
> both a best tree and pair-wise distances from the same command. I've
> repeated this with two versions of raxmlHPC-PTHREADS-AVX (8.0.0 &
> 8.2.10); this is running on a Mac Pro 3 GHz 8-Core Intel Xeon E5
>
> My assumption was that the '-f x' option would add a distances file to
> the output. But, that's not what I get.
>
> Please advise, Thanks!
> -Doyle
>
> -------------
> Details
> -------------
>
> I have tried this with a large data set ~600 sample MSA of ~250,000nt
> sequences (hours, both 2 or 6 theads), and a small set of ~10 samples
> (seconds, both 1 thread or 2 threads).
>
> *When I execute:*
> RAXML -s ../snp.phylip -m GTRGAMMA -n out -T 4 -f x -p 1234
>
> *I get files:*
> RAxML_distances.out
> RAxML_info.out
> RAxML_parsimonyTree.out
>
> *The last few lines of the log reads like this:*
> *
> *
> RAxML was called as follows:
> RAXML -s ../snp.phylip -m GTRGAMMA -n out -T 4 -f x -p 1234
> Log Likelihood Score after parameter optimization: -6391855.062250
> Computing pairwise ML-distances ...
> Time for pair-wise ML distance computation of 218791 distances:
> 569.084506 seconds
> Distances written to file:
> /Volumes/CMR_30T/raxml_SA638+ref/raxml_3/RAxML_distances.sa_snps3
>
> *When I execute:*
> RAXML -s ../snp.phylip -m GTRGAMMA -n out -T 4 -p 1234
>
> *I get files:*
> RAxML_bestTree.out
> RAxML_log.out
> RAxML_result.out
> RAxML_info.out
> RAxML_parsimonyTree.out
>
> *The last few lines of the log reads like this:*
>
> RAxML was called as follows:
> /Applications/raxmlHPC-AVX-v8/raxml -s core_snp.phylip -m GTRGAMMA -n
> out -T 2 -p 1234
> Partition: 0 with name: No Name Provided
> Base frequencies: 0.260 0.234 0.226 0.280
> Inference[0]: Time 0.670686 GAMMA-based likelihood -18496.720872, best
> rearrangement setting 5
> alpha[0]: 1000.000000 rates[0] ac ag at cg ct gt: 0.772393 3.746670
> 1.359205 0.291902 3.546292 1.000000
> Conducting final model optimizations on all 1 trees under GAMMA-based
> models ....
> Inference[0] final GAMMA-based Likelihood: -18493.101949 tree written to
> file
> /Users/WardD/Documents/WORK/UMMS/_ACTIVE_PROJECTS/Ellison/raxml/test2/RAxML_result.out
> Starting final GAMMA-based thorough Optimization on tree 0 likelihood
> -18493.101949 ....
> Final GAMMA-based Score of best tree -18493.101949
> Program execution info written to
> /Users/WardD/Documents/WORK/UMMS/_ACTIVE_PROJECTS/Ellison/raxml/test2/RAxML_info.out
> Best-scoring ML tree written to:
> /Users/WardD/Documents/WORK/UMMS/_ACTIVE_PROJECTS/Ellison/raxml/test2/RAxML_bestTree.out
> Overall execution time: 1.013804 secs or 0.000282 hours or 0.000012 days
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org
メッセージは削除されました

Sergios-Orestis Kolokotronis

未読、
2017/05/05 15:05:012017/05/05
To: raxml
Hi Doyle,
like Alexis said, the manual explains everything. 

I am running v8.2.9. The pairwise distance command is not designed to yield the best tree estimate and distances in one go. It'll give you distances based on an MP tree, but there's no guarantee that this topology will be the same as the ML tree (esp. after a more efficient search with multiple starting trees using the -N flag).

I prefer to estimate a best-known ML tree first and then feed that into the -f x command with -t [bestML.tre]. Your first command line estimated distances on a random-addition stepwise MP tree. If you are happy with that tree topology, then you will find your distances in the relevant output file. That should be RAxML_distances.out. The log you cited below refers to RAxML_distances.sa_snps3 located in directory /Volumes/CMR_30T/raxml_SA638+ref/raxml_3/. I'm not sure why the filenames would be different. Maybe you typed a different command line or pasted from a different output file? In any case, in one of those output files you will find your distances. You can visualize them by building histograms, heatmaps, do an ordination (e.g. PCA), etc.

Regarding your 2nd command line, you ran a simple ML search with one replicate. In the absence of most commands (too many to list here), RAxML defaults to a standard ML search with -f d -N 1. Meaning: a command line listing only the alignment, the output filename, the number of threads, and the random seed number implies the following commands: -f d -N 1.

One last note: I noticed "snps" in your alignment filename. If these are true SNPs indeed, then the standard GTRGAMMA model is unsuitable. You should use the model accommodating ascertainment bias correction: ASC_GTRGAMMA.

Have fun,
Sergios

Alexandros Stamatakis

未読、
2017/05/05 15:27:012017/05/05
To: ra...@googlegroups.com
Thank you Sergio :-)

By the way, the parsimony tree in the distance computation is only used
to estimate the likelihood model parameters which are not very different
from those of the best-known ML tree.

The distances are then calculated in a pair-wise fashin between pairs of
sequences by just optimizing the likelihood of the branch length between
them.

Alexis



On 05.05.2017 21:05, Sergios-Orestis Kolokotronis wrote:
> Hi Doyle,
> like Alexis said, the manual explains everything.
>
> I am running v8.2.9. The pairwise distance command is not designed to
> yield the best tree estimate _and_ distances in one go. It'll give you
> distances based on an MP tree, but there's no guarantee that this
> topology will be the same as the ML tree (esp. after a more efficient
> search with multiple starting trees using the -N flag).
>
> I prefer to estimate a best-known ML tree first and _then_ feed that
> into the -f x command with -t [bestML.tre]. Your first command line
> estimated distances on a random-addition stepwise MP tree. If you are
> happy with that tree topology, then you will find your distances in the
> relevant output file. That should be RAxML_distances.out. The log you
> cited below refers to RAxML_distances.sa_snps3 located in
> directory /Volumes/CMR_30T/raxml_SA638+ref/raxml_3/. I'm not sure why
> the filenames would be different. Maybe you typed a different command
> line or pasted from a different output file? In any case, in one of
> those output files you will find your distances. You can visualize them
> by building histograms, heatmaps, do an ordination (e.g. PCA), etc.
>
> Regarding your 2nd command line, you ran a simple ML search with one
> replicate. In the absence of most commands (too many to list here),
> RAxML defaults to a standard ML search with -f d -N 1. Meaning: a
> command line listing only the alignment, the output filename, the number
> of threads, and the random seed number implies the following
> commands: -f d -N 1.
>
> One last note: I noticed "snps" in your alignment filename. If these are
> true SNPs indeed, then the standard GTRGAMMA model is unsuitable. You
> should use the model accommodating ascertainment bias
> correction: ASC_GTRGAMMA.
>
> Have fun,
> Sergios
>

bigdoyle

未読、
2017/05/08 11:09:252017/05/08
To: raxml


On Friday, May 5, 2017 at 3:05:01 PM UTC-4, Sergios-Orestis Kolokotronis wrote:
Hi Doyle,
like Alexis said, the manual explains everything. 

Its a remarkably good manual, but a lot for the n00b to absorb. Every time I read the manual I stitch a little more together.
 

I am running v8.2.9. The pairwise distance command is not designed to yield the best tree estimate and distances in one go. It'll give you distances based on an MP tree, but there's no guarantee that this topology will be the same as the ML tree (esp. after a more efficient search with multiple starting trees using the -N flag).
I prefer to estimate a best-known ML tree first and then feed that into the -f x command with -t [bestML.tre].

Ah. Thank you. Didn't realize it needed to be performed step-wise.

 
Your first command line estimated distances on a random-addition stepwise MP tree. If you are happy with that tree topology, then you will find your distances in the relevant output file. That should be RAxML_distances.out. The log you cited below refers to RAxML_distances.sa_snps3 located in directory /Volumes/CMR_30T/raxml_SA638+ref/raxml_3/. I'm not sure why the filenames would be different. Maybe you typed a different command line or pasted from a different output file? In any case, in one of those output files you will find your distances. You can visualize them by building histograms, heatmaps, do an ordination (e.g. PCA), etc.

My apologizes. Tried to clean up my commands for clarity, but did a lazy job of it.
 

Regarding your 2nd command line, you ran a simple ML search with one replicate. In the absence of most commands (too many to list here), RAxML defaults to a standard ML search with -f d -N 1. Meaning: a command line listing only the alignment, the output filename, the number of threads, and the random seed number implies the following commands: -f d -N 1.


One last note: I noticed "snps" in your alignment filename. If these are true SNPs indeed, then the standard GTRGAMMA model is unsuitable. You should use the model accommodating ascertainment bias correction: ASC_GTRGAMMA.

Very helpful advice, good catch, and very much appreciated. Yes, I have reduced the input sequences to a MSA of core snp-only variants. 

So, assuming I have digested correctly--if I want a best tree, +bootstrap support, and the distances from a large core snp alignment... I would do

RAXML -s input.phylip -m ASC_GTRGAMMA -n output -T 6 -f a -N autoMRE -x 12345 -p 12345

then

RAXML -s input.phylip -t best.tre -m ASC_GTRGAMMA -n output -f x 


Have fun,
Sergios


:-)
-Doyle 

bigdoyle

未読、
2017/05/08 11:32:092017/05/08
To: raxml
correction to prior post. first command would be as follows with --asc-corr ?

RAXML -s input.phylip -m ASC_GTRGAMMA --asc-corr=lewis -n output -T 6 -f a -N autoMRE -x 12345 -p 12345 

Alexandros Stamatakis

未読、
2017/05/08 14:53:382017/05/08
To: ra...@googlegroups.com
yes that looks good :-)

alexis

On 08.05.2017 17:32, bigdoyle wrote:
> correction to prior post. first command would be as follows with
> --asc-corr ?
>
> RAXML -s input.phylip -m ASC_GTRGAMMA --asc-corr=lewis -n output -T 6 -f
> a -N autoMRE -x 12345 -p 12345
>
>
> On Monday, May 8, 2017 at 11:09:25 AM UTC-4, bigdoyle wrote:
>
>
>
> On Friday, May 5, 2017 at 3:05:01 PM UTC-4, Sergios-Orestis
> Kolokotronis wrote:
>
> Hi Doyle,
> like Alexis said, the manual explains everything.
>
>
> Its a remarkably good manual, but a lot for the n00b to absorb.
> Every time I read the manual I stitch a little more together.
>
>
>
> I am running v8.2.9. The pairwise distance command is not
> designed to yield the best tree estimate _and_ distances in one
> go. It'll give you distances based on an MP tree, but there's no
> guarantee that this topology will be the same as the ML tree
> (esp. after a more efficient search with multiple starting trees
> using the -N flag).
>
> I prefer to estimate a best-known ML tree first and _then_ feed
メッセージは削除されました

bigdoyle

未読、
2017/05/12 16:23:102017/05/12
To: raxml
Sorry. Back again. I can't figure out where I'm going wrong...I get a a pair-wise distance matrix but all the distances in matrix are identical. I must be missing something fundamental, but cant see what. Please advise.  --Doyle

The details:
I'm building a core snp tree from E. faecium isolates and reference genomes. I built a beautiful tree with the command:

RAXML -s ENTFM_snps.phylip -m ASC_GTRGAMMA --asc-corr=lewis -n ENTFM -f a -N autoMRE -x 12545 -p 12545 

The tree file clearly has distances:
(((2014-VREF-41:0.00312082176755099127,2014-VREF-63:0.00293746320508300306):0.00192845758851553249,2014-VREF-268:0.00338235995718797313):0.00595338011022543325,(((
etc. etc.

Then I attempted to obtain the pair-wise distances with the command:

RAXML -s ENTFM_snps.phylip -t RAxML_bestTree.ENTFM -m ASC_GTRGAMMA --asc-corr=lewis -n ENTFM_dist -f x 

and every pairwise distance in the output is the same... here's a portion of the output for refseq reference genomes:
2014-VREF-114 2014-VREF-268 0.000001
2014-VREF-114 2014-VREF-41 0.000001
2014-VREF-114 2014-VREF-63 0.000001
2014-VREF-114 64-3 0.000001
2014-VREF-114 Aus0004 0.000001
2014-VREF-114 AUS0085 0.000001
2014-VREF-114 DO 0.000001
2014-VREF-114 E1 0.000001
2014-VREF-114 E39 0.000001
2014-VREF-114 E745 0.000001
2014-VREF-114 Ef-aus00233 0.000001
2014-VREF-114 EFE10021 0.000001
2014-VREF-114 ISMMS-VRE-11 0.000001
2014-VREF-114 ISMMS-VRE-12 0.000001
2014-VREF-114 ISMMS-VRE-1 0.000001
2014-VREF-114 ISMMS-VRE-7 0.000001
2014-VREF-114 ISMMS-VRE-9 0.000001
2014-VREF-114 NRRL-B-2354 0.000001
2014-VREF-114 strain6E6 0.000001


the info file for distances reads:
Alignment has 19512 distinct alignment patterns
Proportion of gaps and completely undetermined characters in this alignment: 0.00%
RAxML Computation of pairwise distances
Using 1 distinct models/data partitions with joint branch length optimization
All free model parameters will be estimated by RAxML
GAMMA model of rate heterogeneity, ML estimate of alpha-parameter
GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units
Partition: 0
Alignment Patterns: 19512
Name: No Name Provided
DataType: DNA
Substitution Matrix: GTR
Correcting likelihood for ascertainment bias

RAxML was called as follows:
/Applications/RAxML_8/RAXML -s ENTFM_snps.phylip -t RAxML_bestTree.ENTFM -m ASC_GTRGAMMA --asc-corr=lewis -n ENTFM_dist -f x 

Log Likelihood Score after parameter optimization: -1549178.924832
Computing pairwise ML-distances ...
Time for pair-wise ML distance computation of 3403 distances: 12.759323 seconds
Distances written to file: RAxML_distances.ENTFM_dist

Alexandros Stamatakis

未読、
2017/05/14 4:56:392017/05/14
To: ra...@googlegroups.com
Hm, could you maybe run the distance calculation without ascertainment
bias correction and tell me what distances you get?

It might well be that the pair-wise distance calculation is not properly
implemented for ascertainment bias correction models.

alexis

On 12.05.2017 21:23, bigdoyle wrote:
> Sorry. Back again. I can't figure out where I'm going wrong...I get a a
> pair-wise distance matrix but all the distances in matrix are identical.
> I must be missing something fundamental, but cant see what. Please
> advise. --Doyle
>
> *The details:*
> I'm building a core snp tree from E. faecium isolates and reference
> genomes. I built a beautiful tree with the command:
>
> RAXML -s ENTFM_snps.phylip -m ASC_GTRGAMMA --asc-corr=lewis -n ENTFM -f
> a -N autoMRE -x 12545 -p 12545
>
> *The tree file clearly has distances:*
> (((2014-VREF-41:0.00312082176755099127,2014-VREF-63:0.00293746320508300306):0.00192845758851553249,2014-VREF-268:0.00338235995718797313):0.00595338011022543325,(((
> etc. etc.
>
> Then I attempted to obtain the pair-wise distances with the command:
>
> RAXML -s ENTFM_snps.phylip -t RAxML_bestTree.ENTFM -m ASC_GTRGAMMA
> --asc-corr=lewis -n ENTFM_dist -f x
>
> *and every pairwise distance in the output is the same*... here's a
> > an email to raxml+un...@googlegroups.com <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
> Adjunct Professor, Dept. of Ecology and Evolutionary Biology,
> University
> of Arizona at Tucson
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>

bigdoyle

未読、
2017/05/16 15:54:402017/05/16
To: raxml
That appears to be the problem. I just ran
RAXML -s ENTFM_snps.phylip -t RAxML_bestTree.ENTFM -m GTRGAMMA --asc-corr=lewis -n ENTFM_dist_redo -f x

and the distances appear to be good. I suppose I could have left out --asc-corr in the command.
2014-VREF-114 2014-VREF-268 0.021516
2014-VREF-114 2014-VREF-41 0.022170
2014-VREF-114 2014-VREF-63 0.021959
2014-VREF-114 64-3 0.079944
2014-VREF-114 Aus0004 0.019694
2014-VREF-114 AUS0085 0.061360
2014-VREF-114 DO 0.035544

I'm not knowledgeable enough to gauge whether there's any significant impact on the reported distances with or without ASC. But, I think this should serve my needs quite well. Thanks!

-Doyle

Alexandros Stamatakis

未読、
2017/05/17 3:13:312017/05/17
To: ra...@googlegroups.com
okay, that looks reasonable, I'll make a note that ASC gorrection is not
implemented for distance calculations.

cheers,

alexis

On 16.05.2017 20:54, bigdoyle wrote:
> That appears to be the problem. I just ran
> *RAXML -s ENTFM_snps.phylip -t RAxML_bestTree.ENTFM -m GTRGAMMA
> --asc-corr=lewis -n ENTFM_dist_redo -f x*
>
> and the distances appear to be good. I suppose I could have left out
> --asc-corr in the command.
> *2014-VREF-114 2014-VREF-268 0.021516*
> *2014-VREF-114 2014-VREF-41 0.022170*
> *2014-VREF-114 2014-VREF-63 0.021959*
> *2014-VREF-114 64-3 0.079944*
> *2014-VREF-114 Aus0004 0.019694*
> *2014-VREF-114 AUS0085 0.061360*
> *2014-VREF-114 DO 0.035544*
全員に返信
投稿者に返信
転送
新着メール 0 件