MCMCTree fails during Counting frequencies step

161 views
Skip to first unread message

Shane Widanagama

unread,
Aug 11, 2021, 1:56:38 PM8/11/21
to PAML discussion group
Hi everyone,

When I run mcmctree using the mcmctree.ctl control file, it fails during or maybe after the 'Counting frequencies..' step, which you can see in the SLURM output file:


Lmod is automatically replacing "intel/2020.1.217" with "gcc/9.3.0".


Due to MODULEPATH changes, the following have been reloaded:
  1) openmpi/4.0.3

MCMCTREE in paml version 4.9j, February 2020

Reading options from mcmctree.ctl..
Reading master tree.
(((Ananas_comosus, Typha_latifolia), ((Brachypodium_distachyon, Oryza_sativa), Sorghum_bicolor)), Arabidopsis_thaliana);

Reading sequence data..  1 loci


*** Locus 1 ***

processing fasta file
reading seq# 1 Arabidopsis_thaliana                               1005108 sites
reading seq# 2 Brachypodium_distachyon                            1005108 sites
reading seq# 3 Oryza_sativa                                       1005108 sites
reading seq# 4 Sorghum_bicolor                                    1005108 sites
reading seq# 5 Typha_latifolia                                    1005108 sites
reading seq# 6 Ananas_comosus                                     1005108 sitesns = 6   ls = 1005108
Reading sequences, sequential format..
Reading seq # 6: Ananas_comosus        n       
Sequences read..
Counting site patterns..  0:01
Compressing, 195480 patterns at 1005108 / 1005108 sites (100.0%),  0:03
Collecting fpatt[] & pose[], 195480 patterns at 950000 / 1005108 sites (94.5%),  0:03
Error: usedata = 1 for nucleotides only.
Collecting fpatt[] & pose[], 195480 patterns at 1005108 / 1005108 sites (100.0%),  0:04
Counting frequencies..
195480 patterns, messy


This is the the mcmctree.ctl file:
          seed = 7
       seqfile = concatAln.fa
      treefile = tlprot.tree
       outfile = out.txt

         ndata = 1
       seqtype = 2  * 0: nucleotides; 1:codons; 2:AAs
       usedata = 1    * 0: no data; 1:seq like; 2:use in.BV; 3: out.BV
         clock = 2    * 1: global clock; 2: independent rates; 3: correlated rates
       RootAge = <1.0  * safe constraint on root age, used if no fossil for root.

         model = 0    * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85
         alpha = 0    * alpha for gamma rates at sites
         ncatG = 5    * No. categories in discrete gamma

     cleandata = 0    * remove sites with ambiguity data (1:yes, 0:no)?

       BDparas = 1 1 0.1    * birth, death, sampling
   kappa_gamma = 6 2      * gamma prior for kappa
   alpha_gamma = 1 1      * gamma prior for alpha

   rgene_gamma = 2 20 1   * gamma prior for overall rates for genes
  sigma2_gamma = 1 10 1   * gamma prior for sigma^2     (for clock=2 or 3)

         print = 1
        burnin = 5000
      sampfreq = 10
       nsample = 50000

*** Note: Make your window wider (100 columns) before running the program.
mcmctree.ctl (END)

Here is a tail end of the out.txt file:
    1   11    2    1   11    4    2    1    1    1    1    2  268    1    1
    1   10    1    2    1   15    1    1    6    1    1    2    3    3  175
    1    6    1  172    1    2    1   33    2    1   59    1    1    1    3
    1    1    3    2    4    3    1    1   15    1    1    1    1    1    1
    2    4    4    1    2    1    1    1    5    4    1    5    1    1    4
    2    2    9    1    2    1    4    1    2    3   76    3   24    6    1
  161   59    2    2    7    3    2    2    7   10    3    3    4    1 11609



Frequencies..
                                    A      R      N      D      C      Q      E      G      H      I      L      K      M      F      P      S      T      W      Y      V
Arabidopsis_thaliana           0.06919 0.05475 0.04068 0.05441 0.01662 0.03340 0.06817 0.06489 0.02153 0.05204 0.09725 0.06356 0.02424 0.04190 0.04739 0.09218 0.04981 0.01231 0.02695 0.06872
Brachypodium_distachyon        0.09724 0.06215 0.03261 0.05507 0.01709 0.03301 0.06184 0.07240 0.02298 0.04284 0.09841 0.05126 0.02359 0.03666 0.05647 0.08372 0.04636 0.01236 0.02510 0.06882
Oryza_sativa                   0.10005 0.06309 0.03254 0.05481 0.01690 0.03200 0.06158 0.07410 0.02309 0.04273 0.09713 0.05044 0.02361 0.03656 0.05680 0.08185 0.04555 0.01258 0.02544 0.06914
Sorghum_bicolor                0.09967 0.06285 0.03215 0.05520 0.01697 0.03304 0.06157 0.07240 0.02315 0.04216 0.09760 0.05074 0.02308 0.03639 0.05726 0.08251 0.04634 0.01240 0.02505 0.06947
Typha_latifolia                0.07718 0.05909 0.03724 0.05348 0.01753 0.03285 0.06531 0.06767 0.02273 0.05069 0.10148 0.05674 0.02366 0.04053 0.05175 0.09082 0.04554 0.01265 0.02650 0.06654
Ananas_comosus                 0.08641 0.06250 0.03589 0.05287 0.01692 0.03103 0.06583 0.06937 0.02235 0.04731 0.10124 0.05425 0.02207 0.03940 0.05523 0.08870 0.04394 0.01254 0.02591 0.06623

Homogeneity statistic: X2 = 0.02248 G = 0.02267 

Average                        0.08813 0.06071 0.03522 0.05430 0.01701 0.03256 0.06408 0.07009 0.02264 0.04635 0.09889 0.05454 0.02337 0.03860 0.05412 0.08670 0.04626 0.01247 0.02583 0.06814
(Ambiguity characters are used to calculate freqs.)

And this is the newick file I used as the tree:
6 1
(((Ananas_comosus,Typha_latifolia),((Brachypodium_distachyon,Oryza_sativa),Sorghum_bicolor) '>.42<.52'),Arabidopsis_thaliana);

Ziheng

unread,
Nov 3, 2021, 8:09:54 AM11/3/21
to PAML discussion group
Perhaps the program is still running, using CPU, for example.
however, with more than 1M sites, the exact likelihood calculation will be too slow.  you need to use the approximate-likelihood calculation.  
take a look at this tutorial :
  
best, ziheng
Reply all
Reply to author
Forward
0 new messages