MCMCTree "Error: file format" when summarizing MCMC samples

161 views
Skip to first unread message

Juan Manuel Vazquez

unread,
Jul 13, 2023, 10:29:04 AM7/13/23
to PAML discussion group
Hello! I've been trying to run MCMCTree on a set of 59 species, but no matter what number of sites/partitions I try I keep getting the same error: 

```
Summarizing MCMC samples . ..

Data file has a header line.

Error: file format.
text found on line 2.%
```

I've created and attached a minimal example plus the resulting output log file in the hopes someone can help me with this. I've also tried to run this using both PAML 4.10 and PAML 4.9.

mcmctree_failing.tgz
test3.txt

Sishuo Wang

unread,
Jul 13, 2023, 11:35:16 PM7/13/23
to PAML discussion group
Hi Juan,

Looking at the mcmc.txt file, seems that some values are nan and the others are all the same during mcmc. I see your in.BV file the gradients (lnl evaluated at MLE) seem unexpected. Most if not all should be zero but in ur file they are very large values as highlighted below. Do you know which programs was used to generate in.BV?

image.png

Best,
Sishuo

Juan Manuel Vazquez

unread,
Jul 15, 2023, 1:18:48 PM7/15/23
to PAML discussion group
Hi Sishuo, 

In a separate folder, I used the full dataset (3 partitions corresponding to the first, second, and third position of a codon-based alignment; 8,915,239 sites each) with the same model settings as in this mcmc control file, specifying "usedata = 3" to generate the "in.BV" file. 

The image isn't loading sadly; I can confirm though that I see some really high values in the "in.BV", and some strange priors. I am trying to generate an "out.BV" from the test dataset, and it does seem that its the case as well for the test dataset, with very weird priors as a result...

Sishuo Wang

unread,
Jul 16, 2023, 10:29:35 PM7/16/23
to PAML discussion group
Hi Juan,

Here is the figure that failed to be uploaded. In the 8th row of your in.BV, there seems sth wrong with the gradient.

help.png

I re-ran mcmctree using ur data with usedate=3 and seqtype=0 and it seemed to work properly. See the attachment. 

Could you pls try again?

Best,
Sishuo
out.BV
mcmctree1.ctl

Juan Manuel Vazquez

unread,
Jul 17, 2023, 6:37:42 AM7/17/23
to PAML discussion group
Hi Sishuo, 

I can confirm that rerunning it leads to a comparable out.BV (Seed: 707212877). However, if you use this out.BV file as in.BV for a re-run (in a clean folder) and change usedata to 3 (Seed: 67513341 & 1854917565), it fails with the same file format error... 

As a different test, I reran the analysis with usedata = 0 to see what happens, and it leads to the same failure. This led me to try to see what happens if I remove all fossil constraints from the tree (eg: '>47.8<66' at the root); in this case it leads to a new error, which is the same regardless of if I use usedata=0 or usedata=2: 
Screenshot 2023-07-17 033612.png

So it seems like there might be two different issues...

- Juan

Sandra AC

unread,
Aug 25, 2023, 6:36:26 AM8/25/23
to PAML discussion group
Hi Juan, 

It seems that you have not defined your prior on node ages without fossil calibrations (`BDparas`), your rate prior (`rgene_gamma`), nor the prior on the variation on the clock (`sigma2_gamma`):

```
         seed = -1
      seqfile = ../../../../data/genes/allbats_fasta/alignments_concat_noproblemgenes_all/allBatGenes_noproblemgenes.phy
     treefile = ../../../../data/allBats.GHOST.mcmcInputTree
      outfile = test3.txt
     mcmcfile = test3.txt

        ndata = 3
      seqtype = 0
      usedata = 2    * 0: no data; 1:seq like; 2:use in.BV; 3: out.BV
        clock = 3    * 1: global clock; 2: independent rates; 3: correlated rates

        model = 4    * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85
        alpha = 0.5  * alpha for gamma rates at sites
        ncatG = 5    * No. categories in discrete gamma

    cleandata = 0    * remove sites with ambiguity data (1:yes, 0:no)?

        print = 1
       burnin = 100
     sampfreq = 10
      nsample = 100
```

I believe that your MCMC settings may have been defined for a test run (i.e., `burnin`, `sampfreq`, `nsample`), but bear in mind that you are running "burnin + sampfreq * nsample" iterations (i.e., "100 + 10*100 = 1,100" iterations in your example). In order to save disk space, you may want to gather between 10,000 and 20,000 samples, which you can achieve by increasing your sampling frequency instead of the number of samples. I suggest you read "A biologist's guide to Bayesian phylogenetic analysis" (Nascimento et al., 2017) for more details on how to choose priors, background on MCMC in phylogenetics, MCMC diagnostics, etc. In addition, you may want to read the book chapter Mario dos Reis and Ziheng Yang wrote some years ago (dos Reis and Yang 2019), which goes together with the GitHub repository "divtime",  to learn how to better define the settings on the control file to run MCMCtree.

Cheers!
S. 

Sishuo Wang

unread,
Sep 6, 2023, 2:21:07 AM9/6/23
to PAML discussion group
hi sandra,

u r very right those rate prior parameters are missing but it seems that even if i have them in ctl there was still an error when running on juan's data. i tried some solutions but none worked.

sishuo

Sandra AC

unread,
Sep 8, 2023, 5:14:51 AM9/8/23
to PAML discussion group
Hi Sishuo and Juan, 

My previous answer just focused on the last error that Juan reported with regards to the priors getting `0` values -- I thought the other problem had been sorted, my bad! I think that the main problem may be that Juan's control file is redirecting the output from MCMCtree and the samples collected to the same output file: `outfile = test3.txt` and `mcmcfile = test3.txt`.

I suggest you use something like `outfile = out.txt` and `mcmcfile = mcmc.txt`. I ran a quick test (and also added some dummy priors for the rate and sigma2), which seemed to fix both problems -- let me know if that works for you!

P.S.: Juan, when you run your analyses, make sure that you pick the correct rate and sigma2 priors according to your data and the time unit you are using with the calibrations -- by looking at your tree file, I think that you have calibrations with time unit = 1Myr, if I am not wrong. You can take a look at how to deal with time units while setting these priors in the MCMCtree tutorial (section "Changing the time scale").

Hope this sorts out both problems!
Sandra
Reply all
Reply to author
Forward
0 new messages