Computing alignment with posterior decoding: error in sequence length

Skip to first unread message

Dec 5, 2018, 12:12:41 PM12/5/18
to bali-phy-users
Dear all,

I am using BAli-Phy in a new computer and I started running an analysis with a small dataset (14 sequences, the longest 139 bp, no missing data). The name is Analysis_ITS1_BAliPhy.
I executed 6 parallel runs with 50 000 generations, and everything seemed to be fine. I have not noted any problems with mixing or convergence, and I ended with high ESS values (>4500 for each parameter in each run).
I computed the 50% majority-rule consensus tree and everything is as expected.

The problems came when I tried to compute the "consensus" alignment. I used the following command to discard the first 25% of the runs as burn-in:

cut-range --skip=1250 < C1.P1.fastas | alignment-max > P1-max.fasta

And this command did not work (No such file or directory/ No alignment found).

So, I tried:

cut-range --skip=1250 < Analysis_ITS1_BAliPhy-1/C1.P1.fastas | alignment-max > P1-max.fasta

And then I got this message:

alignment-max: Error! Sequence 26: length 133 differs from expected length 129

I have tried with the C1.P1.fastas file of each folder (-1, -2, etc.), getting always the same error with different values. Interestingly, the error always appeared on sequences 15-26, which I guess correspond to the ancestral sequences (AXX).

I have also tried with the six alignments at once:

cut-range --skip=1250 < Analysis_ITS1_BAliPhy-1/C1.P1.fastas Analysis_ITS1_BAliPhy-2/C1.P1.fastas Analysis_ITS1_BAliPhy-3/C1.P1.fastas Analysis_ITS1_BAliPhy-4/C1.P1.fastas Analysis_ITS1_BAliPhy-5/C1.P1.fastas Analysis_ITS1_BAliPhy-6/C1.P1.fastas | alignment-max > P1-max.fasta

And I got the same error again (I have not found in the tutorial or other documentation how to compute the alignment from various .fastas files, this is why I tried with the previous command, but maybe it is wrong).

I have changed the value of the burn-in, with no success (same error).

The weird thing is that I have checked the .fastas files, and I do not see any difference in sequence length, neither in the original sequences nor in the ancestral sequences. I repeated the same analysis twice, in case I was getting any random error or any file was miswritten. I have also checked the sequence names, and they do not contain any space, special characters or dashes; only letters, numbers and underscores.

I am using version 3.3. in Cygwin (Windows 10).

Any idea of what I am doing wrong?

Thank you very much!

Juan Carlos

Juan Carlos Zamora Señoret

Dec 5, 2018, 12:30:13 PM12/5/18
to bali-phy-users
Just as additional information, I skipped the problem by doing all calculations using bp-analyze -skip=1250 Analysis_ITS1_BAliPhy-1/ Analysis_ITS1_BAliPhy-2/ Analysis_ITS1_BAliPhy-3/ Analysis_ITS1_BAliPhy-4/ Analysis_ITS1_BAliPhy-5/ Analysis_ITS1_BAliPhy-6/

Everything worked fine. Note that I used "-skip" instead of "--burnin", otherwise I got an error.

An additional doubt I have now is that in my original dataset one sequence had an ambigous base (W), but in the computed alignment (P1-max.fasta) this base is now an A. I guess this is because the analysis is considering either A or T (as explained in the FAQ), and then it is reporting the most sampled one. This is, however, somewhat inconvenient if the alignment is going to be used for subsequent analyses and we want to keep the uncertainty. Is there any way to prevent such change?

Thanks again!

Juan Carlos
Reply all
Reply to author
0 new messages