This is a case of a different character coding between the two
analyses. See below.
On May 8, 2008, at 4:39 PM, Ulf wrote:
>
> Andres,
>
>
> Here is the script that I have used for the heuristic searches:
>
> read(prealigned:("*.aln", tcm:(1,1)),"ssuatt.fas")
> set(seed:133717, log:"all_data_search.log", root:"Corethronhystrix")
> report(timer:"search start")
Here is the difference:
>
> transform (tcm:(1,1), gap_opening:1)
In POY 4 the gap_opening cost is in addition to the individual indel
cost. So effecitvely:
A-A
AAA
Has cost 2 in your transform selection, 1 from the individual indel of
the tcm, plus 1 from the gap_opening parameter. Just use transform
(tcm:(1,1)) and things will look better.
Anyway, if some of the differences in length are due to sampling
errors, you should probably break the sequences using a multiple
sequence alignment from something like MUSCLE. That way, you can leave
the portions that are really missing, as missing data. An example of a
broken dataset is the following:
>taxon1
AAA#CCC#GGG
>taxon2
AAA#CCCCC#
>taxon3
##GGGGGG
In this case, taxon1 has 3 fragments present, taxon two is missing the
last one, and taxon 3 is missing the first two.
best,
Andres