what is the proper use of trailing.insertion

3 views
Skip to first unread message

danjanies

unread,
May 9, 2008, 4:41:02 PM5/9/08
to poy4
Hello
I want to set up some costs like this:
read (blah)
select (foo)
set(something)
transform ((all, tcm:(1,2)))
transform ((trailing.insertion:(1)))
transform ((leading.insertion:(1)))

I'm getting an error in the second transform command
line 5 between characters
12 and 20 :
[identifiers] expected after [left_parenthesis] (in
[transform_argument])
Thanks
Dan

Andres Varon

unread,
May 12, 2008, 10:20:50 AM5/12/08
to po...@googlegroups.com
Hello Dan,


transform takes a list of arguments. Each argument can be either:

1. A transformation by itself which is interpreted to be applied to
every character, as in:

transform (tcm:(1,1))

This means that every character to which tcm:(1,1) could be applied,
should be transformed.

2. A pair enclosed in parenthesis, where the first element in the pair
is the set of characters to be transformed and the second element in
the pair is what transformation should be applied. For example:

transform ( (static, weight:4) )

I added spaces to make it more readable, but they make no difference.
Note that we have only one transform (the inner pair of parenthesis),
which should be applied only to the static homology characters, using
weight:4. Your first transform below belongs to the second class:

transform ( (all, tcm:(1,2)) )

The transform you want to apply is trailing_insertion and
trailing_deletion. This argument takes as value a list of costs,
corresponding to the cost of an insertion (or deletion) at the tail
and head of the sequences, one for each element of the alphabet.
Assuming that you are dealing with nucleotides only, the nucleotide
alphabet consists of A, C, G, T, and -, and so we would have:

transform ( (all, trailing_insertion:(1,1,1,1,0)) )

You could write them all in one line, as follows:

transform ( (all, tcm:(1,2)), (all, trailing_insertion:(1,1,1,1,0)),
(all, trailing_deletion:(1,1,1,1,0)) )

And you can further simplify this as follows:

transform (tcm:(1,2), trailing_insertion:(1,1,1,1,0),
trailing_deletion:(1,1,1,1,0))

IMPORTANT:

I must say that this trailing indels are not something that I would
recommend. If your sequences are broken in fragments then this will
modify the cost even of the tips of the internal fragments (I'm sure
you don't want this), and is a rather poor way to deal with missing
data in the tips, affecting (in a very difficult to predict way), the
overall alignment and interior regions. In addition, transform
(static_approx) has no functions to handle this kind of coding scheme
properly.

I'd rather break the sequences in proper fragments, and assign a
different weight to the fragments located at the tips.

best,

Andres

>
> Thanks
> Dan
> >

danjanies

unread,
May 12, 2008, 12:48:07 PM5/12/08
to poy4
Hi Andres
Thanks for the thoughtful answers - will try the out.
What I am trying to do is set costs similar to say -molecularmatix 111
-noleading, or -molecularmatrix (sub 1 indel 2) -noleading
Any advice specific to that or are these commands more comparable to -
trailinggap 1
Thanks
Dan
Reply all
Reply to author
Forward
0 new messages