A few PS2 questions...

Orion

unread,

Nov 20, 2010, 8:31:49 PM11/20/10

to csc2417-f10

1.b. I presume we need to deal with overlapping hyper-rectangles in
this one as well?

2.a. Previous messages suggest that this should be O(n) where n is the
size of the tree. This is supposed to an optimal solution as well, and
not an approximation, yes? So the simple greedy approach is
insufficient....

3. Two things seem ambiguous. In order to compare the phylogenetic
alignment and the sum-of-pairs alignment with the consensus sequence,
you must define how to extract some sort of a single sequence from
them for us to make different. In the phylogenetic alignment is it the
sequence of the root? And in the sum-of-pairs, is it the consensus
sequence of the alignment?

Thanks,
Orion

Michael Brudno

unread,

Nov 20, 2010, 9:19:10 PM11/20/10

to csc24...@googlegroups.com

On Sat, Nov 20, 2010 at 8:31 PM, Orion <orion...@gmail.com> wrote:
> 1.b. I presume we need to deal with overlapping hyper-rectangles in
> this one as well?

Yes.

> 2.a. Previous messages suggest that this should be O(n) where n is the
> size of the tree. This is supposed to an optimal solution as well, and
> not an approximation, yes? So the simple greedy approach is
> insufficient....

Yes, optimal

> 3. Two things seem ambiguous. In order to compare the phylogenetic
> alignment and the sum-of-pairs alignment with the consensus sequence,
> you must define how to extract some sort of a single sequence from
> them for us to make different. In the phylogenetic alignment is it the
> sequence of the root? And in the sum-of-pairs, is it the consensus
> sequence of the alignment?

I am not sure I understand the question. The consensus string is the
most common letter in each column. The phylogeny alignment, the
strings at all nodes are all the same lengths (i.e. they contain
gaps).

-Mike

> Thanks,
> Orion

Orion Buske

unread,

Nov 20, 2010, 9:45:05 PM11/20/10

to csc24...@googlegroups.com

On 20 Nov 2010, at 9:19 PM, Michael Brudno wrote:

>>
>> 3. Two things seem ambiguous. In order to compare the phylogenetic
>> alignment and the sum-of-pairs alignment with the consensus sequence,
>> you must define how to extract some sort of a single sequence from
>> them for us to make different. In the phylogenetic alignment is it the
>> sequence of the root? And in the sum-of-pairs, is it the consensus
>> sequence of the alignment?
>
> I am not sure I understand the question. The consensus string is the
> most common letter in each column. The phylogeny alignment, the
> strings at all nodes are all the same lengths (i.e. they contain
> gaps).

Okay, so how do we say if the consensus string is the "same" or "different" from the phylogeny alignment or the sum-of-pairs alignment. It's comparing strings against alignments, which is not straightforward. It would be a tempting solution to say that they are always different, since one is a phylogeny alignment (labeled nodes), one is a sum-of-pairs alignment (n aligned sequences), and one is a single consensus sequence. How can they every be equal?

Michael Brudno

unread,

Nov 20, 2010, 9:51:42 PM11/20/10

to csc24...@googlegroups.com

You are only comparing alignments -- alignments optimal according to
different criteria (consensus, sum of pairs, etc).

Brian

unread,

Nov 21, 2010, 4:00:44 AM11/21/10

to csc2417-f10

I think the point is that the phylogeny prescribes an alignment, via
homology. That is, in the example given:

A-CC - root
AGCC - X
A-CC - Y
T-CC - Z

and the latter 3 lines represent the alignment, because the A from X
and Y is homologous to the T in Z, and the G in X is an insertion, and
thus not homologous to any nucleotide in the other sequences.

Or for example, if I had the phylogeny

A -> GA
-> AC

Then the alignment of GA and AC would be

GA-
-AC

rather than

GA
AC

Michael Brudno

unread,

Nov 21, 2010, 8:46:46 AM11/21/10

to csc24...@googlegroups.com

Yes, that's pretty much right.

Orion

unread,

Nov 22, 2010, 1:49:35 AM11/22/10

to csc2417-f10

But, for your example, aren't there a number of more optimal
assignments of the root node? Given two leaf nodes, GA, AC, their
parent should be labeled one of {GA, AC, GC}, all of which result in a
total distance of 4. Your choice of A as the parent resulted in a
total distance of 6.

I guess I'm still confused about this case as well. Let's say you have
two leaf nodes, A, AG. And, based upon the rest of the tree, the
optimal parent is 'A'. How do we align A and AG? Is -A, AG just as
good as A-, AG? Both result in a distance of 3...

Very, very confused.

Thanks,
Orion

Michael Brudno

unread,

Nov 22, 2010, 1:53:36 AM11/22/10

to csc24...@googlegroups.com

The length of the strings assigned to all nodes is the same (once the
gaps are added). So the root node is assigned either A- or -A, and the
scores will be very different, as the alignment of A- to -A has a very
negative score.

Orion

unread,

Nov 22, 2010, 1:58:24 AM11/22/10

to csc2417-f10

Okay, the example in the handout is wrong and was confusing, since
gaps were not included in the sequence assignments, but all were the
same length.

Thanks for clearing this up. Sorry for being so dense on this problem.

Michael Brudno

unread,

Nov 22, 2010, 2:03:21 AM11/22/10

to csc24...@googlegroups.com

Yes, you are right, the hw did not have this properly explained. Apologies.

-M

Brian

unread,

Nov 22, 2010, 8:58:33 PM11/22/10

to csc2417-f10

Sorry for any confusion. My example was not the -optimal- phylogeny
(or most parsimonious, technically). It was just a random phylogeny I
came up with to demonstrate how phylogeny -> alignment. The key to the
question would be given leaf sequences and a particular tree
structure, find any/all optimal phylogenies, convert those into their
corresponding alignments, demonstrate that those alignments are
distinct from the optimal consensus/sum-of-pairs alignments.

Although I suspect that this is too late now.

Reply all

Reply to author

Forward