Re: CSC 2417 A1

80 views
Skip to first unread message

Michael Brudno

unread,
Oct 14, 2010, 12:42:20 PM10/14/10
to csc24...@googlegroups.com
> I have some question regarding the assignment. In general I have spent a lot
> of time and gotten no where on a few question and feel I lack the intuition.
> I know you are on vacation and some of these question may be better to ask
> in person but I feel that would provide too little time.
>
> 1. (c) Finding overlaps:
> Q : We never covered efficient algorithms for string overlap in general. I
> know of KMP but is there others that may be useful or used in practice?

Knuth-Morris-Pratt is not relevant here. We want an alignment, which
allows for indels, and not a common substring. Hint on this problem,
as people seem to be struggling: the answer is discussed elsewhere in
the homework!

> 2. (b) In the linear space alignment, the original problem of size mn is
> reduced to two subproblems
> of sizes km/2 and (n-k)m/2. In a fast, parallel implementation of sequence
> alignment, it is desirable to have a balanced partitioning that breaks the
> original problem
> into sub-problems of equal sizes. Design a linear space alignment algorithm
> with
> balanced partitioning.
>
> Q : What should be the runtime?

O(NM), the same as for usual Hirschberg's

> 3. (a) The score of a local alignment is not normalized over the length of
> the matching
> region. As a result, a local alignment with score 100 and length 100 will be
> chosen over a
> local alignment with score 99 and length 10, although the latter one is
> probably more
> important biologically. To reflect the length of the local alignment in
> scoring, the score
> F(I,J) of local alignment involving substrings I and J may be adjusted by
> dividing F(I,J)
> by the total length of the aligned regions: F(I,J)/(|I|+|J|). The normalized
> local alignment
> problem is to find substrings I and J that maximize F(I,J)/(|I|+|J|) among
> all substrings I
> and J with |I|+|J| >= k, where k is a threshold for the minimum overall
> length of I and J.
> Devise an algorithm for solving the normalized local alignment problem.
>
> Q : What should be the target runtime?

As fast as you can make it. O(NM) is prefered.

> 3. (d) Bacterial DNA is often organized into circular molecules. Given two
> strings x and y
> of length n and m, respectively, there are n circular shifts of x, and m
> circular shifts of y;
> therefore, there are nm pairs of circular shifts.
> Build an efficient algorithm to find the best global alignment among the nm
> different
> pairs of circular shifts. The trivial O(n2m2) algorithm is not good enough.
> A cubic time
> solution (i.e. O(n^2m)) will only get partial credit.
>
> Q : I have no intuition on how to solve this problem. Is there any hints you
> could give on this problem?

Think about whether two "optimal" alignments between distinct start
and end points can ever intersect.

Reply all
Reply to author
Forward
0 new messages