hw1 on the web

23 views
Skip to first unread message

Michael Brudno

unread,
Sep 29, 2010, 5:11:55 PM9/29/10
to csc2417-f10
I have updated the readong for next week, and also posted hw1 on the
website (officially it will be "out" on the coming monday). HW1 is due
October 18th.

-M

Yue Li

unread,
Oct 5, 2010, 6:28:34 PM10/5/10
to csc24...@googlegroups.com
Hi Michael,

I have a question regarding to the assignment 1 question 1 (c). It says "two reads overlap if they (have, i guess) fewer than D discrepancies (substitutions, insertion or deletions, not counting any initial and final gap)." Based on this criterion, say we set D = 3. Then two reads such as:


AAAAAAAAAT
TCCCCCCCCC

are considered as overlap since they have 0 discrepancy, but

AAAAGCTTGC
AACGTAGCAA

are not overlap since they have 3 discrepancies. So should we develop a faster algorithm (than semi-global alignment) that runs the above way or also has to take into account the overall score of the two alignment (i.e., match score - gap/mismatch penalties). If it is the latter, then shouldn't another threshold for the overall score have to be considered too?

Also, will the A1 be due next Wed or Thu? Could the submission be handwritten rather than typed?

Thanks very much,
Yue


Michael Brudno

unread,
Oct 6, 2010, 5:50:23 PM10/6/10
to csc24...@googlegroups.com
Good point. Let's say that the overlap may have at most D
discrepancies in an alignment of length L. For this problem I am only
interested in the number of discrepancies (mismatches and indels), not
the alignment score.

Let's make the homework due by 10am on thursday, in my office (Pratt
286C). If I am not around, you can also give it to Rebecca in Pratt
283.

-Mike

Farah Juma

unread,
Oct 9, 2010, 3:17:47 PM10/9/10
to csc24...@googlegroups.com
Hi Michael,

I also have a question related to 1)c). This question says that we should not count any initial or final gaps as discrepancies. Let's consider the following two reads:

AACA
       GTTT    

In this example, since we are not counting the initial or final gap, there is only 1 discrepancy (a substitution). So, for a positive integer constant D, these two reads overlap because they have at most D discrepancies. Because we are not counting the initial or final gap, wouldn't this mean that any two arbitrary reads could be considered to overlap since we could just match the last base of one read with the first base of the second read and we would trivially have at most D discrepancies? Am I missing something?

Thanks,
Farah

Frank Vanderzwet

unread,
Oct 12, 2010, 12:47:50 PM10/12/10
to csc2417-f10
Hello,

I also have the same question for question 1c). What are the
conditions for us considering something an overlap? Even if we say D=0
does having the same first and last character constitute an overlap?

Michael Brudno

unread,
Oct 12, 2010, 3:03:40 PM10/12/10
to csc24...@googlegroups.com
As I said earlier, let's assume there is also a minimum overlap length
parameter.

-M

Orion

unread,
Oct 12, 2010, 8:04:58 PM10/12/10
to csc2417-f10
Out of curiosity, does anyone actually have a solution to this? I just
want to know that it's possible. :)

Thanks!
-Orion

Michael Brudno

unread,
Oct 12, 2010, 9:57:10 PM10/12/10
to csc24...@googlegroups.com
I am willing to settle for algorithms that will work fast only on
non-low0complexity sequences (e.g. you can ignore overlaps between
strings of all A's.

Orion

unread,
Oct 13, 2010, 11:54:44 AM10/13/10
to csc2417-f10
How should be regard complexity in L? Is O(L*n) asymptotically faster
than O(n^2)? It would be very nice if it were.

Thank you,
Orion

Michael Brudno

unread,
Oct 13, 2010, 12:15:10 PM10/13/10
to csc24...@googlegroups.com
No, L should not be a parameter in the running time.
Reply all
Reply to author
Forward
0 new messages