When: Thursday, March 31, 14.00-15.00
Location: bldg. 306 aud 38
Title: Efficient Dynamic Programming for Large Scale Biological
Sequence Comparison
Speaker: Morten Støckel
Supervisor: Philip Bille
External examiner: Thore Husfeldt, ITU
Abstract:
This thesis deals with the classical algorithmic problem of
\emph{approximate pattern matching}, which is a way of measuring
similarity between two input strings by an approximate metric.
Algorithms that solve instances of this class of problems is of
great interest, as there are many applications within
bioinformatics in particular, where the approximate similarity
measurement is used to determine similarity between biological
sequences such as protein and DNA.
As the algorithms used to solve the approximate pattern matching
are to be used on large input, new algorithms that improve upon
the performance bounds of previous solutions are of great research
interest.
This thesis presents a brief survey of classic as well as state of
the art dynamic programming algorithms for approximate pattern
matching. The main result of the thesis is a new algorithm which
combines two state of the art algorithms to create a hybrid of the
two, which is shown to have equal or better performance bounds
than both of them, analyzed using two models of computation; The
RAM model and the I/O model. Finally practical experiments
including all algorithms mentioned in the thesis show that the
algorithm presented in the thesis perform the best for nearly all
input sizes across several processor architectures.