嚙調士嚙論歹蕭f嚙踝蕭 嚙篆嚙褐綽蕭 (2014/5/27)
嚙瞎嚙諍大嚙褒賂蕭T嚙線嚙緹嚙褒系嚙調士嚙褒佗蕭f嚙踝蕭
嚙褒生姓嚙磕嚙瘦嚙篆嚙褐綽蕭
嚙踝蕭伀訇癒G嚙踝蕭嚙窮嚙瞑嚙請授、嚙踝蕭q嚙請梧蕭
嚙篆嚙調委嚙踝蕭G嚙踝蕭嚙窮嚙瞑嚙請授、嚙踝蕭q嚙請授、嚙踝蕭嚙論教授、
嚙箱嚙璀嚙踝蕭嚙請授、嚙踝蕭禮嚙請授、嚙踝蕭嚙踝蕭嚙緹嚙請梧蕭
嚙踝蕭嚙踝蕭伅嚙踝蕭G103 嚙羯 05 嚙踝蕭 27 嚙踝蕭 (嚙瞑嚙踝蕭嚙瘦) 14:00-16:00
嚙篆嚙調地嚙瘢嚙瘦嚙踝蕭q嚙稽550嚙罵議嚙踝蕭
嚙篆嚙踝蕭嚙瘩嚙諍:Efficient Exact and Approximate String Matching Algorithms
嚙論歹蕭K嚙緯嚙瘦
In this thesis, we first propose two algorithms for exact string matching
problem, which aim to find all the positions i's in a given text where a
given pattern occurs. Our algorithms find the optimal selective comparing
order of the characters of the pattern so that we could have a better
performance in the searching phase. To find the optimal comparing order, we
adopt the branch and bound approach. Moreover, our proposed algorithm can be
combined with other existing exact string matching algorithms to improve the
searching efficiency. The experimental results show that our algorithms
indeed have the smallest number of character comparisons and are also
efficient in time as compared with other existing exact string matching
algorithms.
Second, we propose a new filtration algorithm, as well as a hybrid filtration
strategy, to efficiently solve the approximate string matching problem (also
called the k-difference problem), which aims to find all the positions i's in
a given text such that there exists a substring of the text ending at
position i whose edit distance from a given pattern is less than or equal to
a given error bound k. Our experimental results on simulated datasets of DNA
sequences show that when compared with other filtration algorithms, our
filtration algorithm has better performance on the efficiency to filter out
those positions of the text at which the pattern does not occur
approximately. Moreover, our hybrid filtration strategy further improves the
effectiveness of our filtration algorithm.
Third, we propose a progressive approach to solve the DNA resequencing
problem which is defined as follows: We are given an unknown DNA sequence X
and a known reference sequence R. Our task is to see whether X and R are
similar or not. The present popular approach is to break up X into
subsequences by the next generation sequencing (NGS) technologies, called
reads. We then map the reads of X onto R with a suitable error bound.
However, if the similarity between X and R is not very high (<95%), there
would be many reads unmapped, and we then cannot obtain the mutations inside
the unmapped regions. One can use a large error bound to increase the number
of reads mapped. But it is not a good solution because increasing error
bound will also increase the probability of false positive mapping. Our
approach would use a small error bound and to increase the number of reads
mapped, our approach modify R each time after the reads are mapped. Thus our
approach is a progressive approach. Compared with other available tools, our
approach allows us to be able to map more reads to the reference sequence.
In our simulated experiments, we also show the high correctness of our
mapping algorithm.
嚙瘩嚙緯嚙諛作嚙瘦
Journal Papers
1. Chia Wei Lu, Chin Lung Lu and R.C.T. Lee*, A new filtration method and a hybrid strategy for approximate string matching,Theoretical Computer Science, 2013, vol.481, pp. 9-17. [SCI, 5-Year Impact
Factor: 0.697]
2. Chih-Chin Liu, Chuan Yi Tang, Han-Yueh Kuo, Chia-Wei Lu, Kai-Chih Chang and
Ming-Li Liou*, The origin of Acinetobacter baumannii TYTH-1: a comparative genomics study, International Journal of Antimicrobial Agents, 2013, vol. 41, no. 4, pp.
318-324. [SCI, 5-Year Impact Factor: 3.896]
3. Ming-Li Liou, Chih-Chin Liu, Chia-Wei Lu, Ming-Feng Hsieh, Kai-Chih Chang,
Han-Yueh Kuo, Chi-Ching Lee, Chun-Tien Chang, Cheng-Yao Yang and Chuan Yi
Tang*, Genome sequence of Acinetobacter baumannii TYTH-1, Journal of Bacteriology, 2012, vol. 194, no. 24, p. 6974. [SCI, 5-Year Impact
Factor: 3.586]
4. Li-Fang Chou, Yu-Tin Chen, Chia-Wei Lu, Yi-Ching Ko, Chuan-Yi Tang, Ming-Jeng
Pan, Ya-Chung Tian, Cheng-Hsun Chiu, Cheng-Chieh Hung and Chih-Wei Yang*, Sequence of Leptospira santarosai serovar Shermani genome and prediction of
virulence-associated genes, GENE, 2012, vol. 511, no. 2, pp. 364-370. [SCI, 5-Year Impact Factor: 2.371]
Conference Papers
1. Chia Wei Lu and R. C. T. Lee*, An Exact String Matching Algorithm Based upon Selective Matching Order and
Branch and Bound Approach, in: Proceedings of the 30th Workshop on Combinatorial Mathematics and
Computation Theory, 2013, pp. 131-137.
2. Chia Wei Lu, Chin Lung Lu and R.C.T. Lee*, A new filtration method and a hybrid strategy for approximate string matching, in: Proceedings of the International Computer Symposium (ICS 2012), Advances
in Intelligent Systems and Applications, Springer-Verlag, vol. 1, 2012, pp.
143-155.
3. Chia Wei Lu and R. C. T. Lee*, A New Filtration Method Based on the Locality Property for Approximate String
Matching, in: Proceedings of the 29th Workshop on Combinatorial Mathematics and
Computation Theory, 2012, pp. 52-59.
4. Chia Wei Lu, Chuan Yi Tang and R. C. T. Lee*, A Progressive Strategy for DNA Resequencing Problem, in: Proceedings of the 27th Workshop on Combinatorial Mathematics and
Computation Theory, 2010, pp. 45-49.
5. Chia Wei Lu, Chun-Yuan Lin*, Chuan-Yi Tang and Wing-Kai Hon, An algorithm for the multiple patterns approximate string matching problem, in: Proceedings of the 26th Workshop on Combinatorial Mathematics and
Computation Theory, 2009, pp. 36-42.
--
[;32m嚙踝蕭 Origin: 嚙踝蕭嚙踝蕭嚙賣站<
bbs.cs.nthu.edu.tw>
[;32m嚙踝蕭 From: cwlu @
r717g.cs.nthu.edu.tw [m