hi there,
On Aug 8, 2013, at 23:51 , Avery louie wrote:
> It is also a lot simpler than the other algos, as far as i can tell. It just takes the two sequences and overlaps them, and gives you an idea of where the two overlap (spike on the graph). It doesn't need a reference sequence either.
I haven't had a close look at your code, so I'm not sure whether you've addressed this problem: But you your overlaps of forward/reverse reads might not be 100% perfect due to sequencing errors. So a naive approach might not work for each case where a more advanced alignment method might give you better overlaps.
I've just had a similar problem of aligning overlapping Illumina read pairs this week and I did the alignment using usearch (
http://www.drive5.com/usearch/) which allow for semi-global or glocal alignments. It gives you back the aligned overlaps in a format of your choice:
http://drive5.com/usearch/manual/allpairs_global.html
Just make sure that you also need to do the reverse-complement of one of the reads before aligning them. I wrote a small python script that iterated over the 2 fastq-files for the paired end Illumina reads, created simple fasta-files for each read-pair and put them into usearch. That idea might work for your data as well I think?
Cheers,
Bastian
--
// Bastian Greshake
// Zehnthofstraße 36
// 55252 Mainz-Kastel, Germany
// cell:
+49 176 213 044 66
// web:
www.ruleofthirds.de