DNA alignment tool (better than MS word)

Avery louie

unread,

Aug 8, 2013, 6:14:03 PM8/8/13

to diy...@googlegroups.com, diybio...@googlegroups.com

A few months ago, I asked "hey, how do I align overlapping forward and reverse reads", and the answer was to either:

1) use clustalW and a reference sequence

2) use MS word to allign them, using a combination of your eyes and the search function

I wrote a little python script to help you do something like #2 by showing you what amount overlap produces the largest number of matching basepairs and the largest "run" of matching basepairs.

All the details and code can be had here.

Let me know what you think of it, and feel free to hack the output to be more useful. Right now it's kind of grotesque and ugly.

Nathan McCorkle

unread,

Aug 8, 2013, 6:34:34 PM8/8/13

to diybio

So what is different with this than using biopython and phred phrap consed?

> --
> -- You received this message because you are subscribed to the Google Groups
> DIYbio group. To post to this group, send email to diy...@googlegroups.com.
> To unsubscribe from this group, send email to
> diybio+un...@googlegroups.com. For more options, visit this group at
> https://groups.google.com/d/forum/diybio?hl=en
> Learn more at www.diybio.org
> ---
> You received this message because you are subscribed to the Google Groups
> "DIYbio" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to diybio+un...@googlegroups.com.
> To post to this group, send email to diy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/diybio.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/diybio/CAL4KOmiNccTY6MBj3izgTaiDw2TJwxegY33kBbEs%3D6iVPT7TAg%40mail.gmail.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
-Nathan

Nathan McCorkle

unread,

Aug 8, 2013, 6:39:09 PM8/8/13

to diybio

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc78

--
-Nathan

Avery louie

unread,

Aug 8, 2013, 6:51:55 PM8/8/13

to diy...@googlegroups.com

I guess the main differences is that my documentation page is about a page of large text, and its mostly for forward/reverse reads in FASTA format. I did it because it was faster and more fun for me to write than figuring out what algorithm to use.

It is also a lot simpler than the other algos, as far as i can tell. It just takes the two sequences and overlaps them, and gives you an idea of where the two overlap (spike on the graph). It doesn't need a reference sequence either.

--A

To view this discussion on the web visit https://groups.google.com/d/msgid/diybio/CA%2B82U9JM6ihEDDSxrgpUR72EcifdtqXaJ4v0xjt%3DArPMqkSybA%40mail.gmail.com.

John Griessen

unread,

Aug 8, 2013, 7:08:56 PM8/8/13

to diy...@googlegroups.com

On 08/08/2013 05:51 PM, Avery louie wrote:
> a lot simpler than the other algos, as far as i can tell. It just takes the two sequences and overlaps them, and gives you an
> idea of where the two overlap (spike on the graph). It doesn't need a reference sequence either.

Sounds like an app in the conceptual phases... A visual method would be create strip charts that can be dragged and dropped
in slots, then shifted (by mouse or track pad or ??) in the slots until that spike is roughly located, then magnifies image and
changes mouse drag sensitivity so you can line up better.

Avery louie

unread,

Aug 8, 2013, 7:11:54 PM8/8/13

to diy...@googlegroups.com

Thats how my idea started, I thought it would be easy to allign if you could just color the letters and drag them. Thats a little harder than some list comprehensions though (at least to me). If somone wants to wizard one up though, that would be cool.

--A

Lisa Thalheim

unread,

Aug 9, 2013, 7:07:40 AM8/9/13

to diy...@googlegroups.com

Hey Avery,

concerning the visual thing: I hacked something like that together a while ago. It's far from perfect and likely buggy, but maybe you want to expand on it.

It's a Processing application that reads two sequences from a text file called seq.txt (example included) and displays the bases as little coloured squares in three tracks: the top track indicates matches (dark for match, light for mismatch), and the two sequences.

The Up/Down keys move the sequence selection - the gray bar indicates which sequences you're currently on. Clicking on a base will define the start point of a subsequence selection, clicking on another base will define its end point. A subsequence selection can be moved with the Left/Right keys. r reverse-complements a subsequence selection, 0 clears the selection. There's also a Sequence input field at the top - typing in a sequence and hitting enter will add an additional sequence track. Valid characters are agctn, case insensitive.

I attached the Processing project directory. Untar it inside Processing's sketchbook directory. You'll also need to copy the controlP5.jar file into your sketchbook/libraries folder.

On Fri, Aug 9, 2013 at 1:11 AM, Avery louie <inact...@gmail.com> wrote:

Thats how my idea started, I thought it would be easy to allign if you could just color the letters and drag them. Thats a little harder than some list comprehensions though (at least to me). If somone wants to wizard one up though, that would be cool.

--A

--

-- You received this message because you are subscribed to the Google Groups DIYbio group. To post to this group, send email to diy...@googlegroups.com. To unsubscribe from this group, send email to diybio+un...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/diybio?hl=en
Learn more at www.diybio.org
---
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To unsubscribe from this group and stop receiving emails from it, send an email to diybio+un...@googlegroups.com.
To post to this group, send email to diy...@googlegroups.com.
Visit this group at http://groups.google.com/group/diybio.

To view this discussion on the web visit https://groups.google.com/d/msgid/diybio/CAL4KOmi0--Fq-yCPaPiJsJT3%3DmBT6BWWb-KrZaNLLVUa_V3%3DuQ%40mail.gmail.com.

controlP5.jar

bases1.tar

SC

unread,

Aug 9, 2013, 11:48:50 AM8/9/13

to diy...@googlegroups.com, diybio...@googlegroups.com

Hi Avery,

If you just have a few clones you were planning on doing by hand anyway, how about this:

http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::merger

There are other simple assemblers online as well that can handle multiple reads, forward and reverse, etc.

Stacy

Bastian Greshake

unread,

Aug 10, 2013, 5:04:17 AM8/10/13

to diy...@googlegroups.com

hi there,

On Aug 8, 2013, at 23:51 , Avery louie wrote:

> It is also a lot simpler than the other algos, as far as i can tell. It just takes the two sequences and overlaps them, and gives you an idea of where the two overlap (spike on the graph). It doesn't need a reference sequence either.

I haven't had a close look at your code, so I'm not sure whether you've addressed this problem: But you your overlaps of forward/reverse reads might not be 100% perfect due to sequencing errors. So a naive approach might not work for each case where a more advanced alignment method might give you better overlaps.

I've just had a similar problem of aligning overlapping Illumina read pairs this week and I did the alignment using usearch (http://www.drive5.com/usearch/) which allow for semi-global or glocal alignments. It gives you back the aligned overlaps in a format of your choice: http://drive5.com/usearch/manual/allpairs_global.html

Just make sure that you also need to do the reverse-complement of one of the reads before aligning them. I wrote a small python script that iterated over the 2 fastq-files for the paired end Illumina reads, created simple fasta-files for each read-pair and put them into usearch. That idea might work for your data as well I think?

Cheers,
Bastian

--
// Bastian Greshake
// Zehnthofstraße 36
// 55252 Mainz-Kastel, Germany
// cell: +49 176 213 044 66
// web: www.ruleofthirds.de

Xabier Vázquez Campos

unread,

Aug 11, 2013, 10:09:19 PM8/11/13

to diy...@googlegroups.com, bgre...@googlemail.com

Have you tried any program using platform including cap3? Ugene and Bioedit have it among their tools

Personally, I prefer DNA baser, in my experience gives better results without having to deal with the paramenters, though it's not free, you can get a trial period number and sometimes an extension for that. The good thing is that you can see the chomatograms at the same time it does the alignments so it's really useful for those locus that are poorly resolved

Reply all

Reply to author

Forward