whole genome aligner with output to IGV?

289 views
Skip to first unread message

Arthur Pastuer

unread,
Sep 22, 2010, 2:28:59 PM9/22/10
to abyss...@googlegroups.com
Hi,

   After getting a good assembly with ABySS what whole genome alignment tools are folks using for alignment of contigs to reference genomes or closely related species? I have used MUMmer in the past,  but it would be nice to get something with alignment output compatible with IGV. Any suggestions would be most appreciated!

Best, Art

Shaun Jackman

unread,
Sep 22, 2010, 3:36:45 PM9/22/10
to Arthur Pastuer, abyss...@googlegroups.com
Hi Art,

I've been using GMAP, BLAT or exonerate. I'll be interested to hear
which tools others are using.

Which tools are folk using to align the reads to the assembled contigs?
I've been using bowtie, BWA or GSNAP.

Cheers,
Shaun

Nathaniel Street

unread,
Sep 23, 2010, 2:53:11 AM9/23/10
to Shaun Jackman, Arthur Pastuer, abyss...@googlegroups.com
Hi

I'm using exonerate for transcript assemblies and lastz for genomic assemblies. I'm happy with exonerate for the transcripts, especially as I can use a few parsers to get properly formatted GFF3 files for use in GBrowse.

I'm not happy with my genomic solution. I wanted to use mummer but it produces an error message when I try to run all contigs against all chromosomes (the same error Art posted to the mummer mailing list).

What about browsers? So far I haven't found a browser for viewing contigs aligned to a reference that I like at all. The browers I've tried tend to have one or two pros and many cons. Things are OK for bacterial assemblies (or in my case the chloroplast) but scaling up to larger multi-chromsome genomes becomes horrible (at least in my opinion).

For aligning reads to contigs I'm using bowtie or bwa - and I can't decide which to settle on. Again, I haven't found a good browser.

We were recently talking about what metrics we want post aligning reads to contigs and from the assembler output. For example, a list of contigs with lower or higher than average coverage, contigs with higher than average number of graph connection, per base coverage etc. If anyone wants to join our discussions and our effects to script producing these summary stats we would welcome the help - just drop me an email. We are also thinking to try and make this as easy to compare across assemblers as possible to help us get past a purely N50 based comparison.

Nat

Alejandro Sanchez

unread,
Sep 23, 2010, 7:16:57 AM9/23/10
to abyss...@googlegroups.com
Hi guys,

About aligning and viewing genomes, what we use here is ABACAS and ACT:

http://sourceforge.net/projects/abacas/
http://www.sanger.ac.uk/resources/software/act/

ABACAS will align your contigs (using nucmer) to a reference that should
be single molecule (line) FASTA file. The output can be loaded along
with the contigs and the reference and you can visualize the matches.
Also, if you BLAST of BLAT your query contigs against the reference, the
m8 format can be loaded in ACT as well.

Cheers.

--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Arthur Pastuer

unread,
Sep 23, 2010, 1:14:13 PM9/23/10
to Nathaniel Street, Shaun Jackman, abyss...@googlegroups.com
Hi Shaun & Nat,
   
As Nat noted: I tried to align my contigs to the reference with MUMmer (this has worked well for me previously...and has never produced errors for contigs produced with earlier versions of ABySS....but it could also be the data set). I got an error that nucmer returned a non-zero value. The response from the MUMmer list was:

"Most of the time this wonderfully obscure error is due to non-unique sequence identifiers in the input. Double check that no sequence identifier (the part after the '>' and before the first space in the fasta files) is repeated in the same file."

Does anyone have a script for searching out duplicate contigs in a FASTA file?

Best, Art

damian kao

unread,
Sep 23, 2010, 2:26:47 PM9/23/10
to Arthur Pastuer, Nathaniel Street, Shaun Jackman, abyss...@googlegroups.com
Here's something simple I wrote in a few minutes that'll find duplicate fasta IDs in python. Copy and paste into text document and save as yourName.py. Keep spacing as it is. Python uses blank spaces as script delimiters.

To run it:
python yourName.py fastaFile.fa

----------------------------------
import sys

inFile = open(sys.argv[1],'r')

idHash = {}
for line in inFile:
   if line[0:1] == ">":
      faID = line.strip()[1:]
     
      if idHash.has_key(faID):
         idHash[faID] += 1
      else:
         idHash[faID] = 1

for k,v in idHash.items():
   if v > 1:
      print str(v) + " entries with sequence id: " + k
------------------------------------------
--

________________________________
Damian Kao

Durrell Kapan

unread,
Sep 23, 2010, 4:33:56 PM9/23/10
to Nathaniel Street, Shaun Jackman, Arthur Pastuer, abyss...@googlegroups.com
Nat et al.,

For either de novo contig re-alignments and/or reference (distant spp.) alignments we've tried bowtie, bwa, RMAP, SSHA2 and others. For browsers TABLET and commercial products.

Keep me in the loop on the "Metrics" discussion drops off the ABYSS list.

Durrell

Reply all
Reply to author
Forward
0 new messages