core dumped on small test

53 views
Skip to first unread message

Pascal Hingamp

unread,
Jul 8, 2015, 2:56:14 AM7/8/15
to mosaik-...@googlegroups.com
Hi,
I'm interested in mosaik because of its respect of IUPAC ambiguity codes, for mapping short 20bp oligos (not reads) to a large ref db. I cloned from git, made, and got a core dump on the first test below. Any ideas what I did wrong?
Cheers,
Pascal


$ ./MosaikBuild -fr nucleotides.fasta -oa toto.dat
------------------------------------------------------------------------------
MosaikBuild 2.2.30                                                  2014-06-27
Wan-Ping Lee & Michael Stromberg  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- converting nucleotides.fasta to a reference sequence archive.

- parsing reference sequences:
ref seqs: 9 (18.0 ref seqs/s)

- writing reference sequences:
100%[=======================================================================================]      9.00 ref seqs/s        in  1 s 

- calculating MD5 checksums:
100%[=======================================================================================]      9.00 ref seqs/s        in  1 s 

- writing reference sequence index:
100%[=======================================================================================]      9.00 ref seqs/s        in  1 s 

- creating concatenated reference sequence:
100%[=======================================================================================]      9.00 ref seqs/s        in  1 s 

- writing concatenated reference sequence...        finished.
- creating concatenated 2-bit reference sequence... finished.
- writing concatenated 2-bit reference sequence...  finished.
- writing masking vector...                         finished.

MosaikBuild CPU time: 0.003 s, wall time: 2.503 s



$ ./MosaikBuild -assignQual 20 -st sanger -fr oligo.fna -out oligo.dat
------------------------------------------------------------------------------
MosaikBuild 2.2.30                                                  2014-06-27
Wan-Ping Lee & Michael Stromberg  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- setting read group ID to: ZRZLTSALBU8
- setting sample name to: unknown
- setting sequencing technology to: sanger

- parsing FASTA files:
reads: 1 (inf reads/s)

Filtering statistics:
============================================
# reads written:                 1
# bases written:                70

MosaikBuild CPU time: 0.002 s, wall time: 0.002 s


$ ./MosaikAligner -in oligo.dat -out oligo.out -ia toto.dat -annpe ../src/networkFile/2.1.26.pe.100.0065.ann -annse ../src/networkFile/2.1.26.se.100.005.ann -hs 15 -mm 12 -act 35
------------------------------------------------------------------------------
MosaikAligner 2.2.30                                                2014-06-27
Wan-Ping Lee & Michael Stromberg  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- Using the following alignment algorithm: all positions
- Using the following alignment mode: aligning reads to all possible locations
- Using a maximum mismatch threshold of 12
- Using a hash size of 15
- Using a Smith-Waterman bandwidth of 31
- Using an alignment candidate threshold of 35bp.
- Setting hash position threshold to 200
- loading reference sequence... finished.

Hashing reference sequence:
100%[======================================================================================]  40,187.0 ref bases/s        in  1 s 


Aligning read library (1):
 0% [                                                                                          ]                                  |Segmentation fault (core dumped)

Pascal Hingamp

unread,
Jul 8, 2015, 3:18:42 AM7/8/15
to mosaik-...@googlegroups.com
I downloaded the precompiled https://code.google.com/p/mosaik-aligner/downloads/detail?name=MOSAIK-2.2.3-Linux-x64.tar version, and this one with the same commands doesn' core dump. However, it fails at the alignment stage because if complains "ERROR: Only the following bases are supported in the BAM format: {=, A, C, G, T, N}. Found [W]" even though the documentation of the software (by the way not in sync with these versions of the program, viz the cryptic -annpe -annse options) explicitely states "MOSAIK uses the full set of IUPAC ambiguity codes during alignment". I can't help but wonder what is the use of a fully IUPAC compliant alignment engine if it refuses to report the results because of a limitation in the output format? I tried to search the verbose documentation, but foudn nothing that seemed to imply the output format might be changed from the reluctant BAM...

I'm also worried about the warning that states "2014-03-26 - A bug causing incorrect bases of reverse complement alignments has been fixed. Please check any version greater than 2.2.19 for the fix" which might indicate that even if I can find a work around for the above error, this 2.2.3 version is seriously buggy if it makes errors on RC alignments. And the latest version core dumps...

Quite a frustrating experience...
Thanks for any help you might provide,
Cheers,
Pascal

$ ./MosaikAligner -in oligo.dat -out oligo.out -ia toto.dat -annpe ../MOSAIK/src/networkFile/2.1.26.pe.100.0065.ann -annse ../MOSAIK/src/networkFile/2.1.26.se.100.005.ann -hs 15 -mm 12 -act 35
------------------------------------------------------------------------------
MosaikAligner 2.2.3                                                 2013-09-20

Wan-Ping Lee & Michael Stromberg  Marth Lab, Boston College Biology Department
------------------------------------------------------------------------------

- Using the following alignment algorithm: all positions
- Using the following alignment mode: aligning reads to all possible locations
- Using a maximum mismatch threshold of 12
- Using a hash size of 15
- Using a Smith-Waterman bandwidth of 31
- Using an alignment candidate threshold of 35bp.
- Setting hash position threshold to 200
- loading reference sequence... finished.

Hashing reference sequence:
100%[======================================================================================]  40,187.0 ref bases/s        in  1 s 


Aligning read library (1):
 0% [                                                                                          ]                                  |ERROR: Only the following bases are supported in the BAM format: {=, A, C, G, T, N}. Found [W]


Pascal Hingamp

unread,
Jul 8, 2015, 3:57:06 AM7/8/15
to mosaik-...@googlegroups.com
Should any one else be looking for a fully IUPAC ambiguity short read / oligonucleotide aligner and who experiences difficulty with MOSAIK, I have downloaded (http://research-pub.gene.com/gmap/), compiled and tested the gmap aligner and it worked out of the box, even correctly reporting 100% identity for an alignment with an ambiguous code:

$ gmap --db=test -D ./ -A oligo.fna
GMAP version 2015-06-23 called with args: gmap --db=test -D ./ -A oligo.fna

    Coverage: 100.0 (query length: 70 bp)
    Trimmed coverage: 100.0 (trimmed length: 70 bp, trimmed region: 1..70)
    Percent identity: 100.0 (69 matches, 0 mismatches, 0 indels, 1 unknowns)

Alignments:
  Alignment for path 1:

    +TOTO  :351-420  (1-70)   100%

             0     .    :    .    :    .    :    .    :    .    :
aa.g         1  T  S  I  G  A  F  N  K  C  K  N  L  W  Y  N  E  T
 +TOTO    :351 TACTAGTATTGGAGCGTTTAATAAATGTAAAAATTTATGGTATAATGAAA
               ||||||||||||||||||||||||||||| ||||||||||||||||||||
             1 TACTAGTATTGGAGCGTTTAATAAATGTAWAAATTTATGGTATAATGAAA
aa.c         1  T  S  I  G  A  F  N  K  C  X  N  L  W  Y  N  E  T


Processed 1 queries in 0.01 seconds (100.00 queries/sec)
 

This gmap solution fits my needs, please feel free to ignore my request for assistance with MOSAIK.
Cheers,
Pascal


On Wednesday, 8 July 2015 15:56:14 UTC+9, Pascal Hingamp wrote:
Reply all
Reply to author
Forward
0 new messages