I cannot use mitobim to analyse my data

311 views
Skip to first unread message

pass...@ku.th

unread,
Oct 25, 2017, 5:00:47 PM10/25/17
to MITObim-users
Dear Christoph,

        I am trying to use mitobim to assemble my mitochondrial genome sequences obtained from Illumina sequencer. I first followed your instruction to set up mitobim docker on my google cloud (1 vCPU, 3.75 GB memory, 10 GB standard persistent boot and local disk). The setting process went just fine and I could smoothly run your test data. However, when I applied the same process in tutorial 1 step 1 on my data, I got the following error message and obtained no assembled sequence. What should I do to fix the problem?    

Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence
Assembly Using Trace Signals and Additional Sequence Information.
Computer Science and Biology: Proceedings of the German Conference on
Bioinformatics (GCB) 99, pp. 45-56.

To (un-)subscribe the MIRA mailing lists, see:

After subscribing, mail general questions to the MIRA talk mailing list:


To report bugs or ask for features, please use the SourceForge ticketing
system at:
This ensures that requests do not get lost.


Compiled by: bach
Fri Apr 18 14:57:20 CEST 2014
On: Linux vk10464 2.6.32-41-generic #94-Ubuntu SMP Fri Jul 6 18:00:34 UTC 2012 x86_64 GNU/Linux
Compiled in boundtracking mode.
Compiled in bugtracking mode.
Compiled with ENABLE64 activated.
Runtime settings (sorry, for debug):
Size of size_t  : 8
Size of uint32  : 4
Size of uint32_t: 4
Size of uint64  : 8
Size of uint64_t: 8
Current system: Linux 901418c34dee 4.10.0-37-generic #41~16.04.1-Ubuntu SMP Fri Oct 6 22:42:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Looking for files named in data ...Pushing back filename: "reference.fa"
Pushing back filename: "reads.fastq"
Manifest:
projectname: initial-mapping-testpool-to-Lsubstriata-mt
job: genome,mapping,accurate
parameters: -NW:mrnl=0 -AS:nop=1 SOLEXA_SETTINGS -CO:msr=no
Manifest load entries: 2
MLE 1:
RGID: 1
RGN: SN: Lsubstriata-mt-genome
SP: SPio: 0 SPC: 0 IF: -1 IT: -1 TSio: 0
ST: 5 (Text) namschem: 6 SID: 0
DQ: 30
BB: 1 Rail: 0 CER: 0

reference.fa MLE 2:
RGID: 2
RGN: reads SN: testpool
SP: SPio: 0 SPC: 0 IF: -1 IT: -1 TSio: 0
ST: 6 (Solexa) namschem: 4 SID: 0
DQ: 30
BB: 0 Rail: 0 CER: 0

reads.fastq 

Parameters parsed without error, perfect.

-CL:pec and -CO:emeas1clpec are set, setting -CO:emea values to 1.
------------------------------------------------------------------------------
Parameter settings seen for:
Sanger data

Used parameter settings:
  General (-GE):
Project name                                : initial-mapping-testpool-to-Lsubstriata-mt
Number of threads (not)                     : 2
Automatic memory management (amm)           : yes
   Keep percent memory free (kpmf)         : 15
   Max. process size (mps)                 : 0
EST SNP pipeline step (esps)                : 0
Colour reads by hash frequency (crhf)       : yes

  Load reads options (-LR):
Wants quality file (wqf)                    :  [sxa]  yes

Filecheck only (fo)                         : no

  Assembly options (-AS):
Number of passes (nop)                      : 1
   Skim each pass (sep)                    : yes
Maximum number of RMB break loops (rbl)     : 1
Maximum contigs per pass (mcpp)             : 0

Minimum read length (mrl)                   :  [sxa]  20
Minimum reads per contig (mrpc)             :  [sxa]  10
Enforce presence of qualities (epoq)        :  [sxa]  yes

Automatic repeat detection (ard)            : yes
   Coverage threshold (ardct)              :  [sxa]  2
   Minimum length (ardml)                  :  [sxa]  200
   Grace length (ardgl)                    :  [sxa]  20
   Use uniform read distribution (urd)     : no
     Start in pass (urdsip)                : 3
     Cutoff multiplier (urdcm)             :  [sxa]  1.5

Spoiler detection (sd)                      : yes
   Last pass only (sdlpo)                  : yes

Use genomic pathfinder (ugpf)               : yes

Use emergency search stop (uess)            : yes
   ESS partner depth (esspd)               : 500
Use emergency blacklist (uebl)              : yes
Use max. contig build time (umcbt)          : no
   Build time in seconds (bts)             : 10000

  Strain and backbone options (-SB):
Bootstrap new backbone (bnb)                : yes
Start backbone usage in pass (sbuip)        : 0
Backbone rail from strain (brfs)            : 
Backbone rail length (brl)                  : 0
Backbone rail overlap (bro)                 : 0
Trim overhanging reads (tor)                : yes

(Also build new contigs (abnc))             : no

  Dataprocessing options (-DP):
Use read extensions (ure)                   :  [sxa]  no
   Read extension window length (rewl)     :  [sxa]  30
   Read extension w. maxerrors (rewme)     :  [sxa]  2
   First extension in pass (feip)          :  [sxa]  0
   Last extension in pass (leip)           :  [sxa]  0

  Clipping options (-CL):
SSAHA2 or SMALT clipping:
   Gap size (msvsgs)                       :  [sxa]  1
   Max front gap (msvsmfg)                 :  [sxa]  2
   Max end gap (msvsmeg)                   :  [sxa]  2
   Strict front clip (msvssfc)             :  [sxa]  0
   Strict end clip (msvssec)               :  [sxa]  0
Possible vector leftover clip (pvlc)        :  [sxa]  no
   maximum len allowed (pvcmla)            :  [sxa]  18
Min qual. threshold for entire read (mqtfer):  [sxa]  0
   Number of bases (mqtfernob)             :  [sxa]  15
Quality clip (qc)                           :  [sxa]  no
   Minimum quality (qcmq)                  :  [sxa]  20
   Window length (qcwl)                    :  [sxa]  30
Bad stretch quality clip (bsqc)             :  [sxa]  no
   Minimum quality (bsqcmq)                :  [sxa]  5
   Window length (bsqcwl)                  :  [sxa]  20
Masked bases clip (mbc)                     :  [sxa]  no
   Gap size (mbcgs)                        :  [sxa]  5
   Max front gap (mbcmfg)                  :  [sxa]  12
   Max end gap (mbcmeg)                    :  [sxa]  12
Lower case clip front (lccf)                :  [sxa]  no
Lower case clip back (lccb)                 :  [sxa]  no
Clip poly A/T at ends (cpat)                :  [sxa]  no
   Keep poly-a signal (cpkps)              :  [sxa]  no
   Minimum signal length (cpmsl)           :  [sxa]  12
   Max errors allowed (cpmea)              :  [sxa]  1
   Max gap from ends (cpmgfe)              :  [sxa]  9
Clip 3 prime polybase (c3pp)                :  [sxa]  yes
   Minimum signal length (c3ppmsl)         :  [sxa]  15
   Max errors allowed (c3ppmea)            :  [sxa]  3
   Max gap from ends (c3ppmgfe)            :  [sxa]  9
Clip known adaptors right (ckar)            :  [sxa]  yes
Ensure minimum left clip (emlc)             :  [sxa]  no
   Minimum left clip req. (mlcr)           :  [sxa]  0
   Set minimum left clip to (smlc)         :  [sxa]  0
Ensure minimum right clip (emrc)            :  [sxa]  no
   Minimum right clip req. (mrcr)          :  [sxa]  10
   Set minimum right clip to (smrc)        :  [sxa]  20

Apply SKIM chimera detection clip (ascdc)   : no
Apply SKIM junk detection clip (asjdc)      : no

Propose end clips (pec)                     :  [sxa]  yes
   Bases per hash (pecbph)                 : 31
   Handle Solexa GGCxG problem (pechsgp)   : yes
   Front freq (pffreq)                     :  [sxa]  0
   Back freq (pbfreq)                      :  [sxa]  0
   Minimum kmer for forward-rev (pmkfr)    : 1
   Front forward-rev (pffore)              :  [sxa]  yes
   Back forward-rev (pbfore)               :  [sxa]  yes
   Front conf. multi-seq type (pfcmst)     :  [sxa]  yes
   Back conf. multi-seq type (pbcmst)      :  [sxa]  yes
   Front seen at low pos (pfsalp)          :  [sxa]  no
   Back seen at low pos (pbsalp)           :  [sxa]  no

Clip bad solexa ends (cbse)                 :  [sxa]  yes
Search PhiX174 (spx174)                     :  [sxa]  yes
   Filter PhiX174 (fpx174)                 :  [sxa]  no

Rare kmer mask (rkm)                        :  [sxa]  0

  Parameters for SKIM algorithm (-SK):
Number of threads (not)                     : 2

Also compute reverse complements (acrc)     : yes
Bases per hash (bph)                        : 10
   Automatic increase per pass (bphaipp)   : 1
   Automatic incr. cov. threshold (bphaict): 20
Hash save stepping (hss)                    : 1
Percent required (pr)                       :  [sxa]  60

Max hits per read (mhpr)                    : 2000
Max megahub ratio (mmhr)                    : 0

SW check on backbones (swcob)               : yes

Max hashes in memory (mhim)                 : 15000000
MemCap: hit reduction (mchr)                : 4096

  Parameters for Hash Statistics (-HS):
Freq. cov. estim. min (fcem)                : 0
Freq. estim. min normal (fenn)              : 0.4
Freq. estim. max normal (fexn)              : 1.6
Freq. estim. repeat (fer)                   : 1.9
Freq. estim. heavy repeat (fehr)            : 8
Freq. estim. crazy (fecr)                   : 20
Mask nasty repeats (mnr)                    : no
   Nasty repeat ratio (nrr)                : 100
   Nasty repeat coverage (nrc)             : 0
   Lossless digital normalisation (ldn)    : no

Repeat level in info file (rliif)           : 6

Million hashes per buffer (mhpb)            : 16
Rare kmer early kill (rkek)                 : no

  Pathfinder options (-PF):
Use quick rule (uqr)                        :  [sxa]  yes
   Quick rule min len 1 (qrml1)            :  [sxa]  -90
   Quick rule min sim 1 (qrms1)            :  [sxa]  100
   Quick rule min len 2 (qrml2)            :  [sxa]  -80
   Quick rule min sim 2 (qrms2)            :  [sxa]  100
Backbone quick overlap min len (bqoml)      :  [sxa]  20
Max. start cache fill time (mscft)          : 5

  Align parameters for Smith-Waterman align (-AL):
Bandwidth in percent (bip)             :  [sxa]  20
Bandwidth max (bmax)                   :  [sxa]  80
Bandwidth min (bmin)                   :  [sxa]  20
Minimum score (ms)                     :  [sxa]  15
Minimum overlap (mo)                   :  [sxa]  20
Minimum relative score in % (mrs)      :  [sxa]  60
Solexa_hack_max_errors (shme)          :  [sxa]  -1
Extra gap penalty (egp)                :  [sxa]  no
   extra gap penalty level (egpl)     :  [sxa] reject_codongaps
   Max. egp in percent (megpp)        :  [sxa]  100

  Contig parameters (-CO):
Name prefix (np)                                         : initial-mapping-testpool-to-Lsubstriata-mt
Reject on drop in relative alignment score in % (rodirs) :  [sxa]  30
Mark repeats (mr)                                        : yes
   Only in result (mroir)                               : no
   Assume SNP instead of repeats (asir)                 : no
   Minimum reads per group needed for tagging (mrpg)    :  [sxa]  3
   Minimum neighbour quality needed for tagging (mnq)   :  [sxa]  20
   Minimum Group Quality needed for RMB Tagging (mgqrt) :  [sxa]  30
   End-read Marking Exclusion Area in bases (emea)      :  [sxa]  1
       Set to 1 on clipping PEC (emeas1clpec)           : yes
   Also mark gap bases (amgb)                           :  [sxa]  yes
       Also mark gap bases - even multicolumn (amgbemc) :  [sxa]  yes
       Also mark gap bases - need both strands (amgbnbs):  [sxa]  yes
Force non-IUPAC consensus per sequencing type (fnicpst)  :  [sxa]  no
Merge short reads (msr)                                  :  [sxa]  no
   Max errors (msrme)                                   :  [sxa]  0
   Keep ends unmerged (msrkeu)                          :  [sxa]  -1
Gap override ratio (gor)                                 :  [sxa]  66

  Edit options (-ED):
Mira automatic contig editing (mace)        : yes
   Edit kmer singlets (eks)                : yes
   Edit homopolymer overcalls (ehpo)       :  [sxa]  no

  Misc (-MI):
Large contig size (lcs)                     : 500
Large contig size for stats (lcs4s)         : 5000

I know what I do (ikwid)                    : no

Extra flag 1 / sanity track check (ef1)     : no
Extra flag 2 / dnredreadsatpeaks (ef2)      : yes
Extra flag 3 / pelibdisassemble (ef3)       : yes
Extended log (el)                           : no

  Nag and Warn (-NW):
Check NFS (cnfs)                            : stop
Check multi pass mapping (cmpm)             : stop
Check template problems (ctp)               : stop
Check duplicate read names (cdrn)           : stop
Check max read name length (cmrnl)          : stop
   Max read name length (mrnl)             : 0
Check average coverage (cac)                : stop
   Average coverage value (acv)            : 160

  Directories (-DI):
Top directory for writing files   : initial-mapping-testpool-to-Lsubstriata-mt_assembly
For writing result files          : initial-mapping-testpool-to-Lsubstriata-mt_assembly/initial-mapping-testpool-to-Lsubstriata-mt_d_results
For writing result info files     : initial-mapping-testpool-to-Lsubstriata-mt_assembly/initial-mapping-testpool-to-Lsubstriata-mt_d_info
For writing tmp files             : initial-mapping-testpool-to-Lsubstriata-mt_assembly/initial-mapping-testpool-to-Lsubstriata-mt_d_tmp
Tmp redirected to (trt)           : 
For writing checkpoint files      : initial-mapping-testpool-to-Lsubstriata-mt_assembly/initial-mapping-testpool-to-Lsubstriata-mt_d_chkpt

  Output files (-OUTPUT/-OUT):
Save simple singlets in project (sssip)      :  [sxa]  no
Save tagged singlets in project (stsip)      :  [sxa]  yes

Remove rollover tmps (rrot)                  : yes
Remove tmp directory (rtd)                   : no

    Result files:
Saved as CAF                       (orc)     : yes
Saved as MAF                       (orm)     : yes
Saved as FASTA                     (orf)     : yes
Saved as GAP4 (directed assembly)  (org)     : no
Saved as phrap ACE                 (ora)     : no
Saved as GFF3                     (org3)     : no
Saved as HTML                      (orh)     : no
Saved as Transposed Contig Summary (ors)     : yes
Saved as simple text format        (ort)     : no
Saved as wiggle                    (orw)     : yes

    Temporary result files:
Saved as CAF                       (otc)     : yes
Saved as MAF                       (otm)     : no
Saved as FASTA                     (otf)     : no
Saved as GAP4 (directed assembly)  (otg)     : no
Saved as phrap ACE                 (ota)     : no
Saved as HTML                      (oth)     : no
Saved as Transposed Contig Summary (ots)     : no
Saved as simple text format        (ott)     : no

    Extended temporary result files:
Saved as CAF                      (oetc)     : no
Saved as FASTA                    (oetf)     : no
Saved as GAP4 (directed assembly) (oetg)     : no
Saved as phrap ACE                (oeta)     : no
Saved as HTML                     (oeth)     : no
Save also singlets               (oetas)     : no

    Alignment output customisation:
TEXT characters per line (tcpl)              : 60
HTML characters per line (hcpl)              : 60
TEXT end gap fill character (tegfc)          :  
HTML end gap fill character (hegfc)          :  

    File / directory output names:
CAF             : initial-mapping-testpool-to-Lsubstriata-mt_out.caf
MAF             : initial-mapping-testpool-to-Lsubstriata-mt_out.maf
FASTA           : initial-mapping-testpool-to-Lsubstriata-mt_out.unpadded.fasta
FASTA quality   : initial-mapping-testpool-to-Lsubstriata-mt_out.unpadded.fasta.qual
FASTA (padded)  : initial-mapping-testpool-to-Lsubstriata-mt_out.padded.fasta
FASTA qual.(pad): initial-mapping-testpool-to-Lsubstriata-mt_out.padded.fasta.qual
GAP4 (directory): initial-mapping-testpool-to-Lsubstriata-mt_out.gap4da
ACE             : initial-mapping-testpool-to-Lsubstriata-mt_out.ace
HTML            : initial-mapping-testpool-to-Lsubstriata-mt_out.html
Simple text     : initial-mapping-testpool-to-Lsubstriata-mt_out.txt
TCS overview    : initial-mapping-testpool-to-Lsubstriata-mt_out.tcs
Wiggle          : initial-mapping-testpool-to-Lsubstriata-mt_out.wig
------------------------------------------------------------------------------
Deleting old directory initial-mapping-testpool-to-Lsubstriata-mt_assembly ... done.
Creating directory initial-mapping-testpool-to-Lsubstriata-mt_assembly ... done.
Creating directory initial-mapping-testpool-to-Lsubstriata-mt_assembly/initial-mapping-testpool-to-Lsubstriata-mt_d_results ... done.
Creating directory initial-mapping-testpool-to-Lsubstriata-mt_assembly/initial-mapping-testpool-to-Lsubstriata-mt_d_info ... done.
Creating directory initial-mapping-testpool-to-Lsubstriata-mt_assembly/initial-mapping-testpool-to-Lsubstriata-mt_d_chkpt ... done.
Creating directory initial-mapping-testpool-to-Lsubstriata-mt_assembly/initial-mapping-testpool-to-Lsubstriata-mt_d_tmp ... done.

Tmp directory is not on a NFS mount, good.

Localtime: Wed Oct 25 20:39:55 2017

Loading reference backbone from reference.fa type fa
Localtime: Wed Oct 25 20:39:55 2017
Loading data from FASTA file:
 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] 
Localtime: Wed Oct 25 20:39:55 2017
rnm size: 0
No FASTA quality file given, using default qualities for all reads just loaded.
Localtime: Wed Oct 25 20:39:55 2017

Done.
Loaded 1 reads with 0 reads having quality accounted for.
Loading reads from reads.fastq type fastq
Localtime: Wed Oct 25 20:39:55 2017
Loading data from FASTQ file: reads.fastq
(sorry, no progress indicator for that, possible only with zlib >=1.34)
src/central_freelist.cc:322] tcmalloc: allocation failed 8192 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 
src/central_freelist.cc:322] tcmalloc: allocation failed 73728 


========================== Memory self assessment ==============================
Running in 64 bit mode.

src/central_freelist.cc:322] tcmalloc: allocation failed 16384 
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)
Failure, wrapped MIRA process aborted.
  
       Thanking you in advance for all your help. I am looking forward to hearing from you.

Best regards,
Passorn Wonnapinij
         

Chris H

unread,
Nov 16, 2017, 6:57:22 PM11/16/17
to MITObim-users
Hi,

Thanks for your patience. Could be your system runs out of memory. You can try to downsample your data to, say 20%, and try again just to see if it runs through. There is a script in the misc_scripts directory of the repo. Then try to gradually increase. If you got it running with a subset of your data, perhaps this already gives a reasonable result. Often the coverage of mt genomes is super high, way higher than one needs so I often advice downsampling. Your really need only ~70-100x coverage for good assemblies. More is sometimes making assemblies worse even. Anyway, if you got it running with low coverage, you can try to estimate the memory MIRA would need in the assembly if you used all the data with the miramem program. Just execute miramem and answer the questions that you are asked. Of course you'd have to try to extrapolate the number of reads from the number of reads that are actually assembled from whatever subset you decide to use to the whole dataset.

Hope that helps and sorry for the delay!

cheers,
Christoph
Reply all
Reply to author
Forward
0 new messages