Segmentation fault in SOAPdenovo multi-kmer

1,648 views
Skip to first unread message

fyusufi

unread,
May 6, 2013, 8:03:06 AM5/6/13
to bgi-...@googlegroups.com
Hello,

I have been trying the new SOAPdenovo multi-kmer feature but I keep getting a segmentation fault when the Iteration step in the contig phase of the program begins. I have tried many different files and settings but I cannot figure out why this error might be occuring.  First, some instances in which the program works correctly:

1) Running multi-kmer with just one fasta file works correctly.
2) Running single kmer with multiple fasta file works correctly. 

Segmentation fault occurs when multiple fasta files are used with multi-kmer setting. I am running on a Linux server (Ubuntu 12.04) with 80 CPU and 1024 GB of RAM. At the time of the segmentation fault only 20-30GB of RAM is being used. I am using the standard pregraph and I have used both the pre-compiled binary and compiled from source as well.

Command: SOAPdenovo-127mer all -s test_config_file -K 31 -m 35 -p 64 -o test_assembly

The Pregraph stage runs without any problems. The problem occurs in the contig stage:

There are 6439650 contig(s) longer than 100, sum up 1301510819 bp, with average length 202.
The longest length is 4421 bp, contig N50 is 217 bp, contig N90 is 114 bp.

Iteration start.

***************************
Iteration 1, kmer: 32
Edge number: 36837275
Construct 32mer graph.
Construct new kmer hash.
0 edge(s) deleted in length of 0.
33543456 new kmer(s).
36837275 edge(s) updated to 32mer edge.
Time spent on building hash graph: 221s.

Add arcs to graph.
In file: test_config_file, max seq len 200, max name len 256.
64 thread(s) initialized.
Import reads from file:
 lib1.fasta
Segmentation fault (core dumped)

In one instance running on the same dataset with different parameters I got a more informative error message.

Command: SOAPdenovo-127mer all -s test_config_file -K 61 -m 65 -p 64 -o test_assembly

With the following error message:

Iteration start.

***************************
Iteration 1, kmer: 62
Edge number: 4535012
Construct 62mer graph.
Construct new kmer hash.
0 edge(s) deleted in length of 0.
4350584 new kmer(s).
4535012 edge(s) updated to 62mer edge.
Time spent on building hash graph: 27s.

Add arcs to graph.
In file: test_config_file, max seq len 200, max name len 256.
Ran out of memory while applying -114285712bytes
There may be errors as follows:
1) Not enough memory.
2) The ARRAY may be overrode.
3) The wild pointers.

Even if the program asked for 114GB of RAM that should not be a problem as the server has 1024GB available. Has anyone else encountered such problems? Any help or advice on how to fix this will be greatly appreciated. I have attached my config file at the end of this message.

Thanks very much!

#test_config_file
#maximal read length
max_rd_len=200
[LIB]
#average insert size
avg_ins=180
#if sequence needs to be reversed
reverse_seq=0
#in which part(s) the reads are used
asm_flags=3
#use only first 90 bps of each read
rd_len_cutoff=125
#in which order the reads are used while scaffolding
rank=1
# cutoff of pair number for a reliable connection (default 3)
pair_num_cutoff=3
#minimum aligned length to contigs for a reliable read location (default 32)
map_len=32
#fasta file for reads
p=lib1.fasta
[LIB]
#average insert size
avg_ins=180
#if sequence needs to be reversed
reverse_seq=0
#in which part(s) the reads are used
asm_flags=3
#use only first 90 bps of each read
rd_len_cutoff=125
#in which order the reads are used while scaffolding
rank=1
# cutoff of pair number for a reliable connection (default 3)
pair_num_cutoff=3
#minimum aligned length to contigs for a reliable read location (default 32)
map_len=32
#fasta file for reads
p=lib2.fasta
[LIB]
#average insert size
avg_ins=3000
#if sequence needs to be reversed 
reverse_seq=0
#in which part(s) the reads are used
asm_flags=3
#use only first 90 bps of each read
rd_len_cutoff=54
#in which order the reads are used while scaffolding
rank=5
# cutoff of pair number for a reliable connection (default 3)
pair_num_cutoff=3
#minimum aligned length to contigs for a reliable read location (default 32)
map_len=32
#fasta file for reads 
p=lib3.fasta



Yogesh

unread,
Jun 11, 2013, 1:58:24 AM6/11/13
to bgi-...@googlegroups.com
Hi,
I also have the similar problem. Did you find any solution?? Here I have attached the parameter I used and the error message.

Version 2.04: released on July 13th, 2012
Compile Jun  4 2013 11:47:05

********************
Contig
********************

Parameters: contig -g /srv/mds01/users/paude004/denovoSV/SOAPoutputs/TJT_graph -M 2 -R -s TJT_config.config -p 20 

There are 347133057 kmer(s) in vertex file.
There are 1125346406 edge(s) in edge file.
Kmers sorted.
1125346406 edge(s) input.
1522960778 pre-arcs loaded.
21804875982 markers overall.
21804875982 markers loaded.
281270312 none-palindrome edge(s) swapped, 0 palindrome edge(s) processed.
Ran out of memory while applying 54016627536bytes
There may be errors as follows:
1) Not enough memory.
2) The ARRAY may be overrode.
3) The wild pointers.


Thanks,
/yogesh

bhkwan

unread,
Feb 6, 2014, 11:20:10 PM2/6/14
to bgi-...@googlegroups.com
Hello,

I have the same problem. What could be the problem and how to solve?

Topulaneus Hattum

unread,
Mar 17, 2014, 12:31:28 PM3/17/14
to bgi-...@googlegroups.com
On Monday, May 6, 2013 8:03:06 AM UTC-4, fyusufi wrote:
...
Ran out of memory while applying -114285712bytes
...
Even if the program asked for 114GB of RAM that should not be a problem as the server has 1024GB available.

I don't think it was trying to allocate 114G.  I think it was trying to allocate negative 114M.  Notice the minus sign.  I think some counting or sizing variable has overflowed.

Were you ever able to resolve this?  I'm experiencing a segfault, though I'm too new to SOAPdenovo to figure out if mine is happening at the same stage as yours.  Googling for "soapdenovo segfault" gives 5,000 hits.  Your thread is the first. I am hopeful that one of them will guide me toward a solution.

T. Hattum

Manoj Samanta

unread,
Mar 17, 2014, 5:21:02 PM3/17/14
to bgi-...@googlegroups.com
Based on my understanding, SOAPdenovo2 multi-kmer method has worse
performance than SOAPdenovo2 single kmer run. The better solution is
to use some kind of preqc module (e.g. kmergenie by Rayan Chikhi or
preqc by Jared Simpson) to find the best k-mer size and then do the
assembly at that k-mer.
> --
> You received this message because you are subscribed to the Google Groups
> "BGI-SOAP" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bgi-soap+u...@googlegroups.com.
> To post to this group, send email to bgi-...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/bgi-soap/d0f29e83-e8fd-496d-b9cb-74e2a94593ad%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

LUO Ruibang

unread,
Mar 17, 2014, 10:34:41 PM3/17/14
to bgi-...@googlegroups.com, lizhenyu
This is a bug.

We will fix it.

Ruibang



--
You received this message because you are subscribed to the Google Groups "BGI-SOAP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bgi-soap+u...@googlegroups.com.
To post to this group, send email to bgi-...@googlegroups.com.

a.w...@uni-bayreuth.de

unread,
Feb 26, 2017, 10:40:46 PM2/26/17
to BGI-SOAP
Hello,
I'm using v 1.03; the core dump bug is still there. Has a fix been already applied as stated in Ruibang's post in March, 2014?
Best regards!

谢寅龙(Long Xie)

unread,
Feb 28, 2017, 2:37:13 AM2/28/17
to bgi-...@googlegroups.com
Hi

Did you try the latest version?

https://github.com/aquaskyline/SOAPdenovo2

 And you'd better check the legality on the input files, such as the read length, end of file or quality value.

Best,
Yinlong Xie
MGI

From: bgi-...@googlegroups.com [bgi-...@googlegroups.com] on behalf of a.w...@uni-bayreuth.de [a.w...@uni-bayreuth.de]
Sent: Friday, February 17, 2017 0:05
To: BGI-SOAP
Subject: [BGI-SOAP:1257] Re: Segmentation fault in SOAPdenovo multi-kmer

--
You received this message because you are subscribed to the Google Groups "BGI-SOAP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bgi-soap+u...@googlegroups.com.
To post to this group, send email to bgi-...@googlegroups.com.

Weig, Alfons

unread,
Mar 5, 2017, 8:48:23 PM3/5/17
to bgi-...@googlegroups.com

Hi,

No, I did not use SOAPdenovo2, because I assumed it is not suited for RNAseq reads (transcriptome assemblies), right?

Best regards

Alfons

--
You received this message because you are subscribed to a topic in the Google Groups "BGI-SOAP" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bgi-soap/y0Q8g9WUFa0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bgi-soap+u...@googlegroups.com.


To post to this group, send email to bgi-...@googlegroups.com.

谢寅龙(Long Xie)

unread,
Mar 6, 2017, 1:49:34 AM3/6/17
to A.W...@uni-bayreuth.de, bgi-...@googlegroups.com
To assemble the RNAseq data, you'd better try the SOAPdenovo-Trans:

From: bgi-...@googlegroups.com [bgi-...@googlegroups.com] on behalf of Weig, Alfons [A.W...@uni-bayreuth.de]
Sent: Wednesday, March 01, 2017 23:46
To: bgi-...@googlegroups.com
Subject: AW: [BGI-SOAP:1260] Re: Segmentation fault in SOAPdenovo multi-kmer

Weig, Alfons

unread,
Mar 6, 2017, 2:04:27 AM3/6/17
to bgi-...@googlegroups.com

Hi, I have used SOAPdenovo-trans, but experienced the core dump failure as explained below.

Is there an upper limit for the number of reads that can be processed?

谢寅龙(Long Xie)

unread,
Mar 6, 2017, 2:16:18 AM3/6/17
to bgi-...@googlegroups.com, A.W...@uni-bayreuth.de
If you met the same problem in SOAPdenovo-Trans, you'd better check the legality of the input files.

Best,
Yinlong Xie
MGI

Sent: Monday, March 06, 2017 15:04
To: bgi-...@googlegroups.com
Subject: AW: [BGI-SOAP:1262] Re: Segmentation fault in SOAPdenovo multi-kmer

Weig, Alfons

unread,
Mar 7, 2017, 3:52:47 AM3/7/17
to bgi-...@googlegroups.com

Hi,

I have checked the fastq files using FastQC, and they seem to be ok (see screenshot). In addition, the fastq files were successfully used by Trinity assembler.

 

My question was if there is an upper limit of reads that can be analyzed by SOAPdenovo-trans? In the project below, I have two times ca. 832 million reads. And I have 190 GB RAM and 24 cores in my computer, so it should not be a hardware problem.

 

If you don’t have an idea what might went wrong, then I have to live with the fact that SOAPdenovo-trans is not suitable for large projects.

 

Thank you for your comments.

Alfons

 

谢寅龙(Long Xie)

unread,
Mar 7, 2017, 4:04:05 AM3/7/17
to A.W...@uni-bayreuth.de, bgi-...@googlegroups.com
At least, SOAPdenovo can handle 100X human WGS data. That is, 3000 million 100bp reads.
Maybe the assembly target is too complicated that generate too many graph elements.


Best,
Yinlong Xie
MGI

Sent: Tuesday, March 07, 2017 16:52
To: bgi-...@googlegroups.com
Subject: AW: [BGI-SOAP:1264] Re: Segmentation fault in SOAPdenovo multi-kmer

Weig, Alfons

unread,
Mar 10, 2017, 2:04:32 AM3/10/17
to bgi-...@googlegroups.com

Hi,

I’m really sorry to come back to you again, but I have tremendous problems finding the most recent version of SOAPdenovo-trans v1.04.

On your website, there is only the v1.03 version which has bugs and stops with errors.

 

Chris Boursnell has fixed some bugs and told me that it is abailable on github. But I cannot find it.

In addition, your webstite at http://soap.genomics.org.cn/SOAPdenovo-Trans.html also tells me that the most recent version is v1.03, but I assume this site is not up to date.

 

I would really appreciate if you could send me a link to the gz package of v1.04.

 

Alfons

 

Von: bgi-...@googlegroups.com [mailto:bgi-...@googlegroups.com] Im Auftrag von ???(Long Xie)


Gesendet: Montag, 6. März 2017 07:49
An: Weig, Alfons <A.W...@uni-bayreuth.de>
Cc: bgi-...@googlegroups.com

谢寅龙(Long Xie)

unread,
Mar 10, 2017, 2:14:30 AM3/10/17
to A.W...@uni-bayreuth.de, bgi-...@googlegroups.com
The github I sent you before was the latest version v1.04.


Best,
Yinlong Xie
MGI

Sent: Friday, March 10, 2017 15:04
To: bgi-...@googlegroups.com
Subject: AW: [BGI-SOAP:1266] Re: Segmentation fault in SOAPdenovo multi-kmer

Reply all
Reply to author
Forward
0 new messages