Segmentation fault at the alignment step

1,404 views
Skip to first unread message

Arnaud Kerhornou

unread,
May 19, 2014, 7:31:08 AM5/19/14
to rna-...@googlegroups.com
Hi all,


I'm trying to align bread wheat ESTs to its wild progenitor genomes, in my post here, Aegilops tauschii,
which is 3.2 Gbp in 209,222 scaffolds (and I have been filtering out all scaffolds which are less than 500bp otherwise, we get twice more scaffolds).

I tried both STAR_2.3.1z4 and STAR_2.3.1z

here are the genome indexing parameters:

STAR --runMode genomeGenerate --genomeDir [...] --genomeFastaFiles [...] --runThreadN 4 --limitGenomeGenerateRAM 60000000 --genomeSAsparseD 2 --genomeSAindexNbases 15 --genomeChrBinNbits 15

and the alignment parameters:

STARlong --genomeDir [...] --runThreadN 4 --alignIntronMax 25000 --readFilesIn ./data/atauschii/my_chunks/my_chunk_18.fasta --outStd SAM --outFilterMultimapScoreRange 20 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0.66 --outFilterMismatchNmax 1000 --winAnchorMultimapNmax 200 --seedSearchLmax 30 --seedSearchStartLmax 12 --seedPerReadNmax 100000 --seedPerWindowNmax 100 --alignTranscriptsPerReadNmax 100000 --alignTranscriptsPerWindowNmax 10000 > ./data/atauschii/aegilops_tauschii.18.sam

Segmentation fault (core dumped)

(See Log.out attached)

Is there a way to debug and find out what's happenning ?

Thanks,
Arnaud

PS: I tried to compile STARdebug, but it is failing

make debug
g++ -c -O3 -DDEBUG -D'SVN_VERSION_COMPILED="STAR_2.3.1z4_r419"' -D'COMPILATION_TIME_PLACE="Mon May 19 12:28:38 BST 2014 :/nfs/panda/ensemblgenomes/external/src/STAR_2.3.1z4"' Genome.cpp
In file included from Genome.h:5,
                 from Genome.cpp:1:
Parameters.h:49: error: ‘readFilesNames’ was not declared in this scope
Parameters.h:49: error: ‘>>’ should be ‘> >’ within a nested template argument list
make: *** [debug] Error 1

Log.out.gz

Alexander Dobin

unread,
May 20, 2014, 10:35:00 PM5/20/14
to rna-...@googlegroups.com
Hi Arnaud,

please try to increase --seedPerWindowNmax to 1000, I suspect this is the parameters that caused seg-fault - since your reads are probably relatively long, and you use aggressive seed searching, there might be more than 100 seeds per window for some of the reads.
If it does not help, you can also try to debug using gdb, you would need to compile with:
$ make clean
$ make gdb

If none of the above works, please send me the link to the genome and minimal portion of the EST sequences that causes seg-fault.
Cheers
Alex

Arnaud Kerhornou

unread,
May 21, 2014, 7:11:42 AM5/21/14
to rna-...@googlegroups.com
Hi ALex,

Thanks for looking into this. After increasing the --seedPerWindowNmax to 1000, I'm still getting the seg fault.

here is the trace from gdb:

[...]
@CO    user command line: /nfs/production/panda/ensemblgenomes/external/src/STAR_2.3.1z4/STAR --genomeDir /nfs/panda/ensemblgenomes/development/arnaud/new_est_hive_pipeline/data/atauschii/ --runThreadN 4 --alignIntronMax 25000 --readFilesIn /nfs/panda/ensemblgenomes/development/arnaud/new_est_hive_pipeline/test_18_less_500.fa --outStd SAM --outFilterMultimapScoreRange 20 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0.66 --outFilterMismatchNmax 1000 --winAnchorMultimapNmax 200 --seedSearchLmax 30 --seedSearchStartLmax 12 --seedPerReadNmax 100000 --seedPerWindowNmax 1000 --alignTranscriptsPerReadNmax 100000 --alignTranscriptsPerWindowNmax 10000
[New Thread 0x2ab13da88700 (LWP 64203)]
[New Thread 0x2ab13dc89700 (LWP 64204)]
[New Thread 0x2ab13de8a700 (LWP 64205)]
[Thread 0x2ab13de8a700 (LWP 64205) exited]
[Thread 0x2ab13dc89700 (LWP 64204) exited]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2ab13da88700 (LWP 64203)]
0x0000000000445bd3 in stitchWindowAligns (iA=390, nA=397, Score=36, WAincl=0x67a8c0, tR2=199, tG2=6203512884, trA=..., Lread=295,
    WA=0x24c380d0,
    R=0x451243b0 "\t\t\t\t\003\003\003\003\001\001\001\001\002\002\002\002\002\002\002\001\002", '\003' <repeats 16 times>, "\001\001\001\001\002\002\002\003\003\003\003\003\003\002\002\002\002\t\002\001\003\003\003\003\003\002", Q=0x4513ca70 "", G=0x2aaaaaae50d8 "\002", sigG=0x0,
    P=0x673430, wTr=0x45014898, nWinTr=0x44ff8948, RA=0x230b1010) at stitchWindowAligns.cpp:302
302            dScore=stitchAlignToTranscript(tR2, tG2, WA[iA][WA_rStart], WA[iA][WA_gStart], WA[iA][WA_Length], WA[iA][WA_iFrag],  WA[iA][WA_sjA], P, R, Q, G, &trAi);       
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libgomp-4.4.6-4.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64 zlib-1.2.3-27.el6.x86_64


I generated a subset fasta file of the ESTs, with only ESTs < 500 bp. It seems to run fine on the sub set of ESTs > 500 bp.

cheers,
Arnaud

Arnaud Kerhornou

unread,
May 21, 2014, 7:55:56 AM5/21/14
to rna-...@googlegroups.com
backtrace report might help to

[...]

   WA=0x24c380d0,
    R=0x451243b0 "\t\t\t\t\003\003\003\003\001\001\001\001\002\002\002\002\002\002\002\001\002", '\003' <repeats 16 times>, "\001\001\001\001\002\002\002\003\003\003\003\003\003\002\002\002\002\t\002\001\003\003\003\003\003\002", Q=0x4513ca70 "", G=0x2aaaaaae50d8 "\002", sigG=0x0,
    P=0x673430, wTr=0x45014898, nWinTr=0x44ff8948, RA=0x230b1010) at stitchWindowAligns.cpp:337
#390 0x0000000000446040 in stitchWindowAligns (iA=0, nA=397, Score=0, WAincl=0x67a8c0, tR2=0, tG2=0, trA=..., Lread=295, WA=0x24c380d0,
    R=0x451243b0 "\t\t\t\t\003\003\003\003\001\001\001\001\002\002\002\002\002\002\002\001\002", '\003' <repeats 16 times>, "\001\001\001\001\002\002\002\003\003\003\003\003\003\002\002\002\002\t\002\001\003\003\003\003\003\002", Q=0x4513ca70 "", G=0x2aaaaaae50d8 "\002", sigG=0x0,
    P=0x673430, wTr=0x45014898, nWinTr=0x44ff8948, RA=0x230b1010) at stitchWindowAligns.cpp:329
#391 0x000000000042de36 in ReadAlign::stitchPieces (this=0x230b1010, R=0x67b300, Q=0x67b320, G=0x2aaaaaae50d8 "\002", SA=..., Lread=295)
    at ReadAlign_stitchPieces.cpp:405
#392 0x000000000042f3d4 in ReadAlign::mapOneRead (this=0x230b1010) at ReadAlign_mapOneRead.cpp:88

#393 0x000000000043f24c in ReadAlign::oneRead (this=0x230b1010) at ReadAlign_oneRead.cpp:67
#394 0x0000000000432660 in ReadAlignChunk::mapChunk (this=0x67a570) at ReadAlignChunk_mapChunk.cpp:25
#395 0x0000000000431f7d in ReadAlignChunk::processChunks (this=0x67a570) at ReadAlignChunk_processChunks.cpp:97

#396 0x000000000040848a in ThreadControl::threadRAprocessChunks (RAchunk=0x67a570) at ThreadControl.h:19
#397 0x0000003435007851 in start_thread () from /lib64/libpthread.so.0
#398 0x0000003434ce767d in clone () from /lib64/libc.so.6


Alexander Dobin

unread,
May 22, 2014, 12:42:42 AM5/22/14
to rna-...@googlegroups.com
Hi Arnaud,

I cannot see what is the problem with the line that seg-faults, the only array that's accessed on it is WA[iA] which is defined with the 1st dimension of   --seedPerWindowNmax, which you now increased to 1000, so iA=390 should not cause a problem. I think I will need to replicate it on my server to look for the problem. If you can send me the link to the genome and at least one sequence that causes the problem, I will have a look at it. You can find read ID that caused the problem in your example within gdb: 
(gdb) fr 391
(gdb) p readName

By the way, before you did  'make STARlong', have you done 'make clean' ? I vaguely remember a user reporting a seg-fault because of that.

Cheers
Alex

Arnaud Kerhornou

unread,
Jun 2, 2014, 11:03:31 AM6/2/14
to rna-...@googlegroups.com
Hi again,

Following on this,

after removing wheat ESTs with too many low-complexity regions, I don't get this segfault anymore but I'm still getting a segfault which seems to be at a different location in the code.

I get another seg fault but at a different location in the code (still using STAR z4 binary),

here is what gdb reports:
[...]
[Thread debugging using libthread_db enabled]
[New Thread 0x2ab0a8336700 (LWP 28772)]
[New Thread 0x2ab0a8537700 (LWP 28773)]
[New Thread 0x2ab0a8738700 (LWP 28774)]
[Thread 0x2ab0a8738700 (LWP 28774) exited]
[Thread 0x2ab0a8537700 (LWP 28773) exited]


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2ab0a8336700 (LWP 28772)]
0x0000000000406739 in PackedArray::operator[] (this=0x2393b9a0, ii=4611686018516860860) at PackedArray.h:26
26       uint a1 = *((uint*) (charArray+B));

Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64 libgcc-4.4.6-4.el6.x86_64 libgomp-4.4.6-4.el6.x86_64 libstdc++-4.4.6-4.el6.x86_64 zlib-1.2.3-27.el6.x86_64

How can I find out if it relates to a particular EST sequence ?

Thanks,
Arnaud

Alexander Dobin

unread,
Jun 2, 2014, 5:08:01 PM6/2/14
to rna-...@googlegroups.com
Hi Arnaud,

sorry about the delay, I am out of town, will get back to you on Wednesday.

Cheers
Alex

Arnaud Kerhornou

unread,
Jun 3, 2014, 9:43:10 AM6/3/14
to rna-...@googlegroups.com
No problem, thanks Alex.

cheers,
Arnaud

Alexander Dobin

unread,
Jun 5, 2014, 1:15:52 AM6/5/14
to rna-...@googlegroups.com
Hi Arnaud,

I think I figure out what's causing the problems - the --seedSearchLmax parameter cannot work with reads that contain Ns (or any non-ACGT calls).
After omitting this parameter I could map your test_18_less_500_more_300.fa.gz without a problem.
I am going to fix this limitation in the future, but at the moment you would need to drop this parameter at least for those reads that contain non-ACGTs.
For good quality ESTs this should not affect accuracy, it's more important for poorer quality sequencing such as PacBio or 454.

Cheers
Alex

Arnaud Kerhornou

unread,
Jun 5, 2014, 10:29:34 AM6/5/14
to rna-...@googlegroups.com
Hi Alex,

Many thanks for looking into this. Following your suggestion, I can now align  wheat ESTs to its genome without any errors.

cheers,
Arnaud

Reply all
Reply to author
Forward
0 new messages