segmentation fault when using long reads

1,827 views
Skip to first unread message

Corey Ruhno

unread,
Jan 15, 2014, 11:18:30 AM1/15/14
to rna-...@googlegroups.com
Hi, I recently downloaded some data from the paper http://www.nature.com/nbt/journal/v31/n11/full/nbt.2705.html . It is long RNA-seq read data. Average read length is about 1kb or so. Initially I ran using the following parameters with the hg19 genome that is provided on the STAR ftp site

../STAR --genomeDir ../hg19/hg19 --readFilesIn ../../../Documents/PacBio/emi22260.css.fastq --outFilterMatchNminOverLread .25 --outFilterMismatchNmax 100 --outFilterScoreMinOverLread .25

This caused a Segmentation fault (core dumped) error almost immediately after the mapping phase began. Log file was empty aside from the header. I tried using default parameters

../STAR --genomeDir ../hg19/hg19 --readFilesIn ../../../Documents/PacBio/emi22260.css.fastq 

and it ran a bit longer. The log file actually printed two lines, although nothing was printed in Aligned.out.sam

Jan 15 10:57:25      0.7       14653     1004    19.1%    772.9     0.6%     1.2%     0.0%     0.0%    78.7%     1.0%
Jan 15 10:58:26      0.8       29428      999    25.3%    777.9     0.5%     2.2%     0.0%     0.0%    71.6%     0.9%

It ended with the same error. Relatedly, the alignment rate is extremely low (in the paper they got 98% overall). Im guessing reads were hitting the max number of mismatches and being thrown out but Im not sure.

So my question is, what is the best way to use STAR when working with very long reads?

Alexander Dobin

unread,
Jan 16, 2014, 6:28:09 PM1/16/14
to rna-...@googlegroups.com
Hi Corey,

the RNA-seq reads in this paper are from PacBio. STAR will not work with "raw" PacBio reads since they have a very large error rate (12-15%).
However, STAR can map the "circular consensus error-corrected" CCS reads - I believe these are the ones you are trying to map. 

To make it work you need to do the following:

Re-compile STAR with 'make STARlong' in the source directory (get the latest source from here). This will generate 'STAR' executable for long reads.
Then run the alignments with the following parameters:
--outFilterMultimapScoreRange 20   --outFilterScoreMinOverLread 0   --outFilterMatchNminOverLread 0.66   --outFilterMismatchNmax 1000   --winAnchorMultimapNmax 200   --seedSearchLmax 30   --seedSearchStartLmax 12   --seedPerReadNmax 100000   --seedPerWindowNmax 100   --alignTranscriptsPerReadNmax 100000   --alignTranscriptsPerWindowNmax 10000

I doubt the mapping rate is really 98% with reasonable acceptance criteria, even for Illumina it never gets this high.

Cheers
Alex

Corey Ruhno

unread,
Jan 24, 2014, 11:12:27 AM1/24/14
to rna-...@googlegroups.com
Thanks Alex. Worked great.

But I did have a problem compiling from source for the latest versions of STAR. The newest version I could get to work was 2.3.1u. 2.3.1v and 2.3.1x gave the following error: 

g++ -c -std=c++0x -O3 -Wall -Wextra -fopenmp -D'SVN_VERSION_COMPILED="STAR_2.3.1x_r380"' -D'COMPILATION_TIME_PLACE="Fri Jan 24 11:11:35 EST 2014 :/home/corey/Bioinformatics/STAR_2.3.1x"'   Parameters.cpp
Parameters.cpp: In member function ‘void Parameters::inputParameters(int, char**)’:
Parameters.cpp:614:39: error: assigning to an array from an initializer list
Parameters.cpp:409:45: warning: ignoring return value of ‘int system(const char*)’, declared with attribute warn_unused_result [-Wunused-result]
make: *** [Parameters.o] Error 1


Just wanted to let you know. Thanks again

pbczyd

unread,
Jan 24, 2014, 3:12:15 PM1/24/14
to rna-...@googlegroups.com
Hi,
I guess you also use gcc 4.7.  Did you try gcc 4.4 ?? 

Regards,
Sheng

Alexander Dobin

unread,
Jan 24, 2014, 4:30:50 PM1/24/14
to rna-...@googlegroups.com
Thanks for reporting this problem!
I have fixed it in this patch
It compiles with gcc 4.7.0 without any problems.

Please let me know if you have any issues.
Cheers
Alex
Reply all
Reply to author
Forward
0 new messages