segmentation fault when using long reads

Corey Ruhno

unread,

Jan 15, 2014, 11:18:30 AM1/15/14

to rna-...@googlegroups.com

Hi, I recently downloaded some data from the paper http://www.nature.com/nbt/journal/v31/n11/full/nbt.2705.html . It is long RNA-seq read data. Average read length is about 1kb or so. Initially I ran using the following parameters with the hg19 genome that is provided on the STAR ftp site

../STAR --genomeDir ../hg19/hg19 --readFilesIn ../../../Documents/PacBio/emi22260.css.fastq --outFilterMatchNminOverLread .25 --outFilterMismatchNmax 100 --outFilterScoreMinOverLread .25

This caused a Segmentation fault (core dumped) error almost immediately after the mapping phase began. Log file was empty aside from the header. I tried using default parameters

../STAR --genomeDir ../hg19/hg19 --readFilesIn ../../../Documents/PacBio/emi22260.css.fastq

and it ran a bit longer. The log file actually printed two lines, although nothing was printed in Aligned.out.sam

Jan 15 10:57:25 0.7 14653 1004 19.1% 772.9 0.6% 1.2% 0.0% 0.0% 78.7% 1.0%

Jan 15 10:58:26 0.8 29428 999 25.3% 777.9 0.5% 2.2% 0.0% 0.0% 71.6% 0.9%

It ended with the same error. Relatedly, the alignment rate is extremely low (in the paper they got 98% overall). Im guessing reads were hitting the max number of mismatches and being thrown out but Im not sure.

So my question is, what is the best way to use STAR when working with very long reads?

Alexander Dobin

unread,

Jan 16, 2014, 6:28:09 PM1/16/14

to rna-...@googlegroups.com

Hi Corey,

the RNA-seq reads in this paper are from PacBio. STAR will not work with "raw" PacBio reads since they have a very large error rate (12-15%).

However, STAR can map the "circular consensus error-corrected" CCS reads - I believe these are the ones you are trying to map.

To make it work you need to do the following:

Re-compile STAR with 'make STARlong' in the source directory (get the latest source from here). This will generate 'STAR' executable for long reads.

Then run the alignments with the following parameters:

--outFilterMultimapScoreRange 20 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0.66 --outFilterMismatchNmax 1000 --winAnchorMultimapNmax 200 --seedSearchLmax 30 --seedSearchStartLmax 12 --seedPerReadNmax 100000 --seedPerWindowNmax 100 --alignTranscriptsPerReadNmax 100000 --alignTranscriptsPerWindowNmax 10000

I doubt the mapping rate is really 98% with reasonable acceptance criteria, even for Illumina it never gets this high.

Cheers

Alex

Corey Ruhno

unread,

Jan 24, 2014, 11:12:27 AM1/24/14

to rna-...@googlegroups.com

Thanks Alex. Worked great.

But I did have a problem compiling from source for the latest versions of STAR. The newest version I could get to work was 2.3.1u. 2.3.1v and 2.3.1x gave the following error:

g++ -c -std=c++0x -O3 -Wall -Wextra -fopenmp -D'SVN_VERSION_COMPILED="STAR_2.3.1x_r380"' -D'COMPILATION_TIME_PLACE="Fri Jan 24 11:11:35 EST 2014 :/home/corey/Bioinformatics/STAR_2.3.1x"' Parameters.cpp

Parameters.cpp: In member function ‘void Parameters::inputParameters(int, char**)’:

Parameters.cpp:614:39: error: assigning to an array from an initializer list

Parameters.cpp:409:45: warning: ignoring return value of ‘int system(const char*)’, declared with attribute warn_unused_result [-Wunused-result]

make: *** [Parameters.o] Error 1

Just wanted to let you know. Thanks again

pbczyd

unread,

Jan 24, 2014, 3:12:15 PM1/24/14

to rna-...@googlegroups.com

Hi,

we got same error. See: https://groups.google.com/forum/#!topic/rna-star/w2YR4ktKHpA

I guess you also use gcc 4.7. Did you try gcc 4.4 ??

Regards,

Sheng

Alexander Dobin

unread,

Jan 24, 2014, 4:30:50 PM1/24/14

to rna-...@googlegroups.com

Thanks for reporting this problem!

I have fixed it in this patch

ftp://ftp2.cshl.edu/gingeraslab/tracks/STARrelease/Alpha/STAR_2.3.1y.tgz

It compiles with gcc 4.7.0 without any problems.

Please let me know if you have any issues.

Cheers

Alex

Reply all

Reply to author

Forward