Hi, I recently downloaded some data from the paper
http://www.nature.com/nbt/journal/v31/n11/full/nbt.2705.html . It is long RNA-seq read data. Average read length is about 1kb or so. Initially I ran using the following parameters with the hg19 genome that is provided on the STAR ftp site
../STAR --genomeDir ../hg19/hg19 --readFilesIn ../../../Documents/PacBio/emi22260.css.fastq --outFilterMatchNminOverLread .25 --outFilterMismatchNmax 100 --outFilterScoreMinOverLread .25
This caused a Segmentation fault (core dumped) error almost immediately after the mapping phase began. Log file was empty aside from the header. I tried using default parameters
../STAR --genomeDir ../hg19/hg19 --readFilesIn ../../../Documents/PacBio/emi22260.css.fastq
and it ran a bit longer. The log file actually printed two lines, although nothing was printed in Aligned.out.sam
Jan 15 10:57:25 0.7 14653 1004 19.1% 772.9 0.6% 1.2% 0.0% 0.0% 78.7% 1.0%
Jan 15 10:58:26 0.8 29428 999 25.3% 777.9 0.5% 2.2% 0.0% 0.0% 71.6% 0.9%
It ended with the same error. Relatedly, the alignment rate is extremely low (in the paper they got 98% overall). Im guessing reads were hitting the max number of mismatches and being thrown out but Im not sure.
So my question is, what is the best way to use STAR when working with very long reads?