STAR crashes when trying to map reads on small viral genome

230 views
Skip to first unread message

Kostas

unread,
Aug 4, 2014, 9:22:47 PM8/4/14
to rna-...@googlegroups.com
 Hi,
I am trying to map about 30M RNAseq reads on viral genomes (e.g. Epstein-Barr). The genome is only one chromosome with 171823bp.
I have generated the genome index using the following command
STAR --runMode genomeGenerate --genomeDir STARIndex/ --runThreadN 32 --sjdbFileChrStartEnd Annotation/NC_007605.intron.tbl --sjdbOverhang 99 --genomeFastaFiles WholeGenome/genome.fa
which finished without complains:

Aug 04 15:06:31 ... Starting to generate Genome files
Loaded database junctions from file: Annotation/NC_007605.intron.tbl: 190 junctions

WholeGenome/genome.fa : chr # 0  "EBV" chrStart: 0
Aug 04 15:06:31 ... finished processing splice junctions database ...
Writing genome to disk... done.
Number of SA indices: 364238
SA size in bytes: 1502482
Aug 04 15:06:31 ... starting to sort  Suffix Array. This may take a long time...
Number of chunks: 35;   chunks size limit: 93992 bytes
Aug 04 15:06:31 ... sorting Suffix Array chunks and saving them to disk...
Aug 04 15:06:32 ... loading chunks from disk, packing SA...
Aug 04 15:06:32 ... writing Suffix Array to disk ...
Aug 04 15:06:32 ... Finished generating suffix array
Aug 04 15:06:32 ... starting to generate Suffix Array index...
0%  done
Aug 04 15:07:55 ... writing SAindex to disk
Aug 04 15:07:57 ..... Finished successfully
DONE: Genome generation, EXITING



Now when I try to map my transcriptome reads to the reference genome the program crashes with SegFault few seconds after it starts.

STAR_2.3.1z15/STAR --genomeDir /Epstein-Barr/STARIndex --outSAMmode Full --outSAMattributes All --readFilesIn
/fastq/S1_R1.fq.gz /fastq/S1_R2.fq.gz --readFilesCommand zcat --runThreadN 1 --genomeLoad LoadAndRemove --outStd SAM --outSAMunmapped Within --outFileNamePrefix /STAR-bamfiles-EBV/S1 --outSAMattrRGline ID:S1 PL:illumina PU:S1 SM:S1  --chimSegmentMin 15 --chimJunctionOverhangMin 15



Log.progress.out contains only the headers
           Time    Speed        Read     Read   Mapped   Mapped   Mapped   Mapped Unmapped Unmapped Unmapped Unmapped
                    M/hr      number   length   unique   length   MMrate    multi   multi+       MM    short    other


while Log.out ends with:
Number of real (reference) chromosmes= 1
1       EBV     171823  0
Processing splice junctions database sjdbN=52,   sjdbOverhang=99
alignIntronMax=alignMatesGapMax=0, the max intron size will be approximately determined by (2^winBinNbits)*winAnchorDistNbins=589824
Opening the file: /STAR-bamfiles-EBV/S1_STARtmp//Chimeric.out.sam.thread0 ... ok
Opening the file: /STAR-bamfiles-EBV/S1_STARtmp//Chimeric.out.junction.thread0 ...
ok
Starting to map file # 0
mate 1:   /fastq/S1_R1.fq.gz
mate 2:   /fastq/S1_R2.fq.gz

The versions of STAR I tried are 2.3.1z and 2.3.1z15


Thanks in advance for any help
K

Alexander Dobin

unread,
Aug 12, 2014, 4:33:09 PM8/12/14
to rna-...@googlegroups.com
Hi Kostas,

you need to use   --genomeSAindexNbases 7 (or even smaller) at the genome generation step. Please see this post:
Also, if you trying to detect EBV in the human transcriptome data, I strongly recommend adding the EBV genome to human genome, rather than using just the EBV genome -see this post:

Cheers
Alex

Kostas

unread,
Aug 13, 2014, 6:21:49 PM8/13/14
to rna-...@googlegroups.com
Hi Alex,
both suggestions worked.
At the end I created a combined genome with both the human and viral sequences.
Thanks
Kostas
Reply all
Reply to author
Forward
0 new messages