Core dump error in genomeGenerate step

AT

unread,

Oct 20, 2015, 2:02:54 PM10/20/15

to rna-star

I am trying to run STAR for some RNA seq data that I have. However, when I run the first step as given below:

STAR-STAR_2.4.2a/bin/Linux_x86_64/STAR --runThreadN 8 --runMode genomeGenerate --genomeDir genomeDir/ --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbGTFfile ../Software/gencode.v23.annotation.gtf --sjdbOverhang 75

I get the following error:

Oct 16 14:03:16 ..... Started STAR run

Oct 16 14:03:16 ... Starting to generate Genome files

Oct 16 14:04:27 ... starting to sort Suffix Array. This may take a long time...

Oct 16 14:04:47 ... sorting Suffix Array chunks and saving them to disk...

Oct 16 14:20:09 ... loading chunks from disk, packing SA...

Oct 16 14:21:24 ... Finished generating suffix array

Oct 16 14:21:24 ... starting to generate Suffix Array index...

Oct 16 14:44:24 ..... Processing annotations GTF

terminate called after throwing an instance of 'std::out_of_range'

what(): vector::_M_range_check

[3]+ Aborted (core dumped)

How do I fix this error, any help will be great. Thanks!

Kirill Tsyganov

unread,

Oct 20, 2015, 5:37:38 PM10/20/15

to AT, rna-star

Hi Arti,

I'm pretty sure the problem is because your chromosome names don't correspond between GTF file and FASTA file. I don't think its a good idea to mix reference files from different organisations... Your fasta file appears to be from Ensembl and they use just numbers for they chromosomes e.g 1 for chromosome 1, whereas your GTF file is from gencode and they use chr prefix before the chromosome numbers e.g chr1 for chromosome 1. Your error is most likely because `1 != chr1`.. either pick both of FASTA and GTF from gencode or Ensembl don't mix them together..

Also aside here is discussion about latest human reference GRCh38... https://groups.google.com/forum/#!searchin/rna-star/mapping$20to$20hg38$2FGRCh38/rna-star/mo1QZ-7QPkc/989_AVNlCgAJ...

cheers,

Kirill

--
You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/rna-star.

AT

unread,

Oct 27, 2015, 11:40:56 AM10/27/15

to rna-star

Hi Kirill,

Thanks a lot for the tip, it's running now. I had a quick question regarding the two pass run for each sample. I am running the first pass using the following command:

STAR-STAR_2.4.2a/bin/Linux_x86_64/STAR --runThreadN 4 --genomeDir ../../genomeDir/ --readFilesIn ../../../CutadaptOutput/Default/D1/D1_R1.fastq.gz ../../../CutadaptOutput/Default/D1/D1_R2.fastq.gz --readFilesCommand zcat --outSAMtype BAM Unsorted --twopassMode Basic&

For the second pass, do I run the same commad with the same set of parameters?

Also, I am sharing the ../../genomeDir/ folder amongst the runs for various samples (including variations of the input files for the same sample, i.e. cases where I trim some reads; and cases where I don't for the sample), is that OK, or will it cause errors? Should I copy the genomeDir for each sample, and each run?

Thanks,

Arti

Kirill Tsyganov

unread,

Oct 28, 2015, 1:27:05 AM10/28/15

to rna-star

Hi Arti,

I've never used 2-pass mapping (not yet anyway). According to the docs, you can do 'per-sample 2-pass mapping' and for that you just need to include additional flag `--twopassMode Basic` just like you did. According to docs STAR will do both passes for you. You shouldn't need to re-enter any commands.

From docs

To run STAR 2-pass mapping for each sample separately, use --twopassMode Basic option. STAR will perform the 1st pass mapping, then it will automatically extract junctions, insert them into the genome index, and, finally, re-map all reads in the 2nd mapping pass.

As for your second question, yes you can and no don't make multiple copies! Naturally you would want to loop over your fastq files and submit the right once for mapping.. so that all of your samples are mapped in parallel. There might be better approach for this, using `--genomeLoad` flag, but I never got around doing this .. The idea (I think) you'd load `yourGenomeDir` in memory once and keep it there for all of your runs, but then I think you need to run explicit command to "Remove" `yourGenomeDir` from memory..