Star Stuck during 1st pass mapping

1,500 views
Skip to first unread message

neeraj bharti

unread,
Jun 1, 2015, 10:44:04 AM6/1/15
to rna-...@googlegroups.com
Dear Alex,

I am mapping reads to small genome 3.5Mbps  and the mapping always stuck at step "Created thread 15.
Genome index was created successfully.

Command for indexing : 

/STAR-STAR_2.4.1d/bin/Linux_x86_64_static/STAR  --runMode genomeGenerate --runThreadN 16 --genomeDir /test_6803/Annotation/star_index --genomeFastaFiles /test_6803/Annotation/star_index/NC_000911.fa --genomeSAindexNbases 10 --sjdbGTFfile /test_6803/Annotation/NC_000911.gtf --sjdbOverhang 99

OUTPUT are following:
787 genomeParameters.txt
29 chrName.txt
8 chrLength.txt
10 chrStart.txt
37 chrNameLength.txt
85K exonGeTrInfo.tab
28K geneInfo.tab
128K transcriptInfo.tab
27K exonInfo.tab
0 sjdbList.fromGTF.out.tab
0 sjdbList.out.tab
5 sjdbInfo.txt
3.5M Genome
29M SA
5.9M SAindex
17K Log.out

..... Started STAR run
... Starting to generate Genome files
... starting to sort  Suffix Array. This may take a long time...
... sorting Suffix Array chunks and saving them to disk...
... loading chunks from disk, packing SA...
... Finished generating suffix array
... starting to generate Suffix Array index...
..... Processing annotations GTF
... writing Genome to disk ...
... writing Suffix Array to disk ...
... writing SAindex to disk
..... Finished successfully


Main command for mapping are following
/STAR-STAR_2.4.1d/bin/Linux_x86_64/STAR --runThreadN 16 --genomeDir /test_6803/Annotation/star_index  --readFilesIn /test_6803/inputs/1.fastq --sjdbGTFfile /test_6803/Annotation/NC_000911.gtf --sjdbOverhang 99 --outSAMtype BAM SortedByCoordinate --twopassMode Basic --genomeLoad  NoSharedMemory -outSAMmapqUnique Integer0to255 --outFileNamePrefix SRR1019362/


stuck after 
..... Started STAR run
..... Loading genome
..... Processing annotations GTF
..... Started 1st pass mapping

Log.out says 

checking Genome sizefile size: 3670016 bytes; state: good=1 eof=0 fail=0 bad=0
315 checking SA sizefile size: 29481128 bytes; state: good=1 eof=0 fail=0 bad=0
316 checking /SAindex sizefile size: 6116787 bytes; state: good=1 eof=0 fail=0 bad=0
317 Read from SAindex: genomeSAindexNbases=10  nSAi=1398100
318 nGenome=3670016;  nSAbyte=29481128
319 GstrandBit=32   SA number of indices=7146940
320 Shared memory is not used for genomes. Allocated a private copy of the genome.
321 Genome file size: 3670016 bytes; state: good=1 eof=0 fail=0 bad=0
322 Loading Genome ... done! state: good=1 eof=0 fail=0 bad=0; loaded 3670016 bytes
323 SA file size: 29481131 bytes; state: good=1 eof=0 fail=0 bad=0
324 Loading SA ... done! state: good=0 eof=1 fail=1 bad=0; loaded 29481128 bytes
325 Loading SAindex ... done: 6116787 bytes
326 Finished loading the genome: Sat May 30 02:16:10 2015
327
328 Number of real (reference) chromosmes= 1
329 1       gi|16329170|ref|NC_000911.1|    3573470 0
330 alignIntronMax=alignMatesGapMax=0, the max intron size will be approximately determined by (2^winBinNbits)*winAnchorDistNbins=589824
331 May 30 02:16:10 ..... Processing annotations GTF
332 Processing sjdbGTFfile=/home/seq/rnaseq/test_6803/Annotation/NC_000911.gtf, found:
333                 3229 transcripts
334                 3229 exons (non-collapsed)
335                 0 collapsed junctions
336 May 30 02:16:10 ..... Finished GTF processing
337 May 30 02:16:10   Loaded database junctions from the GTF file: /home/seq/rnaseq/test_6803/Annotation/NC_000911.gtf: 0 total junctions
338
339 May 30 02:16:10   Finished preparing junctions
340 May 30 02:16:10 ..... Finished inserting junctions into genome
341 Created thread # 1
342 Created thread # 2
343 Created thread # 3
344 Created thread # 4
345 Created thread # 5
346 Created thread # 6
347 Created thread # 7
348 Created thread # 8
349 Created thread # 9
350 Created thread # 10
351 Created thread # 11
352 Created thread # 12
353 Created thread # 13
354 Created thread # 14
355 Created thread # 15

It stucked here only....
Thanks in advance.

Alexander Dobin

unread,
Jun 2, 2015, 11:49:54 AM6/2/15
to rna-...@googlegroups.com, neeraj4...@gmail.com
Hi Neeraj,

it looks like you do not have any splice sites in your annotations, in this case the gtf file is not required.
Could you please re-generate the genome without the gtf options, and then map again?

In the -outSAMmapqUnique Integer0to255 option, there should be two dashes, and the value should be integer.

Cheers
Alex

Shani Amarasinghe

unread,
Jun 9, 2015, 2:58:17 PM6/9/15
to rna-...@googlegroups.com, neeraj4...@gmail.com
Hi Alex and Neeraj,

I'm interested to know whether this issue was solved and if so how?

I am experiencing a similar issue as Neeraj mentioned. This is for mapping my reads to rRNA sequences. my sequences are from the well annotated model plant Arabidopsis thaliana.

The genome directory generation command I used is;
 /usr/local/Programs/STAR/STAR-2.4.1c/STAR \
--runThreadN 25 \
--runMode genomeGenerate \
--genomeDir /mnt/storage/storage1/project_data_directories/shani/Rnaseq/salt_Arabidopsis/Reference_files/Star_rRNA/ \
--genomeFastaFiles /mnt/storage/storage1/project_data_directories/shani/Rnaseq/salt_Arabidopsis/Reference_files/TAIR10_rRNA.fasta \
--sjdbOverhang 100 \
--genomeSAindexNbases 5

And the command used for mapping of two read files look something like this;
/usr/local/Programs/STAR/STAR-2.4.1c/STAR  \
--outTmpDir /tmp/shani/star/  \
--outWigType bedGraph  \
--outFilterIntronMotifs RemoveNoncanonical  \
--alignIntronMax 2000 \
--alignMatesGapMax 2000 \
--runThreadN 25  \
--genomeDir /mnt/storage/storage1/project_data_directories/shani/Rnaseq/salt_Arabidopsis/Reference_files/Star_rRNA/  \
--readFilesCommand 'pigz -dcp2'  \
--readFilesIn /mnt/storage/storage1/project_data_directories/shani/Rnaseq/salt_Arabidopsis/raw_data/Root/Sample_A17/A17_CGATGT_L001_R1_001.fastq.gz  /mnt/storage/storage1/project_data_directories/shani/Rnaseq/salt_Arabidopsis/raw_data/Root/Sample_A17/A17_CGATGT_L001_R2_001.fastq.gz  \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix A17_CGATGT_L001_\
 --outSAMattrRGline ID:A17_CGATGT_L001 DS:"Root Null Control (0Mm)" LB:TruSeq_Stranded PL:Illumina SM:A25 \
--outReadsUnmapped Fastx

It is stuck at
Jun 09 11:34:35 ..... Started STAR run
Jun 09 11:34:35 ..... Loading genome
Jun 09 11:34:35 ..... Started mapping

for nearly an hour now.

I believe this is unusual for STAR as it takes pride in having high speed.

Can I therefore please know what I am doing wrong here? I have attached the Log.out for this particular mapping here.

Thank you very much.

Regards,
Shani.
A17_CGATGT_L001_Log.out

Alexander Dobin

unread,
Jun 11, 2015, 6:54:55 PM6/11/15
to rna-...@googlegroups.com, sha19...@gmail.com, neeraj4...@gmail.com
Hi Shani,

if you map you general RNA-seq to rRNA-only sequences, the mapping will be very slow.
Please check this post for detailed discussion:

The best solution is to include the rRNA sequences(s) with the main genome.
Also, please switch to the 2.4.1d - it contains some serious bug fixes.

Cheers
Alex

Victor

unread,
Jun 19, 2015, 11:03:02 AM6/19/15
to rna-...@googlegroups.com, sha19...@gmail.com, neeraj4...@gmail.com
Hi Alex,

I am experiencing the same exact issue (even after making sure the version had been upgraded to the latest version). STAR aligner has now been running for over 12 hours and no results - I've been attempting to resolve this issue for a couple weeks now without success. Log data appears below.

Thank you for all your assistance in helping to resolve this issue!
V

STAR version=STAR_2.4.1d_modified
STAR compilation time,server,dir=Wed Jun 17 17:05:00 EDT 2015 modena.cshl.edu:/sonas-hs/gingeras/nlsas_norepl/user/dobin/STAR/STAR.sandbox/source
##### DEFAULT parameters:
versionSTAR                       20201
versionGenome                     20101   20200   
parametersFiles                   -   
sysShell                          -
runMode                           alignReads
runThreadN                        1
runDirPerm                        User_RWX
genomeDir                         ./GenomeDir/
genomeLoad                        NoSharedMemory
genomeFastaFiles                  -   
genomeSAindexNbases               14
genomeChrBinNbits                 18
genomeSAsparseD                   1
readFilesIn                       Read1   Read2   
readFilesCommand                  -   
readMatesLengthsIn                NotEqual
readMapNumber                     18446744073709551615
inputBAMfile                      -
bamRemoveDuplicatesType           -
bamRemoveDuplicatesMate2basesN    0
limitGenomeGenerateRAM            31000000000
limitIObufferSize                 150000000
limitOutSAMoneReadBytes           100000
limitOutSJcollapsed               1000000
limitOutSJoneRead                 1000
limitBAMsortRAM                   0
limitSjdbInsertNsj                1000000
outFileNamePrefix                 ./
outTmpDir                         -
outStd                            Log
outReadsUnmapped                  None
outQSconversionAdd                0
outSAMtype                        SAM   
outSAMmode                        Full
outSAMstrandField                 None
outSAMattributes                  Standard   
outSAMunmapped                    None
outSAMorder                       Paired
outSAMprimaryFlag                 OneBestScore
outSAMreadID                      Standard
outSAMmapqUnique                  255
outSAMflagOR                      0
outSAMflagAND                     65535
outSAMattrRGline                  -   
outSAMheaderHD                    -   
outSAMheaderPG                    -   
outSAMheaderCommentFile           -
outBAMcompression                 1
outBAMsortingThreadN              0
outSJfilterReads                  All
outSJfilterCountUniqueMin         3   1   1   1   
outSJfilterCountTotalMin          3   1   1   1   
outSJfilterOverhangMin            30   12   12   12   
outSJfilterDistToOtherSJmin       10   0   5   10   
outSJfilterIntronMaxVsReadN       50000   100000   200000   
outWigType                        None   
outWigStrand                      Stranded   
outWigReferencesPrefix            -
outWigNorm                        RPM   
outFilterType                     Normal
outFilterMultimapNmax             10
outFilterMultimapScoreRange       1
outFilterScoreMin                 0
outFilterScoreMinOverLread        0.66
outFilterMatchNmin                0
outFilterMatchNminOverLread       0.66
outFilterMismatchNmax             10
outFilterMismatchNoverLmax        0.3
outFilterMismatchNoverReadLmax    1
outFilterIntronMotifs             None
clip5pNbases                      0   
clip3pNbases                      0   
clip3pAfterAdapterNbases          0   
clip3pAdapterSeq                  -   
clip3pAdapterMMp                  0.1   
winBinNbits                       16
winAnchorDistNbins                9
winFlankNbins                     4
winAnchorMultimapNmax             50
scoreGap                          0
scoreGapNoncan                    -8
scoreGapGCAG                      -4
scoreGapATAC                      -8
scoreStitchSJshift                1
scoreGenomicLengthLog2scale       -0.25
scoreDelBase                      -2
scoreDelOpen                      -2
scoreInsOpen                      -2
scoreInsBase                      -2
seedSearchLmax                    0
seedSearchStartLmax               50
seedSearchStartLmaxOverLread      1
seedPerReadNmax                   1000
seedPerWindowNmax                 50
seedNoneLociPerWindow             10
seedMultimapNmax                  10000
alignIntronMin                    21
alignIntronMax                    0
alignMatesGapMax                  0
alignTranscriptsPerReadNmax       10000
alignSJoverhangMin                5
alignSJDBoverhangMin              3
alignSplicedMateMapLmin           0
alignSplicedMateMapLminOverLmate    0.66
alignWindowsPerReadNmax           10000
alignTranscriptsPerWindowNmax     100
alignEndsType                     Local
alignSoftClipAtReferenceEnds      Yes
chimSegmentMin                    0
chimScoreMin                      0
chimScoreDropMax                  20
chimScoreSeparation               10
chimScoreJunctionNonGTAG          -1
chimJunctionOverhangMin           20
chimOutType                       SeparateSAMold
sjdbFileChrStartEnd               -   
sjdbGTFfile                       -
sjdbGTFchrPrefix                  -
sjdbGTFfeatureExon                exon
sjdbGTFtagExonParentTranscript    transcript_id
sjdbGTFtagExonParentGene          gene_id
sjdbOverhang                      100
sjdbScore                         2
sjdbInsertSave                    Basic
quantMode                         -   
quantTranscriptomeBAMcompression    1
quantTranscriptomeBan             IndelSoftclipSingleend
twopass1readsN                    18446744073709551615
twopassMode                       None
##### Command Line:
/home/sequencing/star/STAR/bin/Linux_x86_64_static/STAR --runThreadN 12 --genomeDir /mnt/data/refData/1045 --readFilesIn 2000_1.fq.gz 2000_2.fq.gz --readFilesCommand zcat
##### Initial USER parameters from Command Line:
###### All USER parameters from Command Line:
runThreadN                    12     ~RE-DEFINED
genomeDir                     /mnt/data/refData/1045     ~RE-DEFINED
readFilesIn                   2000_1.fq.gz   2000_2.fq.gz        ~RE-DEFINED
readFilesCommand              zcat        ~RE-DEFINED
##### Finished reading parameters from all sources

##### Final user re-defined parameters-----------------:
runThreadN                        12
genomeDir                         /mnt/data/refData/1045
readFilesIn                       2000_1.fq.gz   2000_2.fq.gz   
readFilesCommand                  zcat   

-------------------------------
##### Final effective command line:
/home/peter/star/STAR/bin/Linux_x86_64_static/STAR   --runThreadN 12   --genomeDir /mnt/data/refData/1045   --readFilesIn 2000_1.fq.gz   2000_2.fq.gz      --readFilesCommand zcat   

##### Final parameters after user input--------------------------------:
versionSTAR                       20201
versionGenome                     20101   20200   
parametersFiles                   -   
sysShell                          -
runMode                           alignReads
runThreadN                        12
runDirPerm                        User_RWX
genomeDir                         /mnt/data/refData/1045
genomeLoad                        NoSharedMemory
genomeFastaFiles                  -   
genomeSAindexNbases               14
genomeChrBinNbits                 18
genomeSAsparseD                   1
readFilesIn                       2000_1.fq.gz   2000_2.fq.gz   
readFilesCommand                  zcat   
readMatesLengthsIn                NotEqual
readMapNumber                     18446744073709551615
inputBAMfile                      -
bamRemoveDuplicatesType           -
bamRemoveDuplicatesMate2basesN    0
limitGenomeGenerateRAM            31000000000
limitIObufferSize                 150000000
limitOutSAMoneReadBytes           100000
limitOutSJcollapsed               1000000
limitOutSJoneRead                 1000
limitBAMsortRAM                   0
limitSjdbInsertNsj                1000000
outFileNamePrefix                 ./
outTmpDir                         -
outStd                            Log
outReadsUnmapped                  None
outQSconversionAdd                0
outSAMtype                        SAM   
outSAMmode                        Full
outSAMstrandField                 None
outSAMattributes                  Standard   
outSAMunmapped                    None
outSAMorder                       Paired
outSAMprimaryFlag                 OneBestScore
outSAMreadID                      Standard
outSAMmapqUnique                  255
outSAMflagOR                      0
outSAMflagAND                     65535
outSAMattrRGline                  -   
outSAMheaderHD                    -   
outSAMheaderPG                    -   
outSAMheaderCommentFile           -
outBAMcompression                 1
outBAMsortingThreadN              0
outSJfilterReads                  All
outSJfilterCountUniqueMin         3   1   1   1   
outSJfilterCountTotalMin          3   1   1   1   
outSJfilterOverhangMin            30   12   12   12   
outSJfilterDistToOtherSJmin       10   0   5   10   
outSJfilterIntronMaxVsReadN       50000   100000   200000   
outWigType                        None   
outWigStrand                      Stranded   
outWigReferencesPrefix            -
outWigNorm                        RPM   
outFilterType                     Normal
outFilterMultimapNmax             10
outFilterMultimapScoreRange       1
outFilterScoreMin                 0
outFilterScoreMinOverLread        0.66
outFilterMatchNmin                0
outFilterMatchNminOverLread       0.66
outFilterMismatchNmax             10
outFilterMismatchNoverLmax        0.3
outFilterMismatchNoverReadLmax    1
outFilterIntronMotifs             None
clip5pNbases                      0   
clip3pNbases                      0   
clip3pAfterAdapterNbases          0   
clip3pAdapterSeq                  -   
clip3pAdapterMMp                  0.1   
winBinNbits                       16
winAnchorDistNbins                9
winFlankNbins                     4
winAnchorMultimapNmax             50
scoreGap                          0
scoreGapNoncan                    -8
scoreGapGCAG                      -4
scoreGapATAC                      -8
scoreStitchSJshift                1
scoreGenomicLengthLog2scale       -0.25
scoreDelBase                      -2
scoreDelOpen                      -2
scoreInsOpen                      -2
scoreInsBase                      -2
seedSearchLmax                    0
seedSearchStartLmax               50
seedSearchStartLmaxOverLread      1
seedPerReadNmax                   1000
seedPerWindowNmax                 50
seedNoneLociPerWindow             10
seedMultimapNmax                  10000
alignIntronMin                    21
alignIntronMax                    0
alignMatesGapMax                  0
alignTranscriptsPerReadNmax       10000
alignSJoverhangMin                5
alignSJDBoverhangMin              3
alignSplicedMateMapLmin           0
alignSplicedMateMapLminOverLmate    0.66
alignWindowsPerReadNmax           10000
alignTranscriptsPerWindowNmax     100
alignEndsType                     Local
alignSoftClipAtReferenceEnds      Yes
chimSegmentMin                    0
chimScoreMin                      0
chimScoreDropMax                  20
chimScoreSeparation               10
chimScoreJunctionNonGTAG          -1
chimJunctionOverhangMin           20
chimOutType                       SeparateSAMold
sjdbFileChrStartEnd               -   
sjdbGTFfile                       -
sjdbGTFchrPrefix                  -
sjdbGTFfeatureExon                exon
sjdbGTFtagExonParentTranscript    transcript_id
sjdbGTFtagExonParentGene          gene_id
sjdbOverhang                      100
sjdbScore                         2
sjdbInsertSave                    Basic
quantMode                         -   
quantTranscriptomeBAMcompression    1
quantTranscriptomeBan             IndelSoftclipSingleend
twopass1readsN                    18446744073709551615
twopassMode                       None
----------------------------------------


   Input read files for mate 1, from input string 2000_1.fq.gz
-rw-r--r-- 1 root root 1053329552 May 28 09:58 2000_1.fq.gz

   readsCommandsFile:
exec > "./_STARtmp/tmp.fifo.read1"
echo FILE 0
zcat      "2000_1.fq.gz"


   Input read files for mate 2, from input string 2000_2.fq.gz
-rw-r--r-- 1 root root 1051794125 May 28 09:59 2000_2.fq.gz

   readsCommandsFile:
exec > "./_STARtmp/tmp.fifo.read2"
echo FILE 0
zcat      "2000_2.fq.gz"

Finished loading and checking parameters
Reading genome generation parameters:
versionGenome                 20201        ~RE-DEFINED
genomeFastaFiles              1045.fa        ~RE-DEFINED
genomeSAindexNbases           14     ~RE-DEFINED
genomeChrBinNbits             18     ~RE-DEFINED
genomeSAsparseD               1     ~RE-DEFINED
sjdbOverhang                  100     ~RE-DEFINED
sjdbFileChrStartEnd           -        ~RE-DEFINED
sjdbGTFfile                   -     ~RE-DEFINED
sjdbGTFchrPrefix              -     ~RE-DEFINED
sjdbGTFfeatureExon            exon     ~RE-DEFINED
sjdbGTFtagExonParentTranscripttranscript_id     ~RE-DEFINED
sjdbGTFtagExonParentGene      gene_id     ~RE-DEFINED
sjdbInsertSave                Basic     ~RE-DEFINED
Genome version is compatible with current STAR version
--sjdbOverhang = 100 taken from the generated genome
Started loading the genome: Thu Jun 18 08:11:21 2015

checking Genome sizefile size: 3091464192 bytes; state: good=1 eof=0 fail=0 bad=0
checking SA sizefile size: 24235659369 bytes; state: good=1 eof=0 fail=0 bad=0
checking /SAindex sizefile size: 1565873619 bytes; state: good=1 eof=0 fail=0 bad=0
Read from SAindex: genomeSAindexNbases=14  nSAi=357913940
nGenome=3091464192;  nSAbyte=24235659369
GstrandBit=32   SA number of indices=5875311362
Shared memory is not used for genomes. Allocated a private copy of the genome.
Genome file size: 3091464192 bytes; state: good=1 eof=0 fail=0 bad=0
Loading Genome ... done! state: good=1 eof=0 fail=0 bad=0; loaded 3091464192 bytes
SA file size: 24235659372 bytes; state: good=1 eof=0 fail=0 bad=0
Loading SA ... done! state: good=0 eof=1 fail=1 bad=0; loaded 24235659369 bytes
Loading SAindex ... done: 1565873619 bytes
Finished loading the genome: Thu Jun 18 08:33:36 2015

Number of real (reference) chromosmes= 25
1	chr1	248956422	0
2	chr2	242193529	249036800
3	chr3	198295559	491257856
4	chr4	190214555	689700864
5	chr5	181538259	880017408
6	chr6	170805979	1061683200
7	chr7	159345973	1232601088
8	chr8	145138636	1391984640
9	chr9	138394717	1537212416
10	chr10	133797422	1675624448
11	chr11	135086622	1809580032
12	chr12	133275309	1944846336
13	chr13	114364328	2078277632
14	chr14	107043718	2192834560
15	chr15	101991189	2300051456
16	chr16	90338345	2402287616
17	chr17	83257441	2492727296
18	chr18	80373285	2576089088
19	chr19	58617616	2656567296
20	chr20	64444167	2715287552
21	chr21	46709983	2779774976
22	chr22	50818468	2826698752
23	chrM	16569	2877554688
24	chrX	156040895	2877816832
25	chrY	57227415	3034054656
alignIntronMax=alignMatesGapMax=0, the max intron size will be approximately determined by (2^winBinNbits)*winAnchorDistNbins=589824
Created thread # 1
Created thread # 2
Created thread # 3
Created thread # 4
Starting to map file # 0
mate 1:   2000_1.fq.gz
mate 2:   2000_2.fq.gz
Created thread # 5
Created thread # 6
Created thread # 7
Created thread # 8
Created thread # 9
Created thread # 10
Created thread # 11

Alexander Dobin

unread,
Jun 22, 2015, 12:27:49 PM6/22/15
to rna-...@googlegroups.com, thed...@existencehealth.com, sha19...@gmail.com, neeraj4...@gmail.com
Hi Victor,

From the Log.out file, it seems that the SA file got corrupted somehow - it was not loaded properly.
Please re-generate the genome with the latest version of STAR if you have not done so.
If it still does not work, please send me the Log.out files from both the genome generation and mapping steps.

Cheers
Alex

Christine Yang

unread,
Sep 27, 2018, 6:08:23 PM9/27/18
to rna-star
Hi Alex, 

I also have a similar error where I got a Segmentation fault during the mapping stage, and the log stopped at "Created thread #7". I'm trying to map reads to a combined human/mouse genome. 

I have my genome generation log and mapping log attached. Thanks!
-Christine-
Log.out
t60_1ALog.out

Alexander Dobin

unread,
Oct 1, 2018, 11:46:02 AM10/1/18
to rna-star
Hi Christine,

the 2.6.0a release was buggy - please try the latest one:

If this does not help, please send me the Log.out file, links to fasta files, and a minimal set of reads that reproduces the error.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages