Hello,
I'm new to using rna-star and I wanted to check if I was making an obvious mistake as my alignment rate isn't that good (I aligned the same samples with Tophat2 and hisat2 previously and had an alignment rate of over 90%).
These are the commands used for generating index and aligning:
$ STAR --runMode genomeGenerate --runThreadN 64 --genomeDir /home/STAR/genome --genomeFastaFiles /home/STAR/Mus_musculus.GRCm38.dna.toplevel.fa --sjdbGTFfile /home/STAR/Mus_musculus.GRCm38.92.gtf --sjdbOverhang 100 --limitGenomeGenerateRAM=33524399489 (this last part was included because of a previous error which gave, SOLUTION: please specify --limitGenomeGenerateRAM not less than 33524399488 and make that much RAM available)
$ STAR --runThreadN 12 --genomeDir /home/STAR/genome --sjdbGTFfile /home/STAR/Mus_musculus.GRCm38.92.gtf --sjdbOverhang 100 --readFilesIn /home/5_S4_R1.fastq.gz /home/5_S4_R2.fastq.gz --readFilesCommand zcat --outFileNamePrefix Star_E13_5/Star_E13_5_peripheral --outSAMtype BAM Unsorted SortedByCoordinate
This is the Log.progress.out for one of the samples.
Time Speed Read Read Mapped Mapped Mapped Mapped Unmapped Unmapped Unmapped Unmapped
M/hr number length unique length MMrate multi multi+ MM short other
May 31 15:20:34 24.9 421749 158 61.9% 157.0 0.3% 7.8% 0.2% 0.0% 29.8% 0.2%
May 31 15:21:35 66.3 2247279 158 61.9% 157.0 0.3% 7.8% 0.3% 0.1% 29.8% 0.2%
May 31 15:22:36 80.1 4069270 158 61.8% 157.0 0.3% 7.9% 0.3% 0.1% 29.8% 0.2%
May 31 15:23:45 84.2 5891252 158 61.8% 157.0 0.3% 7.9% 0.3% 0.1% 29.8% 0.2%
May 31 15:24:46 87.1 7572704 158 61.7% 157.0 0.3% 7.8% 0.3% 0.1% 29.9% 0.2%
May 31 15:25:50 89.7 9394245 158 61.7% 157.0 0.3% 7.8% 0.3% 0.1% 29.9% 0.2%
May 31 15:26:50 92.4 11210419 158 61.7% 157.0 0.3% 7.8% 0.3% 0.1% 30.0% 0.2%
May 31 15:27:50 94.3 13023396 158 61.7% 157.0 0.3% 7.8% 0.3% 0.1% 30.0% 0.2%
May 31 15:28:58 94.5 14836357 158 61.7% 157.0 0.3% 7.8% 0.3% 0.1% 30.0% 0.2%
May 31 15:30:01 95.4 16649264 158 61.7% 157.0 0.3% 7.8% 0.3% 0.1% 30.0% 0.2%
May 31 15:31:02 95.7 18322660 158 61.7% 157.0 0.3% 7.8% 0.3% 0.1% 30.0% 0.2%
May 31 15:32:04 95.9 19996309 158 61.7% 157.0 0.3% 7.8% 0.3% 0.1% 30.0% 0.2%
May 31 15:33:04 97.6 21990591 158 61.7% 157.0 0.3% 7.8% 0.3% 0.1% 30.0% 0.2%
May 31 15:34:35 89.4 22409080 158 61.7% 157.0 0.3% 7.8% 0.3% 0.1% 30.0% 0.2%
Just for reference I thought I'll provide the summary for the hisat2 run for the above sample. I used the same fasta file used for STAR to index the genome (for aligning: hisat2 -p 12 -x /home/HISAT2_indexing/Mus_musculus.GRCm38.dna.toplevel_hisat2 -1 /home/5_S4_R1.fastq.gz -2 /home/5_S4_R2.fastq.gz -S /home/Hisat2_E13_5/Hisat2_E13_5.sam 2>Hisat2_E13_5/summary.txt)
22409080 reads; of these:
22409080 (100.00%) were paired; of these:
2953534 (13.18%) aligned concordantly 0 times
17515772 (78.16%) aligned concordantly exactly 1 time
1939774 (8.66%) aligned concordantly >1 times
----
2953534 pairs aligned concordantly 0 times; of these:
83480 (2.83%) aligned discordantly 1 time
----
2870054 pairs aligned 0 times concordantly or discordantly; of these:
5740108 mates make up the pairs; of these:
3615214 (62.98%) aligned 0 times
1699149 (29.60%) aligned exactly 1 time
425745 (7.42%) aligned >1 times
91.93% overall alignment rate
Do you know if there's a way I could improve my alignment using star?
Thank you!
Time Speed Read Read Mapped Mapped Mapped Mapped Unmapped Unmapped Unmapped Unmapped
M/hr number length unique length MMrate multi multi+ MM short other
Jun 02 11:40:21 232.0 3930598 79 79.8% 78.8 0.4% 14.8% 0.8% 0.0% 3.9% 0.8%
Jun 02 11:41:21 258.7 8696869 79 79.7% 78.8 0.4% 14.8% 0.8% 0.0% 4.0% 0.7%
Jun 02 11:42:21 275.8 13864269 79 79.8% 78.8 0.4% 14.7% 0.8% 0.0% 4.0% 0.7%
Jun 02 11:43:22 280.9 18885252 79 79.8% 78.8 0.4% 14.7% 0.8% 0.0% 4.0% 0.7%
Jun 02 11:45:02 235.9 22409080 79 79.8% 78.8 0.4% 14.7% 0.8% 0.0% 4.0% 0.6%
ALL DONE!