This is my first STAR (version 2.5.2b) run, and I have a few questions about the statistics in the Log.final.out file. I'm using HOMER's
map-star.pl script to map 100bp PE reads.
$ STAR --genomeLoad LoadAndKeep --outReadsUnmapped Fastx --genomeDir /path/to/genome/star-indexes/hg38-starIndex --runThreadN 72 --readFilesIn SM01_R1_merged_val_1.fq SM01_R2_merged_val_2.fq --outFileNamePrefix SM01_R1_merged_val_1.fq.hg38-starIndex.
Log.final.out output:
Started job on | Oct 19 15:13:30
Started mapping on | Oct 19 15:15:14
Finished on | Oct 19 15:20:37
Mapping speed, Million of reads per hour | 309.93
Number of input reads | 27807734
Average input read length | 194
UNIQUE READS:
Uniquely mapped reads number | 20450138
Uniquely mapped reads % | 73.54%
Average mapped length | 192.66
Number of splices: Total | 9387626
Number of splices: Annotated (sjdb) | 9106647
Number of splices: GT/AG | 9247743
Number of splices: GC/AG | 76354
Number of splices: AT/AC | 7323
Number of splices: Non-canonical | 56206
Mismatch rate per base, % | 0.18%
Deletion rate per base | 0.01%
Deletion average length | 1.47
Insertion rate per base | 0.01%
Insertion average length | 1.36
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 5996936
% of reads mapped to multiple loci | 21.57%
Number of reads mapped to too many loci | 61402
% of reads mapped to too many loci | 0.22%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 4.63%
% of reads unmapped: other | 0.04%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Based on the STAR documentation guide found here:
BioCloud RNA-Seq (STAR) Result Documentation, I have the following questions:
- Is 73.54% uniquely mapped reads OK?
- The mismatch rate per base is 0.18%. If a good library is anywhere between 0.5%-0.8%, is the library very high quality?
- The % of reads mapped to multiple loci is 21.57%. As I understand it, this number is very high. What could be the issue here?
- The % of reads unmapped due to too short is 4.63%; is this value decent? What constitutes a poor or good % related to sequencing quality?
Any other advice or critique of output is welcome and appreciated!
Thanks!