—alignEndsType=”Local”
Sample Total_reads Uniq_Rate Multi_Rate Unmap_Rate
00004_nonstranded 61617213 92.57 3.64 3.79
00005_nonstranded 63096109 93.02 3.53 3.45
00004_stranded 62893222 90.56 3.73 5.71
00005_stranded 61184208 90.17 3.66 6.17
—alignEndsType=”EndToEnd”
Sample Total_reads Uniq_Rate Multi_Rate Unmap_Rate
00004_nonstranded 61617213 89.44 3.40 7.16
00005_nonstranded 63096109 89.89 3.29 6.82
00004_stranded 62893222 85.53 3.37 11.10
00005_stranded 61184208 84.63 3.28 12.09
STAR itself is NOT strand-aware when mapping the reads to the reference genome. In general:
1. Stranded vs non-stranded sequencing: slightly more reads are uniquely mapped in non-stranded sequencing (accordingly, less reads become unmapped in non-stranded sequencing
2. When “—alignEndsType” is set from “Local” to “EndToEnd”, its impact on “stranded” sequencing is much bigger than nonstranded sequencing. For instance, for "uniquely mapped reads" in sample 0004:
a. Nonstranded: 92.57 (LOCAL) à 89.44 (EndToEnd”): ~3% DIFFERENCE
b. Stranded: 90.56 (LOCAL) à 84.63 (EndToEnd”): ~6% DIFFERENCE
Especially for the 2nd fact, I don't have a good explanation.
My command line looks like:
STAR --genomeDir /hpc/grid/shared/ngsdb/STAR/hg19_gencode19 --readFilesIn /hpc/grid/ngsws/molmed/data/BGIpilotNov2014/CleanData/RC-140808-00004_stranded_1.fq.gz /hpc/grid/ngsws/molmed/data/BGIpilotNov2014/CleanData/RC-140808-00004_stranded_2.fq.gz --readFilesCommand zcat --runThreadN 8 --alignSJDBoverhangMin 1 --outReadsUnmapped Fastx --alignEndsType EndToEnd --outFilterMismatchNoverLmax 0.05 --alignIntronMax 1000000 --outSAMtype BAM SortedByCoordinate;