STAR \
--genomeDir ${ref_genome} \
--readFilesIn reads.fastq \
--outFileNamePrefix reads \
--outFilterIntronMotifs RemoveNoncanonical \
--outSJfilterReads Unique \
--outFilterMultimapNmax 1 \
--outFilterMismatchNoverLmax 0.03 \
--outStd SAM > reads.sam
From my understanding, I am filtering out only those reads that are mapped uniquely, removing non canonical reads, allowing only 1 alignment that can be multi mapped and if there is more, it is considered as unmapped. I get the following as the result:
Number of input reads | 31515501
Average input read length | 50
UNIQUE READS:
Uniquely mapped reads number | 24654387
Uniquely mapped reads % | 78.23%
Average mapped length | 50.75
Number of splices: Total | 3656883
Number of splices: Annotated (sjdb) | 3639208
Number of splices: GT/AG | 3637727
Number of splices: GC/AG | 17677
Number of splices: AT/AC | 1479
Number of splices: Non-canonical | 0
Mismatch rate per base, % | 0.17%
Deletion rate per base | 0.00%
Deletion average length | 1.49
Insertion rate per base | 0.00%
Insertion average length | 1.08
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 0
% of reads mapped to multiple loci | 0.00%
Number of reads mapped to too many loci | 5898515
% of reads mapped to too many loci | 18.72%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.03%
% of reads unmapped: too short | 2.25%
% of reads unmapped: other | 0.78%
Although the number of unmapped reads is less, almost 19% of the reads go into multimapping reads? I am not going to use them while I count features using a tool like HTSeq downstream.
My question is : How do I get only those reads that gets mapped to different loci ? I would like to investigate which reads .
Thanks,
RV