Dear reader,
I am trying to map my mRNA seq reads with STAR. They are paired-end reads, from mosquito genome.
When mapping the paired-end reads I get high % of unmapped: too short: 15%
(Also I have high multimapping, but I am not concerned about that as this mosquito genome has many gene duplications).
I tried to map the R1 and R2 separately, I get 5.11% and 7.82% unmapped: too short respectively.
To me it is strange that I get a much higher number of nmapped: too short when I map the paired end reads togegther. Any idea why this is happening?
I was reading in other questions in this group that reducing the --outFilterMatchNminOverLread and --outFilterScoreMinOverLread could help. I reduced it to 0.5, but it barely made a difference (it went down to 14%).
I still want to see if the unmapped reads maybe map to some bacteria or so (from the microbiome).
Curious if you can help me out!
Best Jurgen
Here the STAR output
jurgenm@octarine:~/23JPM015/Test/Alignments$ cat Mapping_Test_58_R1_20231108_Log.final.out
Started job on | Nov 08 16:46:27
Started mapping on | Nov 08 16:46:44
Finished on | Nov 08 16:48:35
Mapping speed, Million of reads per hour | 413.90
Number of input reads | 12762067
Average input read length | 59
UNIQUE READS:
Uniquely mapped reads number | 7066426
Uniquely mapped reads % | 55.37%
Average mapped length | 58.59
Number of splices: Total | 613490
Number of splices: Annotated (sjdb) | 597850
Number of splices: GT/AG | 582959
Number of splices: GC/AG | 1642
Number of splices: AT/AC | 224
Number of splices: Non-canonical | 28665
Mismatch rate per base, % | 1.29%
Deletion rate per base | 0.04%
Deletion average length | 1.71
Insertion rate per base | 0.03%
Insertion average length | 1.61
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 4943932
% of reads mapped to multiple loci | 38.74%
Number of reads mapped to too many loci | 81524
% of reads mapped to too many loci | 0.64%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 5.11%
% of reads unmapped: other | 0.14%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
jurgenm@octarine:~/23JPM015/Test/Alignments$ cat Mapping_Test_58_R2_20231108_Log.final.out
Started job on | Nov 08 16:46:27
Started mapping on | Nov 08 16:46:44
Finished on | Nov 08 16:48:39
Mapping speed, Million of reads per hour | 399.51
Number of input reads | 12762067
Average input read length | 59
UNIQUE READS:
Uniquely mapped reads number | 6562891
Uniquely mapped reads % | 51.42%
Average mapped length | 58.66
Number of splices: Total | 734259
Number of splices: Annotated (sjdb) | 715745
Number of splices: GT/AG | 703118
Number of splices: GC/AG | 1584
Number of splices: AT/AC | 145
Number of splices: Non-canonical | 29412
Mismatch rate per base, % | 1.28%
Deletion rate per base | 0.04%
Deletion average length | 1.72
Insertion rate per base | 0.03%
Insertion average length | 1.60
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 4699170
% of reads mapped to multiple loci | 36.82%
Number of reads mapped to too many loci | 57172
% of reads mapped to too many loci | 0.45%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 7.82%
% of reads unmapped: other | 3.48%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
jurgenm@octarine:~/23JPM015/Test/Alignments$ cat Mapping_Test_58_20231108_Log.final.out
Started job on | Nov 08 15:11:12
Started mapping on | Nov 08 15:14:50
Finished on | Nov 08 15:16:57
Mapping speed, Million of reads per hour | 361.76
Number of input reads | 12762067
Average input read length | 118
UNIQUE READS:
Uniquely mapped reads number | 7427395
Uniquely mapped reads % | 58.20%
Average mapped length | 116.95
Number of splices: Total | 1505430
Number of splices: Annotated (sjdb) | 1470402
Number of splices: GT/AG | 1450780
Number of splices: GC/AG | 3662
Number of splices: AT/AC | 429
Number of splices: Non-canonical | 50559
Mismatch rate per base, % | 1.24%
Deletion rate per base | 0.04%
Deletion average length | 1.82
Insertion rate per base | 0.03%
Insertion average length | 1.70
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 3339406
% of reads mapped to multiple loci | 26.17%
Number of reads mapped to too many loci | 55758
% of reads mapped to too many loci | 0.44%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 15.14%
% of reads unmapped: other | 0.06%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%