Paired end sequencing: unmapped too short high

93 views
Skip to first unread message

Jurgen Moonen

unread,
Nov 13, 2023, 11:34:50 AM11/13/23
to rna-star
Dear reader,

I am trying to map my mRNA seq reads with STAR. They are paired-end reads, from mosquito genome.

When mapping the paired-end reads I get high % of unmapped: too short: 15%
(Also I have high multimapping, but I am not concerned about that as this mosquito genome has many gene duplications).

I tried to map the R1 and R2 separately, I get 5.11% and 7.82% unmapped: too short respectively.

To me it is strange that I get a much higher number of nmapped: too short when I map the paired end reads togegther. Any idea why this is happening?

I was reading in other questions in this group that reducing the --outFilterMatchNminOverLread and --outFilterScoreMinOverLread could help. I reduced it to 0.5, but it barely made a difference (it went down to 14%).

I still want to see if the unmapped reads maybe map to some bacteria or so (from the microbiome). 

Curious if you can help me out!

Best Jurgen

Here the STAR output
jurgenm@octarine:~/23JPM015/Test/Alignments$ cat Mapping_Test_58_R1_20231108_Log.final.out
                                 Started job on |       Nov 08 16:46:27
                             Started mapping on |       Nov 08 16:46:44
                                    Finished on |       Nov 08 16:48:35
       Mapping speed, Million of reads per hour |       413.90

                          Number of input reads |       12762067
                      Average input read length |       59
                                    UNIQUE READS:
                   Uniquely mapped reads number |       7066426
                        Uniquely mapped reads % |       55.37%
                          Average mapped length |       58.59
                       Number of splices: Total |       613490
            Number of splices: Annotated (sjdb) |       597850
                       Number of splices: GT/AG |       582959
                       Number of splices: GC/AG |       1642
                       Number of splices: AT/AC |       224
               Number of splices: Non-canonical |       28665
                      Mismatch rate per base, % |       1.29%
                         Deletion rate per base |       0.04%
                        Deletion average length |       1.71
                        Insertion rate per base |       0.03%
                       Insertion average length |       1.61
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       4943932
             % of reads mapped to multiple loci |       38.74%
        Number of reads mapped to too many loci |       81524
             % of reads mapped to too many loci |       0.64%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |       0.00%
                 % of reads unmapped: too short |       5.11%
                     % of reads unmapped: other |       0.14%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

jurgenm@octarine:~/23JPM015/Test/Alignments$ cat Mapping_Test_58_R2_20231108_Log.final.out
                                 Started job on |       Nov 08 16:46:27
                             Started mapping on |       Nov 08 16:46:44
                                    Finished on |       Nov 08 16:48:39
       Mapping speed, Million of reads per hour |       399.51

                          Number of input reads |       12762067
                      Average input read length |       59
                                    UNIQUE READS:
                   Uniquely mapped reads number |       6562891
                        Uniquely mapped reads % |       51.42%
                          Average mapped length |       58.66
                       Number of splices: Total |       734259
            Number of splices: Annotated (sjdb) |       715745
                       Number of splices: GT/AG |       703118
                       Number of splices: GC/AG |       1584
                       Number of splices: AT/AC |       145
               Number of splices: Non-canonical |       29412
                      Mismatch rate per base, % |       1.28%
                         Deletion rate per base |       0.04%
                        Deletion average length |       1.72
                        Insertion rate per base |       0.03%
                       Insertion average length |       1.60
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       4699170
             % of reads mapped to multiple loci |       36.82%
        Number of reads mapped to too many loci |       57172
             % of reads mapped to too many loci |       0.45%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |       0.00%
                 % of reads unmapped: too short |       7.82%
                     % of reads unmapped: other |       3.48%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

jurgenm@octarine:~/23JPM015/Test/Alignments$ cat Mapping_Test_58_20231108_Log.final.out
                                 Started job on |       Nov 08 15:11:12
                             Started mapping on |       Nov 08 15:14:50
                                    Finished on |       Nov 08 15:16:57
       Mapping speed, Million of reads per hour |       361.76

                          Number of input reads |       12762067
                      Average input read length |       118
                                    UNIQUE READS:
                   Uniquely mapped reads number |       7427395
                        Uniquely mapped reads % |       58.20%
                          Average mapped length |       116.95
                       Number of splices: Total |       1505430
            Number of splices: Annotated (sjdb) |       1470402
                       Number of splices: GT/AG |       1450780
                       Number of splices: GC/AG |       3662
                       Number of splices: AT/AC |       429
               Number of splices: Non-canonical |       50559
                      Mismatch rate per base, % |       1.24%
                         Deletion rate per base |       0.04%
                        Deletion average length |       1.82
                        Insertion rate per base |       0.03%
                       Insertion average length |       1.70
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       3339406
             % of reads mapped to multiple loci |       26.17%
        Number of reads mapped to too many loci |       55758
             % of reads mapped to too many loci |       0.44%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |       0.00%
                 % of reads unmapped: too short |       15.14%
                     % of reads unmapped: other |       0.06%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

Alexander Dobin

unread,
Dec 6, 2023, 3:53:32 PM12/6/23
to rna-star
Hi Jurgen,

When reads are mapped together, they have to map concordantly - discordant alignments are not allowed, which may reduce the mapping rate.
Also, if you trimmed the reads, you may get alignments where the mates are protruding. You can try to see if --alignEndsProtrude 10 increases the mapped rate.
You can increase mapping rate by reducing --outFilterMatchNminOverLread and --outFilterScoreMinOverLread, but I usually do not recommend it, as you will be letting in poor-quality alignments.

Reply all
Reply to author
Forward
0 new messages