too many short reads failed to map

97 views
Skip to first unread message

Assa Yeroslaviz

unread,
Jan 13, 2023, 4:04:17 AM1/13/23
to rna-star
I have a Ribo-Seq dataset I am trying to map using STAR 2.7.10a with the following parameters:

--limitBAMsortRAM 7062663806 
--winAnchorMultimapNmax 20  
--seedSearchStartLmax  15 
--outFilterMultimapNmax  8 
--outFilterMismatchNmax 4 

As I'm mapping against the transcriptome, I am not using any gtf file neither for indexing the genome, nor the mapping itself. 

The mapping is preceded by cutadapt to remove adapter, discarding also untrimmed reads and fastx_trimmed to remove the first base after trimming. I then map against the rRNA to remove those reads and the rest is then mapped against my transcriptome. 

but for some reason, over 80% of my reads fail to align due to being too short. 

less GSM3152879/GSM3152879.Log.final.out 

                                 Started job on |       Jan 12 16:10:05

                             Started mapping on |       Jan 12 16:10:10

                                    Finished on |       Jan 12 16:59:26

       Mapping speed, Million of reads per hour |       414.20


                          Number of input reads |       340105409

                      Average input read length |       36

                                    UNIQUE READS:

                   Uniquely mapped reads number |       4841867

                        Uniquely mapped reads % |       1.42%

                          Average mapped length |       17.95

                       Number of splices: Total |       14760

            Number of splices: Annotated (sjdb) |       0

                       Number of splices: GT/AG |       14723

                       Number of splices: GC/AG |       27

                       Number of splices: AT/AC |       0

               Number of splices: Non-canonical |       10

                      Mismatch rate per base, % |       4.84%

                         Deletion rate per base |       0.00%

                        Deletion average length |       1.29

                        Insertion rate per base |       0.00%

                       Insertion average length |       1.09

                             MULTI-MAPPING READS:

        Number of reads mapped to multiple loci |       34426996

             % of reads mapped to multiple loci |       10.12%

        Number of reads mapped to too many loci |       1648189

             % of reads mapped to too many loci |       0.48%

                                  UNMAPPED READS:

  Number of reads unmapped: too many mismatches |       0

       % of reads unmapped: too many mismatches |       0.00%

            Number of reads unmapped: too short |       299183031

                 % of reads unmapped: too short |       87.97%

                Number of reads unmapped: other |       5326

                     % of reads unmapped: other |       0.00%

                                  CHIMERIC READS:

                       Number of chimeric reads |       0

                            % of chimeric reads |       0.00%


but for some reason, over 80% of my reads fail to align due to being too short. 

What is the reason for this very high number of too short reads? 
Is it possible to map these reads as well (Does it make sense to try and map them?)

thanks
Assa

Alexander Dobin

unread,
Feb 3, 2023, 11:35:14 AM2/3/23
to rna-star
Hi Assa,

the reads are short to begin with, and the mapped portion of the reads that could map is only 18b, while the mismatch error rate is quite high ~5%.
Two possible reasons are (i) poor quality sequencing (ii) divergent genome. Both are hard to fix with such short reads.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages