different number of mapped reads with/without quantMode

27 views
Skip to first unread message

Federico Ansaloni

unread,
Jul 8, 2019, 7:17:13 AM7/8/19
to rna-star
Hi Alex,
I run on the very same genome and fastq files STAR with/without quantMode. The 2 commands I used are reported below:

Standard mapping:

STAR --runMode genomeGenerate --genomeDir $wd --genomeFastaFiles $genome

STAR --outSAMunmapped Within --outMultimapperOrder Random --outSAMtype BAM Unsorted --outStd BAM_Unsorted --runThreadN 20 --genomeDir $wd --readFilesIn $reads --readFilesCommand zcat > Aligned_allreads.out.bam


quantMode:

STAR --runMode genomeGenerate --genomeDir $wd --genomeFastaFiles $genome --sjdbGTFfile $gtf

STAR --outSAMunmapped Within --outMultimapperOrder Random --outSAMtype BAM Unsorted --outStd BAM_Unsorted --genomeDir $wd --readFilesIn $reads --readFilesCommand zcat --quantMode GeneCounts > Aligned_allreads.out.bam


The Log.final.out of the first command reports 6,032,682 mapped reads while the second one 6,045,145. 

Standard mapping Log.final.out

                                 Started job on | Jul 07 19:20:08

                             Started mapping on | Jul 07 19:21:08

                                    Finished on | Jul 07 19:22:27

       Mapping speed, Million of reads per hour | 323.28


                          Number of input reads | 7094252

                      Average input read length | 200

                                    UNIQUE READS:

                   Uniquely mapped reads number | 6032682

                        Uniquely mapped reads % | 85.04%

                          Average mapped length | 197.81

                       Number of splices: Total | 3103886

            Number of splices: Annotated (sjdb) | 0

                       Number of splices: GT/AG | 3066008

                       Number of splices: GC/AG | 27326

                       Number of splices: AT/AC | 1791

               Number of splices: Non-canonical | 8761

                      Mismatch rate per base, % | 0.57%

                         Deletion rate per base | 0.03%

                        Deletion average length | 2.41

                        Insertion rate per base | 0.03%

                       Insertion average length | 2.07

                             MULTI-MAPPING READS:

        Number of reads mapped to multiple loci | 170011

             % of reads mapped to multiple loci | 2.40%

        Number of reads mapped to too many loci | 7546

             % of reads mapped to too many loci | 0.11%

                                  UNMAPPED READS:

       % of reads unmapped: too many mismatches | 0.00%

                 % of reads unmapped: too short | 12.42%

                     % of reads unmapped: other | 0.04%

                                  CHIMERIC READS:

                       Number of chimeric reads | 0

                            % of chimeric reads | 0.00%


quantMode:

                                 Started job on | Jul 06 10:54:20

                             Started mapping on | Jul 06 10:54:33

                                    Finished on | Jul 06 10:56:00

       Mapping speed, Million of reads per hour | 293.56


                          Number of input reads | 7094252

                      Average input read length | 200

                                    UNIQUE READS:

                   Uniquely mapped reads number | 6045145

                        Uniquely mapped reads % | 85.21%

                          Average mapped length | 198.57

                       Number of splices: Total | 3811119

            Number of splices: Annotated (sjdb) | 3740803

                       Number of splices: GT/AG | 3761944

                       Number of splices: GC/AG | 36356

                       Number of splices: AT/AC | 3715

               Number of splices: Non-canonical | 9104

                      Mismatch rate per base, % | 0.57%

                         Deletion rate per base | 0.04%

                        Deletion average length | 2.40

                        Insertion rate per base | 0.03%

                       Insertion average length | 2.06

                             MULTI-MAPPING READS:

        Number of reads mapped to multiple loci | 164632

             % of reads mapped to multiple loci | 2.32%

        Number of reads mapped to too many loci | 6726

             % of reads mapped to too many loci | 0.09%

                                  UNMAPPED READS:

       % of reads unmapped: too many mismatches | 0.00%

                 % of reads unmapped: too short | 12.33%

                     % of reads unmapped: other | 0.04%

                                  CHIMERIC READS:

                       Number of chimeric reads | 0

                            % of chimeric reads | 0.00%


I was expecting these 2 numbers to be the same, I hypothesize that providing the gtf file in the index generation command may somehow change the output and so the number of mapped reads but I was wondering why and how this is happening.

Thank you.
Federico




Alexander Dobin

unread,
Jul 9, 2019, 7:07:37 PM7/9/19
to rna-star
Hi Federico,

this is the effect of including annotations - no the counting per se. You can check it by running your 2nd example *without* --quantMode GeneCounts (but with the genome generate *with* GTF).
Annotations affect mapping because STAR utilizes information about locations of annotated junctions.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages