Very low mapping quality score in single-end RNA-seq alignment

1,078 views
Skip to first unread message

Chilamakuri C.S. Reddy

unread,
May 4, 2016, 9:32:34 AM5/4/16
to rna-star
Hello All,
I am trying to align, single-end 50 bp, stranded, RNAseq reads to Mouse genome.
Strangely  I get maximum mapping quality score of 3. as a result I could not call variants using GATK.
I have 70 samples and all the sample have the same problem.
I tried using default parameters using 2.4 and 2.5 versions.


Here is the  *_Log.final.out file output.

                                 Started job on |   May 04 13:40:45
                             Started mapping on |   May 04 13:43:01
                                    Finished on |   May 04 13:47:43
       Mapping speed, Million of reads per hour |   626.90

                          Number of input reads |   49107549
                      Average input read length |   50
                                    UNIQUE READS:
                   Uniquely mapped reads number |   40934748
                        Uniquely mapped reads % |   83.36%
                          Average mapped length |   49.46
                       Number of splices: Total |   3637834
            Number of splices: Annotated (sjdb) |   0
                       Number of splices: GT/AG |   3607777
                       Number of splices: GC/AG |   21219
                       Number of splices: AT/AC |   2838
               Number of splices: Non-canonical |   6000
                      Mismatch rate per base, % |   0.20%
                         Deletion rate per base |   0.00%
                        Deletion average length |   1.67
                        Insertion rate per base |   0.00%
                       Insertion average length |   1.35
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   7351441
             % of reads mapped to multiple loci |   14.97%
        Number of reads mapped to too many loci |   638796
             % of reads mapped to too many loci |   1.30%
                                  UNMAPPED READS:
       % of reads unmapped: too many mismatches |   0.00%
                 % of reads unmapped: too short |   0.00%
                     % of reads unmapped: other |   0.37%




Here is the counts table from bam file.

ReadCounts      SamFlag (column 2 from sam)          MappingQuality Score (column 5 from sam)
20380407          0                                                        0
993778              0                                                        1
2243744            0                                                        3
21583208         16                                                       0
882720             16                                                       1
1656114           16                                                       3
3802639           256                                                     0
2118242           256                                                     1
1500345           256                                                     3
3184061           272                                                     0
2241623           272                                                    1
2399513           272                                                    3

Your inputs are appreciated.

Many thanks in advance.

Best Regards
Chandu



Kirill Tsyganov

unread,
May 4, 2016, 6:27:47 PM5/4/16
to rna-star
Hi Chandu, 

This can be a little confusing, but 3 is actually good. Tophat/bowtie and STAR don’t use MAPQ as it's defined in samtools specifications ! As far as I know only BWA sticks to the original samtools specification since Heng Li was the author of both. Here is the break down of possible mapq values for STAR. 

255 = unique mapping
3 = maps to 2 locations in the target
2 = maps to 3 locations
1 = maps to 4-9 locations
0 = maps to 10 or more locations

3 means your read mapped to only one other location, so potentially you can look and see which one out of two location makes more sense for that read for example.., but you wouldn't do this for all of your reads obviously.

You will ever get at most 5 different numbers in you column five. You can check this with sort and uniq commands as following samtools view yourBamFile.bam | cut -f 5 | sort -n | uniq -c 

I'm pretty sure that bowtie and tophat use 50 instead of 255 to mark uniquely mapped reads, everything else is the same

Cheers, 

Kirill

Alexander Dobin

unread,
May 4, 2016, 6:29:03 PM5/4/16
to rna-star
Hi Chandu,

which parameters are you using for mapping? How did you count the reads with different MAPQ?
The unique mappers (83% of all reads)  should have the MAPQ=255.  Please post a few SAM lines with  NH:i:1 attribute.

Cheers
Alex

Chilamakuri C.S. Reddy

unread,
May 5, 2016, 5:11:30 AM5/5/16
to rna-star
Hello Kirill,

Thank you so much. I appreciate your response.
My question was why STAR fail to give unique mapping quality score of 255 in the alignment. I have 70 samples and all the samples gave maximum quality score of 3, indicating there is not a single uniquely mapped read.

Best Regards
Chandu

Chilamakuri C.S. Reddy

unread,
May 5, 2016, 5:22:44 AM5/5/16
to rna-star
Hello Dobin,
Thank you for your reply.

I counted MAPQ from bam file using the following command.
samtools view file.bam | awk -F "\t" '{print $2"\t"$5}' | sort | uniq -c 

I tried with three STAR commands and I got the same results.
First two commands using 2.4.1 version and third command using version 2.5 version.

/STAR_2.4.1c/bin/Linux_x86_64_static/STAR --runThreadN 8 --genomeDir Mouse/GRCm38/STAR2.4 --readFilesIn E33.r_1.fq.gz --readFilesCommand zcat --outFileNamePrefix E33_ --outSAMmapqUnique Integer0to255 --outSAMtype BAM SortedByCoordinate --outSAMattrRGline ID:E33 PL:ILLUMINA LB:E33 SM:E33

/STAR_2.4.1c/bin/Linux_x86_64_static/STAR --runThreadN 8 --genomeDir Mouse/GRCm38/STAR2.4 --readFilesIn E33.r_1.fq.gz --readFilesCommand zcat --outFileNamePrefix E33_ --outSAMmapqUnique Integer0to255 --outSAMtype BAM SortedByCoordinate --outSAMattrRGline ID:E33 PL:ILLUMINA LB:E33 SM:E33 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLre
ad 0

/STAR/2.5/Linux_x86_64_static/STAR --runThreadN 8 --genomeDir Mouse/GRCm38/STAR2.5 --readFilesIn E33.r_1.fq.gz --readFilesCommand zcat --outFileNamePrefix E33_ --outSAMmapqUnique Integer0to255 --outSAMtype BAM SortedByCoordinate --outSAMattrRGline ID:E33 PL:ILLUMINA LB:E33 SM:E33

samtools_1.2 view file.bam | head

K00254:17:H5H5YBBXX:6:1104:19055:32437 16 MT 69 0 50M * 0 0 CAAAGGTTTGGTCCTGGCCTTATAATTAATTAGAGGTAAAATTACACATG JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJFFFAA NH:i:1 HI:i:1 AS:i:49 nM:i:0 RG:Z:E33
D00491:307:C8A68ANXX:5:2116:1832:54133 16 MT 81 0 50M * 0 0 CCTGGCCTTATAATTAATTAGAGGTAAAATTACACATGCAAACCTCCATA GGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGCGGCCCCC NH:i:1 HI:i:1 AS:i:49 nM:i:0 RG:Z:E33
D00491:307:C8A68ANXX:5:1307:6218:85686 16 MT 81 0 50M * 0 0 CCTGGCCTTATAATTAATTAGAGGTAAAATTACACATGCAAACCTCCATA GGGGGGEGFGGGGGGGGGGGGGGGGGGGGFGCGGGGGGGGFFGGG??BBA NH:i:1 HI:i:1 AS:i:49 nM:i:0 RG:Z:E33
D00491:307:C8A68ANXX:5:1311:11835:28649 16 MT 84 0 50M * 0 0 GGCCTTATAATTAATTAGAGGTAAAATTACACATGCAAACCTCCATAGAC GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC NH:i:1 HI:i:1 AS:i:49 nM:i:0 RG:Z:E33
K00254:15:H5FFKBBXX:4:1208:31243:6009 16 MT 87 0 2S48M * 0 0 TACTTATAATTAATTAGAGGTAAAATTACACATGCAAACCTCCATAGACC JJJJJJJFJJJJJJJJJJJJJJJJJJJFJJJJJJJFJFJJFJJJJFFAA< NH:i:1 HI:i:1 AS:i:47 nM:i:0 RG:Z:E33
K00254:17:H5H5YBBXX:5:1205:27681:34301 16 MT 89 0 50M * 0 0 TATAATTAATTAGAGGTAAAATTACACATGCAAACCTCCATAGACCGGTG JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA NH:i:1 HI:i:1 AS:i:49 nM:i:0 RG:Z:E33
K00254:15:H5FFKBBXX:3:2120:6340:48561 16 MT 89 0 50M * 0 0 TATAATTAATTAGAGGTAAAATTACACATGCAAACCTCCATAGACCGGTG JJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJFJFA-JJJJFFFAA NH:i:1 HI:i:1 AS:i:49 nM:i:0 RG:Z:E33
K00254:15:H5FFKBBXX:4:2115:8440:35251 16 MT 89 0 50M * 0 0 TATAATTAATTAGAGGTAAAATTACACATGCAAACCTCCATAGACCGGTG JFF<FJJJJJAF-7F-7JAFFFFFFFJAF-FF77JJFJJJJFJJFFFAAA NH:i:1 HI:i:1 AS:i:49 nM:i:0 RG:Z:E33
K00254:17:H5H5YBBXX:5:1101:3234:12023 16 MT 97 0 50M * 0 0 ATTAGAGGTAAAATTACACATGCAAACCTCCATAGACCGGTGTAAAATCC JJJJFJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA NH:i:1 HI:i:1 AS:i:49 nM:i:0 RG:Z:E33
K00254:15:H5FFKBBXX:4:1105:14346:32367 16 MT 97 0 50M * 0 0 ATTAGAGGTAAAATTACACATGCAAACCTCCATAGACCGGTGTAAAATCC JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA NH:i:1 HI:i:1 AS:i:49 nM:i:0 RG:Z:E33

Best Regards
Chandu

Alexander Dobin

unread,
May 5, 2016, 9:51:03 AM5/5/16
to rna-star
Hi Chandu,

in your command line, you have --outSAMmapqUnique Integer0to255
but you need to put the actual number, e.g.
--outSAMmapqUnique 60

Cheers
Alex

Chilamakuri C.S. Reddy

unread,
May 5, 2016, 12:01:53 PM5/5/16
to rna-star
Thank you Alex. What a silly mistake I made.
Reply all
Reply to author
Forward
0 new messages