Tophat reports discordant sam entries for concordant pairs

667 views
Skip to first unread message

Hyunsung John Kim

unread,
Oct 31, 2013, 7:41:53 PM10/31/13
to tuxedo-to...@googlegroups.com
Hi,

I ran tophat2(v2.0.9) and bowtie2(v1.1.0) with default settings using a GTF for UCSC known genes downloaded from the UCSC genome browser. I noticed a fairly large discrepency between the align_summary.txt and samtools flagstat (see below). I have bolded the discrepancies between the alignment summaries.


> cat align_summary.txt
Left reads:
               Input:   4000000
              Mapped:   2988680 (74.7% of input)
            of these:    218271 ( 7.3%) have multiple alignments (292 have >20)
Right reads:
               Input:   4000000
              Mapped:   2916894 (72.9% of input)
            of these:    209473 ( 7.2%) have multiple alignments (12 have >20)
73.8% overall read alignment rate.

Aligned pairs:   2805475
     of these:    134377 ( 4.8%) have multiple alignments
          and:     89469 ( 3.2%) are discordant alignments
67.9% concordant pair alignment rate.

> samtools view -u -F 0x100 accepted_hits.sorted.bam | samtools flagstat -
6114505 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
6114505 + 0 mapped (100.00%:nan%)
6114505 + 0 paired in sequencing
3142876 + 0 read1
2971629 + 0 read2
405380 + 0 properly paired (6.63%:nan%)
5610950 + 0 with itself and mate mapped
503555 + 0 singletons (8.24%:nan%)
135560 + 0 with mate mapped to a different chr
45998 + 0 with mate mapped to a different chr (mapQ>=5)

To investigate what was going on, I examined a single read with a high mapping quality in the accepted_hits.bam file. It showed that this read mapped uniquely to the genome, but the forward read was mapping to chr1 while the reverse read was mapping to chr6

FORWARD_READ 129 chr1 98853 50 100M chr6 170993888 0 GTGGGGAGGGAAGAAAAAAGGAGGGGGAGAGAAAGAGAAATAAGAACCAAGTTTATTATACTGTATTCAGGGGGAAAAAATTTTCCCAAGGTCCTAACAG B@@FDFDDHHFFFHIIJGIIIBGHIJI/BAAEEHF>@CEEEEEEDDDDDDB>ADDCDDEEDED>CDDDDDDD;BBBBADDBCCDDA@>?AB:@@>:C@C> AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:1
 
REVERSE_READ 65 chr6 170993888 50 100M chr1 98853 0 ACACCTGTAATCCCAGCTACTTGGGCGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCAGAAGTTGCAGTGAGTCAAGATTGCGCCAATGCACTCCA @C@FFEFFHFHHHJJJJIIJIIJJIIGGIIIJJJIIIGDGGIGHFHHHCEFFFCDEC?BDDDBBDC:AACCD3>>::@@@C@CDDC><B@BCCCCCACDA AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU NH:i:1

To investigate further, I checked the pair using blat (http://genome.ucsc.edu/cgi-bin/hgBlat?command=start). In the sequence below, 'REV_PLUS_RC_FORWARD' is the second read sequence and the reverse complement of the first sequence concatenated.

>FORWARD_READ
GTGGGGAGGGAAGAAAAAAGGAGGGGGAGAGAAAGAGAAATAAGAACCAAGTTTATTATACTGTATTCAGGGGGAAAAAATTTTCCCAAGGTCCTAACAG
>REVERSE_READ
ACACCTGTAATCCCAGCTACTTGGGCGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCAGAAGTTGCAGTGAGTCAAGATTGCGCCAATGCACTCCA
>REV_PLUS_RC_FORWARD
ACACCTGTAATCCCAGCTACTTGGGCGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCAGAAGTTGCAGTGAGTCAAGATTGCGCCAATGCACTCCACTGTTAGGACCTTGGGAAAATTTTTTCCCCCTGAATACAGTATAATAAACTTGGTTCTTATTTCTCTTTCTCTCCCCCTCCTTTTTTCTTCCCTCCCCAC

The results of the blat showed that the best hit to FORWARD_READ was chr1 and the best hit to REVERSE_READ was in chr6. However, these paired reads mapped concordantly to both chr1 and chr6. Tophat had reported the best alignment for a single read rather than the read pair! I suspect tophat was counting this pair as "concordantly mapped", yet reported a discordant mapping to the output file.

Is this working as intended?

Best Regards,
Hyunsung John Kim
PhD candidate
UC Santa Cruz

Saad Arif

unread,
Nov 22, 2013, 8:55:14 AM11/22/13
to tuxedo-to...@googlegroups.com
I'm have similar trouble. Anybody got any clues between this disrepancy in the alignment_summary and the accepted_hits.bam file?

Pierre Lindenbaum

unread,
Nov 24, 2013, 4:28:51 AM11/24/13
to tuxedo-to...@googlegroups.com
Your sam record 'FORWARD_READ 129' means:

read mapped in proper pair
    read unmapped
    mate unmapped
    read reverse strand
    mate reverse strand
    first in pair

i think it's the same problem that I had: https://groups.google.com/d/topic/tuxedo-tools-users/jstOyQ-zRgs/discussion
Reply all
Reply to author
Forward
0 new messages