Tophat2 mapq bug when using -g 1

103 views
Skip to first unread message

Simon Andrews

unread,
Jul 1, 2014, 5:14:00 AM7/1/14
to tuxedo-to...@googlegroups.com
We've hit a bug which seems to be an interaction between the -g option and the way that mapping qualities are calculated.

The short version is that if you run tophat with -g1 then all of the hits reported are given a mapping quality of 50, even if there are multiple perfect hits with the same strength in the file.  It looks like tophat is calculating the mapq score after the output filtering is done, rather than before and is giving false confidence in hits which can actually be very poor.

As a quick way to demonstrate this, if you run this sequence:

@someid
tggctcaggggtagagcccctgcctagaatcctccagtgagggg
+
99999999999999999999999999999999999999999999

...against the mouse genome using -g1 you get:

someid  0       5       135979427       50      44M     *       0       0       TGGCTCAGGGGTAGAGCCCCTGCCTAGAATCCTCCAGTGAGGGG    99999999999999999999999999999999999999999999    AS:i:0  XN:i:0  XM:i:0   XO:i:0  XG:i:0  NM:i:0  MD:Z:44 YT:Z:UU NH:i:1

If you run it using g2 then you get:

someid  256     5       135979374       3       44M     *       0       0       TGGCTCAGGGGTAGAGCCCCTGCCTAGAATCCTCCAGTGAGGGG    99999999999999999999999999999999999999999999    AS:i:0  XN:i:0  XM:i:0   XO:i:0  XG:i:0  NM:i:0  MD:Z:44 YT:Z:UU NH:i:2  CC:Z:=  CP:i:135979427  HI:i:0
someid  0       5       135979427       3       44M     *       0       0       TGGCTCAGGGGTAGAGCCCCTGCCTAGAATCCTCCAGTGAGGGG    99999999999999999999999999999999999999999999    AS:i:0  XN:i:0  XM:i:0   XO:i:0  XG:i:0  NM:i:0  MD:Z:44 YT:Z:UU NH:i:2  HI:i:1

So you can see the exact same match in the same genome goes from a mapq of 50 down to 5.  The other fields seem to be OK, but the mapq value really shouldn't depend on the report filtering options set.

If this could be addressed then we'd be very grateful.

Thanks

Simon.
Reply all
Reply to author
Forward
0 new messages