We've hit a bug which seems to be an interaction between the -g option and the way that mapping qualities are calculated.
The short version is that if you run tophat with -g1 then all of the hits reported are given a mapping quality of 50, even if there are multiple perfect hits with the same strength in the file. It looks like tophat is calculating the mapq score after the output filtering is done, rather than before and is giving false confidence in hits which can actually be very poor.
As a quick way to demonstrate this, if you run this sequence:
@someid
tggctcaggggtagagcccctgcctagaatcctccagtgagggg
+
99999999999999999999999999999999999999999999
...against the mouse genome using -g1 you get:
someid 0 5 135979427 50 44M * 0 0 TGGCTCAGGGGTAGAGCCCCTGCCTAGAATCCTCCAGTGAGGGG 99999999999999999999999999999999999999999999 AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:44 YT:Z:UU NH:i:1
If you run it using g2 then you get:
someid 256 5 135979374 3 44M * 0 0 TGGCTCAGGGGTAGAGCCCCTGCCTAGAATCCTCCAGTGAGGGG 99999999999999999999999999999999999999999999 AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:44 YT:Z:UU NH:i:2 CC:Z:= CP:i:135979427 HI:i:0
someid 0 5 135979427 3 44M * 0 0 TGGCTCAGGGGTAGAGCCCCTGCCTAGAATCCTCCAGTGAGGGG 99999999999999999999999999999999999999999999 AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:44 YT:Z:UU NH:i:2 HI:i:1
So you can see the exact same match in the same genome goes from a mapq of 50 down to 5. The other fields seem to be OK, but the mapq value really shouldn't depend on the report filtering options set.
If this could be addressed then we'd be very grateful.
Thanks
Simon.