Entropy and Quality filtering

54 views
Skip to first unread message

Robert Fitak

unread,
May 13, 2016, 1:20:50 PM5/13/16
to MafFilter
In MAFFILTER, what is the equation to calculate entropy?  And how are quality scores encoded?  The best description of MAF format doesn't require a 'quality' line, but programs like LAST can output a line for each block beginning with "p" and subsequent sanger-like quality scores indicating the probability the alignment column is in error (bottom of http://last.cbrc.jp/doc/last-tutorial.html).  Also, the UCSC website (http://genome.ucsc.edu/FAQ/FAQformat.html#format5) mentions a quality line beginning with "q" and values of 0-9 and F.
I would like to do some quality filtering of my alignments but cannot set the thresholds in MAFFILTER until I understand what it is expecting.
Thanks for your time!
Bob

Julien Yann Dutheil

unread,
Jul 12, 2016, 3:04:23 AM7/12/16
to MafFilter
Oups, for some reason I had not seen this message!

Entropy is calculated based on the frequencies of each state, including N as a separate state. The Shannon entropy is used, with a log base of 5.

As for the quality scores, the UCSC format is expected.

I apologize for the delayed reply!

J.
Reply all
Reply to author
Forward
0 new messages