Hello Everyone,
Does anyone know how FIMO calculates the matched score?
When I tried to FIMO to scan this sequence (ACACGTGGTA) using this specific motif:
Background letter frequencies (from file `./background'): |
A 0.27670 C
0.22330 G 0.22330 T 0.27670 |
| | | |
MOTIF 1 ACACGTGKCA-MEME-1 |
| | | |
letter-probability
matrix: alength= 4 w= 10 nsites= 600 E= 5.3e-610 |
0.463333 | 0.223333 | 0.21 | 0.103333 |
0.083333 | 0.611667 | 0.075 | 0.23 |
0.895 | 0 | 0.105 | 0 |
0.05 | 0.8 | 0.066667 | 0.083333 |
0.015 | 0.001667 | 0.951667 | 0.031667 |
0 | 0.023333 | 0 | 0.976667 |
0 | 0 | 1 | 0 |
0 | 0.045 | 0.425 | 0.53 |
0.098333 | 0.83 | 0.011667 | 0.06 |
0.506667 | 0.145 | 0.235 | 0.113333 |
I get this final output:
# motif_id motif_alt_id sequence_name start stop strand score p-value q-value matched_sequence
1 ACACGTGKCA-MEME-1 1 1 10 - 11.3939 6.55e-05 0.000786 ACACGTGGTA
I was doing some research and it seems that the score is calculated by summing the appropriate entries from each column of the position-dependent scoring matrix.
And the position-dependent scoring matrix is determined by log2(p/q)
p: Normalized frequency counts of a specific nucleotide in a specific position (for example: for an Adenine at position 1, you would use 0.463333) ;
q: Background frequencies for each base in the genome;
After calculation, the score that I get is 11.40194089. Which is slightly different from the reported in FIMO (11.3939).
Does anyone know what is the exact formula used for the FIMO score calculation? Am I doing something wrong?
I would appreciate if someone could help me with this one,
Thanks,