Guidance on Selecting Reliable Transcription Factors Using the FIMO Tool

162 views
Skip to first unread message

Ouhao Lin

unread,
Jun 20, 2023, 8:56:40 AM6/20/23
to MEME Suite Q&A
Dear Professor,

I hope this message finds you well.

I am writing to you regarding the use of the FIMO tool, which I understand you are highly experienced with. I have recently utilized this tool in my research and have some inquiries about the interpretation of the results.

I've had a look at the example results on the following page: https://meme-suite.org/meme/doc/examples/fimo_example_output_files/fimo.html?man_type=web. From the results, I understand that one of the key metrics provided by the FIMO tool is the p-value, and it is suggested to select results with a p-value of less than 0.0001. However, I'm uncertain about the thresholds for the 'score' and 'q-value' metrics.

In the case of the 'score' metric, I have noticed that some are negative. My current understanding is that a negative score suggests that the particular base at that specific position does not align with the expected pattern. Hence, should I consider only the positive and larger 'score' values as more reliable? Additionally, is there a generally accepted threshold for the 'score' and 'q-value'?

in a summary, how should I figure out the most possible result?

Any advice or resources you could provide to help me understand these concepts better would be greatly appreciated.

Thank you in advance for your time and consideration. I am looking forward to your insightful guidance.

Best regards,
ouhao

cegrant

unread,
Jun 28, 2023, 12:14:47 AM6/28/23
to MEME Suite Q&A
Hi Ouhao,

The FIMO score is the log of the ratio of the likelihood of the observed sequence assuming it is an instance of the motif , to the likelihood of the sequence assuming it is described by the background model. A key point is that the raw score metric is not well calibrated. A negative score just means that the ratio is less than one, but depending on the width of the motif, and the quality of the background model, it may turn out to be a decent match. All you really know from the score is that for a given motif, a larger score indicates a better match than a smaller score. It's particularly important to be aware that you can't compare scores for different motifs. This is why the p-value and q-value  are important.

 Internally, FIMO turns the possible scores into integers, and uses dynamic programing to calculate all possible scores for a given motif. It then calculates the probability distribution of the scores, and calculates a p-value for each possible score. p-values  for a match can be compared across motifs. The smaller the p-value for a score, the smaller the probability that you would see a score as high or higher entirely by chance. A match lower than the threshold you choose is said to be statistically significant. The traditional threshold for statistical significance is 0.01, but this is an arbitrary standard, and we recommend that you use the stricter standard of 0.0001.

There is a problem with the p-value. Strictly speaking it's only valid if you are doing a single test for significance. FIMO is calculating a p-value at every position in your sequences. If your sequences are long enough, you are guaranteed to get many matches with p-values that pass your significance threshold, but are entirely due to chance. This is called the multiple testing problem. The q-value is a p-value that has been corrected for multiple testing. You can interpret the q-value as the fraction of matches that you would accept as significant at that would actually be chance matches, if you accept that q-value as the threshold for significance. The typical accepted threshold for a q-value is 0.01.

This earlier post has links to references on q-values that you may find helpful.

Reply all
Reply to author
Forward
0 new messages