FIMO result for different length of "promoter" region

Bharata Kalbuaji

unread,

May 19, 2017, 3:32:14 AM5/19/17

to MEME Suite Q&A

Hello, I want to ask something with regard to the result of FIMO. So, I want to find "transcription factors" that regulate gene of interest. I use around 4000 PWM I got from transfac pro. Then, I define the promoter region to be a sequence with length XX nucleotide before the first exon (reverse strand is already considered).

I calculate this position using some simple method by extracting the position from first exon. The exon information is derived from gene annotation in the form of GTF file. Then, I use bedtools to extract the sequence from those location. I want to find a "good" length of promoter sequence so I tried several XX nt before sequence.

For now, I have checked for 500bt and 1500nt before first exon. I think that the longer sequence I check, I will get more match from FIMO. If a promoter region match with 500nt before only match for 5, I think if I use 1500nt, it will get more match.

What I got is a bit surprising that the result from 1500nt is way less than from the result if I use 500nt. I use q-value threshold of 0.001.

Is there any explanation why I got less result with longer promoter? My theory is using longer sequence will result a bigger p-value and then it becomes not significance and then not outputted in the result. What do you think about this?

CharlesEGrant

unread,

May 23, 2017, 6:34:48 PM5/23/17

to MEME Suite Q&A

My theory is using longer sequence will result a bigger p-value and then it becomes not significance and then not outputted in the result. What do you think about this?

First, it would be good to check that FIMO is running successfully to completion. FIMO keeps all the matches in memory until the results are output so that it can sort them by p-value/q-value. Given that you are scanning with 4000 motifs FIMO may be running out of memory and truncating the results. Are you seeing any warning or error messages as FIMO runs?

The p-value for a FIMO match only depends on the motif PWM, the background model, and the matching sequence. The surrounding sequence doesn't affect the p-value. The q-value is derived from the observed distribution of p-values, and so could be influenced by your choice of the upstream sequences. If the data set with longer sequences contained repeats or low complexity sequences this could definitely skew your results. If this seems to be the case you might want to direct FIMO to use a p-value threshold, but then manually apply a q-value threshold to FIMO's results.

Bharata Kalbuaji

unread,

May 23, 2017, 9:58:32 PM5/23/17

to MEME Suite Q&A

Oh yes that is actually what I mean. With more sequence I checked, the more p-values I get and it can skew the calculation of q-value if the p-value derived from low complexity region provide significance p-value. After checking some result, I notice that many match sequences are low complexity, for example A rich region and several types of repetition. From around 4000 matrices I check, the result only gives 30 matrix that is significance with total number of match is >30,000 match. So, you can imagine 1 matrix found to be significance to around 1000 sequence but other 3970 matrices don't give good q-value.

For example, I have a matrix named V_HDAC2_06 with 27000 matching sequence (almost 2/3 from overall result) that only match to A rich sequence (for

example AAAGAAAAAAAAAAA) with PWM like below. It seems that this matrix will find match for A rich region though but almost 2/3 of my result only for this matrix is a bit weird to me.

MOTIF V_HDAC2_06 hdac2

letter-probability matrix: alength= 4 w= 15 nsites= 1 E= 0

0.533000 0.117000 0.183000 0.167000

0.600000 0.000000 0.300000 0.100000

0.383000 0.000000 0.500000 0.117000

0.566434 0.000000 0.416583 0.016983

0.550000 0.000000 0.450000 0.000000

0.966034 0.016983 0.000000 0.016983

0.750000 0.017000 0.233000 0.000000

0.316683 0.000000 0.616384 0.066933

0.750000 0.117000 0.133000 0.000000

0.883000 0.067000 0.050000 0.000000

0.767000 0.000000 0.200000 0.033000

0.533000 0.017000 0.300000 0.150000

0.450000 0.000000 0.417000 0.133000

0.616384 0.316683 0.000000 0.066933

0.800000 0.000000 0.017000 0.183000

Reply all

Reply to author

Forward