Oh yes that is actually what I mean. With more sequence I checked, the more p-values I get and it can skew the calculation of q-value if the p-value derived from low complexity region provide significance p-value. After checking some result, I notice that many match sequences are low complexity, for example A rich region and several types of repetition. From around 4000 matrices I check, the result only gives 30 matrix that is significance with total number of match is >30,000 match. So, you can imagine 1 matrix found to be significance to around 1000 sequence but other 3970 matrices don't give good q-value.
For example, I have a matrix named V_HDAC2_06 with 27000 matching sequence (almost 2/3 from overall result) that only match to A rich sequence (for
example AAAGAAAAAAAAAAA) with PWM like below. It seems that this matrix will find match for A rich region though but almost 2/3 of my result only for this matrix is a bit weird to me.
MOTIF V_HDAC2_06 hdac2
letter-probability matrix: alength= 4 w= 15 nsites= 1 E= 0
0.533000 0.117000 0.183000 0.167000
0.600000 0.000000 0.300000 0.100000
0.383000 0.000000 0.500000 0.117000
0.566434 0.000000 0.416583 0.016983
0.550000 0.000000 0.450000 0.000000
0.966034 0.016983 0.000000 0.016983
0.750000 0.017000 0.233000 0.000000
0.316683 0.000000 0.616384 0.066933
0.750000 0.117000 0.133000 0.000000
0.883000 0.067000 0.050000 0.000000
0.767000 0.000000 0.200000 0.033000
0.533000 0.017000 0.300000 0.150000
0.450000 0.000000 0.417000 0.133000
0.616384 0.316683 0.000000 0.066933
0.800000 0.000000 0.017000 0.183000