Hi,
I am making experiments to understand how MEME reacts while changing the input. Those experiments appeared to be meaningfull when I saw that for the same sequence A, changing the number of repetitions of a single motif from 3 to 4 changed totally the result. In 4 repetitions case the motif is found while for the 3 repetitions case it is not.
So I started to create my own artificial sequences to figure out what is the limit between MEME efficiency and the parameters of the input file (M: the size of the motif model ; B: the size of the background model).
I worked with one animo acids sequence using anr distribution model. I recorded the E-value (e-v) and the pourcentage (%) of the motif found compared to the real motif to evaluate the results.
In my artificial sequence the motif appeared two times. NB :If M=24 the real motif size is 12.
The results were:
For M<=24 MEME found the motif (%>=60%) if B is under a limit. Over this limit the motif is not found.
For M> 24 the results became to be more confusing. The motif is found only for very low background level ...
After that I studied the Expectation Maximization process but I couldn't figure out how to explain those results.
Regarding the algorithm complexity and the interdenpence of symbols while working with animo acid sequences, I understand that my experiments were not exhaustive. Yet are there criteria the input structure (for M an B) should respect to maximize MEME efficiency?
Best regards,
Kelian