Hi,
I was trying to find motif enrichment with AME in a set of ~800 10kb long regions from the human genome and the top 3 motifs, whith E-val ~ 1e-500 where ZN770, ZN12 and PAX5
(HOCOMOCO v11 core and shuffled input sequences as control)
Unexpectedly a colleague of mine found exactly the same 3 motifs and in the same order by using completely different set of 1kb regions (always HOCOMOCO v11 core and shuffled input sequences as control)
I then started to randomly picking ~800 10kb regions from the human genome and EVERY time I found the same 3 motifs on top!
Is there a bias by using shuffled input sequences as control?
What's happening?
Moreover, if I use the JASPAR database I always get the same 3 sequences whatever set of genomic region I use as input (however the JASPAR motigfs are different from the HOCOMOCO ones which never show up using JASPAR......)
Any hint?
Thank you
Davide