ZN770 ALWAYS highly enriched in the genome

110 views
Skip to first unread message

Davide

unread,
Jun 22, 2023, 10:27:38 AM6/22/23
to MEME Suite Q&A
Hi,
I was trying to find motif enrichment with AME in a set of  ~800 10kb long regions from the human genome and the top 3 motifs, whith E-val ~ 1e-500 where ZN770, ZN12 and PAX5
(HOCOMOCO v11 core and shuffled input sequences as control)
Unexpectedly a colleague of mine found exactly the same 3 motifs and in the same order  by using completely different set of 1kb regions (always HOCOMOCO v11 core and shuffled input sequences as control)
I then started to randomly picking ~800 10kb regions from the human genome and EVERY time I found the same 3 motifs on top!
Is there a bias by using shuffled input sequences as control?
What's happening?
Moreover, if I use the JASPAR database I always get the same 3 sequences whatever set of genomic region I use as input (however the JASPAR motigfs are different from the HOCOMOCO ones which never show up using JASPAR......)
Any hint?
Thank you

Davide

cegrant

unread,
Jun 28, 2023, 12:21:20 AM6/28/23
to MEME Suite Q&A
It's difficult to say without more details. Can you forward us copies of the motif and sequence file you are using, and let us know the exact AME command line?

One possible issue to be aware of: are you screening your sequences for repeats and low complexity regions. It looks like those motifs might score well against runs of 'G' or 'A'.

Reply all
Reply to author
Forward
0 new messages