Thank
you so much for getting back to me Charles. I highly appreciate it. I
understand that meme is trying to find the minimum sites for a motif and
as soon as it finds them it starts looking for other motifs.
I have tried this new command:
meme
ISEQUENCES.txt -oc MEMEMOTIF -dna -revcomp -maxsites 100 -mod anr
-nmotifs 50 -minw 6 -maxw 8 -bfile BG_5.model -maxsize 19810000 -p 16
Based
on the manual, the default minimum number of sites for this command
should be 465 correct!, min( 5* 93, 600), as I am submitting 93
sequences only as my input.
I am getting the following scenerio:
------------------------------------------------------------
--------------------
Motif 21 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name Strand Start P-value Site
------------- ------ ----- --------- ------
scaffold_1:2079505-20800 + 412 1.36e-04 AAACTGTTTC CGACGA TCTTTTTCGA
scaffold_1:2079505-20800 - 165 1.36e-04 GAAATTCAGA CGACGA TTGTGTATTT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 29 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name Strand Start P-value Site
------------- ------ ----- --------- ------
scaffold_17:1203250-1203 + 191 1.36e-04 AACCGAAAAA CGACGA TTGTTGGACA
scaffold_9:1776808-17773 - 327 1.36e-04 ATCCTCGTCC CGACGA TTACATCGTC
--------------------------------------------------------------------------------
My
question is if it is the same motif occurring in all these sequences
then why doesn't MEME combine all the occurrences to make a better
consensus. I am allowing a range until 100 as maxsites and clearly there are more sites for this motif than 2.
I am confused because this is not a consistent behavior for all motifs discovered via MEME, it gives me motifs such as this:
Sequence name Strand Start P-value Site
------------- ------ ----- --------- --------
scaffold_138:281271-2817 - 68 8.62e-06 GTGCATACAG GGAGGAGG AGACTCTCTC
scaffold_21:1032274-1032 + 245 8.62e-06 ATACAATAGC GGAGGGGG GGGGGGGGTG
scaffold_8:401504-402004 - 464 8.62e-06 TCCTTGAAGC GGAGGGGG AAAAGAAACC
scaffold_86:153800-15429 - 224 8.62e-06 TTGTGTTTTT GGAGGGGG ATGTTCTTAT
scaffold_86:153800-15429 - 128 8.62e-06 TGAAGACTAG GGAGGAGG GCTTTTAATC
scaffold_7:572197-572697 - 16 8.62e-06 GTGGGAAGTA GGAGGGGG AGTCACATGA
scaffold_17:688200-68869 + 404 8.62e-06 TGTGGTTGGT GGAGGAGG AGGCTGGCCG
scaffold_4:702324-702823 + 441 8.62e-06 CGAATAGGAA GGAGGGGG GGGGGGAACG
scaffold_1329:2586-3085 - 161 8.62e-06 CTAATAGGAA GGAGGAGG GAGGTGGGAA
scaffold_32:261278-26177 + 57 8.62e-06 GAAACAGGGT GGAGGGGG GGATACAAAA
scaffold_5:1253243-12537 + 386 8.62e-06 CGTAAGGGCA GGAGGAGG GGGGTCCAAA
scaffold_471:17995-18495 + 354 8.62e-06 AGGCGGAAGT GGAGGAGG ATTCCTTGTT
scaffold_58:309773-31027 - 212 1.20e-05 GGAAGGCCGT GGAGGCGG AAAAAGAGGA
scaffold_9:1776808-17773 + 292 1.20e-05 TCTACATATA GGAGGCGG GCGATCGGCT
scaffold_12:923707-92420 - 249 1.20e-05 TTACATGGCT GGAGGCGG TGATGGCAAA
scaffold_2:2191051-21915 - 405 1.76e-05 AGGGTTTTTT GGGGGAGG CCTCTTTATA
scaffold_7:992552-993052 + 44 1.76e-05 GCCCCGTTGA GGGGGAGG ATGTGACAAT
scaffold_37:989489-98998 + 360 1.76e-05 GGCTCTCTGT GGGGGAGG AATTTTCAAG
scaffold_21:601673-60217 - 361 1.97e-05 GGAGAGCGGC GGGGGCGG GTCAATGGCC
scaffold_5:1259883-12603 - 113 2.83e-05 TTGTGGTGGT GGTGGAGG TGGTTGTGGA
scaffold_60:721319-72181 + 14 2.83e-05 GTATTAGTAT GGTGGGGG AACTTATGGT
scaffold_86:158050-15854 + 7 2.83e-05 CATGAA GGTGGAGG GTTCCAGCTC
scaffold_3:1327506-13280 - 272 3.17e-05 GCAGTCCCAG GGTGGCGG CTCCTGTTTT
--------------------------------------------------------------------------------
How
come it takes into account multiple occurrences for this motif
GCT[GC]CTG[CG] but in the former case it
splits them into two separate motifs when it is actually the same
motif? How come here it does not split them into sets of 2 occurrences.
Hope
this makes sense. I highly appreciate your answers and I will be
grateful for your concern.
This is an excellent resource for us and for
that I am very grateful.
Rocky