meme may return floating number maxsites, which causes an error in meme_xml_to_html

102 views
Skip to first unread message

Yusuke Takahashi

unread,
Mar 3, 2020, 11:23:19 PM3/3/20
to MEME Suite Q&A
Dear MEME Suite Team,

meme_xml_to_html returned an error: 'maxsites has invalid value "1.34582e+06"', though I did not specify the "maxsites" parameter in meme.
In a related question ( https://groups.google.com/d/topic/meme-suite/UTIYMNt-jV0/discussion ), it is said that "maxsites" should be an integer.

I used meme with "-mod anr" (via meme-chip), in which case "maxsites" would be set to 5 times the number of primary sequences.
The number of my primary sequences were 269163, so "maxsites" would be 1345815, which is close to "1.34582e+06" in the error.
According to my resulting meme.txt, the motif finding part seems to be successfully completed.
However, my resulting meme.txt and meme.xml contain the floating number maxsites of "1.34582e+06".
Therefore, I guess that the reporting part of meme has a bug in reporting large integers.
I would appreciate it if this problem would be resolved.

Version: MEME Suite 5.1.1 (local installation; I have confirmed that "make test" gave no errors)
Initial meme-chip command: meme-chip -oc result -seed $SEED -ccut 100 -fdesc description -bfile control_background -db $MEMEPREFIX/motif_databases/WORM/uniprobe_worm.meme -meme-mod anr -meme-minw 4 -meme-maxw 30 -meme-nmotifs 8 -meme-searchsize 100000 -meme-p 10 -dreme-e 0.05 -centrimo-score 5.0 -centrimo-ethresh 10.0 -neg MY_CONTROL MY_INPUT

Invoked meme command: meme result/seqs-centered -oc result/meme_out -mod anr -nmotifs 8 -minw 4 -maxw 30 -bfile result/control_background -dna -searchsize 100000 -p 10 -objfun de -neg result/control-centered -revcomp -nostatus

Detailed error message:

Starting meme: meme result/seqs-centered -oc result/meme_out -mod anr -nmotifs 8 -minw 4 -maxw 30 -bfile result/control_background -dna -searchsize 100000 -p 10 -objfun de -neg result/control-centered -revcomp -nostatus
Failed to write HTML output due to errors processing the XML:
maxsites has invalid value "1.34582e+06" at /home/USER/memesuite_5_1_1_install/libexec/meme-5.1.1/meme_xml_to_html line 453, <$fh> line 216.
Warning: meme_xml_to_html exited abnormally and may have failed to create HTML output.
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:



Process name: [[12841,1],0]
Exit code: 1
--------------------------------------------------------------------------
meme exited with error code 1

cegrant

unread,
Mar 21, 2020, 11:42:31 PM3/21/20
to meme-...@googlegroups.com
That definitely sounds like a bug. Thanks for reporting it to us. We'll take a look and get it fixed.

However, your input parameters sound very strange for a MEME-ChIP job, in particular submitting almost 270,000 sequences. With so much data MEME is unlikely to find much of anything. The use of '-mod anr' is very unusual too. Most MEME-ChIP jobs are run with '-mod goops' (the default).

Are these by any chance the raw reads from a ChIP-Seq experiment? You may want to review the analysis pipeline for MEME-ChIP published in this paper:

 Nat Protoc 9, 1428–1450 (2014). https://doi.org/10.1038/nprot.2014.083

Yusuke Takahashi

unread,
Mar 25, 2020, 3:26:48 AM3/25/20
to MEME Suite Q&A
Thank you for your comment.
Actually, I am using MEME-ChIP for DNA modification loci.
As you mentioned, I noticed that keeping "maxsites" of the MEME parameter much lower than the number of input sequences is crucial for finding distinct motifs (with Classic or Differential Enrichment objective functions).
To be honest, I don't know whether the phenomenon is theoretically expected, and I would like to hear your thoughts.

cegrant

unread,
Mar 28, 2020, 6:58:49 PM3/28/20
to MEME Suite Q&A
Hi Yusuke,

I'm not familiar with the protocols for working with DNA modification loci, and so I'm not altogether sure that MEME-ChIP is an appropriate tool for your purposes. MEME-ChIP's primary purpose is the supporting the analysis of ChIP-Seq experiments and similar technologies. ChIP-Seq experiments typical produce sequences a few hundred BP long, with the central 100 bp strongly enriched for the motifs of interest. MEME-ChIP trims the input sequences to their central 100 bp before passing them on to MEME. Is that appropriate in your context? Also, in Chip-Seq, each sequence typically contains only 1 motif site. This is why ZOOPS is the model usually chosen, and ANR would be an odd choice. How long are  your sequences? Do you really expect that a single sequence will contain more than motif site?

As far as the effect of maxsites: yes, choosing an inappropriately large value for maxsites would be expected to give poor results. Typically motifs sites are rare compared to the non-motif sites in your sequences. MEME is simultaneously trying to figure out the Position Weight Matrix (PWM) for a motif, and the sequence sites containing that motif. MEME may start with a decent guess of the motif sites, and use those to build an initial PWM,. If maxsites is too big, MEME will start adding spurious sites that loosely match the initial PWM. MEME will then adjust the PWM to accommodate the new spurious sites, choose another set of (mostly spurious) sites that somewhat match the new PWM, and so on. The ideal choice for the value of maxsites would be the  exact number of true motif sites. Of course, you don't know what that is, so  you have to make a guess what the largest likely number of motif sites is, and choose maxsites to be less than that.

The number of sequences you are trying to analyze is also concerning. Because motifs are rare, MEME is trying to find a needle in a haystack, and the more hay you have in the haystack, the less likely MEME will be to find anything. Do you really think that any significant fraction of your 270,000 sequences is likely to contain a common motif?
 
Reply all
Reply to author
Forward
0 new messages