FIMO Warning: The maximum size of the heap cannot be decreased.

210 views
Skip to first unread message

Raphael V

unread,
May 11, 2021, 4:37:02 AM5/11/21
to MEME Suite Q&A
Hi everyone, 

I hope someone can help me out here.
I'm running command line FIMO in ubuntu server, with the following:

$ fimo --max-stored-scores 10000000000 --bfile /annotation/motifs/meme_background_hg19 --oc annotation/motifs/boxes/boxes_hg19 /annotation/motifs/boxes/boxes.meme /Genomes_ref/bowtie2/Hsapiens/hg19.fa


The output is fine. However, I keep receiving this warning:

Warning: The maximum size of the heap cannot be decreased.

Warning: Reached max stored scores (100000).

Motif matches with p-value >= 5.7e-05 have been dropped to reclaim memory.

Warning: Reached max stored scores (100000).

Motif matches with p-value >= 2.4e-05 have been dropped to reclaim memory.

Warning: Reached max stored scores (100000).

Motif matches with p-value >= 2e-05 have been dropped to reclaim memory.

 

The problem there is that I have specified the --max-stored-scores 10000000000  to have exactly all matches identified. Why then it keeps throwing this warning Warning: Reached max stored scores (100000) ?

Is it memory? I have 500Gb of RAM available.

How fix this? Anyone can help me out here?

Thank you for your time!

cegrant

unread,
May 13, 2021, 12:04:13 AM5/13/21
to MEME Suite Q&A
The max-stored-scores has a limited range, and the number you requested it is so large that it is interpreted as a negative number. A negative max-stored-scores isn't allowed, so it falls back to the default value of 100000. This is a bug on our end, and we should produce a sensible warning when the requested number is out of range. Thank  you for reporting the problem, and we'll work on fixing it in the next release. The maximum allowed value of max-stored-scores is 2,147,483,647, the largest 32-bit signed integer .

I don't think your command line would have done quite what you want, and you are likely to run into several headaches even on a  machine with with 500GB of RAM available. First, keep in mind that by default, FIMO will apply a p-value threshold of 1e-4 to all a matches. That is, matches whose p-value is greater than 0.0001 will be ignored from the very start. As you saw, once FIMO reaches the max number of stored scores, it will start dropping the least significant scores to try and reclaim all space. The underlying problem is that to compute q-values FIMO has to hold in memory all the matches that exceed the current p-value threshold. This requires a couple of hundred bytes for each match. When scanning full genomes this quickly exceeds available memory on common machines. On top of the RAM requirement, FIMO writes out its results in several formats including XML, which is, unfortunately, quite verbose. You might end up needing a terabyte of disk space just to hold FIMO's XML output. If you are scanning with multiple motifs, the problem gets even worse.

Keep in mind  that on the scale of a genome wide search, almost of the matches are going to be spurious, and of no biological interest. Over the full human genome, a typical transcription factor is going to bind to a few thousand, or maybe a few tens of thousands of sites, but a 10bp wide motif may have hundreds of thousand of perfect matches entirely by chance. To avoid being overwhelmed by false positives you'll need to choose a stringent q-value threshold. But depending on the size of the motif, even a perfect match to the motif  may not rise to the required q-value threshold.

If for some reason you really do want all of FIMO's match scores for the full human genome, you'll need to try a command like this;

fimo --thresh 1.0 --text --bfile /annotation/motifs/meme_background_hg19 > fimo.txt

This will direct FIMO to record all matches no matter what their p-value, it will not compute q-values so it doesn't have to keep all the results in memory, and it will write the results to standard out in tab-delimited format, eliminating the amount of disk space required by XML. The good news is that this will run quite quickly. The bad news  is that it won't compute q-values for the matches, so judging statistical significance will be hard. If you want to correct the p-values for multiple testing, you'll have to do your own computation.
Message has been deleted

Raphael V

unread,
May 19, 2021, 7:36:08 AM5/19/21
to MEME Suite Q&A
Many thanks for your reply.
I need to retrieve every putative motif to compare motifs with and without TF binding.
Let you know if works out
thanks

edit: typo error
Reply all
Reply to author
Forward
0 new messages