Hello MEME Suite dev team,
First of all, great job on this very helpful suite of tools!
I have managed to run STREME on a cluster using the provided Docker container with Singularity. I think everything works well for us, but I am very bothered by this particular WARNING message in all of our log files.
They look like these:
# Warning: Ignoring sequence '>chr1:180798-180803' because it is too short (5 < 8).
# Warning: Ignoring sequence '>chr1:804891-804897' because it is too short (6 < 8).
# Warning: Ignoring sequence '>chr1:844107-844112' because it is too short (5 < 8).
# Warning: Ignoring sequence '>chr1:855004-855009' because it is too short (5 < 8).
# Warning: Ignoring sequence '>chr1:858106-858112' because it is too short (6 < 8).
# Warning: Ignoring sequence '>chr1:858238-858245' because it is too short (7 < 8).
# Warning: Ignoring sequence '>chr1:858257-858263' because it is too short (6 < 8).
# Warning: Ignoring sequence '>chr1:865068-865075' because it is too short (7 < 8).
# Warning: Ignoring sequence '>chr1:865828-865835' because it is too short (7 < 8).
# Warning: Ignoring sequence '>chr1:869830-869837' because it is too short (7 < 8).
# Warning: Ignoring sequence '>chr1:869903-869910' because it is too short (7 < 8).
# Warning: Ignoring sequence '>chr1:904739-904745' because it is too short (6 < 8).
# Warning: Ignoring sequence '>chr1:904746-904751' because it is too short (5 < 8).
# Warning: Ignoring sequence '>chr1:905179-905186' because it is too short (7 < 8).
Followed by hundred thousands of warnings that look like this:
# Warning: Skipping control sequence '>chr11:46680252-46680293' of length 41 because
# it causes total database length to be exceeded (400041 > 400000).
# Warning: Skipping control sequence '>chr1:15975972-15975990' of length 18 because
# it causes total database length to be exceeded (400018 > 400000).
# Warning: Skipping control sequence '>chr2:176269394-176269409' of length 15 because
# it causes total database length to be exceeded (400015 > 400000).
# Warning: Skipping control sequence '>chr1:25034373-25034395' of length 22 because
# it causes total database length to be exceeded (400022 > 400000).
# Warning: Skipping control sequence '>chr2:178194603-178194637' of length 34 because
# it causes total database length to be exceeded (400034 > 400000).
# Warning: Skipping control sequence '>chr1:8409341-8409355' of length 14 because
# it causes total database length to be exceeded (400014 > 400000).
# Warning: Skipping control sequence '>chr9:85717113-85717146' of length 33 because
# it causes total database length to be exceeded (400033 > 400000).
# Warning: Skipping control sequence '>chr17:67359968-67360003' of length 35 because
# it causes total database length to be exceeded (400035 > 400000).
# Warning: p-values will be inaccurate if primary and control
# sequences have different length distributions.
The final line of warning also raises some doubt in our analysis (bold).
My questions are:
1. Are these normal warnings/not something we should pay much attention to?
2. Can the last warning be ignored, considering that we ran STREME without the a control sequence (so the control sequence is internally generated by STREME anyway)?
3. Finally, is there a way or any plan to provide an extra level of verbosity suppression? These warning lines generate huge log files and I find it a bit cumbersome to work with, especially when all we want to look at is the exit code of the cluster job run which is appended at the end of the log files.
Below is the command we run:
streme --verbosity 1 --oc /mnt/outputs/streme-io/outputs/"${out_prefix}" --dna --totallength 4000000 --minw 8 --maxw 15 --thresh 0.05 --align center --p /mnt/outputs/streme-io/inputs/"${out_prefix}"_footprints.fa