Warning in STREME output explanation

59 views
Skip to first unread message

Suffi

unread,
Jul 10, 2023, 10:42:25 PM7/10/23
to MEME Suite Q&A
Hello MEME Suite dev team,

First of all, great job on this very helpful suite of tools!

I have managed to run STREME on a cluster using the provided Docker container with Singularity. I think everything works well for us, but I am very bothered by this particular WARNING message in all of our log files.

They look like these:
# Warning: Ignoring sequence '>chr1:180798-180803' because it is too short (5 < 8).
# Warning: Ignoring sequence '>chr1:804891-804897' because it is too short (6 < 8).
# Warning: Ignoring sequence '>chr1:844107-844112' because it is too short (5 < 8).
# Warning: Ignoring sequence '>chr1:855004-855009' because it is too short (5 < 8).
# Warning: Ignoring sequence '>chr1:858106-858112' because it is too short (6 < 8).
# Warning: Ignoring sequence '>chr1:858238-858245' because it is too short (7 < 8).
# Warning: Ignoring sequence '>chr1:858257-858263' because it is too short (6 < 8).
# Warning: Ignoring sequence '>chr1:865068-865075' because it is too short (7 < 8).
# Warning: Ignoring sequence '>chr1:865828-865835' because it is too short (7 < 8).
# Warning: Ignoring sequence '>chr1:869830-869837' because it is too short (7 < 8).
# Warning: Ignoring sequence '>chr1:869903-869910' because it is too short (7 < 8).
# Warning: Ignoring sequence '>chr1:904739-904745' because it is too short (6 < 8).
# Warning: Ignoring sequence '>chr1:904746-904751' because it is too short (5 < 8).
# Warning: Ignoring sequence '>chr1:905179-905186' because it is too short (7 < 8).

Followed by hundred thousands of warnings that look like this:

# Warning: Skipping control sequence '>chr11:46680252-46680293' of length 41 because
#          it causes total database length to be exceeded (400041 > 400000).
# Warning: Skipping control sequence '>chr1:15975972-15975990' of length 18 because
#          it causes total database length to be exceeded (400018 > 400000).
# Warning: Skipping control sequence '>chr2:176269394-176269409' of length 15 because
#          it causes total database length to be exceeded (400015 > 400000).
# Warning: Skipping control sequence '>chr1:25034373-25034395' of length 22 because
#          it causes total database length to be exceeded (400022 > 400000).
# Warning: Skipping control sequence '>chr2:178194603-178194637' of length 34 because
#          it causes total database length to be exceeded (400034 > 400000).
# Warning: Skipping control sequence '>chr1:8409341-8409355' of length 14 because
#          it causes total database length to be exceeded (400014 > 400000).
# Warning: Skipping control sequence '>chr9:85717113-85717146' of length 33 because
#          it causes total database length to be exceeded (400033 > 400000).
# Warning: Skipping control sequence '>chr17:67359968-67360003' of length 35 because
#          it causes total database length to be exceeded (400035 > 400000).
# Warning: p-values will be inaccurate if primary and control
#          sequences have different length distributions.

The final line of warning also raises some doubt in our analysis (bold).

My questions are:

1. Are these normal warnings/not something we should pay much attention to?

2. Can the last warning be ignored, considering that we ran STREME without the a control sequence (so the control sequence is internally generated by STREME anyway)?

3. Finally, is there a way or any plan to provide an extra level of verbosity suppression? These warning lines generate huge log files and I find it a bit cumbersome to work with, especially when all we want to look at is the exit code of the cluster job run which is appended at the end of the log files.

Below is the command we run:

streme --verbosity 1 --oc /mnt/outputs/streme-io/outputs/"${out_prefix}" --dna --totallength 4000000 --minw 8 --maxw 15 --thresh 0.05 --align center --p /mnt/outputs/streme-io/inputs/"${out_prefix}"_footprints.fa


cegrant

unread,
Jul 16, 2023, 9:58:22 PM7/16/23
to MEME Suite Q&A
The message 

# Warning: Ignoring sequence '>chr1:180798-180803' because it is too short (5 < 8).

Is telling you that a bunch of your sequences are shorter than the shortest motif you are looking for. Your command sets '-minw 8' so streme is not going to look for any motifs less than 8 positions wide. Streme can't make any use of those short sequences so it ignores them. It doesn't do any harm other than wasting running time reading in those sequences and of course producing the annoying warning message. I'd recommend filtering those sequences before running analyzing them with STREME. 

The message

# warning: Skipping control sequence '>chr11:46680252-46680293' of length 41 because

#          it causes total database length to be exceeded (400041 > 400000).

is coming up because you've set the '--totallength option to 4000000. That is you directed STREME to limit the size of the input database to a total of 400,000 bases. Apparently you input sequence file is larger than that, so STREME is ignoring sequences that go beyond that limit. You can either reduce the size of your sequence data in the input file or increase the value you pass to the '--totallength option. The STREME options are described here.

cegrant

unread,
Jul 16, 2023, 10:02:07 PM7/16/23
to MEME Suite Q&A
I should also note that log messages from streme are going to the standard error output. You can use standard Linux shell syntax for redirecting that. For example, to totally throw away the log messages:

streme -dna -p fool.fa 2> /dev/null

Reply all
Reply to author
Forward
0 new messages