Number of sequences in a fasta file for Streme

31 views
Skip to first unread message

Olivia Woodward

unread,
Dec 28, 2023, 7:09:14 PM12/28/23
to MEME Suite Q&A
Is there a limit on the number of sequences an input file can contain? I've observed that the tool works with files containing fewer sequences, specifically those representing a particular gene. However, I intend to use files containing sequences from various genes as control. These control files have a significantly larger number of sequences. When I attempt to analyze these control files using Streme, the process is consistently terminated. Therefore, I would like to inquire about any potential limitations regarding the number of sequences in an input file.

cegrant

unread,
Dec 28, 2023, 7:10:08 PM12/28/23
to MEME Suite Q&A
The ‘killed’ message is almost always an indication that your computer is running low on memory, and the system has killed STREME to keep it from interfering with other programs and the operating system.

You could try finding a machine with more memory, but note that throwing more sequences at STREME isn’t always the best way to get improved results. STREME needs 1-5% of the input sequences to contain instances of the motif for STREME to have a fighting chance of identifying it. Adding more sequences may just be adding more noise. If your data set is larger than 20-40MB in size you may want to consider downsampling.
Reply all
Reply to author
Forward
0 new messages