STREME multiple threads/reducing run time

Corbin Machatzke

unread,

May 15, 2023, 10:55:58 AM5/15/23

to MEME Suite Q&A

Hello,

I am doing analysis on a set of Illumina sequencing data. I have 3 fasta files, 2 containing samples with (hopefully) enriched motifs compared to a non biased sequencing set (file number 3). All files only have sequences exactly 30 nt long and contain 0.4, 1.5 and 0.9 M sequences.

I planned on doing motif disocvery using STREME looking something like this:

streme --p inputfile1 --o somefolder --n referencefile --maxw 30

The computation takes over 50 hours with an i7-7700k and 32 gb of RAM (after about 50 hours the workstation crashed due to sth unrelated). Is there a way to increase the speed of STREME? I read on the manual page for STREME it only uses 1 thread, but can not find an option to add more threads.

Thank you for your help

Corbin

cegrant

unread,

May 15, 2023, 2:29:07 PM5/15/23

to MEME Suite Q&A

Hi Corbin,

MEME is the only tool in the MEME Suite that does parallel processing.

Something certainly seems amiss in how long STREME is taking to analyze your data set. That should only take a couple of minutes given the size of the sequence database you are using.

Can you forward us copies of the input files? That would help us troubleshoot the problem. You should be able to attach files to a reply using the paper clip icon in the tool bar. If that fails try mailing them to meme-...@uw.edu

Corbin Machatzke

unread,

May 16, 2023, 9:13:14 AM5/16/23

to MEME Suite Q&A

Hello,

So I can only attach images in this forum, instead I uploaded the files on sciebo: https://tu-dortmund.sciebo.de/s/m45QOFI2DOtjBMY

There are 3 files, two files I wanted to use as input each (running STREME for each once) and a reference file for the --n option.

Thank you for your help!

Corbin

tlawb...@gmail.com

unread,

Jun 14, 2023, 4:42:53 PM6/14/23

to MEME Suite Q&A

Corbin,

Sorry that STREME is taking a very long time on your inputs.

I'm afraid that with over 72Mb of sequence, STREME will

take a very long time.

I tested your 72Mb input (46Mb positive seqs + 26Mb control seqs)

on my iMac (i7 4Ghz Quad-core, 16Gb RAM) and

it took 115 hours to find the first motif. This is not out of

line with the timing results given at the bottom at the bottom

of the manual page for STREME: https://meme-suite.org/meme/doc/streme.html

In cases like yours, the best approach is to subsample the input sequence

datasets to make them smaller. If you have the MEME Suite installed

on your computer (from source, MacPorts or Docker) you can use

the fasta-subsample utility for that purpose: https://meme-suite.org/meme/doc/fasta-subsample.html

In the future, we hope to make STREME multi-threaded, which will make it run faster.

But it scales faster than linearly with the size of the input sets, as you can

see from the timing results mentioned above. So there will always

be a practical limit to how big an input it can handle in a reasonable amount of time.