STAR --runThreadN benefit plateau?

Joshua Mincer

unread,

May 22, 2023, 8:03:47 PM5/22/23

to rna-star

Hello all,

I am putting together a workflow using the STAR aligner. I'm wondering if there are resources or publications available for the benchmarking of alignment, and the optimal benefit that the parameter --runThreadN provide.

For example, if I run my workflow with 32 threads, I am wondering if it would be faster to run 2 samples in parallel with --runThreadN 16, 4 samples in parallel with --runThreadN 8, 8 samples in parallel with --runThreadN 4, etc. In essence, will the inclusion of more and more threads lead to a plateau in time saved at a given point. I am curious about this, but before I go testing, I wanted to see if anyone else has information on this!

Alexander Dobin

unread,

Jun 8, 2023, 1:05:55 PM6/8/23

to rna-star

Hi @mincej20

I have not run thorough tests myself, and have not seen anyone else doing that.

For a single run, there is definitely a plateau somewhere between 10-30 threads.

Of course, it depends strongly on hardware and datasets.

One of the main bottlenecks is read/write disk bandwidth, which will probably not be solved by concurrent runs, unless you run them on different physical partitions.

Cheers

Alex

Dicer

unread,

Jul 10, 2023, 11:05:06 AM7/10/23

to rna-star

Sharing some performance profiling.

Base command:

```

STAR --runMode alignReads \
    --genomeDir ref_data/human/STAR \
    --sjdbGTFfile ref_data/human/gencode.v43.primary_assembly.basic.annotation.gtf \
    --sjdbOverhang 299 \
    --quantMode GeneCounts \
    --outSAMtype BAM SortedByCoordinate \
    --readFilesIn 1.fq 2.fq \
    --outFileNamePrefix [SAMPLE]_map/

```

On a system w 128GB RAM and 48 CPU (12 cores x 4 threads):

`--runThreadN 42` = 9 min
`--runThreadN 26` = 10.5 min
`--runThreadN 16` = 12 min

It seemed like they all used relatively the same amount of RAM, floating around 45GB.

Takeaway: It's better to run two samples in parallel with half of your threads each rather than maxing out 1 sample with all threads.

Would simultaneous processes conflict with each other? e.g. overwrite each other's files?

Alexander Dobin

unread,

Jul 10, 2023, 3:27:21 PM7/10/23

to rna-star

Simultaneous processes will not conflict if you use distinct --outFileNamePrefix .

However, they may still compete over the disk and RAM bandwidth, so the processing speed-up might be limited, but still should be better than increasing the number of threads.

Reply all

Reply to author

Forward