STAR --runThreadN benefit plateau?

56 views
Skip to first unread message

Joshua Mincer

unread,
May 22, 2023, 8:03:47 PM5/22/23
to rna-star
Hello all, 
I am putting together a workflow using the STAR aligner. I'm wondering if there are resources or publications available for the benchmarking of alignment, and the optimal benefit that the parameter --runThreadN provide. 

For example, if I run my workflow with 32 threads, I am wondering if it would be faster to run 2 samples in parallel with --runThreadN 16, 4 samples in parallel with --runThreadN 8, 8 samples in parallel with --runThreadN 4, etc. In essence, will the inclusion of more and more threads lead to a plateau in time saved at a given point. I am curious about this, but before I go testing, I wanted to see if anyone else has information on this! 

Alexander Dobin

unread,
Jun 8, 2023, 1:05:55 PM6/8/23
to rna-star
Hi @mincej20

I have not run thorough tests myself, and have not seen anyone else doing that.
For a single run,  there is definitely a plateau somewhere between 10-30 threads.
Of course, it depends strongly on hardware and datasets.
One of the main bottlenecks is read/write disk bandwidth, which will probably not be solved by concurrent runs, unless you run them on different physical partitions.

Cheers
Alex

Dicer

unread,
Jul 10, 2023, 11:05:06 AM7/10/23
to rna-star
Sharing some performance profiling.

Base command:
```
STAR --runMode alignReads \
--genomeDir ref_data/human/STAR \
--sjdbGTFfile ref_data/human/gencode.v43.primary_assembly.basic.annotation.gtf \
--sjdbOverhang 299 \
--quantMode GeneCounts \
--outSAMtype BAM SortedByCoordinate \
--readFilesIn 1.fq 2.fq \
--outFileNamePrefix [SAMPLE]_map/
```

On a system w 128GB RAM and 48 CPU (12 cores x 4 threads):
  • `--runThreadN 42` = 9 min
  • `--runThreadN 26` = 10.5 min
  • `--runThreadN 16` = 12 min
It seemed like they all used relatively the same amount of RAM, floating around 45GB.

Takeaway: It's better to run two samples in parallel with half of your threads each rather than maxing out 1 sample with all threads.

Would simultaneous processes conflict with each other? e.g. overwrite each other's files?

Alexander Dobin

unread,
Jul 10, 2023, 3:27:21 PM7/10/23
to rna-star
Simultaneous processes will not conflict if you use distinct --outFileNamePrefix .
However, they may still compete over the disk and RAM bandwidth, so the processing speed-up might be limited, but still should be better than increasing the number of threads.

Reply all
Reply to author
Forward
0 new messages