Trinity long runtime for De Novo assembly

343 views
Skip to first unread message

matthia...@gmx.de

unread,
Feb 18, 2021, 7:02:22 AM2/18/21
to trinityrnaseq-users
Hey y'all!

I am fairly new to de novo transcriptome assembly, but my Assemblies take a really long time to finish, which doesn't line up with the performance benchmarking on http://trinityrnaseq.github.io/performance/

I am running Trinity v2.11.0 through a Singularity container with 50 cores and 500GB of RAM like this:

singularity exec trinityrnaseq.v2.11.0.simg Trinity --seqType fq --max_memory 500G --samples_file sample_list.txt --CPU 50 


My input files are 9 samples (RNAseq experiments with 3 replicates of 3 different tissues), each of them with about 70M reads (paired). Reads have been preprocessed with fastp.

After 5 days Trinity reached Phase 2, and after another 8 days Phase 2 is at 34% completed, with it making currently 2-5% progress a day. 

I have different runs from different species (all comparable regarding read number and length), which seem to also progress very slowly. 

Looking through the log, it does seem to run without errors.

Is this normal, or am I missing something?

Brian Haas

unread,
Feb 18, 2021, 8:11:53 AM2/18/21
to matthia...@gmx.de, trinityrnaseq-users
hi,

Trinity definitely runs more efficiently on some systems than others. If it's not a RAM or CPU issue, the speed of the file system is a big player, and running at high CPUs with a file system that's stressed will probably reduce the overall efficiency.  I tend to not run it with CPU set > 10 or 20, but it totally depends on what your systems capabilities are.

For fast phase-2, I use the --grid_exec functionality and spawn the commands on our internal compute farm tied to a shared networked file system.

We also have Trinity set up on Terra to leverage the cloud:  https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-On-Terra  (but that costs $ like most any similar cloud computing system on google or aws).

hth,

~b 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/993bd788-e313-4425-9528-a8c8c56a0b3en%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

matthia...@gmx.de

unread,
Feb 22, 2021, 5:30:53 AM2/22/21
to trinityrnaseq-users
Thanks for the reply, Brian!

I transferred everything to a different Computing environment and the job finished over the course of the weekend (even without --grid_exec), while the original is still running and is currently at 40%. So yeah, might indeed be the system. I would never have thought it'd make that much of a difference.

Thank you for the help

- Matthias

Brian Haas

unread,
Feb 22, 2021, 7:34:31 AM2/22/21
to matthia...@gmx.de, trinityrnaseq-users
great to hear.

all the best!

~brian

Reply all
Reply to author
Forward
0 new messages