Trinity Phase 2: Assembling Clusters of Reads running very slowly

1,469 views
Skip to first unread message

Luke Gardner

unread,
Mar 23, 2015, 7:32:15 PM3/23/15
to trinityrn...@googlegroups.com
Hi all,

I'm running trinity 2.0.6 on a high performance cluster but it has been going for nearly 8 days now and seems to still have a long way to go in "Trinity Phase 2: Assembling Clusters of Reads". I was hoping that someone could comment as to whether this is normal or if there is something wrong. I'm trying to assemble ~ 210 million paired end reads for a transcriptome. I'm limited to 2 days run time on my HPC before it times out. I restart the exact same trinity script each time it times out and begin at the last completed checkpoint. 

Below is my trinity script:
Trinity --normalize_reads --seqType fq --max_memory 120G --left /share/PI/bblock/Luke/raw_reads/11344_cat_forward_paired.fq  --right /share/PI/bblock/Luke/raw_reads/11344_\
cat_reverse_paired.fq --output /share/PI/bblock/Luke/trinity --CPU 10

My log.err file shows no errors other than the HPC timed out. My latest log.out however shows that the following

Trinity version: v2.0.6
-currently using the latest production release of Trinity.

Monday, March 23, 2015: 06:46:17        CMD: java -Xmx64m -jar /home/lgardner/trinityrnaseq-2.0.6/util/support_scripts/ExitTester.jar 0
Monday, March 23, 2015: 06:46:17        CMD: java -Xmx64m -jar /home/lgardner/trinityrnaseq-2.0.6/util/support_scripts/ExitTester.jar 1


----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads  ---------------------
----------------------------------------------------------------------------------

---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 50 Coverage --
-- /share/PI/bblock/Luke/trinity/insilico_read_normalization --
---------------------------------------------------------------



#######################################################################
Inchworm file: /share/PI/bblock/Luke/trinity/inchworm.K25.L25.DS.fa detected.
Skipping Inchworm Step, Using Previous Inchworm Assembly
#######################################################################

--Skipping cmd: /home/lgardner/trinityrnaseq-2.0.6/util/misc/fasta_filter_by_min_length.pl /share/PI/bblock/Luke/trinity/inchworm.K25.L25.DS.fa 100 > /share/PI/bblock/Luke/trinity/chrysalis/inchworm.K25.L25.DS.fa.min100, checkpoint exists.
--Skipping cmd: bowtie-build -q /share/PI/bblock/Luke/trinity/chrysalis/inchworm.K25.L25.DS.fa.min100 /share/PI/bblock/Luke/trinity/chrysalis/inchworm.K25.L25.DS.fa.min100 2>/dev/null, checkpoint exists.
--Skipping cmd: bash -c " set -o pipefail; bowtie -a -m 20 --best --strata --threads 10  --chunkmbs 512 -q -S -f /share/PI/bblock/Luke/trinity/chrysalis/inchworm.K25.L25.DS.fa.min100 both.fa  | samtools view -@ 10 -F4 -Sb - | samtools sort -@ 10 -no - - > /share/PI/bblock/Luke/trinity/chrysalis/iworm.bowtie.nameSorted.bam"  2>/dev/null, checkpoint exists.
--Skipping cmd: /home/lgardner/trinityrnaseq-2.0.6/util/support_scripts/scaffold_iworm_contigs.pl /share/PI/bblock/Luke/trinity/chrysalis/iworm.bowtie.nameSorted.bam /share/PI/bblock/Luke/trinity/inchworm.K25.L25.DS.fa > /share/PI/bblock/Luke/trinity/chrysalis/iworm_scaffolds.txt 2>/dev/null, checkpoint exists.
--Skipping cmd: /home/lgardner/trinityrnaseq-2.0.6/Chrysalis/GraphFromFasta -i /share/PI/bblock/Luke/trinity/inchworm.K25.L25.DS.fa -r both.fa -min_contig_length 200 -min_glue 2 -glue_factor 0.05 -min_iso_ratio 0.05 -t 10 -k 24 -kk 48  -scaffolding /share/PI/bblock/Luke/trinity/chrysalis/iworm_scaffolds.txt  > /share/PI/bblock/Luke/trinity/chrysalis/GraphFromIwormFasta.out, checkpoint exists.
--Skipping cmd: /home/lgardner/trinityrnaseq-2.0.6/Chrysalis/CreateIwormFastaBundle -i /share/PI/bblock/Luke/trinity/chrysalis/GraphFromIwormFasta.out -o /share/PI/bblock/Luke/trinity/chrysalis/bundled_iworm_contigs.fasta -min 200 2>/dev/null , checkpoint exists.
--Skipping cmd: /home/lgardner/trinityrnaseq-2.0.6/Chrysalis/ReadsToTranscripts -i both.fa -f /share/PI/bblock/Luke/trinity/chrysalis/bundled_iworm_contigs.fasta -o /share/PI/bblock/Luke/trinity/chrysalis/readsToComponents.out -t 10 -max_mem_reads 10000000  2>/dev/null, checkpoint exists.
--Skipping cmd: /bin/sort -T . -S 120G -k 1,1n /share/PI/bblock/Luke/trinity/chrysalis/readsToComponents.out > /share/PI/bblock/Luke/trinity/chrysalis/readsToComponents.out.sort 2>/dev/null , checkpoint exists.


--------------------------------------------------------------------------------
------------ Trinity Phase 2: Assembling Clusters of Reads ---------------------
--------------------------------------------------------------------------------

Monday, March 23, 2015: 06:46:18        CMD: /home/lgardner/trinityrnaseq-2.0.6/trinity-plugins/parafly/bin/ParaFly -c recursive_trinity.cmds -CPU 10 -v
warning, command: /home/lgardner/trinityrnaseq-2.0.6/util/support_scripts/../../Trinity --single "/share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_0/c0.trinity.reads.fa" --output "/share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_0/c0.trinity.reads.fa.out" --CPU 1 --max_memory 1G --full_cleanup --seqType fa --trinity_complete   has successfully completed from a previous run.  Skipping it here.
warning, command: /home/lgardner/trinityrnaseq-2.0.6/util/support_scripts/../../Trinity --single "/share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_0/c1.trinity.reads.fa" --output "/share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_0/c1.trinity.reads.fa.out" --CPU 1 --max_memory 1G --full_cleanup --seqType fa --trinity_complete   has successfully completed from a previous run.  Skipping it here.
warning, command: /home/lgardner/trinityrnaseq-2.0.6/util/support_scripts/../../Trinity --single "/share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_0/c2.trinity.reads.fa" --output "/share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_0/c2.trinity.reads.fa.out" --CPU 1 --max_memory 1G --full_cleanup --seqType fa --trinity_complete   has successfully completed from a previous run.  Skipping it here.

Continues like this for all the completed commands until it reaches the end which follows as:


warning, command: /home/lgardner/trinityrnaseq-2.0.6/util/support_scripts/../../Trinity --single "/share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_0/c59.trinity.reads.ty/read_partitions/Fb_0/CBin_0/c59.trinity.reads.fa.out" --CPU 1 --max_memory 1G --full_cleanup --seqType fa --trinity_complete   has successfully completed from a previous r
warning, command: /home/lgardner/trinityrnaseq-2.0.6/util/support_scripts/../../Trinity --single "/share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_0/c60.trinity.reads.ty/read_partitions/Fb_0/CBin_0/c60.trinity.reads.fa.out" --CPU 1 --max_memory 1G --full_cleanup --seqType fa --trinity_complete   has successfully completed from a previous r
[lgardner@sherlock-ln01 ~/trinityrnaseq-2.0.6]$ emacs Trinity_coho_assembly.sh
[lgardner@sherlock-ln01 ~/trinityrnaseq-2.0.6]$ tail COHO4.out
succeeded(6)   0.00465893% completed.    WARNING, cannot remove output directory /share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_811/c81163.trinity.reads.fa.out, since not created in this run. (safety precaution)
succeeded(7)   0.00543542% completed.    WARNING, cannot remove output directory /share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_811/c81164.trinity.reads.fa.out, since not created in this run. (safety precaution)
succeeded(8)   0.0062119% completed.    WARNING, cannot remove output directory /share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_811/c81173.trinity.reads.fa.out, since not created in this run. (safety precaution)
succeeded(10)   0.00776488% completed.    WARNING, cannot remove output directory /share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_811/c81172.trinity.reads.fa.out, since not created in this run. (safety precaution)
succeeded(11)   0.00854137% completed.    WARNING, cannot remove output directory /share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_811/c81162.trinity.reads.fa.out, since not created in this run. (safety precaution)
succeeded(13)   0.0100943% completed.    WARNING, cannot remove output directory /share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_811/c81170.trinity.reads.fa.out, since not created in this run. (safety precaution)
succeeded(15)   0.0116473% completed.    WARNING, cannot remove output directory /share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_811/c81174.trinity.reads.fa.out, since not created in this run. (safety precaution)
succeeded(17)   0.0132003% completed.    WARNING, cannot remove output directory /share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_811/c81147.trinity.reads.fa.out, since not created in this run. (safety precaution)
succeeded(21)   0.0163062% completed.    WARNING, cannot remove output directory /share/PI/bblock/Luke/trinity/read_partitions/Fb_0/CBin_811/c81146.trinity.reads.fa.out, since not created in this run. (safety precaution)
succeeded(422)   0.327678% completed.

When I look in the read_partitions directory I find around 2100 of the Cbin* directories and I've only completed 811 of these directories after 8 days. My concern is that with each successive restart of the trinity command the percent completed of the  ParaFly -c recursive_trinity.cmds -CPU 10 -v commands after two days is getting less and less. Such that the first log.out indicated a total of 20% of these commands were completed, the next restart only completed 11%, the next 3% etc...

Memory doesn't seem to be the limiting factor at this stage as a 'top' shows my 10 java commands running simultaneously using only 5 G each.

I'm hoping someone can tell me if this is slow down is reasonable and expected or if something has gone wrong.

Thanks for you help,

Luke



Brian Haas

unread,
Mar 23, 2015, 8:30:29 PM3/23/15
to Luke Gardner, trinityrn...@googlegroups.com, ctat_t...@googlegroups.com
Hi Luke,

That does sound pretty odd.   I'm not sure that we've experienced a slowdown in the rate of butterfly executions over the course of a Trinity run. I'm CC'ing our performance group for comment here.

Note, you might try running it via our Trinity/Galaxy service available here:

I'd be curious to know how it runs there.

best,

~brian


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Fulton, Ben

unread,
Mar 26, 2015, 9:21:39 PM3/26/15
to Brian Haas, Luke Gardner, trinityrn...@googlegroups.com, ctat_t...@googlegroups.com

I think I’ve seen this happen before. I wonder if it has something to do with the total number of files, especially since the output directories from previous runs aren’t being removed?

 

Luke, what sort of file system does your system have? Would it be possible to add the –monitoring flag to one of your runs so we can get a profile?

 

--

Ben Fulton

Research Technologies

Scientific Applications and Performance Tuning

Indiana University

E-Mail: befu...@iu.edu

Ulrike Pfreundt

unread,
Apr 16, 2015, 11:43:10 AM4/16/15
to trinityrn...@googlegroups.com, bh...@broadinstitute.org, luke.ga...@gmail.com, ctat_t...@googlegroups.com
Hi,

I encounter the same problem. Well, if it is a problem. I have never used Trinity before, so don't know whether it is normal to run "Phase 2 : Assembling Clusters of Reads" so slowly.
I am running Trinity 2.0.6 with the following specifications:

time /data/Uli/programs/trinityrnaseq-2.0.6/Trinity --seqType fq --max_memory 300G --left T4T8_R1_non_rRNA_concat.fastq --right T4T8_R2_non_rRNA_concat.fastq --CPU 50 --SS_lib_type FR

Each fastq-file contains 40,957,444 reads.
I looked at the Cbin directories Luke mentioned and found it to be 6014 directories.

This is what is happening right now (Thursday, April 16th, 17:40):

Wednesday, April 15, 2015: 22:46:59     CMD: /data/Uli/programs/trinityrnaseq-2.0.6/trinity-plugins/parafly/bin/ParaFly -c recursive_trinity.cmds -CPU 50 -v
Number of Commands: 601435
succeeded(26209)   4.35774% completed.

As you can see form the time the command was started, it needs a few hours for completing one percent, so this will run forever.

As this is the first run of Trinity, there are obviously no folders from previous runs.
Can anyone tell me whether I should cancel this job and restart differently?

Thank you.
Best, Ulrike

--

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.


To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



 

--

--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

Will Holtz

unread,
Apr 16, 2015, 12:23:22 PM4/16/15
to Ulrike Pfreundt, trinityrn...@googlegroups.com, Brian Haas, luke.ga...@gmail.com, ctat_t...@googlegroups.com
Are your processes fitting within your RAM or have you started hitting your swap space on disk? If you are hitting swap, you should kill it and find some new parameters that can run without the swap space.

-Will


To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
The information contained in this e-mail message or any attachment(s) may be confidential and/or privileged and is intended for use only by the individual(s) to whom this message is addressed.  If you are not the intended recipient, any dissemination, distribution, copying, or use is strictly prohibited.  If you receive this e-mail message in error, please e-mail the sender at who...@lygos.com and destroy this message and remove the transmission from all computer directories (including e-mail servers).

Please consider the environment before printing this email.

Luke Gardner

unread,
Apr 16, 2015, 12:33:04 PM4/16/15
to Ulrike Pfreundt, trinityrn...@googlegroups.com, bh...@broadinstitute.org, ctat_t...@googlegroups.com
Hi Ulrike

I found that only a handful of my Cbin directories were responsible for the slow down - after I got through them (maybe 30 or so) the process completed for the remaining Cbin directories quite rapidly and in line with the estimations offered by the developers. One thing I noticed with a quick spot check was that the Cbin directories that took a long time to complete had more and larger contigs in '*.trinity.reads.fa.out.Trinity.fasta' files and the contigs had a lot more paths that went into generating them compared to the files in the cbin directories that took only minutes to complete. Sorry I can't be of more help other than to say it did finish for me but took longer that expected.  

On Thu, Apr 16, 2015 at 8:43 AM, Ulrike Pfreundt <upfr...@gmail.com> wrote:

Ulrike Pfreundt

unread,
Apr 16, 2015, 4:05:15 PM4/16/15
to Luke Gardner, who...@lygos.com, trinityrn...@googlegroups.com, bh...@broadinstitute.org, ctat_t...@googlegroups.com
Hi,
tanks for your quick answers.
At the moment no swap space is being used.
The process is at 5,37 % now. So it didnt really get any faster since this afternoon.
Maybe it just takes this long...
If there are more ideas, I am happy to provide more information.

Cheers, Ulrike

Mark Chapman

unread,
Apr 16, 2015, 6:29:33 PM4/16/15
to Ulrike Pfreundt, Luke Gardner, who...@lygos.com, trinityrn...@googlegroups.com, Brian Haas, ctat_t...@googlegroups.com
Hi Ulrike,

It would help some, though maybe not a lot, to run the normalisation step first (http://trinityrnaseq.github.io/trinity_insilico_normalization.html) then assemble your normalised reads.

It seems your slowness isn't related to this, but maybe it would make your prohibitively slow assembly into an annoyingly-slow-but-got-there-eventually assembly?

BW, Mark

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
Dr. Mark A. Chapman
+44 (0)2380 594396
------------------------------------
Centre for Biological Sciences
University of Southampton
Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ

Ulrike Pfreundt

unread,
Apr 28, 2015, 5:03:22 AM4/28/15
to trinityrn...@googlegroups.com, luke.ga...@gmail.com, who...@lygos.com, ctat_t...@googlegroups.com, upfr...@gmail.com, bh...@broadinstitute.org
I ran the digital normalization now as well.
I think the step that was so slow before was quicker now.
Still, it ran a total of 11 days. Input were around 40 million paired end seqs of 150 nt.
Maybe thats jsut the way it is...
:)

Best, Ulrike
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsubscribe...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



 

--

--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsubscribe...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--

Matias

unread,
Feb 3, 2017, 7:43:11 AM2/3/17
to trinityrnaseq-users
Hello, I am having the same original problem of phase 2 running too slow (1% every 2 days aprox.) with the latest version of trinity on a local computer. I am running trinity norm, below is the command line.  Data is60 millions 100bp paired end reads. Never have this problem before.

Any idea on the solution?

trinityrnaseq-Trinity-v2.3.2/Trinity--seqType fq --max_memory 64G --left reads1.fastq --right reads2.fastq --CPU 8

Thanks for your help


Matias

Brian Haas

unread,
Feb 3, 2017, 8:04:33 AM2/3/17
to Matias, trinityrnaseq-users
When phase 2 slows down it's sometimes due to the file system being very slow (e.g. Someone else is copying huge amounts of data around), another process using up available RAM, or something screwy happened with the multithreading.

If you run top, do you see multiple concurrent Trinity processes running commensurate with your CPU settings? Any other hints about available RAM or other competing processes consuming resources?

If you need to you can kill the process and restart it.  It'll pick up where it left off and hopefully pick up the pace.

-Brian
(by iPhone)

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages