Trinity resource usage monitoring in v2.4.0

178 views
Skip to first unread message

Brian Haas

unread,
Feb 10, 2017, 9:51:50 AM2/10/17
to trinityrn...@googlegroups.com
Greetings all,

For those interested in exploring Trinity resource usage, we have some additional monitoring capabilities that we've made more accessible in the latest release.  You can explore this here:


best,

~brian


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Mikhail

unread,
Feb 14, 2017, 8:00:50 PM2/14/17
to trinityrnaseq-users
Hi Brian!

I am currently trying to optimize Trinity and need to check how the resources are being used. So I just tried to use this runtime profiling system. I get really weird results (please see attached PDF file generated by examine_resource_usage_profiling.pl). Looks like no all the relevant processes are captured.
If necessary please find also trinity run output and .dat file generated by collectl

All the best,
Mikhail
collectl.plot.pdf
Trinity.out
collectl.dat

Brian Haas

unread,
Feb 14, 2017, 9:02:04 PM2/14/17
to Mikhail, trinityrnaseq-users
Hi Mikhail,

The reason the plot looks weird here is because the whole job only took a couple of minutes to run, and the monitoring is set to poll the system once a minute.

You can change the polling time via:

      --monitor_sec

parameter.  For this super short running job, you can set it at one second


      --monitor_sec  1

and hopefully the plots will make more sense.


best,


~b



--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Mikhail

unread,
Feb 16, 2017, 12:13:27 PM2/16/17
to trinityrnaseq-users, mik...@lji.org
That was a test run involving very small assembly. So I created another test subset which takes couple of hours to assemble. 
This is how the resource usage looks like (attached).
What disturbs me is that though I requested 24 CPU and 100 GB of RAM there is no point when the resources are fully utilized. Can this mean that the assembly can be potentially sped up? Also my cluster quota is counted on the usage of 24 CPUs in this case  : - )) which is not cost-effective.
What if I use grid parallelization by providing --grid_exec script? Can this give any boost in performance?
To post to this group, send email to trinityrn...@googlegroups.com.
collectl.plot.pdf
Trinity_output.txt

Brian Haas

unread,
Feb 16, 2017, 4:18:50 PM2/16/17
to Mikhail, trinityrnaseq-users
The grid exec should make the 2nd phase run much faster.   Also, I think there are some processes that are not being captured that should be accounted for in the reporting.  I'll look into that shortly.

~b

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsubscribe...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.

Mikhail

unread,
Feb 16, 2017, 4:43:49 PM2/16/17
to trinityrnaseq-users, mik...@lji.org
If there are some debug runs I can make on my system I am ready to help.

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,
Feb 16, 2017, 4:46:10 PM2/16/17
to Mikhail, trinityrnaseq-users
Sounds good.  

I'd suggest cloning the 'devel' branch of Trinity, and then you can just do a git pull to update it once I've got further updates for you.


   git checkout devel


more later,

~b

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsubscribe...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsubscribe...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.

Brian Haas

unread,
Feb 16, 2017, 5:39:32 PM2/16/17
to Mikhail, trinityrnaseq-users
Hi Mikhail,

ParaFly is now included in the monitoring output, and most of what it's doing is running the 2nd phase trinity process, so it should reflect the number of 'active' 2nd phase trinity processes running concurrently.


I'll run a few tests on our end to see how things look with our test data sets.

more later,

~brian

Mikhail

unread,
Feb 16, 2017, 8:50:54 PM2/16/17
to trinityrnaseq-users, mik...@lji.org
Great! Now it is clear that ParaFly actually fully exploits the CPU resources.

It also clear that memory demands of GG assembly are really low and CPU is the bottleneck. I should try to use grid execution. I have a question concerning it though. Where the amount of grid jobs which will be submitted is specified? Is it related to --CPU parameter? 

-Mikhail
collectl.plot.pdf

Brian Haas

unread,
Feb 16, 2017, 9:12:44 PM2/16/17
to Mikhail, trinityrnaseq-users

Not bad.  Mine is attached w/ --CPU 10.  It didn't quite get to 10 - mostly around 5.  I'm trying again w/ --CPU 20 for comparison.   It might be that ParaFly could use some optimization wrt the multithreading settings.


Our info on configuring execution for a compute farm (LSF, SGE, SLURM, or PBS) is here:

The --CPU settings won't matter when that's set, since the grid exec configuration will determine how jobs are batched, and your throughput on your compute farm will ultimately determine how much parallelization you can achieve.

~b
Screen Shot 2017-02-16 at 8.58.52 PM.png

Brian Haas

unread,
Feb 17, 2017, 3:23:37 PM2/17/17
to Mikhail, trinityrnaseq-users
I made a few more updates to better improve on the monitoring.

Here's what the latest (next release) reports will look like:


I'll put a couple more examples together for comparison.

best,

~b


Brian Haas

unread,
Feb 18, 2017, 10:12:33 AM2/18/17
to Mikhail, trinityrnaseq-users

I added a plot for running our 50 M PE mouse data set through:


There are still a few tweaks we should make to it, like putting the I/O on a log scale, but overall I think it's looking pretty good now.

best,

~brian

Mikhail

unread,
Feb 23, 2017, 1:43:47 PM2/23/17
to trinityrnaseq-users, mik...@lji.org
Here are the results for the big dataset. 
(Redundant lines in log file are removed to reduce size)

Look good. Here is what I noticed though:
1. Now very low level processes (like touch, which etc) are captured. They are not very informative.
2. In this genome guided run the first phase (which actually takes most of time to run) still does not seem to use the whole memory and CPU. Is there anything I can do to speed it up? 
BAM file parsing seems to be parallelizable process. 

Thanks,
Mikhail 
collectl.plot.pdf
Trinity_out.txt

Brian Haas

unread,
Feb 23, 2017, 7:53:25 PM2/23/17
to Mikhail, trinityrnaseq-users
Thanks, Mikhail!

The latest devel branch code for the monitoring should capture just the subset of trinity processes and not everything as shown here.  (Note, I don't recommend using the rest of the devel trinity code right now... just the monitoring code if you want it.  The other pieces there are under various stages of development).

The genome-guided setup needs to be optimized... it just hasn't been a huge priority and resources are tight.

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,
Feb 25, 2017, 9:42:56 AM2/25/17
to Mikhail, trinityrnaseq-users

I should be able to get the parallel genome-guided prep step integrated in the next release.

~b
Reply all
Reply to author
Forward
0 new messages