LSF grid configuration

329 views
Skip to first unread message

Dan Browne

unread,
Mar 17, 2015, 12:18:43 PM3/17/15
to trinityrn...@googlegroups.com
Hi all,

For utilizing a grid with the --grid_conf flag, on the website it says to have the following commands in the grid_configuration_file.txt:

grid=LSF
cmd=bsub -q regevlab -R "rusage[mem=10]"
mount_test=T
max_nodes=500
cmds_per_node=100

However, reading through the LSF Users Manual, I noticed that it says the -I (uppercase i) flag is required to submit an interactive job, which I assume is what Trinity is doing? Also in my case - and maybe in others' as well - I will not be submitting to a queue. Given our 20-core NeXtScale nodes, I am thinking my command should look more like this?:

cmd=bsub -I -n 20 -R "select[nxt] span[ptile=20] rusage[mem=60000]" -M 3000

I think this command template should delegate 1 node per job, utilizing all 20 cores within the node. The LSF system will only select nodes that have at least 60GB memory (they all have at least 64GB) and the per-process enforceable memory limit is 3GB (with potentially 20 processes running in parallel, if they each had 3GB, that would eat up a total of 60GB).

Any feedback on the accuracy of my thoughts here would be appreciated.

Thanks,
Dan

Brian Haas

unread,
Mar 17, 2015, 12:29:19 PM3/17/15
to Dan Browne, trinityrn...@googlegroups.com
Hi Dan,

This is for the massively parallel steps that require little RAM and few cores.  If you can try running the sample data through with *our* defaults and see how that goes first, that might be best.

The jobs should not be interactive.  In short, what the system does under the hood is to batch jobs, write a single shell script that runs the batch while on the node, and write status info to certain files.  The primary job (the main Trinity job) checks the status of all these non-interactive jobs.

Ideally, these individual jobs use one core and can execute with under 5G RAM, and you can get greater runtime efficiency by just increasing your parallel throughput.

best,

~brian


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Dan Browne

unread,
Mar 17, 2015, 1:57:17 PM3/17/15
to Brian Haas, trinityrn...@googlegroups.com
Hey Brian,

Thanks for the quick feedback. I'm a little confused on how this is supposed to work. Let me give you a little background about how our system is configured and how we submit jobs - maybe it's different than what you've experienced?

We start by writing a job script, contained in a text file, for example, the following job script (we'll call it Example.job) is one I used recently to submit a Trinity assembly job:

#BSUB -J 2015.03.08_Race.A_Trinity_Assembly
#BSUB -L /bin/bash
#BSUB -W 200:00
#BSUB -n 40
#BSUB -R "span[ptile=40]"
#BSUB -q xlarge
#BSUB -R "rusage[mem=1000000]"
#BSUB -M 800000
#BSUB -o %J_Output
#BSUB -P Race.A_Transcriptome_Assembly
#
##
module load Westmere
#
module load Trinity/2014-07-17-ictce-6.3.5-jellyfish-2.1.3-nooptarch
#
module load Bowtie/1.0.0-ictce-7.1.2
#
module load SAMtools/1.0-ictce-6.3.5
#
cd $SCRATCH/2015.03.08_Race.A_Transcriptome_Assembly
#
Trinity --seqType fq --JM 500G --single /scratch/user/dbrowne/B.braunii_NGS_Data/B.braunii_RNAseq_Data/A_Race_RNAseq_Data/Race.A_Time_Course/B.braunii_race.A_Yamanaka_Day.03.fastq,/scratch/user/dbrowne/B.braunii_NGS_Data/B.braunii_RNAseq_Data/A_Race_RNAseq_Data/Race.A_Time_Course/B.braunii_race.A_Yamanaka_Day.05.fastq,/scratch/user/dbrowne/B.braunii_NGS_Data/B.braunii_RNAseq_Data/A_Race_RNAseq_Data/Race.A_Time_Course/B.braunii_race.A_Yamanaka_Day.08.fastq,/scratch/user/dbrowne/B.braunii_NGS_Data/B.braunii_RNAseq_Data/A_Race_RNAseq_Data/Race.A_Time_Course/B.braunii_race.A_Yamanaka_Day.13.fastq,/scratch/user/dbrowne/B.braunii_NGS_Data/B.braunii_RNAseq_Data/A_Race_RNAseq_Data/Race.A_Time_Course/B.braunii_race.A_Yamanaka_Day.22.fastq --run_as_paired --CPU 40 --min_contig_length 200 --normalize_reads --grid_conf ~dbrowne/B.braunii_race.A_Transcriptome_Assembly/2015.03.08_Race.A_Trinity_Test_Assembly/Trinity_Grid_Configuration.txt --trimmomatic --quality_trimming_params "SLIDINGWINDOW:10:25  MINLEN:100" --output 2015.03.08.Race.A_trinity_assembly
#

I would then initiate the job by doing the following:

$ bsub < Example.job

Is this sort of setup compatible with the way Trinity handles the LSF job submission system? Maybe I would be better off not trying to take advantage of the grid utilization capability of Trinity?

Thanks,
Dan


Brian Haas

unread,
Mar 17, 2015, 2:04:42 PM3/17/15
to Dan Browne, trinityrn...@googlegroups.com
Hi Dan,

I think it should be fine, as long as your initial Trinity job can then run 'bsub' to submit additional jobs.

Try it out on the small sample data set and see if it works.  If it doesn't, there's probably a little bit of tuning to do to make it work, and we can hopefully help you through it.

best,

~brian


Dan Browne

unread,
Mar 17, 2015, 2:25:00 PM3/17/15
to Brian Haas, trinityrn...@googlegroups.com
Okay - please forgive my ignorance, but I need a little guidance on how to run this test.

In the sample_data/ directory, I see the following:

[XXX dbrowne]$ cd XXX/Trinity/trinityrnaseq-2.0.6-westmere/sample_data/
Makefile                           test_edgeR_diff_expr/              test_GOSeq_trinotate_pipe/         test_Trinity_Assembly/
__regression_tests/                test_full_edgeR_pipeline/          test_Inchworm/                     test_Trinity_Coding_Extraction/
test_align_and_estimate_abundance/ test_GenomeGuidedTrinity/          test_InSilicoReadNormalization/ 


Where should I go from here? I see in the test_Trinity_Assembly directory that there is some sample data, e.g. reads2.left.fq.gz and reads2.right.fq.gz.

Shall I construct a job script to run Trinity with the --grid_conf flag and using the above mentioned data?

Finally, what exactly is the default option for the grid_configuration_file.txt? Is there an example somewhere in the sample_data directory?

Thanks,
Dan

Dan Browne

unread,
Mar 17, 2015, 2:28:24 PM3/17/15
to Brian Haas, trinityrn...@googlegroups.com
Oh sorry, I guess I should've searched a little more before posting that! I see in the /misc_run_tests that there is a file called __runMe_using_Grid_LSF.sh

I'll run that real quick and see what happens!

Dan

Brian Haas

unread,
Mar 17, 2015, 2:36:44 PM3/17/15
to Dan Browne, trinityrn...@googlegroups.com
Right - just change the --grid_conf to point to your grid.conf file with your queue name.

best,

~b

Dan Browne

unread,
Mar 17, 2015, 3:28:44 PM3/17/15
to Brian Haas, trinityrn...@googlegroups.com
Alright, I got it working on the sample data! The default command pretty much works fine. I also tried the following command:

cmd=bsub -n 20 -R "select[nxt] span[ptile=20] rusage[mem=60000]" -M 5000

And that worked as well!

I've attached a file of the stdout I got with the default command (with modified queue specification), just in case you're interested.

Thanks again for the quick help. Now I can make this work a lot faster!

Dan
2015.03.17_Trinity_Grid_LSF_Testing_1.txt

Brian Haas

unread,
Mar 17, 2015, 3:36:12 PM3/17/15
to Dan Browne, trinityrn...@googlegroups.com
Fantastic!

Pretty easy, right?   :)

-Brian
(by iPhone)

<2015.03.17_Trinity_Grid_LSF_Testing_1.txt>

Dan Browne

unread,
Mar 17, 2015, 3:49:59 PM3/17/15
to Brian Haas, trinityrn...@googlegroups.com
Yes, not too hard at all - once I got past all the errors from some mistakes in files and permission errors from our system. A couple things I noted:

In the file __runMe_using_Grid_LSF.sh, the command is:

./../Trinity --seqType fq --JM 2G --left reads.left.fq.gz --right reads.right.fq.gz --SS_lib_type RF --CPU 4 --grid_conf_file ../../htc_conf/BroadInst_LSF.test.conf

The flag --grid_conf_file should be --grid_conf

The flag --JM should be --max_memory

../../htc_conf doesn't exist, it should be ../../hpc_conf

To get it working, I fixed the above errors and copied reads.left.fq.gz and reads.right.fq.gz into my own scratch directory and ran the command from there and it worked. Permission errors prevented me from running the command from the intended sample_data/test_Trinity_Assembly directory.

Thanks again! I'm sure it won't be too long before I have more questions haha.

Dan


Brian Haas

unread,
Mar 17, 2015, 4:01:32 PM3/17/15
to Dan Browne, trinityrn...@googlegroups.com
Excellent.

yeah - that __runMe script hasn't been updated to reflect the new usage for Trinity 2.0.

glad it's working!


~brian

Sara Haines

unread,
Apr 25, 2015, 2:43:00 PM4/25/15
to trinityrn...@googlegroups.com, dbrow...@gmail.com
Brian and Dan:  Thanks for having this discussion.  This post has been most helpful for speeding up Phase 2 (Butterfly) on our cluster.  We are struggling to get our metatranscriptome assembly to finish before getting kicked off the "week" queue.  So want to utilize the massively parallel option.  I'm trying to figure out how to tune it to work in our system.

Following along with this post, I have been able to get our cluster to accept the --grid_conf configuration file for our data.  

grid=LSF
cmd=bsub -q hour -M 10 -n 8 -R "span[ptile=8]"
mount_test=T
max_nodes=20
cmds_per_node=1000

This is the overarching job, 

bsub -q week -n 8 -R "span[hosts=1]" -M 128 Trinity --seqType fq --max_memory 128G \
    --normalize_reads --left ${R1} --right ${R2} \
    --grid_conf ./grid.conf --CPU 8

I thought that this would run at least 8 commands from the generated shell (with 1000 commands, e.g. J13795.S0.sh) at a given time on one node.  However, using top on the given node, it shows one command running and never see other CPU/core utilization.   It is a little confusing because it is one bsub job firing off other bsubs.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                        
27685 haines    25   0 89932 8148 1952 S  1.7  0.0   0:00.05 perl                           
27786 haines    25   0  402m  60m 1356 R  0.3  0.1   0:00.01 GraphFromFasta                 
10308 haines    15   0 19780 2316 1340 S  0.0  0.0   0:00.00 res                            
10321 haines    17   0 65964 1164  964 S  0.0  0.0   0:00.00 1429982431.3508                
10323 haines    16   0 65968 1236  988 S  0.0  0.0   0:00.24 J13795.S0.sh                   
10465 haines    15   0 91044 1812 1036 S  0.0  0.0   0:00.04 sshd                           
10469 haines    18   0 68320 1800 1300 S  0.0  0.0   0:00.01 bash                           
10610 haines    15   0 13032 1388  848 R  0.0  0.0   0:00.23 top                            
27785 haines    25   0 65964 1152  948 S  0.0  0.0   0:00.00 sh

Whereas when we run without --grid_conf on a given node, it utilizes more of the CPUs  requested

bsub -q week -n 8 -R "span[hosts=1]" -M 128 Trinity --seqType fq --max_memory 128G \
    --normalize_reads --left ${R1} --right ${R2} --CPU 8             

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                        
  424 haines    18   0  402m 386m 1364 R 23.5  0.0   0:00.71 GraphFromFasta                 
  426 haines    18   0  402m 386m 1364 R 23.2  0.0   0:00.70 GraphFromFasta                 
  597 haines    18   0 4653m  38m  10m S 21.9  0.0   0:00.66 java                           
  640 haines    18   0  466m 450m 1364 R 13.9  0.0   0:00.42 GraphFromFasta                 
14493 haines    15   0  854m 692m 1260 S  9.6  0.1 446:56.85 ParaFly                        
  435 haines    18   0 90036 8152 1952 S  4.3  0.0   0:00.13 perl                           
  445 haines    18   0 90036 8152 1952 S  4.3  0.0   0:00.13 perl                           
  446 haines    18   0 90036 8156 1956 S  4.3  0.0   0:00.13 perl                           
32255 haines    17   0 90036 8188 1956 S  4.3  0.0   0:00.13 perl                           
32265 haines    18   0 90036 8160 1956 S  4.3  0.0   0:00.13 perl                           
32314 haines    18   0 90036 8148 1952 S  4.3  0.0   0:00.13 perl                           
  322 haines    18   0 90036 8156 1952 S  3.6  0.0   0:00.11 perl                           
  769 haines    18   0 90036 8148 1944 S  3.3  0.0   0:00.10 perl                           
  950 haines    17   0  402m  88m 1356 R  2.0  0.0   0:00.06 GraphFromFasta                 
  959 haines    17   0  402m  54m 1356 R  1.0  0.0   0:00.03 GraphFromFasta                 
  418 haines    18   0 65960 1156  952 S  0.3  0.0   0:00.01 sh                             
12527 haines    16   0 13152 1516  848 R  0.3  0.0   0:00.19 top                            
  421 haines    18   0 65960 1160  952 S  0.0  0.0   0:00.00 sh                             
  590 haines    17   0 18072 1340 1152 S  0.0  0.0   0:00.00 ParaFly                        
  639 haines    18   0 65960 1160  952 S  0.0  0.0   0:00.00 sh                             
  947 haines    19   0 65960 1160  952 S  0.0  0.0   0:00.00 sh                             
  958 haines    19   0 65960 1156  952 S  0.0  0.0   0:00.00 sh                             
  976 haines    18   0  357m 7600 4964 S  0.0  0.0   0:00.00 java                           
  980 haines    18   0 65960 1160  956 S  0.0  0.0   0:00.00 bash                           
  982 haines    18   0 18192 1588 1364 R  0.0  0.0   0:00.00 inchworm                       
 6863 haines    15   0 90256 1808 1032 S  0.0  0.0   0:00.00 sshd                           
 6867 haines    17   0 68316 1812 1304 S  0.0  0.0   0:00.02 bash                           
14860 haines    15   0 19788 2320 1340 S  0.0  0.0   0:00.20 res                            
14861 haines    17   0 65960 1184  976 S  0.0  0.0   0:00.00 1429566354.8596                
14863 haines    18   0  158m  11m 2040 S  0.0  0.0   3:31.51 perl                           

Is there something to specify in grid.conf to make each shell command run on parallel CPUs for a given node?  Or should I just tune max_nodes and cmds_per_node for multiple nodes but one CPU?  

Also, how does grid_conf help chrysalis?  (I started this on our data after getting thru chrysalis.)


Dan wrote:
cmd=bsub -n 20 -R "select[nxt] span[ptile=20] rusage[mem=60000]" -M 5000

And that worked as well!

Dan: How did you get yours to run parallel CPUs on a given node?  

BTW, we are using Trinity 2.0.6.  

Thanks so much again for all the previous details.  We are so much farther along because them.  

s

Brian Haas

unread,
Apr 25, 2015, 3:05:58 PM4/25/15
to Sara Haines, trinityrn...@googlegroups.com, Dan Browne
Hi Sara,

responses below

On Sat, Apr 25, 2015 at 2:43 PM, Sara Haines <sara.m...@gmail.com> wrote:
Brian and Dan:  Thanks for having this discussion.  This post has been most helpful for speeding up Phase 2 (Butterfly) on our cluster.  We are struggling to get our metatranscriptome assembly to finish before getting kicked off the "week" queue.  So want to utilize the massively parallel option.  I'm trying to figure out how to tune it to work in our system.

Following along with this post, I have been able to get our cluster to accept the --grid_conf configuration file for our data.  

grid=LSF
cmd=bsub -q hour -M 10 -n 8 -R "span[ptile=8]"
mount_test=T
max_nodes=20
cmds_per_node=1000

This is the overarching job, 

bsub -q week -n 8 -R "span[hosts=1]" -M 128 Trinity --seqType fq --max_memory 128G \
    --normalize_reads --left ${R1} --right ${R2} \
    --grid_conf ./grid.conf --CPU 8

I thought that this would run at least 8 commands from the generated shell (with 1000 commands, e.g. J13795.S0.sh) at a given time on one node.  However, using top on the given node, it shows one command running and never see other CPU/core utilization.   It is a little confusing because it is one bsub job firing off other bsubs.



The --CPU 8 will leverage 8 cores for the initial phase (read clustering) of Trinity. If you don't specify the --grid_conf, then it'll spawn 8 parallel jobs for doing the assemblies in the second phase. If you use --grid_conf, then all the parent process does is to submit assemblies to LSF in phase 2, according to the params in your conf file.


 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                        
27685 haines    25   0 89932 8148 1952 S  1.7  0.0   0:00.05 perl                           
27786 haines    25   0  402m  60m 1356 R  0.3  0.1   0:00.01 GraphFromFasta                 
10308 haines    15   0 19780 2316 1340 S  0.0  0.0   0:00.00 res                            
10321 haines    17   0 65964 1164  964 S  0.0  0.0   0:00.00 1429982431.3508                
10323 haines    16   0 65968 1236  988 S  0.0  0.0   0:00.24 J13795.S0.sh                   
10465 haines    15   0 91044 1812 1036 S  0.0  0.0   0:00.04 sshd                           
10469 haines    18   0 68320 1800 1300 S  0.0  0.0   0:00.01 bash                           
10610 haines    15   0 13032 1388  848 R  0.0  0.0   0:00.23 top                            
27785 haines    25   0 65964 1152  948 S  0.0  0.0   0:00.00 sh

Whereas when we run without --grid_conf on a given node, it utilizes more of the CPUs  requested

It depends on which phase it's in.  If it's in phase 1, then both jobs will have identical performance and resource utilization. If it's in phase 2, then they'll behave very differently (as described above).
I would keep it at one core per node and just crank up the number of parallel instances.  I tend to do this at 10 to 100 jobs per node, and use up to 500 nodes.


 
Also, how does grid_conf help chrysalis?  (I started this on our data after getting thru chrysalis.)



It depends....  we now use chrysalis in phase 1 as well as in phase 2.  So, in phase 1 it's not going to help.  In phase 2, it should help tremendously.    The original architecture of Trinity was such that the phase 1 and 2 were intertwined, and in Trinity 2.0 they're decoupled.



 
Dan wrote:
cmd=bsub -n 20 -R "select[nxt] span[ptile=20] rusage[mem=60000]" -M 5000

And that worked as well!

Dan: How did you get yours to run parallel CPUs on a given node?  

BTW, we are using Trinity 2.0.6.  

Thanks so much again for all the previous details.  We are so much farther along because them.  

s

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Dan Browne

unread,
Jun 10, 2015, 11:03:33 AM6/10/15
to trinityrn...@googlegroups.com, dbrow...@gmail.com
Hey Sara,

Sorry for the really delayed response! Maybe you've already figured out some of these questions, but I'll chime in anyways just in case.

The --grid_conf option isn't going to make a huge difference to the processes of the main job that you're running, it's going to farm out the processing for stage 2 in lots of little chunks, submitting independent jobs to the cluster through LSF. You'll be able to see this when it's happening with the "bjobs" command.

You'll want to be sure to coordinate the cmd template in your grid.conf file with the --grid_node_CPU and --grid_node_max_memory options in the overarching command. For example, if your grid.conf file looks like:

grid=LSF
cmd=bsub -q hour -M 10 -n 5 -R "span[ptile=5]"
mount_test=T
max_nodes=500
cmds_per_node=20

Then you'll want your overarching Trinity command to look like:

Trinity --seqType fq --max_memory 128G --normalize_reads --left ${R1} --right ${R2} --CPU 8 --grid_conf ./grid.conf --grid_node_CPU 5 --grid_node_max_memory 10G

As Brian mentioned, it's better than have more max_nodes and fewer cmds_per_node. This will really take advantage of the parallel nature of the cluster and dramatically reduce your assembly time. I can start and finish assemblies in 5-8 hours utilizing the --grid_conf option on our cluster. I'm not sure how big your cluster is, but you should try to use as many nodes as you can. For example, our cluster has about 700 nodes with 20 cores each. I usually have my grid.conf file set very similarly to what I've described above. The command template submits jobs requesting 5 cores at a time, running 20 commands per job with the 5 cores running in parallel. Many 5-core jobs are submitted in parallel - I think the most cores I've been using in parallel at one time was about 4,000 (out of the roughly 14,000 available on our cluster).

Hope this helps!

Dan

Brian Haas

unread,
Jun 14, 2015, 10:50:20 AM6/14/15
to Dan Browne, trinityrn...@googlegroups.com
Dan - it seems you've got this working pretty well.  In case you weren't aware, we have:


so you can use the grid dispatch system for all sorts of general computes.

best,

~brian


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Abhijit Sanyal

unread,
Oct 11, 2018, 12:08:53 AM10/11/18
to trinityrnaseq-users
Hi Brian,

I am reviving this old post since I am facing a challenge running Trinity on the LSF. The post describes the parameters for an older version of Trinity and I have the latest version. I have used the below command to execute Trinity on the LSF.

${TRINITY_HOME}/Trinity --seqType fq --max_memory 2G --left reads.left.fq.gz --right reads.right.fq.gz --SS_lib_type RF --CPU 8 --grid_exec "$GRID_SCRIPT" --output /anno/sanyalab/TESTAREA/trinity_test --full_cleanup > test_trinity.out 2>test_trinity.err

My "$GRID_SCRIPT" is as follows

GRID_SCRIPT="$TRINITY_HOME/trinity-plugins/HpcGridRunner-1.0.2/hpc_cmds_GridRunner.pl --grid_conf $TRINITY_HOME/trinity-plugins/HpcGridRunner-1.0.2/hpc_conf/Phi_LSF.100.conf -c"


The config file I am giving to "hpc_cmds_GridRunner.pl" is as follows

# grid type:
grid
=LSF

# template for a grid submission
cmd
=bsub -q prod -P ti-assembly-txpt -M 20 -R "rusage[mem=20,scr=100]"
# note -e error.file -o out.file are set internally, so dont set them in the above cmd.

# uses the LSF feature to pre-exec and check that the file system is mounted before executing.
# this helps when you have some misbehaving grid nodes that lost certain file mounts.
mount_test
=T

##########################################################################################
# settings below configure the Trinity job submission system, not tied to the grid itself.
##########################################################################################

# number of grid submissions to be maintained at steady state by the Trinity submission system
max_nodes
=200

# number of commands that are batched into a single grid submission job.
cmds_per_node
=50


Attached is the output file "test_trinity.out" which has the errors. Another file that I am attaching is the "recursive_trinity.cmds.hpc-cache_success.__failures" file. In this file the options are all changed from what I have given in the command line.

Please let me know what changes I need to make to the command to get this working.

Thanks
Abhijit


On Sunday, June 14, 2015 at 8:20:20 PM UTC+5:30, Brian Haas wrote:
Dan - it seems you've got this working pretty well.  In case you weren't aware, we have:


so you can use the grid dispatch system for all sorts of general computes.

best,

~brian

On Wed, Jun 10, 2015 at 11:03 AM, Dan Browne <dbrow...@gmail.com> wrote:
Hey Sara,

Sorry for the really delayed response! Maybe you've already figured out some of these questions, but I'll chime in anyways just in case.

The --grid_conf option isn't going to make a huge difference to the processes of the main job that you're running, it's going to farm out the processing for stage 2 in lots of little chunks, submitting independent jobs to the cluster through LSF. You'll be able to see this when it's happening with the "bjobs" command.

You'll want to be sure to coordinate the cmd template in your grid.conf file with the --grid_node_CPU and --grid_node_max_memory options in the overarching command. For example, if your grid.conf file looks like:

grid=LSF
cmd=bsub -q hour -M 10 -n 5 -R "span[ptile=5]"
mount_test=T
max_nodes=500
cmds_per_node=20

Then you'll want your overarching Trinity command to look like:

Trinity --seqType fq --max_memory 128G --normalize_reads --left ${R1} --right ${R2} --CPU 8 --grid_conf ./grid.conf --grid_node_CPU 5 --grid_node_max_memory 10G

As Brian mentioned, it's better than have more max_nodes and fewer cmds_per_node. This will really take advantage of the parallel nature of the cluster and dramatically reduce your assembly time. I can start and finish assemblies in 5-8 hours utilizing the --grid_conf option on our cluster. I'm not sure how big your cluster is, but you should try to use as many nodes as you can. For example, our cluster has about 700 nodes with 20 cores each. I usually have my grid.conf file set very similarly to what I've described above. The command template submits jobs requesting 5 cores at a time, running 20 commands per job with the 5 cores running in parallel. Many 5-core jobs are submitted in parallel - I think the most cores I've been using in parallel at one time was about 4,000 (out of the roughly 14,000 available on our cluster).

Hope this helps!

Dan

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.
test_trinity.out
recursive_trinity.cmds.hpc-cache_success.__failures

Brian Haas

unread,
Oct 11, 2018, 9:25:27 AM10/11/18
to Abhijit Sanyal, trinityrnaseq-users
Hi,

I'm not sure what the LSF errors are.   The best thing to do is to troubleshoot gridrunner directly before running trinity with it.  Here, you can create any commands and see if you can run them through gridrunner.  We can troubleshoot gridrunner from its own forum.

best,

~b

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages