LSF grid configuration

Dan Browne

unread,

Mar 17, 2015, 12:18:43 PM3/17/15

to trinityrn...@googlegroups.com

Hi all,

For utilizing a grid with the --grid_conf flag, on the website it says to have the following commands in the grid_configuration_file.txt:


grid=LSF


cmd=bsub -q regevlab -R "rusage[mem=10]"


mount_test=T


max_nodes=500


cmds_per_node=100

However, reading through the LSF Users Manual, I noticed that it says the -I (uppercase i) flag is required to submit an interactive job, which I assume is what Trinity is doing? Also in my case - and maybe in others' as well - I will not be submitting to a queue. Given our 20-core NeXtScale nodes, I am thinking my command should look more like this?:


cmd=bsub -I -n 20 -R "select[nxt] span[ptile=20] rusage[mem=60000]" -M 3000

I think this command template should delegate 1 node per job, utilizing all 20 cores within the node. The LSF system will only select nodes that have at least 60GB memory (they all have at least 64GB) and the per-process enforceable memory limit is 3GB (with potentially 20 processes running in parallel, if they each had 3GB, that would eat up a total of 60GB).

Any feedback on the accuracy of my thoughts here would be appreciated.

Thanks,
Dan

Brian Haas

unread,

Mar 17, 2015, 12:29:19 PM3/17/15

to Dan Browne, trinityrn...@googlegroups.com

Hi Dan,

This is for the massively parallel steps that require little RAM and few cores. If you can try running the sample data through with *our* defaults and see how that goes first, that might be best.

The jobs should not be interactive. In short, what the system does under the hood is to batch jobs, write a single shell script that runs the batch while on the node, and write status info to certain files. The primary job (the main Trinity job) checks the status of all these non-interactive jobs.

Ideally, these individual jobs use one core and can execute with under 5G RAM, and you can get greater runtime efficiency by just increasing your parallel throughput.

best,

~brian

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--

--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

Dan Browne

unread,

Mar 17, 2015, 1:57:17 PM3/17/15

to Brian Haas, trinityrn...@googlegroups.com

Hey Brian,

Thanks for the quick feedback. I'm a little confused on how this is supposed to work. Let me give you a little background about how our system is configured and how we submit jobs - maybe it's different than what you've experienced?

We start by writing a job script, contained in a text file, for example, the following job script (we'll call it Example.job) is one I used recently to submit a Trinity assembly job:

#BSUB -J 2015.03.08_Race.A_Trinity_Assembly
#BSUB -L /bin/bash
#BSUB -W 200:00
#BSUB -n 40
#BSUB -R "span[ptile=40]"
#BSUB -q xlarge
#BSUB -R "rusage[mem=1000000]"
#BSUB -M 800000
#BSUB -o %J_Output
#BSUB -P Race.A_Transcriptome_Assembly
#
##
module load Westmere
#
module load Trinity/2014-07-17-ictce-6.3.5-jellyfish-2.1.3-nooptarch
#
module load Bowtie/1.0.0-ictce-7.1.2
#
module load SAMtools/1.0-ictce-6.3.5
#
cd $SCRATCH/2015.03.08_Race.A_Transcriptome_Assembly
#
Trinity --seqType fq --JM 500G --single /scratch/user/dbrowne/B.braunii_NGS_Data/B.braunii_RNAseq_Data/A_Race_RNAseq_Data/Race.A_Time_Course/B.braunii_race.A_Yamanaka_Day.03.fastq,/scratch/user/dbrowne/B.braunii_NGS_Data/B.braunii_RNAseq_Data/A_Race_RNAseq_Data/Race.A_Time_Course/B.braunii_race.A_Yamanaka_Day.05.fastq,/scratch/user/dbrowne/B.braunii_NGS_Data/B.braunii_RNAseq_Data/A_Race_RNAseq_Data/Race.A_Time_Course/B.braunii_race.A_Yamanaka_Day.08.fastq,/scratch/user/dbrowne/B.braunii_NGS_Data/B.braunii_RNAseq_Data/A_Race_RNAseq_Data/Race.A_Time_Course/B.braunii_race.A_Yamanaka_Day.13.fastq,/scratch/user/dbrowne/B.braunii_NGS_Data/B.braunii_RNAseq_Data/A_Race_RNAseq_Data/Race.A_Time_Course/B.braunii_race.A_Yamanaka_Day.22.fastq --run_as_paired --CPU 40 --min_contig_length 200 --normalize_reads --grid_conf ~dbrowne/B.braunii_race.A_Transcriptome_Assembly/2015.03.08_Race.A_Trinity_Test_Assembly/Trinity_Grid_Configuration.txt --trimmomatic --quality_trimming_params "SLIDINGWINDOW:10:25 MINLEN:100" --output 2015.03.08.Race.A_trinity_assembly
#

I would then initiate the job by doing the following:

$ bsub < Example.job

Is this sort of setup compatible with the way Trinity handles the LSF job submission system? Maybe I would be better off not trying to take advantage of the grid utilization capability of Trinity?

Thanks,

Dan

Brian Haas

unread,

Mar 17, 2015, 2:04:42 PM3/17/15

to Dan Browne, trinityrn...@googlegroups.com

Hi Dan,

I think it should be fine, as long as your initial Trinity job can then run 'bsub' to submit additional jobs.

Try it out on the small sample data set and see if it works. If it doesn't, there's probably a little bit of tuning to do to make it work, and we can hopefully help you through it.

best,

~brian

Dan Browne

unread,

Mar 17, 2015, 2:25:00 PM3/17/15

to Brian Haas, trinityrn...@googlegroups.com

Okay - please forgive my ignorance, but I need a little guidance on how to run this test.

In the sample_data/ directory, I see the following:

[XXX dbrowne]$ cd XXX/Trinity/trinityrnaseq-2.0.6-westmere/sample_data/
Makefile                           test_edgeR_diff_expr/              test_GOSeq_trinotate_pipe/         test_Trinity_Assembly/
__regression_tests/                test_full_edgeR_pipeline/          test_Inchworm/                     test_Trinity_Coding_Extraction/
test_align_and_estimate_abundance/ test_GenomeGuidedTrinity/          test_InSilicoReadNormalization/

Where should I go from here? I see in the test_Trinity_Assembly directory that there is some sample data, e.g. reads2.left.fq.gz and reads2.right.fq.gz.

Shall I construct a job script to run Trinity with the --grid_conf flag and using the above mentioned data?

Finally, what exactly is the default option for the grid_configuration_file.txt? Is there an example somewhere in the sample_data directory?

Thanks,

Dan

Dan Browne

unread,

Mar 17, 2015, 2:28:24 PM3/17/15

to Brian Haas, trinityrn...@googlegroups.com

Oh sorry, I guess I should've searched a little more before posting that! I see in the /misc_run_tests that there is a file called __runMe_using_Grid_LSF.sh

I'll run that real quick and see what happens!

Dan

Brian Haas

unread,

Mar 17, 2015, 2:36:44 PM3/17/15

to Dan Browne, trinityrn...@googlegroups.com

Right - just change the --grid_conf to point to your grid.conf file with your queue name.

best,

~b

Dan Browne

unread,

Mar 17, 2015, 3:28:44 PM3/17/15

to Brian Haas, trinityrn...@googlegroups.com

Alright, I got it working on the sample data! The default command pretty much works fine. I also tried the following command:

cmd=bsub -n 20 -R "select[nxt] span[ptile=20] rusage[mem=60000]" -M 5000

And that worked as well!

I've attached a file of the stdout I got with the default command (with modified queue specification), just in case you're interested.

Thanks again for the quick help. Now I can make this work a lot faster!

Dan

2015.03.17_Trinity_Grid_LSF_Testing_1.txt

Brian Haas

unread,

Mar 17, 2015, 3:36:12 PM3/17/15

to Dan Browne, trinityrn...@googlegroups.com

Fantastic!

Pretty easy, right? :)

-Brian

(by iPhone)

<2015.03.17_Trinity_Grid_LSF_Testing_1.txt>

Dan Browne

unread,

Mar 17, 2015, 3:49:59 PM3/17/15

to Brian Haas, trinityrn...@googlegroups.com

Yes, not too hard at all - once I got past all the errors from some mistakes in files and permission errors from our system. A couple things I noted:

In the file __runMe_using_Grid_LSF.sh, the command is:

./../Trinity --seqType fq --JM 2G --left reads.left.fq.gz --right reads.right.fq.gz --SS_lib_type RF --CPU 4 --grid_conf_file ../../htc_conf/BroadInst_LSF.test.conf

The flag --grid_conf_file should be --grid_conf

The flag --JM should be --max_memory

../../htc_conf doesn't exist, it should be ../../hpc_conf

To get it working, I fixed the above errors and copied reads.left.fq.gz and reads.right.fq.gz into my own scratch directory and ran the command from there and it worked. Permission errors prevented me from running the command from the intended sample_data/test_Trinity_Assembly directory.

Thanks again! I'm sure it won't be too long before I have more questions haha.

Dan

Brian Haas

unread,

Mar 17, 2015, 4:01:32 PM3/17/15

to Dan Browne, trinityrn...@googlegroups.com

Excellent.

yeah - that __runMe script hasn't been updated to reflect the new usage for Trinity 2.0.

glad it's working!

~brian

Sara Haines

unread,

Apr 25, 2015, 2:43:00 PM4/25/15

to trinityrn...@googlegroups.com, dbrow...@gmail.com

Brian and Dan: Thanks for having this discussion. This post has been most helpful for speeding up Phase 2 (Butterfly) on our cluster. We are struggling to get our metatranscriptome assembly to finish before getting kicked off the "week" queue. So want to utilize the massively parallel option. I'm trying to figure out how to tune it to work in our system.

Following along with this post, I have been able to get our cluster to accept the --grid_conf configuration file for our data.

grid=LSF

cmd=bsub -q hour -M 10 -n 8 -R "span[ptile=8]"

mount_test=T

max_nodes=20

cmds_per_node=1000

This is the overarching job,

bsub -q week -n 8 -R "span[hosts=1]" -M 128 Trinity --seqType fq --max_memory 128G \

--normalize_reads --left ${R1} --right ${R2} \

--grid_conf ./grid.conf --CPU 8

I thought that this would run at least 8 commands from the generated shell (with 1000 commands, e.g. J13795.S0.sh) at a given time on one node. However, using top on the given node, it shows one command running and never see other CPU/core utilization. It is a little confusing because it is one bsub job firing off other bsubs.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

27685 haines 25 0 89932 8148 1952 S 1.7 0.0 0:00.05 perl

27786 haines 25 0 402m 60m 1356 R 0.3 0.1 0:00.01 GraphFromFasta

10308 haines 15 0 19780 2316 1340 S 0.0 0.0 0:00.00 res

10321 haines 17 0 65964 1164 964 S 0.0 0.0 0:00.00 1429982431.3508

10323 haines 16 0 65968 1236 988 S 0.0 0.0 0:00.24 J13795.S0.sh

10465 haines 15 0 91044 1812 1036 S 0.0 0.0 0:00.04 sshd

10469 haines 18 0 68320 1800 1300 S 0.0 0.0 0:00.01 bash

10610 haines 15 0 13032 1388 848 R 0.0 0.0 0:00.23 top

27785 haines 25 0 65964 1152 948 S 0.0 0.0 0:00.00 sh

Whereas when we run without --grid_conf on a given node, it utilizes more of the CPUs requested

bsub -q week -n 8 -R "span[hosts=1]" -M 128 Trinity --seqType fq --max_memory 128G \

--normalize_reads --left ${R1} --right ${R2} --CPU 8

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

424 haines 18 0 402m 386m 1364 R 23.5 0.0 0:00.71 GraphFromFasta

426 haines 18 0 402m 386m 1364 R 23.2 0.0 0:00.70 GraphFromFasta

597 haines 18 0 4653m 38m 10m S 21.9 0.0 0:00.66 java

640 haines 18 0 466m 450m 1364 R 13.9 0.0 0:00.42 GraphFromFasta

14493 haines 15 0 854m 692m 1260 S 9.6 0.1 446:56.85 ParaFly

435 haines 18 0 90036 8152 1952 S 4.3 0.0 0:00.13 perl

445 haines 18 0 90036 8152 1952 S 4.3 0.0 0:00.13 perl

446 haines 18 0 90036 8156 1956 S 4.3 0.0 0:00.13 perl

32255 haines 17 0 90036 8188 1956 S 4.3 0.0 0:00.13 perl

32265 haines 18 0 90036 8160 1956 S 4.3 0.0 0:00.13 perl

32314 haines 18 0 90036 8148 1952 S 4.3 0.0 0:00.13 perl

322 haines 18 0 90036 8156 1952 S 3.6 0.0 0:00.11 perl

769 haines 18 0 90036 8148 1944 S 3.3 0.0 0:00.10 perl

950 haines 17 0 402m 88m 1356 R 2.0 0.0 0:00.06 GraphFromFasta

959 haines 17 0 402m 54m 1356 R 1.0 0.0 0:00.03 GraphFromFasta

418 haines 18 0 65960 1156 952 S 0.3 0.0 0:00.01 sh

12527 haines 16 0 13152 1516 848 R 0.3 0.0 0:00.19 top

421 haines 18 0 65960 1160 952 S 0.0 0.0 0:00.00 sh

590 haines 17 0 18072 1340 1152 S 0.0 0.0 0:00.00 ParaFly

639 haines 18 0 65960 1160 952 S 0.0 0.0 0:00.00 sh

947 haines 19 0 65960 1160 952 S 0.0 0.0 0:00.00 sh

958 haines 19 0 65960 1156 952 S 0.0 0.0 0:00.00 sh

976 haines 18 0 357m 7600 4964 S 0.0 0.0 0:00.00 java

980 haines 18 0 65960 1160 956 S 0.0 0.0 0:00.00 bash

982 haines 18 0 18192 1588 1364 R 0.0 0.0 0:00.00 inchworm

6863 haines 15 0 90256 1808 1032 S 0.0 0.0 0:00.00 sshd

6867 haines 17 0 68316 1812 1304 S 0.0 0.0 0:00.02 bash

14860 haines 15 0 19788 2320 1340 S 0.0 0.0 0:00.20 res

14861 haines 17 0 65960 1184 976 S 0.0 0.0 0:00.00 1429566354.8596

14863 haines 18 0 158m 11m 2040 S 0.0 0.0 3:31.51 perl

Is there something to specify in grid.conf to make each shell command run on parallel CPUs for a given node? Or should I just tune max_nodes and cmds_per_node for multiple nodes but one CPU?

Also, how does grid_conf help chrysalis? (I started this on our data after getting thru chrysalis.)

Dan wrote:

cmd=bsub -n 20 -R "select[nxt] span[ptile=20] rusage[mem=60000]" -M 5000

And that worked as well!

Dan: How did you get yours to run parallel CPUs on a given node?

BTW, we are using Trinity 2.0.6.

Thanks so much again for all the previous details. We are so much farther along because them.

s

Brian Haas

unread,

Apr 25, 2015, 3:05:58 PM4/25/15

to Sara Haines, trinityrn...@googlegroups.com, Dan Browne

Hi Sara,

responses below

On Sat, Apr 25, 2015 at 2:43 PM, Sara Haines <sara.m...@gmail.com> wrote:

Brian and Dan: Thanks for having this discussion. This post has been most helpful for speeding up Phase 2 (Butterfly) on our cluster. We are struggling to get our metatranscriptome assembly to finish before getting kicked off the "week" queue. So want to utilize the massively parallel option. I'm trying to figure out how to tune it to work in our system.

Following along with this post, I have been able to get our cluster to accept the --grid_conf configuration file for our data.

grid=LSF
cmd=bsub -q hour -M 10 -n 8 -R "span[ptile=8]"
mount_test=T
max_nodes=20
cmds_per_node=1000

This is the overarching job,

bsub -q week -n 8 -R "span[hosts=1]" -M 128 Trinity --seqType fq --max_memory 128G \
--normalize_reads --left ${R1} --right ${R2} \
--grid_conf ./grid.conf --CPU 8

I thought that this would run at least 8 commands from the generated shell (with 1000 commands, e.g. J13795.S0.sh) at a given time on one node. However, using top on the given node, it shows one command running and never see other CPU/core utilization. It is a little confusing because it is one bsub job firing off other bsubs.

The --CPU 8 will leverage 8 cores for the initial phase (read clustering) of Trinity. If you don't specify the --grid_conf, then it'll spawn 8 parallel jobs for doing the assemblies in the second phase. If you use --grid_conf, then all the parent process does is to submit assemblies to LSF in phase 2, according to the params in your conf file.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27685 haines 25 0 89932 8148 1952 S 1.7 0.0 0:00.05 perl
27786 haines 25 0 402m 60m 1356 R 0.3 0.1 0:00.01 GraphFromFasta
10308 haines 15 0 19780 2316 1340 S 0.0 0.0 0:00.00 res
10321 haines 17 0 65964 1164 964 S 0.0 0.0 0:00.00 1429982431.3508
10323 haines 16 0 65968 1236 988 S 0.0 0.0 0:00.24 J13795.S0.sh
10465 haines 15 0 91044 1812 1036 S 0.0 0.0 0:00.04 sshd
10469 haines 18 0 68320 1800 1300 S 0.0 0.0 0:00.01 bash
10610 haines 15 0 13032 1388 848 R 0.0 0.0 0:00.23 top
27785 haines 25 0 65964 1152 948 S 0.0 0.0 0:00.00 sh

Whereas when we run without --grid_conf on a given node, it utilizes more of the CPUs requested

It depends on which phase it's in. If it's in phase 1, then both jobs will have identical performance and resource utilization. If it's in phase 2, then they'll behave very differently (as described above).

I would keep it at one core per node and just crank up the number of parallel instances. I tend to do this at 10 to 100 jobs per node, and use up to 500 nodes.

Also, how does grid_conf help chrysalis? (I started this on our data after getting thru chrysalis.)

It depends.... we now use chrysalis in phase 1 as well as in phase 2. So, in phase 1 it's not going to help. In phase 2, it should help tremendously. The original architecture of Trinity was such that the phase 1 and 2 were intertwined, and in Trinity 2.0 they're decoupled.

Dan wrote:
cmd=bsub -n 20 -R "select[nxt] span[ptile=20] rusage[mem=60000]" -M 5000

And that worked as well!

Dan: How did you get yours to run parallel CPUs on a given node?

BTW, we are using Trinity 2.0.6.

Thanks so much again for all the previous details. We are so much farther along because them.

s

--

You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Dan Browne

unread,

Jun 10, 2015, 11:03:33 AM6/10/15

to trinityrn...@googlegroups.com, dbrow...@gmail.com

Hey Sara,

Sorry for the really delayed response! Maybe you've already figured out some of these questions, but I'll chime in anyways just in case.

The --grid_conf option isn't going to make a huge difference to the processes of the main job that you're running, it's going to farm out the processing for stage 2 in lots of little chunks, submitting independent jobs to the cluster through LSF. You'll be able to see this when it's happening with the "bjobs" command.

You'll want to be sure to coordinate the cmd template in your grid.conf file with the --grid_node_CPU and --grid_node_max_memory options in the overarching command. For example, if your grid.conf file looks like:

grid=LSF
cmd=bsub -q hour -M 10 -n 5 -R "span[ptile=5]"
mount_test=T
max_nodes=500
cmds_per_node=20

Then you'll want your overarching Trinity command to look like:

Trinity --seqType fq --max_memory 128G --normalize_reads --left ${R1} --right ${R2} --CPU 8 --grid_conf ./grid.conf --grid_node_CPU 5 --grid_node_max_memory 10G

As Brian mentioned, it's better than have more max_nodes and fewer cmds_per_node. This will really take advantage of the parallel nature of the cluster and dramatically reduce your assembly time. I can start and finish assemblies in 5-8 hours utilizing the --grid_conf option on our cluster. I'm not sure how big your cluster is, but you should try to use as many nodes as you can. For example, our cluster has about 700 nodes with 20 cores each. I usually have my grid.conf file set very similarly to what I've described above. The command template submits jobs requesting 5 cores at a time, running 20 commands per job with the 5 cores running in parallel. Many 5-core jobs are submitted in parallel - I think the most cores I've been using in parallel at one time was about 4,000 (out of the roughly 14,000 available on our cluster).

Hope this helps!

Dan

Brian Haas

unread,

Jun 14, 2015, 10:50:20 AM6/14/15

to Dan Browne, trinityrn...@googlegroups.com

Dan - it seems you've got this working pretty well. In case you weren't aware, we have:

http://hpcgridrunner.github.io/

so you can use the grid dispatch system for all sorts of general computes.

best,

~brian

--

You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--

--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

Abhijit Sanyal

unread,

Oct 11, 2018, 12:08:53 AM10/11/18

to trinityrnaseq-users

Hi Brian,

I am reviving this old post since I am facing a challenge running Trinity on the LSF. The post describes the parameters for an older version of Trinity and I have the latest version. I have used the below command to execute Trinity on the LSF.

${TRINITY_HOME}/Trinity --seqType fq --max_memory 2G --left reads.left.fq.gz --right reads.right.fq.gz --SS_lib_type RF --CPU 8 --grid_exec "$GRID_SCRIPT" --output /anno/sanyalab/TESTAREA/trinity_test --full_cleanup > test_trinity.out 2>test_trinity.err

My "$GRID_SCRIPT" is as follows

GRID_SCRIPT="$TRINITY_HOME/trinity-plugins/HpcGridRunner-1.0.2/hpc_cmds_GridRunner.pl --grid_conf $TRINITY_HOME/trinity-plugins/HpcGridRunner-1.0.2/hpc_conf/Phi_LSF.100.conf -c"

The config file I am giving to "hpc_cmds_GridRunner.pl" is as follows

# grid type:
grid=LSF

# template for a grid submission
cmd=bsub -q prod -P ti-assembly-txpt -M 20 -R "rusage[mem=20,scr=100]"
# note -e error.file -o out.file are set internally, so dont set them in the above cmd.

# uses the LSF feature to pre-exec and check that the file system is mounted before executing.
# this helps when you have some misbehaving grid nodes that lost certain file mounts.
mount_test=T

##########################################################################################
# settings below configure the Trinity job submission system, not tied to the grid itself.
##########################################################################################

# number of grid submissions to be maintained at steady state by the Trinity submission system
max_nodes=200

# number of commands that are batched into a single grid submission job.
cmds_per_node=50

Attached is the output file "test_trinity.out" which has the errors. Another file that I am attaching is the "recursive_trinity.cmds.hpc-cache_success.__failures" file. In this file the options are all changed from what I have given in the command line.

Please let me know what changes I need to make to the command to get this working.

Thanks

Abhijit

On Sunday, June 14, 2015 at 8:20:20 PM UTC+5:30, Brian Haas wrote:

Dan - it seems you've got this working pretty well. In case you weren't aware, we have:

http://hpcgridrunner.github.io/

so you can use the grid dispatch system for all sorts of general computes.

best,

~brian

On Wed, Jun 10, 2015 at 11:03 AM, Dan Browne <dbrow...@gmail.com> wrote:

Hey Sara,

Sorry for the really delayed response! Maybe you've already figured out some of these questions, but I'll chime in anyways just in case.

The --grid_conf option isn't going to make a huge difference to the processes of the main job that you're running, it's going to farm out the processing for stage 2 in lots of little chunks, submitting independent jobs to the cluster through LSF. You'll be able to see this when it's happening with the "bjobs" command.

You'll want to be sure to coordinate the cmd template in your grid.conf file with the --grid_node_CPU and --grid_node_max_memory options in the overarching command. For example, if your grid.conf file looks like:

grid=LSF
cmd=bsub -q hour -M 10 -n 5 -R "span[ptile=5]"
mount_test=T
max_nodes=500
cmds_per_node=20

Then you'll want your overarching Trinity command to look like:

Trinity --seqType fq --max_memory 128G --normalize_reads --left ${R1} --right ${R2} --CPU 8 --grid_conf ./grid.conf --grid_node_CPU 5 --grid_node_max_memory 10G

As Brian mentioned, it's better than have more max_nodes and fewer cmds_per_node. This will really take advantage of the parallel nature of the cluster and dramatically reduce your assembly time. I can start and finish assemblies in 5-8 hours utilizing the --grid_conf option on our cluster. I'm not sure how big your cluster is, but you should try to use as many nodes as you can. For example, our cluster has about 700 nodes with 20 cores each. I usually have my grid.conf file set very similarly to what I've described above. The command template submits jobs requesting 5 cores at a time, running 20 commands per job with the 5 cores running in parallel. Many 5-core jobs are submitted in parallel - I think the most cores I've been using in parallel at one time was about 4,000 (out of the roughly 14,000 available on our cluster).

Hope this helps!

Dan

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

test_trinity.out

recursive_trinity.cmds.hpc-cache_success.__failures

Brian Haas

unread,

Oct 11, 2018, 9:25:27 AM10/11/18

to Abhijit Sanyal, trinityrnaseq-users

Hi,

I'm not sure what the LSF errors are. The best thing to do is to troubleshoot gridrunner directly before running trinity with it. Here, you can create any commands and see if you can run them through gridrunner. We can troubleshoot gridrunner from its own forum.

best,

~b

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

--

You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.

Visit this group at https://groups.google.com/group/trinityrnaseq-users.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward