runnign abyss-pe on PBS cluster

244 views
Skip to first unread message

Gautam Singh

unread,
Mar 6, 2014, 5:46:16 AM3/6/14
to abyss...@googlegroups.com
Hi,

I want to run my assembly on PBS cluster having 17 nodes having 12 cpus each. But whenever I m running it, it start running on 1 node only (there is no job on other nodes). Please find the below script that I have used to run and correct me if there is anything wrong with my script??

Command used:

#!/bin/bash
#PBS -l select=17:ncpus=12:mpiprocs=12
#PBS -M er.gaut...@gmail.com
#PBS -m abc
cd $PBS_O_WORKDIR
export LD_LIBRARY_PATH=/opt/software/opempi/1.7.4/lib:$LD_LIBRARY_PATH
export PATH=/opt/software/openmpi/1.7.4/bin:$PATH
/scratch/ABYSS_1.3.7/bin/abyss-pe v=-v k=57 name=illumina_assembly_cluster_k57 lib='s1 s2 s3 s5' s1='s1_1.fastq s1_2.fastq' s2='s2_1.fastq s2_2.fastq' s3='s3_1.fastq s3_2.fastq' s5='s5_1.fastq s5_2.fastq'


Gautam Singh
New Delhi
India

Haruna Cofer

unread,
Mar 6, 2014, 8:06:51 AM3/6/14
to Gautam Singh, abyss...@googlegroups.com
Make sure you specify how many cores to use with the -np option.  For example to use 204 cores: abyss-pe -np 204
 
Also with OpenMPI, I found that I needed to set the OPAL_PREFIX environment variable.  For example: export OPAL_PREFIX=/opt/software/openmpi/1.7.4


From: abyss...@googlegroups.com [mailto:abyss...@googlegroups.com] On Behalf Of Gautam Singh
Sent: Thursday, March 06, 2014 5:46 AM
To: abyss...@googlegroups.com
Subject: runnign abyss-pe on PBS cluster

--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4335 / Virus Database: 3705/7142 - Release Date: 03/02/14

Ben Vandervalk

unread,
Mar 6, 2014, 12:37:03 PM3/6/14
to Haruna Cofer, Gautam Singh, abyss...@googlegroups.com
Hi Gautam,

I am not familiar with PBS but that script looks reasonable to me.

I can suggest a few things to check/try:

1) When compiling ABySS, if no MPI library can be found, only the single-machine version of the ABySS binary will be built.  (The single-machine binary is named "ABYSS".)  Check to make sure that there is an "ABYSS-P" binary installed in your /scratch/ABYSS_1.3.7/bin/ directory.  If not, try compiling again and using the "--with-mpi" option during the configure step to explicitly tell ABySS where to find the MPI library on your system.

2) Double check that the format of your "#PBS -l ..." line is correct.

3) As Haruna suggests, you can try explicitly setting the number of MPI processes.  I noticed a small mistake in her syntax, though.  It should be:

$ abyss-pe np=204 <other_args>

Good luck!

- Ben

Ben Vandervalk

unread,
Mar 6, 2014, 12:42:40 PM3/6/14
to Haruna Cofer, Gautam Singh, abyss...@googlegroups.com
Hi Gautam,

Also: If those things don't help, please post the log of your abyss-pe job and we will probably be able to see what is going wrong.

- Ben

George Willian Condomitti

unread,
Mar 19, 2014, 10:53:52 AM3/19/14
to abyss...@googlegroups.com, Haruna Cofer, Gautam Singh
Hi guys,

A question on this: when running abyss-pe with PBS I don't call mpiexec, because abyss-pe does that internally.

What about when I'm calling each step separately, like abyss-map? Do I still have to only call abyss-map or mpiexec .... abyss-map ?


Thanks,
Condomitti.

Ben Vandervalk

unread,
Mar 19, 2014, 12:21:16 PM3/19/14
to George Willian Condomitti, abyss...@googlegroups.com, Haruna Cofer, Gautam Singh
Hi George,

Only the first step of the abyss-pe script (the ABYSS-P) uses mpiexec (actually "mpirun").

The remaining steps run on a single machine, so you don't need mpiexec for those.

You can get a good idea what is going on by looking at the abyss-pe file in a text editor (it is a Makefile).

Hope that helps,

- Ben


For more options, visit https://groups.google.com/d/optout.

George Willian Condomitti

unread,
Mar 20, 2014, 1:56:55 PM3/20/14
to abyss...@googlegroups.com, George Willian Condomitti, Haruna Cofer, Gautam Singh
Thanks , Ben!

So that means it won't make any difference if I run that step on a cluster of nodes? I mean, I won't make use of the benefits of extra nodes in terms of reduced walltime, right?


Condomitti.

Ben Vandervalk

unread,
Mar 20, 2014, 1:59:13 PM3/20/14
to George Willian Condomitti, abyss...@googlegroups.com, Haruna Cofer, Gautam Singh
Exactly :-)

- Ben

George Willian Condomitti

unread,
Mar 20, 2014, 4:03:42 PM3/20/14
to abyss...@googlegroups.com, George Willian Condomitti, Haruna Cofer, Gautam Singh
Thanks Ben!
That explains why I wasn't getting to the end of the job even using a big amount of nodes ;-)

Cheers,
Condomitti.

Tarzíciusz Pál Simon

unread,
Nov 3, 2016, 4:21:16 PM11/3/16
to ABySS
Hi, 

I have read the questions and answers again and again, and I think this is the proper place to ask my dilemma. 
I have set up PBS/TORQUE environment, with NFS share but not for the home/user directory, /home/user/workdir. I have put my binaries to that directory. I am using ubuntu 16 and I have installed abyss-pe using apt-get package installer. 

Nodes: master, client1, client2 (clients are running the torque-mom, master runs the scheduler and server)

If I run the qsub comman, (posted later) on only one free node, it is working. For example client1 is free (client2 disabled) and i send the job to client1, it is working. If I enable client2 and disable client1, and send the job to client2 it is working. 
If I enable (free) client1 and client2 torque-mom and post the job to them, only client2 runs the job. I can see it from the network traffic and mem usage. 
 
I would like to use all of my nodes (client1 and 2) parallel. 

I have also tried to enable client1 and 2 and force the mpirun (-H client1) to use only the client1 host, but it says "Host key verification fault" and I get also the following: 
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------
/usr/bin/abyss-pe:470: recipe for target 'abysstest-1.fa' failed

My sh script looks like (b2.sh)
#PBS -N abysstest 
#PBS -o /home/simpa/workdir/abysstest.log 
#PBS -e /home/simpa/workdir/abysstest.err 
#PBS -l nodes=2:ppn=1 
#PBS -l walltime=700:00:00 
#PBS -r y 
cd /home/simpa/workdir/
module load openmpi
abyss-pe mpirun='mpirun -H client1' np=2 k=25 in='/home/simpa/workdir/SRR1955491_1.fastq /home/simpa/workdir/SRR1955491_2.fastq' > runinfo.txt
cat $PBS_NODEFILE > runnodes.txt

I submit the job like: 
qsub -V -N abysstest b2.sh


$PBS_NODEFILE always getting the name of the nodes properly. I have checked it. 

I have also tought that I will mess around mpirun.mpich, using mich instead of openmpi. 
I have modified the sh according to this, changed the mpirun to mpiexec, I have pointed mpirun='/pathtonfsshare/mpirun.mpich -hosts client1', but at the end client2 has started the job...

Thank You for the patience reading theese lines. I am messing around with this for a long time, and really needed some help, or ideas. 

Regards!
Tarziciusz

Shaun Jackman

unread,
Nov 3, 2016, 5:55:50 PM11/3/16
to ABySS
Hi, Tarzíciusz. Have you been able to successfully run other MPI programs? What are the contents of the file $PBS_NODEFILE or runnodes.txt? For troubleshooting, I'd suggest getting a simpler MPI program running, like http://mpitutorial.com/tutorials/mpi-hello-world/
You can also try running `mpirun ABYSS-P` directly rather than using the abyss-pe script (just for troubleshooting).

Cheers,
Shaun
Reply all
Reply to author
Forward
0 new messages