how to run abyss-pe on cluster

595 views
Skip to first unread message

archana bhardwaj

unread,
Aug 6, 2013, 8:26:32 AM8/6/13
to abyss...@googlegroups.com

Hello everyone

please tell me how to run the abyss-pe on clusters having multiple processors , i  posted on my discussion sites.. still no response.... if anyone having idea.. please help 
Message has been deleted

Connor

unread,
Aug 6, 2013, 1:14:57 PM8/6/13
to abyss...@googlegroups.com
Shaun has a solution which uses the abyss-pe makefile.  I always used to invoke ABYSS-P with MPI, but this seems simpler for sure.


Best of luck,

Connor Cameron

archana bhardwaj

unread,
Aug 7, 2013, 12:59:50 AM8/7/13
to abyss...@googlegroups.com
Respected Connor 

I want to correct myself. actually  I want to  run the abyss-pe  on multiple clusters  for paired end illumina dataset. Data size is so big that i it is not possible to assemble it via single node. 

Therefore it is highly effective if i will be able to use parallel abyss-pe mode with 14 nodes having more than 100 processors in my institute. 

I used following command ..it is working on single machine , not doing MPI run on 7 nodes so that use the 50 processors , it seems that it is not reading hostfile ..why is it so?? .... 

export="/usr/mpi/gcc/mvapich2-1.6/bin/mpirun -machinefile /storage/home/cdac/data/hostfile_8 np 50" /state/partition1/apps/ABYSS/bin/abyss-pe k=31 lib='lib1 lib2' lib1='LIBSG311_R1_ada_ambi_quality_none.fastq LIBSG311_R2_ada_ambi_quality_none.fastq' lib2='LIBSG312_R1_ada_ambi_quality_none.fastq LIBSG312_R2_ada_ambi_quality_none.fastq' name=IC1_IC2

please sir check it once ... 

waiting for reply 



Connor

unread,
Aug 7, 2013, 9:20:52 PM8/7/13
to abyss...@googlegroups.com
Hey Sorry I'm late getting to this,

I got that you wanted to use distributed computing on your cluster.  I usually use something like 20 nodes at 8 cores a pop.

The solution that Shaun had used should work the way you want (I have actually seen it in a number of other topics, now that I am looking at it), but I have not used it this way (I should probably figure out how).

I think your problem is that you are invoking the abyss-pe makefile with mpirun.  You don't want to do this.  You should be able to give abyss-pe the np parameter (not mpi like you have here).  You will also want to give abyss-pe the hostfile.  This will work assuming you compiled abyss with MPI enabled.

Try something like this:

 abyss-pe np=50 mpirun='mpirun -hostfile hostfile' k=31 lib='lib1 lib2' lib1='LIBSG311_R1_ada_ambi_quality_none.fastq LIBSG311_R2_ada_ambi_quality_none.fastq' lib2='LIBSG312_R1_ada_ambi_quality_none.fastq LIBSG312_R2_ada_ambi_quality_none.fastq' name=IC1_IC2

Here is how I do things (this has been made much easier with the abyss-pe makefile solution, but for the sake of it I'll post this, also it allows you to customize the pipeline a bit more, use --recon to see what abyss-pe does), I posted it initially, but I deleted it because I found that other topic link.

I usually do not use the abyss-pe makefile (I probably should be), but instead invoke ABYSS-P directly with MPI.

My launcher does:

qsub -N ${name1} ${scripath}/run-abyss-se.bash ${kmer} ${datadir}

run-abyss-se.bash, use -pe orte here, put it in the Shebang in your 
shell:

#! /bin/bash
#$ -S /bin/bash
#$ -m e
#$ -pe orte 160
#$ -cwd

<lots of exporting and environment checking stuff>

mpirun \
 --prefix ${openmpihome} \
 -v \
 -np $NSLOTS \
 --hostfile hosts.dat \
 ${abysshome}/bin/ABYSS-P \
 --out=abyssrun-1.fa \
 --kmer=${kmer} \
 --graph=se-graph.dot \
 --snp=popped-bubbles.fa \
 --coverage-hist=k-mer_coverage.hist \
 --coverage=5 \
 --verbose \
 ${datadir}/*/*/*.fastq \

I would try to get the abyss-pe makefile working before trying to directly invoke ABYSS-P with MPI unless you really want to.

There is one advantage to using the strategy above as opposed to the makefile strategy and that is the ability to use only the parts of the pipeline that you want.  For instance, if you just want the unitigs, you can run ABYSS-P then run AdjList, abyss-filtergraph, PopBubbles, and MergeContigs yourself.

If you are ever curious about what programs ABySS is actually running when you invoke the abyss-pe makefile, just add a --recon to the end of your command.  This will tell make to output the command that it would run, but not actually execute anything.

Happy Assembling,

Connor Cameron

archana bhardwaj

unread,
Aug 12, 2013, 9:17:13 AM8/12/13
to abyss...@googlegroups.com
Hello Connor sir 

Thanks for your help... Now i am able to run the abyss-pe on multiple nodes at a time.

please make one thing clear to me if should i go for ABYSS-P or abyss-pe for parallel assembly of paired end data ??

can i use ABYSS-P which is real MPI executable in abyss for paired end sequence data or it is suitable for single end data only ?? or abyss-pe can also be useful for above task ?? 






Ben Vandervalk

unread,
Aug 12, 2013, 12:35:10 PM8/12/13
to archana bhardwaj, abyss...@googlegroups.com
Hi Archana,

It is recommended to use abyss-pe instead of ABYSS-P.  The abyss-pe Makefile calls ABYSS-P, but also does many post-processing steps on ABYSS-P output (e.g. scaffolding with paired end reads) to improve the quality of the assembly.

As you say, ABYSS-P treats its input files as single end reads.  It is the later stages of abyss-pe that use the paired end information.

- Ben


--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Connor

unread,
Aug 12, 2013, 12:44:08 PM8/12/13
to abyss...@googlegroups.com
Awesome, good to hear you got it running.

If you want to use the abyss-pe workflow (paired end workflow consisting of unitigs, contigs, and scaffolds phases) it is probably best to use the abyss-pe makefile.

This does not mean that you must use abyss-pe to run the full ABySS pipeline.  

If you run abyss-pe with the required inputs and add the make flag --recon on the end, you will see, in stdout, the workflow that you would be running had you invoked make without --recon.

This is useful if you want to select aspects of the pipeline that you think are appropriate for you, however, be aware of how the various programs interact and how they operate.

The first phase (ABYSS-P) is single end only, all of the input reads are utilized in the creation of the de Bruijn graph.  If you would like to run the rest of the unitigs phase, you will need to view the recon and see that Adjlist, abyss-filtergraph, Popbubbles, and MergeContigs are utilized before producing the unitigs symlink.

I would suggest this if you are comfortable with the ABySS pipeline and you are looking for a few more useful ways to utilize the tools provided to you in the pipeline,

Connor
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages