deconseq and GNU parallel?

13 views
Skip to first unread message

yax...@gmail.com

unread,
Jan 30, 2017, 5:32:49 PM1/30/17
to Edwards Lab Tools
Hello, I wonder if deconseq can be modified to use with GNU parallel. I am running deconseq on my large nextgen dataset and see from resource use that out of 16 processors only 2-3 are occupied to 1-3% by bsw2_aln. ON this machine it is going to take about a week of non-stop computation...

Vladimir

gisleves...@gmail.com

unread,
May 30, 2017, 4:09:32 AM5/30/17
to Edwards Lab Tools
This very thorough tutorial is for combining BLAST with GNU Parallel, but the principal is the same.
https://www.biostars.org/p/63816/

aaronmich...@gmail.com

unread,
Oct 12, 2017, 4:16:52 PM10/12/17
to Edwards Lab Tools
I was wanting to use gnu parallel as well but am having trouble... the program seems to run longer, give multiple outputs. I'm sure it's a problem with my parallel command and not deconseq, but I'm not sure what. here is my command templated after the BLAST example linked:

time cat case.part-02.fastq | parallel -j15 --block 24M --recstart '>' --pipe perl deconseq.pl -dbs btref -f case.part-02.fastq > ~/results

I picked the number of cores and the block size based on the total input file size and the number of reads deconseq appears to be processing at a pass ~44K. The total number of reads in the data should be ~660K

I started the command about 90 minutes ago on our centos cluster and it is still running. But it only took 6 minutes for deconseq to screen 21K reads with 1 core... so I'm not getting any speedup (and am probably slowing down). Sorry if this is too off topic. How do you parallelize?

Aaron

Message has been deleted

gisleves...@gmail.com

unread,
Oct 12, 2017, 5:17:12 PM10/12/17
to Edwards Lab Tools
--recstart '>' should be '^@' since @ is the first character of the beginning line of a fastq sequence. Also because @ is used in quality scoring '@' will not work.
Gisle

aaronmich...@gmail.com

unread,
Oct 13, 2017, 9:29:02 AM10/13/17
to Edwards Lab Tools
When I saw your answer I thought that for sure was it! but... I changed to

time cat case.part-02.fastq | parallel -j15 --block 24M --recstart '@' --pipe perl deconseq.pl -dbs btref -f case.part-02.fastq > ~/results

and got the same outcome. is stating the file name in both the parallel and deconseq command sections causing confusion?

aaronmich...@gmail.com

unread,
Oct 13, 2017, 9:38:51 AM10/13/17
to Edwards Lab Tools
Sorry, that post was premature. I missed the carrot '^'. My new command is

time cat case.part-02.fastq | parallel -j15 --block 24M --recstart '^@' --pipe perl deconseq.pl -dbs btref -f case.part-02.fastq > ~/results

Then I get
parallel: Warning: A record was longer than 25165824. Increasing to --blocksize 32715573.
parallel: Warning: A record was longer than 32715573. Increasing to --blocksize 42530246.
parallel: Warning: A record was longer than 42530246. Increasing to --blocksize 55289321.
parallel: Warning: A record was longer than 55289321. Increasing to --blocksize 71876119.
parallel: Warning: A record was longer than 71876119. Increasing to --blocksize 93438956.
parallel: Warning: A record was longer than 93438956. Increasing to --blocksize 121470644.
parallel: Warning: A record was longer than 121470644. Increasing to --blocksize 157911839.
parallel: Warning: A record was longer than 157911839. Increasing to --blocksize 205285392.
parallel: Warning: A record was longer than 205285392. Increasing to --blocksize 266871011.
parallel: Warning: A record was longer than 266871011. Increasing to --blocksize 346932316.

Then parallel ended up only using a single core. That doesn't make any sense... my reads shouldn't be that big, max 250 nt

Reply all
Reply to author
Forward
0 new messages