Strange 'stalling' of bpipe execution

15 views
Skip to first unread message

Sue Grimes

unread,
Dec 12, 2017, 3:23:13 AM12/12/17
to bpipe-discuss
Not sure where to start with troubleshooting this.  We have been running bpipe successfully for several years, and just started encountering this particular issue (on several servers, several different users, several different scripts), a few days ago.  We are running on standalone servers with no job queuing system.  If anyone has any thoughts on what could be happening, please let me know!

What happens is:
- Bash script is submitted which calls a bpipe pipeline
- An empty .bpipe/logs/<pid>.log is created, but no <pid>_bpipe.log is created, and no commandlog.txt
- No bpipe stages actually start executing
- On a couple of jobs after 10-24 hours, the actual bpipe steps did start running and appear to be running as normal.

This is occurring in bpipe 0.9.8.7 and 0.9.9.
- There are sufficient resources on the servers to run the submitted jobs
- As far as I know there were no environment changes on the servers between when the bpipes were running normally, and now.

- For one job that is currently 'stalled', ps uax shows:

sgrimes  44311  0.0  0.0  12568  3212 ?        SN   Dec11   0:00 bash bpipe_wgs_somatic_cnvonly.sh

sgrimes  44314  0.0  0.0  12840  3468 ?        SN   Dec11   0:00 /bin/bash /mnt/ix1/Resources/tools/bpipe-0.9.8.7/bin/bpipe run -n 32 -l mem=96 -rf wgs_somatic_1712111515.html -p tn_split=_PE -p nt_bwa=32 /mnt/IPGcrc/00_Code/bpipes/pipe_scripts/wgs_somatic_cnvonly.pipe P05874_15600A_mp_R1.fastq.gz P05874_15600A_mp_R2.fastq.gz P05874_15601A_nt_R1.fastq.gz P05874_15601A_nt_R2.fastq.gz  (..plus more fastq input files)

sgrimes  44336  0.0  0.0 8218352 241616 ?      SNl  Dec11   0:31 java -Xmx2g -noverify -classpath /mnt/ix1/Resources/tools/bpipe-0.9.8.7/bin/../lib/*:/home/sgrimes/bpipes/extra-lib.jar -Dbpipe.pid=44314 -Dbpipe.home=/mnt/ix1/Resources/tools/bpipe-0.9.8.7/bin/.. -Dbpipe.version=0.9.8.7 -Dbpipe.builddate=1427611867141 org.codehaus.groovy.tools.GroovyStarter --classpath /mnt/ix1/Resources/tools/bpipe-0.9.8.7/bin/../lib/*:/home/sgrimes/bpipes/extra-lib.jar --main bpipe.Runner -n 32 -l mem=96 -rf wgs_somatic_1712111515.html -p tn_split=_PE -p nt_bwa=32 /mnt/IPGcrc/00_Code/bpipes/pipe_scripts/wgs_somatic_cnvonly.pipe P05874_15600A_mp_R1.fastq.gz P05874_15600A_mp_R2.fastq.gz P05874_15601A_nt_R1.fastq.gz P05874_15601A_nt_R2.fastq.gz P05875_15602A_mp_R1.fastq.gz (..plus more fastq input files)


Simon Sadedin

unread,
Dec 16, 2017, 2:24:54 AM12/16/17
to bpipe-discuss on behalf of Sue Grimes
That's pretty weird indeed Sue.

If you look at how Bpipe initializes, you can see the two log files are made one directly after the other:


If you are certain that you're seeing one log file but not the other, then it would seem like something is stalling Bpipe in the process of trying to initialize the logs. Something like, the act of creating the directories it wants (does the .bpipe/logs directory exist when it is stalled?).

Perhaps you can look at the code there and see if you can figure out at exactly which step it's stalled. You could even clone it from Github, build it, and then put some print statements in to work out what exactly is happening.

Not sure this helps, but let us know what you find!

Simon



--
You received this message because you are subscribed to the Google Groups "bpipe-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bpipe-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to bpipe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/bpipe-discuss.
For more options, visit https://groups.google.com/d/optout.

Sue Grimes

unread,
Dec 16, 2017, 12:11:36 PM12/16/17
to bpipe-discuss on behalf of Simon

Hi Simon,

 

Thanks for the response and the code/troubleshooting hints.  What we were seeing was the /logs directory created and an empty <pid>.log, but no <pid>_bpipe.log, and no commandlog.txt.  Eventually (if we waited long enough, sometimes a couple of days!) the log file and commandlog.txt would get written and bpipe would run normally.

 

We narrowed it down to some sort of interaction between bpipe and our ix storage cluster (same environment we have always been running on).  Bpipe would run normally if we ran in a directory on the server hard drive, but not if we ran in a directory on our ix storage cluster.  Even hello world bpipe would exhibit the symptoms if we ran it in ix directory.  We didn’t encounter any issues running software directly (eg. shell script calling same tools that the bpipe pipeline was calling); tools which use java (eg picard or gatk) still were running as expected on ix cluster.

 

Our sysadmin talked to ix system support and it does appear there was something strange going on with the ix caching or something.  He rebooted the storage system, and now bpipe is running normally again.  So we seem to be fine at the moment, but not a totally satisfying answer re what was going on!

 

Sue.

--

To unsubscribe from this group and stop receiving emails from it, send an email to bpipe-discus...@googlegroups.com.


To post to this group, send email to bpipe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/bpipe-discuss.
For more options, visit https://groups.google.com/d/optout.

 

--

You received this message because you are subscribed to the Google Groups "bpipe-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to bpipe-discus...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages