Broken pipe when executing println on a remote worker

48 views
Skip to first unread message

David Parks

unread,
Jun 28, 2016, 7:30:07 PM6/28/16
to julia-users
I've launched workers on remote servers using my own cluster manager. It appears to be configured correctly, the workers launch, I can execute remotecall on them. But when I try to run a remote `println` command I get a broken pipe. `stdout` doesn't seem to forward to the master as I would expect.

julia> remotecall_fetch(90, gethostname)
"gpu-8.local"

julia
> remotecall_fetch(90, println, "test")
ERROR
: On worker 90:
write
: broken pipe (EPIPE)
 
in yieldto at ./task.jl:71
 
in wait at ./task.jl:371
 
in stream_wait at ./stream.jl:60
 
in uv_write at stream.jl:962
 
in buffer_or_write at stream.jl:972
 
in write at stream.jl:1011
 
in print at strings/io.jl:46
 
in print at strings/io.jl:18
 
in println at strings/io.jl:25
 
in println at strings/io.jl:28
 
in anonymous at multi.jl:923
 
in run_work_thunk at multi.jl:661
 
[inlined code] from multi.jl:923
 
in anonymous at task.jl:63
 
in remotecall_fetch at multi.jl:747
 
in remotecall_fetch at multi.jl:750

julia> versioninfo()
Julia Version 0.4.5
Commit 2ac304d (2016-03-18 00:58 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT NO_AFFINITY NEHALEM)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3


I posted this previously on SO but didn't get any takers, I wonder if anyone here has an idea why this might occur.


Thanks,
David

David Parks

unread,
Jun 28, 2016, 8:19:42 PM6/28/16
to julia-users
Is it the case that the cluster manager must continue to redirect stdout to the master after the master/slave handshake has been completed?

I noticed this line in the documentation:
  • The cluster manager captures the stdout’s of each worker and makes it available to the master process
I had originally take that to mean only during the initial handshake when the workers write their IP/Port to stdout and that needs to be captured by the master to initiate the session. But now I'm wondering if the cluster manager needs to continuously redirect stdout from the workers to the master. If I'm correct about that, what API would we send the workers stdout messages to? The documentation isn't specific about this point.


David Parks

unread,
Jun 28, 2016, 10:06:24 PM6/28/16
to julia-users
Answered my own question after a few more hours of sweat and tears. I had mis-understood the documentation, and what I said previously was correct. The cluster manager must maintain the stdout IO stream and pass it to the WorkerConfig.io field. 
Reply all
Reply to author
Forward
0 new messages