[zeromq-dev] Disconnecting from permanently dead endpoints

Pierre Ynard

unread,

Jan 25, 2012, 10:25:59 AM1/25/12

to zerom...@lists.zeromq.org

Hello,

I want to set up a messaging system where a zeromq socket connects to
several endpoints (with TCP). Some of these endpoints may go permanently
down, and the corresponding transport addresses could even be reused
by a totally unrelated service. So I need to a way to "disconnect" the
zeromq socket from this endpoint, as keeping around zombie endpoints to
periodically try to reconnect to them is unnecessary, undesirable and
does not scale. The only way I see to do this is to close the socket,
create a new one, and reconnect it to all the remaining endpoints, but
that's quite suboptimal. Did I overlook something?

Regards,

--
Pierre Ynard
"Une âme dans un corps, c'est comme un dessin sur une feuille de papier."
_______________________________________________
zeromq-dev mailing list
zerom...@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Pieter Hintjens

unread,

Jan 25, 2012, 11:29:38 AM1/25/12

to ZeroMQ development list

On Wed, Jan 25, 2012 at 9:25 AM, Pierre Ynard <link...@yahoo.fr> wrote:

> I want to set up a messaging system where a zeromq socket connects to
> several endpoints (with TCP). Some of these endpoints may go permanently
> down, and the corresponding transport addresses could even be reused
> by a totally unrelated service. So I need to a way to "disconnect" the
> zeromq socket from this endpoint, as keeping around zombie endpoints to
> periodically try to reconnect to them is unnecessary, undesirable and
> does not scale. The only way I see to do this is to close the socket,
> create a new one, and reconnect it to all the remaining endpoints, but
> that's quite suboptimal. Did I overlook something?

Mainly, it depends on the type of messaging pattern.

For pub-sub, subscribers can come and go, 0MQ handles this automatically.

For request-reply, you need to use a load-balancing pattern such as
least-recently used.

For other pattterns it gets harder and some people resort to creating
one socket per connection.

-Pieter

Andrew Hume

unread,

Jan 25, 2012, 11:44:54 AM1/25/12

to ZeroMQ development list

pieter is of course correct in what he says, but to me,

your issue is more about murkiness about the goal.

if the goal is to move data from source to several sinks,

each of which can come and go, then normally the model

is a known publisher and an unspecified set of clients getting that data.

if data should go to exactly one client, then use PUSH-PULL.

if data can or should go to each client, then use PUB-SUB.

in both cases, 0mq handles what you want; you needn't worry about teardown etc.

life gets harder if the publisher needs to know the client's name.

------------------

Andrew Hume (best -> Telework) +1 623-551-2845

and...@research.att.com (Work) +1 973-236-2014

AT&T Labs - Research; member of USENIX and LOPSA

Pierre Ynard

unread,

Jan 25, 2012, 12:08:04 PM1/25/12

to zerom...@lists.zeromq.org

> if the goal is to move data from source to several sinks,
> each of which can come and go, then normally the model
> is a known publisher and an unspecified set of clients getting that data.
> if data should go to exactly one client, then use PUSH-PULL.
> if data can or should go to each client, then use PUB-SUB.
> in both cases, 0mq handles what you want; you needn't worry about teardown etc.

I'm doing PUSH-PULL, and pulling data from several sources.

> > For other pattterns it gets harder and some people resort to creating
> > one socket per connection.

Having to handle several sockets would defeat the ease of use of zeromq;
in my case I'd rather just reset all the connections.

--
Pierre Ynard
"Une âme dans un corps, c'est comme un dessin sur une feuille de papier."

Andrew Hume

unread,

Jan 25, 2012, 12:14:11 PM1/25/12

to ZeroMQ development list

if you're doing a PUSH-PULL, and the PUSH side is doing the bind,

then there shouldn't be a problem with the (PULL) clients coming and going.

are you saying there is?

On Jan 25, 2012, at 10:08 AM, Pierre Ynard wrote:

if the goal is to move data from source to several sinks,
each of which can come and go, then normally the model
is a known publisher and an unspecified set of clients getting that data.
if data should go to exactly one client, then use PUSH-PULL.
if data can or should go to each client, then use PUB-SUB.
in both cases, 0mq handles what you want; you needn't worry about teardown etc.

I'm doing PUSH-PULL, and pulling data from several sources.

Pierre Ynard

unread,

Jan 25, 2012, 12:34:19 PM1/25/12

to zerom...@lists.zeromq.org

> if you're doing a PUSH-PULL, and the PUSH side is doing the bind,
> then there shouldn't be a problem with the (PULL) clients coming and going.
> are you saying there is?

No, but the PULL client will have a problem with the PUSH servers, many
of which will come and then go forever; I don't want to keep around
connections to servers that are defintely gone and won't come back.

Pieter Hintjens

unread,

Jan 25, 2012, 12:36:13 PM1/25/12

to ZeroMQ development list

On Wed, Jan 25, 2012 at 11:14 AM, Andrew Hume <and...@research.att.com> wrote:

> if you're doing a PUSH-PULL, and the PUSH side is doing the bind,
> then there shouldn't be a problem with the (PULL) clients coming and going.
> are you saying there is?

PUSH/PULL doesn't work very well when workers come and go, since
requests can get sent to workers that are going away, and thus get
lost.

My general strategy in such dynamic topologies is to use ROUTER
sockets and custom routing. You could for example use credit-based
flow control, or LRU, or round-robin with sliding acknowledgements. In
any case you need explicit logic to either only send to known
available workers, or recover from messages lost. 0MQ will not do this
for you.

It was meant to be a topic for the Guide, how to build reliable pipelines.

-Pieter

Pieter Hintjens

unread,

Jan 25, 2012, 12:39:17 PM1/25/12

to ZeroMQ development list

On Wed, Jan 25, 2012 at 11:34 AM, Pierre Ynard <link...@yahoo.fr> wrote:

> No, but the PULL client will have a problem with the PUSH servers, many
> of which will come and then go forever; I don't want to keep around
> connections to servers that are defintely gone and won't come back.

In theory this should be managed happily by 0MQ. In practice on some
systems (Android, at least), it will eventually cause problems. This
is a bug, and something we're looking into.

-Pieter

Martin Lucina

unread,

Jan 25, 2012, 8:10:06 PM1/25/12

to zerom...@lists.zeromq.org

Hi Pierre,

On Wed, 25 Jan 2012 16:25:59 +0100
Pierre Ynard <link...@yahoo.fr> wrote:

> I want to set up a messaging system where a zeromq socket connects to
> several endpoints (with TCP). Some of these endpoints may go permanently
> down, and the corresponding transport addresses could even be reused
> by a totally unrelated service. So I need to a way to "disconnect" the
> zeromq socket from this endpoint, as keeping around zombie endpoints to
> periodically try to reconnect to them is unnecessary, undesirable and
> does not scale.

Understood.

> The only way I see to do this is to close the socket,
> create a new one, and reconnect it to all the remaining endpoints, but
> that's quite suboptimal. Did I overlook something?

Correct, recreating the socket is the only way to accomplish
what you need right now.

An API for zmq_disconnect() and zmq_unbind() would probably not be
controversial, but the implementation would be ... hard. This dives
right in to the termination mechanisms in libzmq which are really
complex.

-mato
--
Martin Lucina <mar...@lucina.net>

john skaller

unread,

Jan 25, 2012, 8:32:45 PM1/25/12

to ZeroMQ development list

On 26/01/2012, at 4:39 AM, Pieter Hintjens wrote:

> On Wed, Jan 25, 2012 at 11:34 AM, Pierre Ynard <link...@yahoo.fr> wrote:
>
>> No, but the PULL client will have a problem with the PUSH servers, many
>> of which will come and then go forever; I don't want to keep around
>> connections to servers that are defintely gone and won't come back.
>
> In theory this should be managed happily by 0MQ. In practice on some
> systems (Android, at least), it will eventually cause problems. This
> is a bug, and something we're looking into.

Isn't it also a design fault? I mean, if you can connect/bind to multiple endpoints dynamically
you obviously should be able to un-connect and un-bind.

Given that, and a function to test if an endpoint is alive (no idea how .. )
removal of dead endpoints becomes the client's responsibility.

After all only the client can judge when an endpoint is to be deemed
dead because only the client knows the overall architecture.

--
john skaller
ska...@users.sourceforge.net

john skaller

unread,

Jan 25, 2012, 10:00:55 PM1/25/12

to ZeroMQ development list

On 26/01/2012, at 12:32 PM, john skaller wrote:

>
> Isn't it also a design fault? I mean, if you can connect/bind to multiple endpoints dynamically
> you obviously should be able to un-connect and un-bind.

API:

int disconnect(void *socket, void *connection);
int unbind(void *socket, void *connection);

Semantics:

When you connect or bind a socket, 0MQ associates the char* address
of the endpoint name with the internal representation of the endpoint.

When you call disconnect or unbind, ALL the endpoints which are
associated with that address are detached from the socket.

0MQ does not use the address for anything so it is perfectly safe
to free the buffer. It is also safe to use the same buffer for two connections,
in fact this is useful if you wish to disconnect or unbind them together.
if want to distinguish end points ensure the addresses passed in
are unique (i.e. if you're using wildcard connections make sure
to copy the buffer if you want to distinguish them).

No change to the existing C API is required for this extension.
No code is broken by this extension.

I guess the association of the address with the endpoint
internally is easy enough? The exact method of disconnecting
is open, but would be a bit similar to doing a close. disconnecting
all sockets does NOT relieve the client of the need to close the socket.
I am guessing the disconnect would block in the same circumstances
as close would.

The return value is -1 on error, or the number of endpoints disconnected
or unbound. 0 is a valid result. I do not know what errors might occur ..
except perhaps trying to unbind a connection or disconnect a binding.

Martin Lucina

unread,

Jan 25, 2012, 10:54:15 PM1/25/12

to zerom...@lists.zeromq.org

On Thu, 26 Jan 2012 14:00:55 +1100
john skaller <ska...@users.sourceforge.net> wrote:

>
> On 26/01/2012, at 12:32 PM, john skaller wrote:
>
> >
> > Isn't it also a design fault? I mean, if you can connect/bind to multiple endpoints dynamically
> > you obviously should be able to un-connect and un-bind.
>
> API:
>
> int disconnect(void *socket, void *connection);
> int unbind(void *socket, void *connection);

Yup, this is almost the same as what I was thinking.

> Semantics:
>
> When you connect or bind a socket, 0MQ associates the char* address
> of the endpoint name with the internal representation of the endpoint.

"void *connection" won't work; you would then need to get a handle
for the internal representation of the endpoint out of libzmq when you
call connect() or bind(), in order to be able to call unbind() or
disconnect () on it.

I would go with just calling unbind() or disconnect()
with the char* you originally passed to bind() or connect().

Semantics are pretty much the same; it is obviously an error to unbind
() or disconnect() an endpoint which was not bound or connected in the
first place.

-mato
--
Martin Lucina <mar...@lucina.net>

Pieter Hintjens

unread,

Jan 26, 2012, 12:11:29 AM1/26/12

to ZeroMQ development list

On Wed, Jan 25, 2012 at 9:54 PM, Martin Lucina <mar...@lucina.net> wrote:
> On Thu, 26 Jan 2012 14:00:55 +1100
> john skaller <ska...@users.sourceforge.net> wrote:
>
>>
>> On 26/01/2012, at 12:32 PM, john skaller wrote:
>>
>> >
>> > Isn't it also a design fault? I mean, if you can connect/bind to multiple endpoints dynamically
>> > you obviously should be able to un-connect and un-bind.
>>
>> API:
>>
>> int disconnect(void *socket, void *connection);
>> int unbind(void *socket, void *connection);
>
> Yup, this is almost the same as what I was thinking.

+1

-Pieter

john skaller

unread,

Jan 26, 2012, 1:50:19 AM1/26/12

to ZeroMQ development list

On 26/01/2012, at 2:54 PM, Martin Lucina wrote:

>
> I would go with just calling unbind() or disconnect()
> with the char* you originally passed to bind() or connect().

That's exactly what I said :)

>
> Semantics are pretty much the same; it is obviously an error to unbind
> () or disconnect() an endpoint which was not bound or connected in the
> first place.

You may be right, however that's not what my spec says:
when you disconnect you are disconnecting a SET of endpoints.

This is because connect/bind can be called with a char* to a buffer
which can be memcpy() into with the end point name. So all such
endpoints have the same char* address.

So since you're disconnecting a SET .. well there's nothing wrong
with an empty set!

Instead, the number of matched endpoints is returned by the function.
The client can then decide if 0 is an error or not.

--
john skaller
ska...@users.sourceforge.net

Martin Lucina

unread,

Jan 26, 2012, 2:28:08 AM1/26/12

to zerom...@lists.zeromq.org

On Thu, 26 Jan 2012 17:50:19 +1100
john skaller <ska...@users.sourceforge.net> wrote:

>
> On 26/01/2012, at 2:54 PM, Martin Lucina wrote:
>
> >
> > I would go with just calling unbind() or disconnect()
> > with the char* you originally passed to bind() or connect().
>
> That's exactly what I said :)
>
> >
> > Semantics are pretty much the same; it is obviously an error to unbind
> > () or disconnect() an endpoint which was not bound or connected in the
> > first place.
>
> You may be right, however that's not what my spec says:
> when you disconnect you are disconnecting a SET of endpoints.
>
> This is because connect/bind can be called with a char* to a buffer
> which can be memcpy() into with the end point name. So all such
> endpoints have the same char* address.

*click* Oh, I get it - saw the void * and didn't realise you wanted to
use the char * address as a handle to the endpoint. Neat trick, but
rather too far in terms of overloading for my taste.

Also, it won't work terribly well for language bindings since it relies
on a string also being a pointer, which just happens to be the case for
C.

What I have in mind is more conventional: ZeroMQ takes a copy of the
string passed to bind() or connect(), stashes it in the socket. unbind
() or disconnect() are again passed a string, trawl thru all the
bound/connected endpoints for the socket, and succeed if a match is
found.

> So since you're disconnecting a SET .. well there's nothing wrong
> with an empty set!
>
> Instead, the number of matched endpoints is returned by the function.
> The client can then decide if 0 is an error or not.

Sure, this makes sense if the char * is also considered as a handle. If
it's just a string then there is no set involved since it is an error
to bind or connect[1] to the same endpoint more than once.

-mato

[1] Actually, Martin Sustrik tells me you can probably connect (but not
bind) more than once to the same endpoint in the current
implementation, but that makes no real sense anyway so should return an
error if it does not already.
--
Martin Lucina <mar...@lucina.net>

john skaller

unread,

Jan 26, 2012, 3:13:16 AM1/26/12

to ZeroMQ development list

On 26/01/2012, at 6:28 PM, Martin Lucina wrote:
>
> *click* Oh, I get it - saw the void * and didn't realise you wanted to
> use the char * address as a handle to the endpoint. Neat trick, but
> rather too far in terms of overloading for my taste.

Agreed, but the objective to get something that is fully upwards
compatible and easy to implement.

The only hard bit here is how to actually do the disconnection.
My thought is that 0MQ close the socket (async) a nd
throw away the associated infrastructure, i.e. just lose any
buffers etc. The reason is the existing use case: the connect
is dead anyhow.

>
> Also, it won't work terribly well for language bindings since it relies
> on a string also being a pointer, which just happens to be the case for
> C.

That should be no problem: the binding has to supply an actual
char * to call the C function, so it can wrap it up as an abstract
handle and return it.

>
> What I have in mind is more conventional: ZeroMQ takes a copy of the
> string passed to bind() or connect(), stashes it in the socket. unbind
> () or disconnect() are again passed a string, trawl thru all the
> bound/connected endpoints for the socket, and succeed if a match is
> found.

Slightly risky if addresses aren't unique, eg "tcp://...." vs "TCP://..
or whatever. Also a bit harder to implement because the strings
have to be free()d. And a bit slower if you have millions of
connections.

--
john skaller
ska...@users.sourceforge.net

Reply all

Reply to author

Forward