multiple connection

Frédéric Logier

unread,

Apr 9, 2013, 8:51:15 AM4/9/13

to zer...@googlegroups.com

Hi Paul,

because of REP/REQ I can't have a pool of workers connected to zerogw. So when an http POST take some times, it's not possible to another clent to POST. Is there a way with one instance of zerogw to treat more requests at the same time or should I launch more instances ?

thx

Paul Colomiets

unread,

Apr 9, 2013, 3:11:13 PM4/9/13

to zer...@googlegroups.com

Hi Frédéric,

I don't understand your question actually. Zerogw supports multiple
simultaneous requests from the first version having a working HTTP
(say v0.1 :) ). But, I'll try to guess what you mean:

1. Client side requests are received simultaneously up to the limit of
max-connections and max-requests and listen-backlog and file
descriptor limit. I don't think you reached that limit (if you do,
just lift up the limits)

2. Client side requests are not being streamed until whole POST body
is downloaded into zerogw's memory. If you are trying very large
files, and you expect to see the request at backend immediately, it
may not be the case. You have to wait until whole post body will be
uploaded before it reaches the backend, be it first request or any
other one.

3. If you use REP socket at the backend, you can process only one
request at a time. It's the limitation of zeromq REP sockets. You have
3 options: use XREP (ROUTER) socket, connect multiple zeromq sockets
to zerogw, start multiple processes for backend.

4. The load-balancing for zeromq REQ/REP sockets is not guaranteed to
be fair. Consider the following situation. You started single backend
worker, it got two requests queued, but still processes first one.
Then you spawn another worker, the latter will process new requests,
ones that are already queued in the first worker will be queued there
forever. After the second worker is started, both will get same number
of requests (without accounting that first one is more loaded). This
can be smoothed a bit by setting high water marks on zeromq sockets.

5. If you confused that zerogw declares `!zmq.Req` socket in config,
don't believe :) It's actually XREQ (DEALER) socket, just we consider
that an implementation detail.

6. You man have `!zmq.Connect` socket in zerogw. Just either change it
to `!zmq.Bind` or put multiple `!zmq.Connect` lines to configuration
file to connect pool of workers.

All in all, you may run `zerogw -c examples/zerogw.yaml` and few
instances of `python2.7 examples/echo.py` and see what happens. Put a
sleep call and more print statements to see what happens
simultaneously and what not.

Unless you are trying to saturate 10Gbit you shouldn't have to worry
about multiple zerogw instances.

--
Paul

Frédéric Logier

unread,

Apr 9, 2013, 7:02:54 PM4/9/13

to zer...@googlegroups.com

Hi Paul,

sorry about my poor explanation (and my english too :), so i'll try to be more clear.

You right, I had supposed that you used REQ/REP and not DEALER. In fact I have never used it, only REP/REP PUB/SUB and PUSH/PULL.

I have set multiple !zmq.Connect and, indeed, that's what I need. So whether I connect a second worker to this second !zmq.Connect, or should I use a broker like this http://zguide.zeromq.org/cpp:rrbroker ? I suppose that using a broker is better, because worker and client do not need to know each others, so I can launch a pool of workers/zerogw.

Anyway I'm using Qt/C++ and it's a bit more complicated that these python examples, because of the Qt event loop I can't use a blocked (zmq::socket_t) recv. (if you are curious my code https://github.com/nodecast/ncs/blob/master/zerogw.cpp#L206 )

Thanks again for your help Paul.

2013/4/9 Paul Colomiets <pa...@colomiets.name>

--
You received this message because you are subscribed to the Google Groups "Zerogw" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zerogw+un...@googlegroups.com.
To post to this group, send email to zer...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://fredix.wordpress.com

Paul Colomiets

unread,

Apr 10, 2013, 4:07:14 PM4/10/13

to zer...@googlegroups.com

Hi Frédéric,

On Wed, Apr 10, 2013 at 2:02 AM, Frédéric Logier <fre...@gmail.com> wrote:
> Hi Paul,

> sorry about my poor explanation (and my english too :), so i'll try to be
> more clear.
> You right, I had supposed that you used REQ/REP and not DEALER. In fact I
> have never used it, only REP/REP PUB/SUB and PUSH/PULL.
>
> I have set multiple !zmq.Connect and, indeed, that's what I need. So whether
> I connect a second worker to this second !zmq.Connect, or should I use a
> broker like this http://zguide.zeromq.org/cpp:rrbroker ? I suppose that
> using a broker is better, because worker and client do not need to know each
> others, so I can launch a pool of workers/zerogw.
> Anyway I'm using Qt/C++ and it's a bit more complicated that these python
> examples, because of the Qt event loop I can't use a blocked (zmq::socket_t)
> recv. (if you are curious my code
> https://github.com/nodecast/ncs/blob/master/zerogw.cpp#L206 )
>

It's up to you. I have few hints, that you may value.

1. You shouldn't use device, unless you have more than one zerogw.
Device creates additional latency and additional data copy.

2. The device's performance is same order of magnitude that zerogw. So
if you use device, and you have multiple zerogw instances for
performance, you need about the same number of devices to handle the
load.

3. If you have several zerogw instances for redundancy, you need to
have similar number of devices for redundancy too.

4. The device may be useful for load balancing reasons, if you have
high water marks set on the sockets (and if you don't know what HWM is
and how to set it up right, forget about this choice), of if your
devices handles the load-balancing better than zeromq (that the sample
you above doesn't)

5. You don't need a device to connect any number of workers to zerogw
(and it is a zeromq feature, so it's expected that any zeromq service
have this feature, see below)

So how to connect multiple workers. You are thinking in a way like
nginx works (and in fact most other pre-zeromq software proxies). The
scheme is the following:

http://www.slideshare.net/PaulColomiets/zeromq-and-web/19

This works by allocating a port for each worker, and connecting by
zerogw to each worker separately. But much nicer way to connect using
zeromq is following:

http://www.slideshare.net/PaulColomiets/zeromq-and-web/20

When you do `zmq.Bind` for a single port in zerogw. Then you can spawn
any number of workers at any time, and they `zmq_connect` to that
single port (7001 on the picture). Remember, zeromq is symmetrical
with respect to `bind` and `connect`. You just need to write
`!zmq.Bind` instead of `!zmq.Connect` in zerogw's configuration.

Regards,
--
Paul

Frédéric Logier

unread,

Apr 15, 2013, 5:08:28 PM4/15/13

to zer...@googlegroups.com

Hi Paul,

finaly I'm testing zerogw -> zmq_router -> zmq_dealer -> workers. It seems ok, and I need at least 2 zerogw instance or more.

2013/4/10 Paul Colomiets <pa...@colomiets.name>

--
Paul

--
You received this message because you are subscribed to the Google Groups "Zerogw" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zerogw+un...@googlegroups.com.
To post to this group, send email to zer...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://fredix.wordpress.com

Paul Colomiets

unread,

Apr 15, 2013, 5:35:42 PM4/15/13

to zer...@googlegroups.com

Hi Frédéric,

On Tue, Apr 16, 2013 at 12:08 AM, Frédéric Logier <fre...@gmail.com> wrote:
> Hi Paul,

> finaly I'm testing zerogw -> zmq_router -> zmq_dealer -> workers.

Not sure what you mean. Do you use two standalone devices?

> It seems ok,

Nice.

> and I need at least 2 zerogw instance or more.

Just curious, what's the goal? Redundancy?

Anyway. Thanks for the success report!

--
Paul

Frédéric Logier

unread,

Apr 17, 2013, 5:03:01 AM4/17/13

to zer...@googlegroups.com

Hi Paul,

I don't understand what do you mean by "devices", I'm just using zmq socket router and dealer, not a zmq::proxy or zmq::forwarder. My project act as a deamon, so zerogw is connected to him. A thread receive your payloads and forward them to a pool of workers.

My goal is to have the lowest latency when there is many HTTP POST with files in the body. I have tried XREP on my workers instead of REQ, but because my workers do not use a blocked zmq recv and because the payloads are unordered I can't do the difference between payloads. So it's simpliest for me to stay with REQ socket and a pool of zerogw in front.

You can see a developer schema https://dl.dropboxusercontent.com/u/147977/ncs_dev.png

2013/4/15 Paul Colomiets <pa...@colomiets.name>

--
Paul

--
You received this message because you are subscribed to the Google Groups "Zerogw" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zerogw+un...@googlegroups.com.
To post to this group, send email to zer...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://fredix.wordpress.com

Frédéric Logier

unread,

Apr 17, 2013, 5:50:40 AM4/17/13

to zer...@googlegroups.com

"I have tried XREP on my workers instead of REQ" I means instead of REP.

2013/4/17 Frédéric Logier <fre...@gmail.com>

--
http://fredix.wordpress.com

Jiří Sedláček

unread,

May 10, 2013, 8:47:08 AM5/10/13

to zer...@googlegroups.com

Hi,

I just stumbled upon the "fairness" of work distribution. I have a problem that might be solved by putting dealer <--> router proxy between zerogw and workers. I guess. The problem is this:

1. zerogw accepts requests and forwards it to worker

2. worker calls another process via zeromq

3. the called process does DELETE /some/resource which goes back to zerogw

4. even though there are 8 workers connected to zerogw it just happens that it is time to queue this request to the worker that is waiting for its execution :-)

What a beautiful deadlock.

If there was a dealer the request would have gone to the first available worker. It's sad that I will have to put a middle man in there.

Jiri

Paul Colomiets

unread,

May 10, 2013, 1:28:12 PM5/10/13

to zer...@googlegroups.com

Hi Jiří,

On Fri, May 10, 2013 at 3:47 PM, Jiří Sedláček
<yirie.se...@gmail.com> wrote:
> Hi,
>
> I just stumbled upon the "fairness" of work distribution. I have a problem
> that might be solved by putting dealer <--> router proxy between zerogw and
> workers. I guess. The problem is this:
>
> 1. zerogw accepts requests and forwards it to worker
> 2. worker calls another process via zeromq
> 3. the called process does DELETE /some/resource which goes back to zerogw
> 4. even though there are 8 workers connected to zerogw it just happens that
> it is time to queue this request to the worker that is waiting for its
> execution :-)
>
> What a beautiful deadlock.
>

Yes. That's expected. It's not like request will always be forwarded
to the same worker, but the probability is 0.125 which is pretty high
if you get thousands of requests per second.

> If there was a dealer the request would have gone to the first available
> worker. It's sad that I will have to put a middle man in there.
>

It's how zeromq works and we can't do anything with that at zerogw
side. However, it's not clear why you need to do another request to
the same set of workers anyway? It should be easy to do the DELETE
request internally, it would save you from a network round-trip at
each request.

--
Paul

Jiří Sedláček

unread,

May 13, 2013, 3:26:02 AM5/13/13

to zer...@googlegroups.com

You are right I am doing a request to the same set of workers. The reason is that in production our system is spread across different machines and we use HTTP for communication. In development the system runs on a single PC. That's why there is a local HTTP call.

There is a way for ZeroMQ to distribute the workload based on worker availability (like Lazy Pirate pattern) but I guess it is not justified to add so much code to zerogw just to handle this special case. The small code base is of course one of the reasons it performs so well.

I added a branch to my code to handle local calls differently. So it works now.

Jiri

Paul Colomiets

unread,

May 13, 2013, 2:14:31 PM5/13/13

to zer...@googlegroups.com

Hi Jiří,

On Mon, May 13, 2013 at 10:26 AM, Jiří Sedláček
<yirie.se...@gmail.com> wrote:
> You are right I am doing a request to the same set of workers. The reason is
> that in production our system is spread across different machines and we use
> HTTP for communication. In development the system runs on a single PC.
> That's why there is a local HTTP call.
>

Sure we have same problem here. We have two ways of solving problem:

1. If resource is local, just call the method directly (or do internal
redirect). If resource is remote execute it with RPC (seems what you
have implemented)

2. If method X needs method Y. Split process that serves requests for
X and Y into two processes A and B where A serves only method X and B
serves only method Y. Then you have different pools of processes, so
they never deadlock.

BTW, in our setups both variants work with zeromq-based RPC not
through HTTP, as for internal requests the HTTP give unnecessary
overhead.

Note also that second option is superior in amount of monitoring info
you may get. I.e. you can monitor CPU usage and traffic that is
created by external users and internal RPC separately.

> There is a way for ZeroMQ to distribute the workload based on worker
> availability (like Lazy Pirate pattern) but I guess it is not justified to
> add so much code to zerogw just to handle this special case. The small code
> base is of course one of the reasons it performs so well
>

This actually doesn't solve the problem. When you have 8 workers, and
all 8 workers got a request that needs to call another worker, you're
deadlocked. This is a kind of deadlock that seems to be very unlikely,
but happens every day :)

> I added a branch to my code to handle local calls differently. So it works
> now.
>

Great!

--
Paul

Jiří Sedláček

unread,

May 17, 2013, 1:46:01 AM5/17/13

to zer...@googlegroups.com

Hi Paul,

that is a very deep insight. I had to think it through. That's why it took me so long to reply. There is an inherent flaw in our design causing these potential deadlocks from nested calls. Somehow we have managed to survive by adding more http workers or by removing the "nested-ness" when a bug was reported.

We have about 9 components in the system. I have to start rigidly checking if there is a nested call there. If there is then the component must be split in two as you correctly suggest. I think there isn't one but I am not sure and that's no good. Every piece of new code has a potential of introducing this deadlock. I'll try to find a way to test it automatically.

Thank you Paul

Paul Colomiets

unread,

May 17, 2013, 2:43:26 AM5/17/13

to zer...@googlegroups.com

Hi Jiří,

On Fri, May 17, 2013 at 8:46 AM, Jiří Sedláček
<yirie.se...@gmail.com> wrote:
> We have about 9 components in the system. I have to start rigidly checking
> if there is a nested call there. If there is then the component must be
> split in two as you correctly suggest. I think there isn't one but I am not
> sure and that's no good. Every piece of new code has a potential of
> introducing this deadlock. I'll try to find a way to test it automatically.
>

There is another possible solution: asynchronous calls
(notifications). If you don't wait for reply, you don't have
deadlocks. In some cases it's easier to turn requests into
asynchronous (e.g. send them using push socket), than to split
component into two. If you need to check the whole system, it may save
you some time.

> Thank you Paul
>

You're welcome!

--
Paul

Jiří Sedláček

unread,

May 17, 2013, 3:32:50 AM5/17/13

to zer...@googlegroups.com

I happily report that the only deadlock possibility in the system was the one I removed earlier.

To your point about async calls:

I agree it's the best way to go. We use it in parts of the system. However I don't yet see a way to do this:

Function X in process A needs something from function Y in process B. If this should be done asynchronously then process A would have to have a call register where function X would be put on hold until response from B is received. When it is received the computation would resume.

I mean I see a way to do this but that would require completely changing the api of X so it would be easy to tell what needs to be fetched upfront.

Paul Colomiets

unread,

May 17, 2013, 7:24:30 PM5/17/13

to zer...@googlegroups.com

Hi Jiří,

On Fri, May 17, 2013 at 10:32 AM, Jiří Sedláček
<yirie.se...@gmail.com> wrote:
> Function X in process A needs something from function Y in process B. If
> this should be done asynchronously then process A would have to have a call
> register where function X would be put on hold until response from B is:
> received. When it is received the computation would resume.
>

Sure. It doesn't work well for all the tasks. I just mention it
because there people coming from traditional RPC world who don't even
understand that it's possible to live without a reply :)

--
Paul

Reply all

Reply to author

Forward