On 12 February 2015 at 01:14:56, Emil Aguinaldo (
emil.ag...@gmail.com) wrote:
> We currently have 150K transient clients that will be connecting
> to the cluster. The initial design is creating a shared pool of
> connections. Each client will use/reuse a connection from the
> pool and then create two channels each for publishing and consuming.
> Each client gets an auto delete queue. Clients are can be UI pages
> or devices and most of them are transient. It seems expensive
> to create one connection per client (we're planning on hitting
> 1 million). We wanted to know if what is a good ratio for the connection
> pool vs clients.
I'm still a bit confused as to what the clients are but if you can pool connections, with this
kind of numbers it's a good idea. I cannot suggest the ratio as it is something
application-specific. I'd try 10 to 1 and 25 to 1.
> > > 2. Our clients produce and consume messages. Is it best practice
> > > to separate the send and receive channel?
> >
> > Publishers may be blocked if consumers do not keep up. If you
> run into that,
> > yes, it is a good idea to use separate connections. Otherwise
> it's fine to both
> > publish and consume on a shared one.
> >
> > > 3. Can we even share a single channel for sending messages?
> A pool
> > > of send channels?
> >
> > Channels must not be shared between threads (or similar concurrency
> primitives).
> > Other cases of sharing are typically fine.
>
> Are channels expensive? I can see from the sample tests and the
> test harness that the tests create one connection/channel for
> a consumer and another one for the producers. Given that we are
> going for 150K to a million subscribers, is it feasible to create
> a channel for every client or can we reuse the channels for different
> consumers?
Channels are relatively inexpensive compared to connections. That's one reason why they exist in the protocol ;)
If you can pull connection pooling off, then pooling channels may or may not be necessary.
> > > 4. Our clients are temporary and we want to create a temporary
> > > queue for each one. Is it feasible to support 500 queues?
> >
> > 500 or 500K? Queues take up some RAM so 500K would require spreading
> > them between nodes (e.g. using an HAproxy with distribution
> mode leastconn).
>
> Sorry for the confusion, so my question was actually about creating
> a queue for a client and tearing it down when client disconnects
> (autodelete). Given the size number of clients (150K - 500K -
> 1M), we are talking about alot of queues. Is this just a matter
> of adding more nodes for memory and spacing them out? Wouldnt
> it be too much traffic and binding churn for it to handle?
Yes, pretty much. As for binding churn, we need to profile it but I don't
expect it to be a pain point compared to some other things discussed in this thread.
> > 500 queues is nothing, of course.
> >
> > > 5. Rabbit seems to take a long time to cleanup resources. Is
> this
> > > normal?
> >
> > What resources? How long is "long time"?
> >
> We actually have a cluster of websocket servers that sits in the
> middle of our clients and our rabbitmq cluster. We create a pool
> of rabbitmq connections on the websocket servers so that when
> clients connect to them they would have a rabbit connection available
> to them. We find that closing a rabbitmq connection takes a while
> compared to the socket connection. When we run tests and then
> the test clients suddenly all die for whatever reason, the websocket
> needs to wait till all the rabbitmq connections have enough time
> to cleanup. This i find takes a while like a few minutes to clear
> out like 200K channels 100K queues and consumers.
We have a bug for speeding channel teardown. I doubt that for 200K connections
closed in a loop it can take less than 1 minute, though.
> > Closing 500K connections in a loop will put some stress on a node
> > but it shouldn't take particularly long.
> >
> > > 6. One of our tests shows that the cluster performs well when
> the
> > > queues and the producer is on the same node. We assume that the
> > > cluster should be able to handle and not crash when you have
> that
> > > many clients producing and consuming on different nodes.
> >
> > Your assumption is reasonable. If you use queue mirroring and
> see high memory use
> > on mirror nodes, you may be running into .
> >
> > Otherwise please provide log files and what RabbitMQ and Erlang
> versions are used.
>
> RabbitMQ 3.4.4, Erlang R16B03
> I will do another run and I will provide the rabbit logs. Is there
> any other logs that might be needed? Do you want me to paste or attach
> them here?
If they are large (and with this number of connections, they probably will be moderately sized),
I'd compress them and upload to the group or put anywhere else where we can download them
from.
Thank you!
> > > 7. We are currently using a single topic exchange where transient
> > > clients connect and bind queues to. Will it be able to handle
> the
> > > load or is it causing the issues we are having?
> >
> > Exchanges are just routing tables. Actual routing is performed
> by channel processes. This is something
> > that will be spread across nodes roughly evenly if you spread
> connections roughly evenly .
> Our Devops are reporting high traffic in between nodes. Much
> higher than expected and seems like was saturating the backend
> network for such small traffic coming in.
There are 3 things that cause intra-cluster traffic:
* Queue operations (including publishes): you can publish to any node, every queue has a master messages go through to guarantee ordering.
* Mirroring of messages (needs no explanation)
* Binding operations (they have to be distributed cluster-wide)
* Emission of stats that management UI presents
* Nodes send each other messages to know which other nodes are up (the interval is configurable but typically at least a few seconds)