Hello,
I'll tell you what conclusions that I've come to. When I started thinking about this, the target was the cloud, so I'm going to give suggestions that can match an environment without UDP broadcast. Aim was to have no single point of failure for a RabbitMQ setup.
First, of course HA-queues with RMQ shared in a cluster. You can have either one or two disk-nodes; if you have two, you can avoid having all management/subscriptions/exchanges-management stop working when one goes down.
Then, you need a way of telling the clients that the broker it was connected to just went down. When it happens, you need a way of resending messages that weren't successfully sent; so that's where the async work on MT comes in. That will also work against the occasional network error and will assure you don't lose messages from that. When you are resending you will be re-establishing a connection to the broker(s):
The way I think this is best done is using HAProxy with Keepalived. HAProxy keeps tabs on the underlying Rabbits and redirects connections to those that are up, while itself being pretty non-intrusive. StackExchange seems to be working happily with it and the book Rabbit In Action recommended it as well - and it works in my virtual environemnt.
But you've then just introduced HAProxy as a single point of failure; so you need to get rid of that as well. DNS of course has a TTL (and I personally hate Named because of its terrible configuration and debugging facilities) - so for creating a service registry, you're probably best off with a lookup service in e.g. ZooKeeper -- but the broker is a "known point" and so, I would rather connect to the HAProxy by IP.
That's where
https://github.com/haf/puppet-keepalived comes in - you can configure it with HAProxy so that, in the case of a detected failure, it will take over the IP of the previous master. Of course this is a 'faulty failure detector', but in reality it should be pretty stable (I guess one could reason about network spilits and what not, but more likely the computer will simply be rebooted sometimes and then you need the failover) The link comes with some binaries that I've taken from HAProxy's patch, enabling unicast on keepalived (because UDP multicast doesn't work in some clouds).
The actual configuration of RMQ and HA proxy is sampled here:
github.com/jussiheinonen/rabbitmq-on-vagrant - but its puppet code is pretty shoddy for a production env. (e.g. doesn't provide 'unless => ...'-args to Exec). I've found a pretty decent rabbit module here
https://github.com/haf/puppet-rabbitmq
Using these links you could probably get a vagrant setup for testing the above up in half a day. Tell me how it pans out!