Hi to everyone!
We have headers exchange which handles about 0.2-10 messages/sec, but system's load is small (about 25-30% only).
Our environment:
RabbitMQ 3.5.7
We have RMQ cluster (2 nodes) as single publish point for our app servers (just name it "Cluster"), this Cluster has one fanout exchange (called "EP").
We have several "Web" nodes which serve our clients via web-stomp plug-in. Every Web node also has fanout "EP" exchange, which receives messages from Cluster via Federation plug-in.
Next to EP exchange Web has headers exchange (called "Headers"), which has one binding from EP.
Then we have 21 direct exchanges with bindings from Headers. Every direct exchange has one binding with 2 rules (for example: "
some.our.site.name"=true (bool), "scope"="user" (str), + "x-match"="all"). It's very small number of objects in routing table.
And finally every direct exchange has bindings to many fanout exchanges. At day it can be about 50-60k fanout exchanges at the same time.

Previous topology was without EP and Headers exchanges and every direct exchanges (D*) was synced by Federation plug-in with the same exchange on Cluster.
That topology handled 800 msgs/sec at average and 5000 msgs/sec in peak (which took 70% of CPU).
New topology works in parallel with previous and should deliver only 50-100 msgs/sec, but we see about 0.2-10 msgs/sec on EP and Headers stats after some time.
In addition we've noticed that memory for binaries is growing abnormally. Before average binary memory was about 100-150Mb, but now it's almost 5Gb and continue growing when number of clients and delivered messages is 20% at day's load.
On one Web node we've changed Federation plug-on to Shovel plug-in, it started publish directly to Headers exchnage. After that we've still seen messages from EP exchange (but nobody was publishing in it). It looks like messages were published from some internal Erlang queue. To confirm this theory we've bound empty queue with EP exchange and deleted binding from EP to Headers. After that system's load was about 100% and in queue were a lot of messages. It means that Headers exchange was throttle our message publishing process.
After 3 hours on the test node (less than 15% of CPU is used):
1) Shovel's queue on Cluster

2) At the same time on Header exchange on Web

Shovel (and Federation plug-in too) takes all messages from Cluster but in Headers is published only 20% of them. But everything is okay right after node restart.
Binary memory statistics:


Any thoughts about that?
Thanks for your help!