My application currently has a client that opens five channels to the MQ. One of those channels is in use continuously, the other four, are sporadically used. The problem is that the firewall drops unused channels after 30 minutes of inactivity. So, when a packet is sent over one of those channels an errno 73 or errno 78 occurs. Which causes POE::Component::Client::Stomp to go into reconnection mode. During this, the packet is silently dropped. Not good.
The first stab at fixing this is to have the client call "send_data" synchronously, instead of asynchronously, this didn't work. The error code is generated a period of time after the call completes. So POE itself, is doing some buffering deep down inside. Ummm, David, this might be a good case for transactions, hint, hint.
Now I figure I have 3 ways to solve this problem:
1) Turn on the undocumented AutoFlush feature for POE::Wheel::ReadWrite. This would presumably cause the errors to happen faster. Thus allowing me to preserve the packet. This could also hamper performance if the flush blocked. And the option is undocumented, while the whole flushing scheme is marked as experimental.
2) Turn on SO_KEEPALIVE within POE::Wheel::SocketFactory. This always seems to be the firewall admins answer to these types of problems. This would require mucking around within the object and directly manipulating the socket itself since there is no way of doing this with options. This leads to brittle code as there is no documented interface into the object. Also I figure that some bright young firewall OS engineer will realize that all those idle channels with keepalive active could be dropped to free up firewall resources. Which is why they are silently dropping inactive open channels now.
3) Rewrite the client to send all the packets over one channel. This would certainly fix the problem as there is a continuous flow over the one channel. But it leads into coordination problems between the sending over the network and producing the data to send .i.e if the network link goes down, you need to notify the producers to start buffering until the link comes back.
Options 1 and 2 would require modifications to POE::Component::Client::Stomp. This may help others or it may break everybody else, I don't know. Option 3 only effects me, but there is a deadline on the client code and options 1 and 2 may be easier to implement.
So, I would like to know how others have handled this situation and any thoughts they may have.
We definitely had firewall related problems with our application at my
former employer. The problem was a bit different though..
In our case, as well, idle connections were being closed. When the
connection was closed, an error would occur on the MQ server end and
the MQ would correctly clean up that connection. However, on the
client end, the connection didn't get any errors, and it would still
think the connection was active indefinitely. Messages sent on this
connection would go to oblivion and it would wait forever to receive
messages that never came.
I'm not sure what was at fault: the firewall, OS (RHEL 4.0), network
architecture, switches/routers, etc.. Who knows. Our solution was to
get the network team to allow MQ clients to bypass the firewall when
communicating with the server. Lots of trial-and-error was involved.
My "official" position is that firewalls in between internal
components of an application are lame and should be avoided at all
costs. In some cases it can't be avoided and while it would be
awesome to have a generic solution in such a case, firewall/network
setups can be as unique as snowflakes.
As to your proposed solutions, they sound reasonable in the case of a
client that sends messages. In the case of a client that only listens
for messages and doesn't send any of its own, only #2 would work --
assuming the firewall would actually refrain from breaking such a
connection -- it didn't work for us. Some sort of "ping" to prevent
the connection being closed might work too. I'd be open to adding a
PING frame which would be a non-op except you could set a "receipt"
header on it.
> Ummm, David, this might be a good
> case for transactions, hint, hint.
Heh, maybe. ;-) But I'm not sure it would help in this situation. It
sounds like the message is never making it to the server, right?
2009/7/2, Kevin Esteb <kes...@wsipc.org>:
>
> Or maybe an option 4 has more promise.
>
> When sending the packet ask for a receipt. The receipt's ID points to the
> resource. When the receipt comes back, remove the resource. Appears to be
> working, I think I will let this one run for a while...
Receipts definitely sounds like the way to go here. This is
more-or-less the purpose of receipts: to make sure the message landed
on the server end.
Happy Hacking,
David.
--
Open Source Hacker and Language Learner
http://www.hackyourlife.org/