Peer Expire Time?

45 views
Skip to first unread message

Jason Dopson

unread,
Jan 26, 2012, 12:51:26 PM1/26/12
to mobicents-public
I have an issue where connections aren't being closed properly
(because the far end switch disconnects) and then the MUX is throwing
an exception (java.nio.channels.AsynchronousCloseException). The
switch in question is sending me a Diameter Rf feed. The switch has
two redundant blades with one floating IP address. The scenario is
that the switch fails over operations from one blade to another and
then tries to re-establish the Rf feed. The MUX gets a new CER for
the same peer and tries to route the CEA over one of the existing open
connections (which is now closed) and then runs into the above
exception. It then proceeds to loop over and over in this condition
for an extended period of time (hours). I noticed an exp_time config
param on jdiameter-config.xml but its not very clear from the
documentation what this value is (secs, mins, hours?) and what
triggers the expiration.

To clarify, let me depict the scenario.

1) Switch (switch-001.somedomain.com) is operating on its first
redundant blade and opens connections to the MUX. All connections are
opened and added to the peer table. ACR/ACA's are flowing properly.
switch-001.somedomain.com (1.1.1.1:1) - Valid, open
switch-001.somedomain.com (1.1.1.1:2) - Valid, open
switch-001.somedomain.com (1.1.1.1:3) - Valid, open
switch-001.somedomain.com (1.1.1.1:4) - Valid, open

2) Switch (switch-001.somedomain.com) fails over operations to a
second redundant blade. All existing connections are no longer
valid. It's assumed that the switch did not try to gracefully close
the connections. Switch (switch-001.somdomain.com) now tries to open
a new connection (5) via a CER. However, CEA is attempted to route
back over connection (1) which is no longer valid.
switch-001.somedomain.com (1.1.1.1:1) - Invalid, open, CEA is sent
over this connection
switch-001.somedomain.com (1.1.1.1:2) - Invalid, open
switch-001.somedomain.com (1.1.1.1:3) - Invalid, open
switch-001.somedomain.com (1.1.1.1:4) - Invalid, open
switch-001.somedomain.com (1.1.1.1:5) - New CER, CEA is then attempted
to be sent across connection (1) instead of this connection.

Here is the log, this entire section repeats over and over again.

2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] Message
[MessageImpl{commandCode=257, flags=128}] received to peer [aaa://
1.1.1.1:26897]
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] Origin-Host in new
received message is [switch-001.somedomain.com]
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] Origin-Realm in new
received message is [switch-001.somedomain.com]
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] Checking against
entry in predefinedPeerTable with URI [aaa://127.0.0.1:21812]
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] 127.0.0.1 !=
switch-001.somedomain.com
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] Peer is still null
[switch-001.somedomain.com]
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-001.somedomain.com:26886]
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] URI
[switch-001.somedomain.com] == [switch-001.somedomain.com] but peer
connection is not open
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-001.somedomain.com:26890]
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] URI
[switch-001.somedomain.com] == [switch-001.somedomain.com] but peer
connection is not open
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-003.somedomain.com:46299]
2012-01-25 20:23:15,006 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-003.somedomain.com:33070]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-000.somedomain.com:26871]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-000.somedomain.com:26879]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-000.somedomain.com:26881]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-001.somedomain.com:26888]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] URI
[switch-001.somedomain.com] == [switch-001.somedomain.com] but peer
connection is not open
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-000.somedomain.com:26873]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-000.somedomain.com:26892]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-003.somedomain.com:37022]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-003.somedomain.com:50231]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-003.somedomain.com:50503]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-001.somedomain.com:26878]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] URI
[switch-001.somedomain.com] == [switch-001.somedomain.com] but peer
connection is not open
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-003.somedomain.com:43206]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-003.somedomain.com:41421]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-000.somedomain.com:26887]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-001.somedomain.com:26872]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] URI
[switch-001.somedomain.com] == [switch-001.somedomain.com] but peer
connection is not open
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-001.somedomain.com:26876]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] URI
[switch-001.somedomain.com] == [switch-001.somedomain.com] but peer
connection is not open
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-003.somedomain.com:36710]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-000.somedomain.com:26889]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-001.somedomain.com:26882]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] URI
[switch-001.somedomain.com] == [switch-001.somedomain.com] but peer
connection is not open
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-003.somedomain.com:44800]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-000.somedomain.com:26885]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-000.somedomain.com:26883]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-001.somedomain.com:26874]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] URI
[switch-001.somedomain.com] == [switch-001.somedomain.com] but peer
connection is not open
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-003.somedomain.com:48775]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-000.somedomain.com:26877]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] PeerTable contains an
URI of [aaa://switch-001.somedomain.com:26891]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] URI
[switch-001.somedomain.com] == [switch-001.somedomain.com] and peer
connection is open
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.server.impl.MutablePeerTableImpl] Add
[org.jdiameter.client.impl.transport.tcp.TCPClientConnection@3f437b82]
connection to peer [SPeer{Uri=aaa://switch-001.somedomain.com:26891;
State=OKAY;
con=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@5429647c;
incCon{aaa://
1.1.1.1:26897=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@3f437b82} }]
2012-01-25 20:23:15,007 DEBUG [org.jdiameter.server.impl.PeerImpl]
Releasing connection [aaa://1.1.1.1:26897]
2012-01-25 20:23:15,007 DEBUG
[org.jdiameter.client.impl.transport.tcp.TCPTransportClient] Stopping
transport. Socket is [Socket[addr=/1.1.1.1,port=26897,localport=3868]]
2012-01-25 20:23:15,008 DEBUG
[org.jdiameter.client.impl.transport.tcp.TCPTransportClient] Transport
exception
java.nio.channels.AsynchronousCloseException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:
185)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:270)
at
org.jdiameter.client.impl.transport.tcp.TCPTransportClient.run(TCPTransportClient.java:
146)
at java.lang.Thread.run(Thread.java:662)



Jason Dopson

unread,
Jan 26, 2012, 3:12:52 PM1/26/12
to mobicents-public
More investigation has lead to some interesting details. After the
failover depicted above the TCP stack still shows the old connections
as ESTABLISHED (verified through a netstat). This may end up being an
issue with the switch or a firewall not tearing down the connections
properly. However, I think the application should be robust enough to
determine that those connections are no longer valid (since data can
not flow) and tear them down from the MUX side.

Jason Dopson

unread,
Jan 27, 2012, 1:00:50 PM1/27/12
to mobicents-public
Not sure why I didn't think of this earlier but shouldn't the MUX send
DWR's to its peers? I don't see that happening here.
> > connection is open...
>
> read more »

Jason Dopson

unread,
Jan 27, 2012, 1:18:58 PM1/27/12
to mobicents-public
After further reading and investigation, it appears DWR's are sent
only if all connections from a peer go idle. This makes sense after
reading the description in RFC3588. However, there must be a way to
timeout individual connections.
> > > [org.jdiameter.server.impl.MutablePeerTableImpl]...
>
> read more »

Bartosz Baranowski

unread,
Jan 30, 2012, 5:22:15 AM1/30/12
to mobicent...@googlegroups.com
Hi Jason

Nice investigation :)
Is there any way we can get a test case which reproduces this problem ?
Bartosz Baranowski
JBoss R & D
==================================
Word of criticism meant to improve is always step forward.

Jason Dopson

unread,
Jan 31, 2012, 3:54:31 PM1/31/12
to mobicents-public
I'm not sure how you could replicate it. I'm thinking it may have
something to do with the device being behind a NAT. I'm going to
attempt to put my CCF on the same network as the device to see if that
eliminates the issue. I'll keep you posted.
> ...
>
> read more »

Jason Dopson

unread,
Feb 6, 2012, 6:07:46 PM2/6/12
to mobicents-public
The issue doesn't seem to be NAT related. It happens when a Diameter
connected device doesn't close its connections gracefully (due to a
crash). To replicate:

1) Establish a Diameter feed from a device.
2) To simulate a hard disconnect, temporarily place a firewall
(iptables in my case) on the diameter server for port 3868.
3) Reset the connections on the far end device manually so it will
attempt to reconnect (via CER).
4) Remove the firewall (flush iptables) and let the far end device
attempt to reestablish the feed.

This all needs to be done fairly quickly so that the stack doesn't
start to send DWR's/DWA's (less than 30 seconds I believe). What
happens is the old diameter connection never gets torn down (because a
TCP FIN was never sent). The Diameter stack starts receiving CER's
over a new connection and then the Async Close errors (detailed above)
are generated. The only way to fix this is to restart the stack
within JSLEE (which forces the connections to close).

Ultimately what *should* happen is once the stack has an error with
any connection, it should send out a Diameter Peer Disconnect to close
it out.

-- JD

On Jan 31, 3:54 pm, Jason Dopson <jdop...@gmail.com> wrote:
> I'm not sure how you could replicate it.  I'm thinking it may have
> something to do with the device being behind a NAT.  I'm going to
> attempt to put my CCF on the same network as the device to see if that
> eliminates the issue.  I'll keep you posted.
>
> On Jan 30, 5:22 am, Bartosz Baranowski <baran...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Hi Jason
>
> > Nice investigation :)
> > Is there any way we can get a test case which reproduces this problem ?
> > Bartosz Baranowski
> > JBoss R & D
> > ==================================
> > Word of criticism meant to improve is always step forward.
>
> > On Fri, Jan 27, 2012 at 7:18 PM, Jason Dopson <jdop...@gmail.com> wrote:
> > > After further reading and investigation, it appears DWR's are sent
> > > only if all connections from apeergo idle.  This makes sense after
> > > reading the description in RFC3588.  However, there must be a way to
> > > timeout individual connections.
>
> > > On Jan 27, 1:00 pm, Jason Dopson <jdop...@gmail.com> wrote:
> > > > Not sure why I didn't think of this earlier but shouldn't the MUX send
> > > > DWR's to its peers?  I don't see that happening here.
>
> > > > On Jan 26, 3:12 pm, Jason Dopson <jdop...@gmail.com> wrote:
>
> > > > > More investigation has lead to some interesting details.  After the
> > > > > failover depicted above the TCP stack still shows the old connections
> > > > > as ESTABLISHED (verified through a netstat).  This may end up being an
> > > > > issue with the switch or a firewall not tearing down the connections
> > > > > properly.  However, I think the application should be robust enough to
> > > > > determine that those connections are no longer valid (since data can
> > > > > not flow) and tear them down from the MUX side.
>
> > > > > On Jan 26, 12:51 pm, Jason Dopson <jdop...@gmail.com> wrote:
>
> > > > > > I have an issue where connections aren't being closed properly
> > > > > > (because the far end switch disconnects) and then the MUX is throwing
> > > > > > an exception (java.nio.channels.AsynchronousCloseException).  The
> > > > > > switch in question is sending me a Diameter Rf feed.  The switch has
> > > > > > two redundant blades with one floating IP address.  The scenario is
> > > > > > that the switch fails over operations from one blade to another and
> > > > > > then tries to re-establish the Rf feed.  The MUX gets a new CER for
> > > > > > the samepeerand tries to route the CEA over one of the existing
> > > open
> > > > > > connections (which is now closed) and then runs into the above
> > > > > > exception.  It then proceeds to loop over and over in this condition
> > > > > > for an extended period of time (hours).  I noticed an exp_time config
> > > > > > param on jdiameter-config.xml but its not very clear from the
> > > > > > documentation what this value is (secs, mins, hours?) and what
> > > > > > triggers the expiration.
>
> > > > > > To clarify, let me depict the scenario.
>
> > > > > > 1) Switch (switch-001.somedomain.com) is operating on its first
> > > > > > redundant blade and opens connections to the MUX.  All connections
> > > are
> > > > > > opened and added to thepeertable.  ACR/ACA's are flowing properly.
> > > > > > switch-001.somedomain.com (1.1.1.1:1) - Valid, open
> > > > > > switch-001.somedomain.com (1.1.1.1:2) - Valid, open
> > > > > > switch-001.somedomain.com (1.1.1.1:3) - Valid, open
> > > > > > switch-001.somedomain.com (1.1.1.1:4) - Valid, open
>
> > > > > > 2) Switch (switch-001.somedomain.com) fails over operations to a
> > > > > > second redundant blade.  All existing connections are no longer
> > > > > > valid.  It's assumed that the switch did not try to gracefully close
> > > > > > the connections.  Switch (switch-001.somdomain.com) now tries to
> > > open
> > > > > > a new connection (5) via a CER.  However, CEA is attempted to route
> > > > > > back over connection (1) which is no longer valid.
> > > > > > switch-001.somedomain.com (1.1.1.1:1) - Invalid, open, CEA is sent
> > > > > > over this connection
> > > > > > switch-001.somedomain.com (1.1.1.1:2) - Invalid, open
> > > > > > switch-001.somedomain.com (1.1.1.1:3) - Invalid, open
> > > > > > switch-001.somedomain.com (1.1.1.1:4) - Invalid, open
> > > > > > switch-001.somedomain.com (1.1.1.1:5) - New CER, CEA is then
> > > attempted
> > > > > > to be sent across connection (1) instead of this connection.
>
> > > > > > Here is the log, this entire section repeats over and over again.
>
> > > > > > 2012-01-25 20:23:15,006 DEBUG
> > > > > > [org.jdiameter.server.impl.MutablePeerTableImpl] Message
> > > > > > [MessageImpl{commandCode=257, flags=128}] received topeer[aaa://
> > > > > > 1.1.1.1:26897]
> > > > > > 2012-01-25 20:23:15,006 DEBUG
> > > > > > [org.jdiameter.server.impl.MutablePeerTableImpl] Origin-Host in new
> > > > > > received message is [switch-001.somedomain.com]
> > > > > > 2012-01-25 20:23:15,006 DEBUG
> > > > > > [org.jdiameter.server.impl.MutablePeerTableImpl] Origin-Realm in new
> > > > > > received message is [switch-001.somedomain.com]
> > > > > > 2012-01-25 20:23:15,006 DEBUG
> > > > > > [org.jdiameter.server.impl.MutablePeerTableImpl] Checking against
> > > > > > entry in predefinedPeerTable with URI [aaa://127.0.0.1:21812]
> > > > > > 2012-01-25 20:23:15,006 DEBUG
> > > > > > [org.jdiameter.server.impl.MutablePeerTableImpl] 127.0.0.1 !=
> > > > > > switch-001.somedomain.com
> > > > > > 2012-01-25 20:23:15,006 DEBUG
> > > > > > [org.jdiameter.server.impl.MutablePeerTableImpl]Peeris still null
> > > > > > 2012-01-25 20:23:15,007 DEBUG...
>
> read more »

Jason Dopson

unread,
Feb 7, 2012, 10:57:12 AM2/7/12
to mobicents-public
FYI, I've opened issue 3090 to track this as it does appear to be a
bug.
> > > > > > > 2012-01-25 20:23:15,007 DEBUG...
>
> read more »
Reply all
Reply to author
Forward
0 new messages