Could someone shed any light on where the problem may lie ?
Note that there are two relevant WARNING messages and not ERRORS in the
application eventlog for this problem:
1. SNA DLC Link Service Eventid 226
Connection lost
Connection = ETHER1
Local MAC address = 407985012000
Local ring number = 0000
Remote MAC address = 4000CBA00400
Routing data =
Source SAP = 04
Destination SAP = 04
Connection data = N(S) N(R) PFRC LRNR FRMD CRCV CSND LKST
0046 0056 0000 0046 0128 0001 0001 0000
EXPLANATION
The specified connection was lost, either because of some problem with the
local NDIS adapter, a problem at the remote adapter, or a problem in the
network itself (for example, a bridge failure).
The connection data fields are as follows:
N(S) N(S) value of last transmitted frame
N(R) N(R) value of last transmitted frame PFRC Number of I-frames that
have not been acknowledged LRNR N(R) value of last received frame FRMD
Frame modulus (should be 128) CRCV Last command/response received CSND
Last command/response sent LKST Link state
Bit 6--local busy
Bit 4--remote busy
ACTION
If another error message signifies a local adapter fault, perform the
action recommended in that message. If no error message signifies a problem
with the local adapter, contact support personnel for your network.
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
2. SNA Server Eventid 23
Connection Failure
Connection = ETHER1
Link Service = SNADLC1
Outage Code = 00AF
EXPLANATION
A link service reported a connection failure to the node. See below for
more information on the outage code that caused the connection to fail.
TOKEN RING, ETHERNET OR FDDI
0029
The remote system is not responding to the Host Integration Server
computer's attempt to activate an SNA connection.
00AE
The connection has failed due to either (1) a connection timeout, due to
slow response from the remote system, or (2) the remote system has
deactivated the connection by sending Host Integration Server SNA Service a
DISC(connect) or DM frame.
If slow response from the remote system is suspected, then the Host
Integration Server SNA connection t1 and ti timers should be increased.
If you are connecting to an AS/400 and there are no active user sessions,
the AS/400 will drop the connection if the AS/400 APPC controller "Switched
Disconnect" setting is set to YES.
To isolate the problem, a Microsoft Network Monitor trace (or similar
utility) must be used to capture an occurrence of the failure.
00AF
The connection to the remote system has been lost. This could occur due to
a media error on your LAN, a failure of an intermediate bridge or router, or
if the remote system has stopped responding.
00AC
The Windows NT DLC driver has detected a DLC protocol error in a frame
received by the remote system and has sent a Frame Reject (FRMR) causing the
connection to end. Event 228 should be logged when this error occurs. See the
FRMR code in Event 228 for more information describing why the FRMR occurred.
00AD
The remote system has detected a DLC protocol error in a frame sent by
Windows NT, and has sent a Frame Reject (FRMR) causing the connection to end.
Event 227 should be logged when this error occurs.
SDLC
0011
Data Set Ready (DSR) failure. The remote system has dropped the line, or
the modem has disconnected.
If this problem is occurring while initiating a dial-up SDLC connection,
confirm that the switched PU on the host is not already in use, or that the
Host Integration Server SNA SDLC Local Node ID matches a valid switched PU
definition in VTAM.
0012
Clear to Send (CTS) failure. The modem or line has failed while the Host
Integration Server SNA Service attempted to send data.
This error may indicate a problem with the modem, a loose/bad cable, or a
line failure.
0014
Data Carrier Detect (DCD) failure. The modem or connection to the remote
system has failed.
Try replacing the SDLC cable and check the modem configuration settings. If
this doesn't solve the problem, an SDLC line trace should be captured to
determine the cause of the problem.
0024
Non-productive receive retry limit exceeded. The remote system is inactive
but has not broken the modem connection or that there is a very poor quality
line.
0025
Idle timeout retry exceeded. The SNA connection has not received an SDLC
response from the remote system, causing a connection timeout. Common causes
include: (1) wrong duplexing (2) wrong NRZI value, (3) a bad SDLC cable, (4)
line is too fast for adapter, (5) a modem configuration issue (6) a bad SDLC
adapter.
0029
The remote system is not accepting an SNA connection. If the SNA SDLC
adapter is connecting through a Frame Relay Access Device (FRAD), confirm
that the device is configured to pass XID messages.
002C
Invalid SDLC command received. The SDLC driver has reported that an invalid
SDLC command has been received. Either an SDLC I-frame was received out of
sequence, an unrecognized command was received, or the SDLC frame exceeded
the maximum frame size that the SDLC link service can support.
ACTION: Try the following suggestions: (1) configure the SNA SDLC
connection for "half duplex" instead of "full duplex", (2) configure the SNA
SDLC connection for XID Type 3, and set the Max BTU length to 265, (3)
contact your SDLC adapter vendor and confirm that your SDLC adapter/driver
can operate properly in thee machine you are using.
002D
Abnormal response. The SDLC adapter has received an unexpected response
from the modem, causing the connection to fail.
ACTION: Try the following: (1) Confirm that the modem is configured
properly, (2) determine if the SDLC link speed is faster than your SDLC
adapter can support (for example, the IBM SDLC adapter does not support
56Kbps), (3) confirm that the SNA SDLC link service is configured properly as
a leased, multidrop or switched link, and try disabling "constant request to
send" if this has been selected, (4) try configuring the SNA SDLC connection
for "half duplex" instead of "full duplex".
002E
Write time-out retry exceeded. This error can indicate an SDLC link service
or configuration problem in Host Integration Server SNA Manager, or a problem
with the modem or SDLC line.
ACTION: See ACTION for 2D.
0015
Connection terminated by host. The remote system has sent a DISC(connect)
to the Host Integration Server connection causing the connection to end.
This could occur if VTAM (or host operator) has brought down the line or PU,
possibly after receiving an unexpected SDLC frame or being notified of a
problem with the SDLC connection. This error can occur due to a transient
problem with the SDLC line or an SDLC configuration mismatch with the host.
ACTION: See ACTION for 2C.
0080
A Disconnect Mode (DM) was received from the remote system. The remote
system returned to initialization state.
Action: Contact your host administrator.
0088
A Request Initialization Mode (RIM) was received from the remote system.
The remote system returned to initialization state.
Action: Contact your host administrator.
0081
Disconnect retry limit. The remote end did not reply during a
DISC(connect) attempt. Host Integration Server SNA Service tried to
disconnect with the remote system but the remote system did not reply.
Action: Contact your host administrator.
0082
Contact retry limit. The remote end did not reply to a connection
activation attempt. Host Integration Server SNA Service tried to contact the
remote system but the remote system did not reply to this attempt.
ACTION: Confirm that the Host Integration Server SNA connection is
configured with the correct encoding (NRZ or NRZI) to match the host. Try
configuring the Host Integration Server SNA connection to use "half duplex"
instead of "full duplex". If this doesn't solve the problem, confirm the
cable and modem configuration, and that the remote system is ready to accept
a connection.
0084
Host Integration Server SNA Service is not receiving Receiver Ready (RR)
poll responses from the remote system, causing the connection to timeout.
While the remote system may not be properly responding to polls, this problem
has also been observed when a SDLC link speed that is faster than the SDLC
adapter can support (for example, attempting to use an IBM SDLC adapter at
56Kbps). If connecting to an AS/400 over SDLC, this problem has been observed
if the APPC device associated with the Host Integration Server SNA Manager
Local APPC LU name has not been manually created on the AS/400.
0085
The remote system has become busy. The Host Integration Server SNA link
has received too many Receiver Not Ready (RNR) messages from the remote
system and has assumed that the remote system will not recover from this busy
state.
Action: Determine the SDLC connection status of the remote system.
GENERAL
0016
The Link Service has stopped.
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
Hi Peter
The Events in the Log describe the problem fairly accurately: The
specified connection was lost, either because of some problem with the
local NDIS adapter, a problem at the remote adapter, or a problem in the
network itself (for example, a bridge failure).
You will probably need to do some further troubleshooting, to determine
exactly what problem caused the connection to be lost.
When you migrated the server to 2009, was it on the same physical
hardware? Or is the server a new machine? Was the Operating System also
upgraded? Were there any changes to the network infrastructure at the
same time? Are there any NIC or network related errors in the Event Log
as well?
For this kind of problem, a network trace using a regular network
sniffer, such as Microsoft NetMon or Wireshark, will show you if the DLC
frames are being correctly sent from the network card. If DLC traffic is
leaving the HIS server for the mainframe correctly, then the problem is
probably further out in the network.
Hope this helps,
Andrew
--
amclar at optusnet dot com dot au
"Andrew McLaren" wrote:
> .
>
Odds are that frames are being dropped/lost in one direction or the other.
This is almost always the reason.
Since you changed to a new server, it could be that since there is a new MAC
address for the HIS Server, something in the network path may have been
setup for the old MAC address.
Thanks...
--
Stephen Jackson
Microsoft® HIS Support
Please do not send e-mail directly to this alias. This alias is for
newsgroup purposes only. This posting is provided "AS IS"
with no warranties, and confers no rights.
"Peter Lee" <Pete...@discussions.microsoft.com> wrote in message
news:BFACB2EE-7562-4425...@microsoft.com...
I used the same MAC address (LAA) from the old server and used it on the new
server. I also removed the LAA from the old server.
There is no simple answer which solves this problem. The DLC connection
between HIS and the mainframe is failing. You will need to do some
troubleshooting to find out where the failure is occurring.
Usually a DLC link is established like this:
HIS Host
TEST -->
<-- TEST
XID -->
<-- XID
SNRM -->
<-- UA
RNR -->
<--RR
I frame -->
<-- I frame
( etc )
DISC --> or <--DISC
You may need to get a trace of the connection failing (ie, the event 23)
to see what is dropping the connection, and why.
It is a "well-known problem", in the sense that the events show you have
a networking problem. This specific error is not usually caused by a bug
or misconfiguration in HIS.
Hope it helps,
Andrew
-
amclar at optusnet dot com dot au
First thing that springs to mind is nic negotiation issues - i.e. check the
link speed / duplex on the server nic and the switch are compatible. e.g.
100/FULL both sides, or auto-auto both sides.
If that's ok, then check the path between the host nic and mainframe nic
across every switch/router it goes along and see if any of those interfaces are
going down/up at the same time your issue is occurring?
If it is a new server is it plugged into the same switch port as the old
server?
Neil Pike. Protech Computing Ltd
http://www.linkedin.com/in/neilpike
Also, is this a physical server or a virtual under Vmware? If under Vmware
check the patch level as there were some bugs with sna/dlc comms a while back.
Have you verified by looking at the mac/arp table on the switch/router that the
LAA address is taking effect? Some drivers have historically not set them
correctly. Not likely with a newer Windows 2008 one you'd think, but worth
checking.
There are 2 MAC addresses in the capture, the server 40:79:85:01:20:00 and
the remote SNA device 02:00:D3:05:20:00
Frame 131 marks the end of some transactions and then communications goes
idle. Every 5 sec during idle periods the server sends a RR frame with the
Poll (P) bit set and the remote end responds with an RR with the Final (F)
bit set. The P/F exchange occurs in frames 132/133, 134/135, 136/137, 138/139
and 140/141. 5 sec later at frame 142 the server sends another Poll but does
not receive an answer so the server sends the Poll again - a total of 10
times at 0.4 sec intervals but gets no response. After 10 retries the server
gives up and marks that SNA connection as disconnected.
1 sec later Polls are received from the remote end at 1 sec intervals for a
total of 9 Polls. The server does not respond to the Polls as it has already
marked this connection as disconnected.
The remote end then sends a DISC.
Server then starts sending TEST frames and the connection is eventually
re-established.
Looks like either the Polls sent by the server did not get to the remote end
or the Final response sent by the remote end did not get to the server. A
capture is needed at the remote end to determine this.
Still awaiting the second capture at the remote end.
I know the problem is not related to HIS misconfiguration as other HIS sever
migrations had not got the same problem.
"Andrew McLaren" wrote:
> .
>
The new server is a physical server.
"Neil Pike" wrote:
> .
>
The old server Windows 2003 with HIS 2000 was replaced by Windows 2008 with
HIS 2009.
"Neil Pike" wrote:
> .
>
The server nic is set to 100 Mb Full. The switch should also set to the same
100 Mb Full according to the Comms supplier standard.
The new server is plugged into a new switch port. The old server is still
using the old switch port and it hasn't been decommissioned yet. We have
tried to use the old switch port on the first Monday after the cut over, the
drop out also occurred on the old switch port.
"Neil Pike" wrote:
> .
>
Note that by default the switch ports unless manually configured are set to auto-
auto. If the switch is auto-auto and you are 100Mbit/full then regular
renegotiations will occur during which all packets will be lost. Between those re-
negotiations the link will work but it will be at 10/half and so you will
lose/retransmit packets if there are any collisions between send and receive.
Your symptoms do match the above!
Set the server nic to auto-auto.
I don't know how many switches/routers the packets need to go thru. My
company don't look after the client's network. But the packets are
encapsulated by a DLS/w router.
"Neil Pike" wrote:
> .
>
Thanks for the update. Yes, as Neil and yourself both remark: this looks
like something is getting held up in the network. I'm not sure why that
would happen at the same time as an HIS upgrade; perhaps it is just
coincidence. In general terms, upgrading HIS would not cause the problem
you are seeing.
Good luck,
Andrew
Not sure it will resolve the issue, but you could try increasing the 802.2
timeouts in the DLC connection properties under 802.2 DLC tab.
The following KB article describes the 802.2 DLC connection timers:
129786 SNA Server and 802.2 connection timers (t1, t2, ti)
http://support.microsoft.com/default.aspx?scid=kb;EN-US;129786
Since it looks like the Response (t1) timer is coming into play (it is 0.4
seconds by default), you could try increasing this timer to see if the
longer delay between the retries allows the responses to arrive before the
connection drops.
Thanks...
--
Stephen Jackson
Microsoft® HIS Support
Please do not send e-mail directly to this alias. This alias is for
newsgroup purposes only. This posting is provided "AS IS"
with no warranties, and confers no rights.
"Peter Lee" <Pete...@discussions.microsoft.com> wrote in message
news:4096D1E4-B819-45C3...@microsoft.com...
Neil