Associations systematicly rejected in the DICOM reciever

3,345 views
Skip to first unread message

pierre....@serphydose.com

unread,
Jan 18, 2013, 6:33:25 AM1/18/13
to dcm...@googlegroups.com
Hi all,

we are using the DCM4CHE toolkit (version 2.0.25) for receiving images instances (dcmrcv tool).
We are not really sure about what really happened, but after several days, the receiver rejected all associations, with the following reason :
INFO: 2013-01-15 10:15:35 - Association - Association(2455) accepted Socket[addr=/172.16.11.2,port=59358,localport=6667]
INFO: 2013-01-15 10:15:35 - Association - Association(2455): A-ASSOCIATE-RQ CANON_CCR >> DW_RAW_SCP
INFO: 2013-01-15 10:15:35 - PDUEncoder - CANON_CCR(2455) << A-ASSOCIATE-RJ[result=2, source=2, reason=1]: transient no-reason-given
INFO: 2013-01-15 10:15:35 - Association - CANON_CCR(2455): close Socket[addr=/172.16.11.2,port=59358,localport=6667]

In the logs there was a warning when this issue started :
WARN: 2013-01-15 10:12:25 - Association - i/o exception in State Sta1

I don't know if it will be useful, but the only thing we have seen in the Java logs, was the following error :
2013-01-15 10:12:25 - SEVERE: java.net.SocketException: Connection reset
     at java.net.SocketInputStream.read(SocketInputStream.java:168)
     at org.dcm4che2.net.PDUDecoder.readFully(PDUDecoder.java:101)
     at org.dcm4che2.net.PDUDecoder.nextPDU(PDUDecoder.java:220)
     at org.dcm4che2.net.PDUDecoder.nextPDV(PDUDecoder.java:584)
     at org.dcm4che2.net.PDUDecoder.isEOF(PDUDecoder.java:632)
     at org.dcm4che2.net.PDUDecoder.copyTo(PDUDecoder.java:721)

After restarting the receiver, everything was working correctly again.

We tried to understand what kind of event could cause the receiver to reject all messages with this reason, but we didn't see the logic.
What could be the root cause of this issue?

Thank you!


Mark Messer

unread,
Jan 18, 2013, 5:02:09 PM1/18/13
to dcm...@googlegroups.com
The problem appears to be an inability to recover from a bad TCP/IP read. This is a low level I/O bug in dcmrcv.


DICOM is a transport protocol that follows the OSI seven layer model. DICOM runs on top of TCP/IP. The lower levels of DICOM are described in part 8 of the DICOM standard. The current version is here: http://medical.nema.org/Dicom/2011/11_08pu.pdf. Reading it isn't always easy, but it can help you decipher the logs. Start with
    Figure 1-1, ISO OSI BASIC REFERENCE MODEL
    3.7 DICOM COMMUNICATION SUPPORT DEFINITIONS
    Figure 6.1, DICOM NETWORK PROTOCOL ARCHITECTURE.


INFO: 2013-01-15 10:15:35 - Association - Association(2455) accepted Socket[addr=/172.16.11.2,port=59358,localport=6667]
INFO: 2013-01-15 10:15:35 - Association - Association(2455): A-ASSOCIATE-RQ CANON_CCR >> DW_RAW_SCP
INFO: 2013-01-15 10:15:35 - PDUEncoder - CANON_CCR(2455) << A-ASSOCIATE-RJ[result=2, source=2, reason=1]: transient no-reason-given
INFO: 2013-01-15 10:15:35 - Association - CANON_CCR(2455): close Socket[addr=/172.16.11.2,port=59358,localport=6667]

The take-away is that TCP/IP is working, but that dcm4cv rejected the DICOM association before even beginning negotiation. The reason is an unknown transient problem.

In more detail, these lines mean
    A TCP/IP connection was successfully established.
    CANON_CCR requested a DICOM association from DW_RAW_SCP (dcmrcv).
    CANON_CCR received a response that dcmrcv rejected the DICOM association attempt.
    CANON_CCR closed the TCP/IP connection.

A-ASSOCIATE-RJ means the association was rejected before it opened.
Services related to establishing a DICOM association are listed in PS 3.8-2009, 7 OSI upper layer service for DICOM application entities, Table 7-1 UPPER LAYER SERVICES
    A-ASSOCIATE creates a new association.
    A-RELEASE, A-ABORT, or A-P-ABORT terminates an existing association. These were never used.
    P-DATA helps negotiate the data type of format to be used in the association.

The standard describes the parameters of the A-ASSOCIATE message starting at PS 3.8-2011, 7.1.1.7 Result, p16.  

The source identifies the creator of the result and reason. source=2 means UL service-provider (ACSE related function), or the Upper Layer of dcmrcv (Association Control Service Element related function).

The result is rejected (transient), not rejected (permanent). rejected (transient) means more or less "Try again later." rejected (permanent) would be appropriate for an error like an unknown AE Title, where another attempt would not be expected to succeed.

The diagnostic (or reason) is no-reason-given. This just means that the reason is not one of the other reasons listed in PS 3.8-2011, 7.1.1.9 Diagnostic.



The java log is useful.

A PDU is a Protocol Data Unit, or low level DICOM data packet.
A PDV is a Presentation Data Value. A PDV contains a data type and data format. PDVs are used by the P-DATA service to negotiate the data that can be transferred over a DICOM association.

So it appears that the trouble started while trying to negotiate an association. dcmrcv tried to read a PDU from the socket. This failed because of an I/O error. The socket was reset.



The socket itself recovered. When next opened, it transmitted data normally. But the PDUDecoder in dcmrcv seems to have been left in an abnormal state. It could not recover and process any more DICOM data packets.

This type of intermittent bug can be hard to find and test. Let us know if it happens again.

pierre....@serphydose.com

unread,
Feb 5, 2013, 10:51:24 AM2/5/13
to dcm...@googlegroups.com
Hi,

thank you very much for your answer!
This was very helpful to drive the investigation to try to understand the root cause of the issue.

This issue described below happened again, so we monitored the socket opened on the server to verify the TCP connections.
It seems that we may have an issue with the dimseRspTimeout of Dcmrcv, that was set to 0. In that case, when a TCP connection is broken (can happen often when a modality is connected with Wifi), then the TCP connection between the client and the server is not properly deleted and after a while the system is not able to create any new TCP connection, and we have the "transient-no-reason" error message.
Is that possible? Do we have this behaviour in dcmrcv if too much TCP connections are created?

We have set this timeout to 10000, do you think this could help? Is there another timeout we should verify?

Thanks again for this very complete answer that was really helpful and complete!

Pierre
Message has been deleted

Docjay

unread,
Mar 28, 2013, 12:04:26 PM3/28/13
to dcm...@googlegroups.com
Hi,

   This has been happening to me lately using DCMSND from the dcm4che-2.0.27 toolkit.  I have been sending images to our dcm4chee archive v2.17.2.  It just happens randomly and I have set the option '-reaper 30000' in hoping this goes away.  I have many many folder/subfolders to go through with millions of images to send.

-thanks
Reply all
Reply to author
Forward
0 new messages