rejected files for DicomStorageService

177 views
Skip to first unread message

George Kowalski

unread,
Mar 3, 2021, 11:40:22 AM3/3/21
to RSNA MIRC CTP/TFS User Group
Hello All, 

I'm setting up a very simple pipeline,: read in files from one dir, anonymise and output to a final directory. 

here is the simple config.xml : 

<Configuration>
    <Server
        maxThreads="20"
        port="8282"/>
    <Pipeline name="DICOM De-identification to Local Storage">
        <DirectoryImportService
            class="org.rsna.ctp.stdstages.DirectoryImportService"
            import="roots/DirectoryImportService/import"
            interval="100"
            name="DirectoryImportService"
            quarantine="quarantines/DirectoryImportService"
            root="roots/DirectoryImportService"/>
        <IDMap
            class="org.rsna.ctp.stdstages.IDMap"
            name="IDMap"
            root="roots/IDMap"/>
        <DicomAnonymizer
            class="org.rsna.ctp.stdstages.DicomAnonymizer"
            lookupTable="scripts/LookupTable.properties"
            name="DicomAnonymizer"
            quarantine="quarantines/DicomAnonymizer"
            root="roots/DicomAnonymizer"
            script="scripts/DicomAnonymizer.script"/>
        <DirectoryStorageService
            acceptDuplicates="yes"
            acceptFileObjects="no"
            acceptXmlObjects="no"
            acceptZipObjects="no"
            class="org.rsna.ctp.stdstages.DirectoryStorageService"
            defaultString=""
            logDuplicates="yes"
            name="DirectoryStorageService"
            quarantine="quarantines/DirectoryStorageService"
            root="roots/DirectoryStorageService"
            setStandardExtensions="yes"
            structure="{PatientID}/{AccessionNumber}"
            whitespaceReplacement="_"/>
    </Pipeline>
</Configuration>

It does process some of the incoming files , but most are not processed . I know this if I only select the acceptDicomObjects setting in the CTP Launcher , then most just go away and don't enter the quarantines/DirectoryStorageService dir. What is used to determine if a file is DICOM or not. If I enable the XML Files and ZIP options they come thru ( not getting stored via the "Structure" attribute  so I know they are "bad dicoms" . 

I then pull these into horos and view them to see they are all   good DICOM files. 

I normally run this on windows but  this is the first time on a x86 mac with 64 bit oracle jdk 

/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home   

I assumed this would not be an issue as I'm not planning to run de-compression of the images at any point. 

Thanks

G
        

John Perry

unread,
Mar 3, 2021, 7:26:10 PM3/3/21
to rsnas-ctpmir...@googlegroups.com
George:
 
The DirectoryImportService adds all the files in its import directory into its queue as they appear.
 
When the Pipeline is ready to process the next file, it asks the import service for a file, and the DirectoryImportService supplies it, if one is available. To be more precise, the import service gets a file from its queue and parses it. It first tries to parse it as a DicomObject. If that fails, it tries to parse it as an XmlObject. If that fails, it tries to parse it as a ZipObject. If all else fails, it instantiates it as a FileObject, which is the parent class of DicomObject, XmlObject, and ZipObject. Once it has instantiated an object for the file, it returns that object to the Pipeline for processing.
 
The object that flows down the pipeline is the object the Pipeline got from the import service.
 
Certain pipeline stages only accept certain object types. They pass other object types on to the next stage without doing anything to or with them. An example is the DicomAnonymizer stage.
 
If a stage can handle multiple object types, it provides accept... attributes to allow you to skip objects of certain types. The default in these kinds of stages is to accept all object types. Any objects skipped are just passed on to the next stage.
 
Many stages also support filter scripts that allow you to decide whether to process an object based on the contents of the object itself. If no script is supplied, the default script accepts the object. Again, any objects skipped are just passed on to the next stage.
 
If files are being quarantined in your pipeline, it might be the DicomAnonymizer that is doing it, although if you are using the default script, I don't see how, because the default script doesn't call the @quarantine() function.

We can see which stage is quarantining the object by putting ObjectLogger stages before and after the DicomAnonymizer stage.
 
The DirectoryStorageService might be quaranting an object if it doesn't contain an AccessionNumber element. In that case, it would try to use the empty string for the PatientID/AccessionNumber directory path, and maybe that will cause the system to fail to store the object. If you look at one of the files in the quarantine, see if this might be the case. The QuarantineServlet has nice features that show you this kind of information. If this is the problem, you can change the defaultString attribute to "null" and see if that fixes it.
 
I hope this helps... JP
--
You received this message because you are subscribed to the Google Groups "RSNA MIRC CTP/TFS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rsnas-ctpmirc-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rsnas-ctpmirc-user-group/23e46587-0eff-41e4-a4d9-a36f042fbcebn%40googlegroups.com.

George Kowalski

unread,
Mar 4, 2021, 10:22:26 AM3/4/21
to RSNA MIRC CTP/TFS User Group
well it got me closer. Using the Quarantine servlet and setting DirectlryImportService to not accept anything than Dicom Objects did most everything into the ~/DirectoryImportService folder . 

I then looked at these files using the command line tool dcm2xml and grepping for PatientID, AccessionNumber , SeriesInstanceUID, StudyInstanceUID, . They all have them.  Interestingly the only issue I have when doing this is the following warnings on each of these files : 


W: DcmMetaInfo: Invalid Element (0008,0005) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0008) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0016) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0018) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0020) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0021) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0022) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0023) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0030) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0031) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0032) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0033) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0050) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0060) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0070) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0080) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,0090) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,1010) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,1030) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,103e) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,1060) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,1090) found in Meta Information Header
W: DcmMetaInfo: Invalid Element (0008,1140) found in Meta Information Header

I looked for those elements because the Quaretine Servlet  showed nothing : 

quarantine.png

John Perry

unread,
Mar 4, 2021, 1:19:50 PM3/4/21
to rsnas-ctpmir...@googlegroups.com
George:
 
Interesting. The files are being quarantined by the DirectoryImportService.
 
Can you send me one of the quarantined files?
 
JP
quarantine.png

George Kowalski

unread,
Mar 5, 2021, 8:54:51 AM3/5/21
to RSNA MIRC CTP/TFS User Group
The images I used were 10 year old MRI images, If I use the most recent set from 2019 they proces as planned. Looking in to if I can release an image after deidentification. There has to be something in these old images. 

John Perry

unread,
Mar 5, 2021, 10:43:52 AM3/5/21
to rsnas-ctpmir...@googlegroups.com
George:

The messages you saw make me think that the quarantined objects are not DICOM part 10 files.
 
There is an easy way to see what's going on.
 
Download this program from the RSNA MIRC site:
 
 
Put it in a folder somewhere and run it. It will display a blank window. Drag one of the quarantined files onto the window and drop it. The program will display the binary of the file.
 
BinaryDump knows the formats of a number of file types (DICOM, WAVE, JPEG, PNG, etc.) Its parsers are very forgiving, so if a strict DICOM parser rejects a corrupt DICOM file, BinaryDump may be still able to make some sense of it. If it detects a file as a DICOM object, it will color code the binary.
 
Here is an example of the dump of a part 10 file:
 
image
 
A part 10 file has a preamble in the first 128 bytes (normally all zeroes), followed by 4 bytes containing the ASCII characters "DICM" starting at address 80 hex. If those things are missing, the object isn't a part 10 file.
 
If the parser can make any sense out of the object, it will show what it detects as the content type (or in the case of a DICOM object, the transfer syntax) in the footer:
 
image
 
It will also provide a DICOM menu with a number of items, including List Elements. Selecting that item shows an attached frame like this:
 
image
 
If you click any line in the element listing, it jumps the binary listing to the address containing the element.
 
We should see a preamble, the DICM identifier, and then a series of elements in group 0002 (which constitute the file metadata that was referenced in the error messages you saw). The elements starting in group 0008 are not file metadata; they are the data of the object itself.
 
I haven't seen this lately, but in the distant past I encountered a lot of images that omitted the preamble and started with the identifier or just group 0008 itself.
 
It will be interesting to see what your quarantined objects contain.
image[1].png
image[5].png
image[7].png

George Kowalski

unread,
Mar 19, 2021, 4:05:05 PM3/19/21
to RSNA MIRC CTP/TFS User Group
John , 

Thanks for the input , didn't see this program before. I compared the images that made it thru as DICOM to those quarantined , no difference really. Here is one that ended up in the ~quarantines/DirectryImportService/ 

Screen Shot 2021-03-19 at 2.56.56 PM.png


and at the bottom : 

Screen Shot 2021-03-19 at 2.59.18 PM.png


John Perry

unread,
Mar 19, 2021, 8:47:20 PM3/19/21
to rsnas-ctpmir...@googlegroups.com
George:
 
Are the files that fail the same length as the files that succeed?
 
Without knowing the values of the elements that define the size of the stored image, I can't say for sure, but a file length of 43818 sounds pretty small. For example, I would expect a 256x256 MR image to be larger than 131,072. I'm wondering if the file is truncated.
 
I'd like to see a full image file (jp3...@gmail.com), but if you can't do that, I'd at least like to see the listing of element values from (0028,0010) through (0028,0102).
Screen Shot 2021-03-19 at 2.56.56 PM.png
Screen Shot 2021-03-19 at 2.59.18 PM.png

George Kowalski

unread,
Mar 23, 2021, 9:52:34 AM3/23/21
to RSNA MIRC CTP/TFS User Group
Thanks , I've emailed the metadata in question for this image. 

To unsubscribe from this group and stop receiving emails from it, send an email to rsnas-ctpmirc-user-group+unsub...@googlegroups.com.

George Kowalski

unread,
Mar 23, 2021, 10:31:53 AM3/23/21
to RSNA MIRC CTP/TFS User Group
To answer your other question , no they are not all the same length, nost are 44KB , but others git up to 5.7 MB 

Screen Shot 2021-03-23 at 9.30.00 AM.png

On Friday, March 19, 2021 at 7:47:20 PM UTC-5 John Perry wrote:
Reply all
Reply to author
Forward
0 new messages