Status: New
Owner: ----
Labels: Type-Defect Priority-Medium Difficulty-Medium
New issue 113 by
d.j.hun...@gmail.com: read_file() returns incomplete
dataset for DICOM file with nested private sequences
http://code.google.com/p/pydicom/issues/detail?id=113
I have a DICOM file that contains a couple of private sequences of
undefined length, which themselves contain undefined length sequences
nested within them. The transfer syntax is implicit VR. When I attempt to
read this file, many of the data elements (including the pixel data) are
missing from returned dataset.
Looking at the code, the problem appears to originate when
data_element_generator() reaches a private sequence whose VR is unknown.
Under this condition, the sequence is treated as binary data of undefined
length and read_undefined_length_value() is called, which parses the file
until a sequence delimiter tag is reached. However, in the case of nested
sequences, the next sequence delimiter to be reached corresponds to the end
of the first nested sequence rather than that of the parent sequence.
As such, the parent sequence is only partially read, and the rest of the
sequence is parsed as if it is the top-level dataset. When the parent’s
actual sequence delimiter is reached it is detected by read_dataset(),
ultimately causing read_file() to terminate early.
As a workaround, I’ve modified data_element_generator() to check all data
elements of undefined length to see if they are sequences (based on the
assumption that the first four bytes of an SQ data element value will be
always be an Item Tag):
--- a/src/oxmorf/dicom/filereader.py
+++ b/src/oxmorf/dicom/filereader.py
@@ -247,7 +247,13 @@
VR = dictionaryVR(tag)
except KeyError:
pass
- if VR == 'SQ':
+
+ bytes = fp_read(4)
+ fp.seek(fp_tell()-4)
+ possible_group, possible_elem = unpack(endian_chr+"HH", bytes)
+ possible_item_tag = TupleTag((possible_group, possible_elem))
+
+ if (VR == 'SQ') or (possible_item_tag == ItemTag):
if debugging:
logger_debug("%04x: Reading and parsing undefined
length sequence"
% fp_tell())
The hope is that this should prevent any sequences being read as binary
data (it seems to work ok so far, although I've not properly tested it).
If needed, I should shortly be able to provide the DICOM file in question.
Many thanks,
David