At the breast tomosynthesis forum at the SIIM meeting last June, there was a discussion comparing the very large official BTO objects (~ 500 Meg) with the very much smaller Hologic secondary capture objects (SCOs ~30 Meg), which contain the pixel data in private data elements. As a result, I had a look to see if I could determine the encoding of those objects by comparing them with the decompressed versions (see
http://www.dclunie.com/pixelmedimagearchive/upmcdigitalmammotomocollection/index.html for examples).
I rapidly concluded:
1) There are no JPEG markers or other clues to the format
2) There are features of both Huffman and run-length encoding
3) Both the Huffman and run-length encoding show adaptive behaviour
4) The intervals between successive decoded pixel values were mostly multiple of 9 (with a few 8s and 10s, so not all final pixel values were multiple of 9)
This looked too complex to decipher, so I gave up :-(
However, last month I wrote my own JPEG-LS codec, and recognised the above pattern in images compressed using the lossy ("near lossless") mode, with a maximum permitted error (NEAR) of 4 (2 x NEAR + 1 = 9). I therefore revisited the Hologic examples, and it was immediately apparent that they are actually encoded using JPEG-LS, but without any of the expected JPEG markers. It was only a small amount of work to identify the other features required for full pixel data decoding, so I include below a recipe for re-writing such images into a form in which the pixel data for all frames may be accessed by any DICOM application which supports JPEG-LS without needing to expand to fully decompressed form. I have not attempted to re-create all the non-pixel data required to make valid BTO objects - I'll leave that to others!
Basic description of the private data
a) There are 2 versions of the set of frames - they appear to me to be essentially the same images, but with one of them at approximately half the resolution of the other.
b) The data is encoded using JPEG-LS in near-lossless mode, with a maximum error of 4. Due to the way that JPEG-LS works, this value must be correct, as any attempt to use a different value produces "garbage" from the decoder.
Decoding procedure:
1) Make a new SOP instance to hold the pixel data - and copy any attributes you wish to preserve from the original (not including the private data)
2) Decide whether you wish to access the full resolution or lower resolution version of the data, and (from the original SCO), select the appropriate private data sequence - 7E01,1010 for full resolution or 7E01, 1011 for lower resolution
3) Within the chosen sequence, there will be a collection of DICOM datasets, each of which will contain element 7E01,1012 - which is of type OB - these should be concatenated in order into an array of byte values (called hereafter, "data").
4) Locate the following values in the data, and set into appropriate DICOM attributes in the new image:
data[20-21] : Frame Count => 0028,0008
data[24-25] : Columns => 0028,0011
data[28-29] : Rows => 0028,0010
data[32] : Bits Stored => 0028,0101
Set corresponding values in 0028,0100 (16) and 0028,0102 (Bits Stored - 1), as well as 0028,0004 = MONOCHROME2
5) Construct a matching JPEG-LS header, using these and other values:
0xFF, 0xD8, // Start of image (SOI) marker
0xFF, 0xF7, // Start of JPEG-LS frame (SOF55) marker – marker segment follows
0x00, 0x0B, // Length of marker segment = 11 bytes including the length field
0x08, // P = Precision [Set from data[32] ]
0x00, 0x00, // Y = Number of lines [Set from data[28-29] - reversed to make big endian]
0x00, 0x00, // X = Number of columns [Set from data[24-25] - reversed to make big endian]
0x01, // Nf = Number of components in the frame = 1
0x01, // C1 = Component ID = 1 (first and only component)
0x11, // Sub-sampling: H1 = 1, V1 = 1
0x00, // Tq1 = 0 (this field is always 0)
0xFF, 0xDA, // Start of scan (SOS) marker
0x00, 0x08, // Length of marker segment = 8 bytes including the length field
0x01, // Ns = Number of components for this scan = 1
0x01, // Ci = Component ID = 1
0x00, // Tm1 = Mapping table index = 0 (no mapping table)
0x00, // NEAR [Set from data[36] ] -
0x00, // ILV = 0 (interleave mode = non-interleaved)
0x00 // Al = 0, Ah = 0 (no point transform)
6) Locate the index to the frame positions in the encoded data. This starts 1024 bytes before the end of "data" and consists of a series of 4 byte little-endian values, giving the start of each frame relative to the start of data. It includes an extra value at the end to allow the calculation of the length of the last frame.
7) Restruct each frame as JPEG-LS data, by extracting the encoded data from data[index[frame]] to data[index[frame+1]-1], then prepending the JPEG-LS data from step 5, and appending an EOI marker (FF, D9). If the frame length is odd, append an additional 00 byte. Each frame can now be packed as a new fragment into the new SOP instance (remembering to include a basic offset table as the first fragment).
8) Add an 0002 group meta-header, indicating that the transfer syntax is 1.2.840.10008.1.2.4.81 (JPEG-LS in near-lossless mode)
The positions of the important values in data are empirical, but the numbers of frames, width and height are well-established, as they vary between images, and are consistent. The values for precision (10) and NEAR (4), are less certain, as all images I have seen contain the same values, but their positions seem logical, so they are probably correct.
Interestingly, the images generated this way are even smaller than the Hologic originals, as they only contain one of the 2 resolution versions - the high resolution one works out at about 75% of the original size, and the lower one at 25% of the original.
The above recipe has worked for all Hologic SCO objects I have tried it on so far, and gives pixel values identical to the decopmressed versions available on the site from which they were obtained, but being entirely empirical, it is quite possible that there may be subtleties which have escaped me, so if you find any images for which it does not work (or if you need any guidance on how to use the above notes), please do get in touch.
Dave Harvey