save_as and int16 auto scaling

James Hawkins

unread,

Jan 28, 2013, 3:16:14 PM1/28/13

to pyd...@googlegroups.com

I'm reading in dicom files, writing some disclaimer text to the image data, and trying to save a new dicom with that new data. I read in the dicom as follows:

self.ds = dcm.read_file(dicom_file)

self.data = self.ds.pixel_array

The image data has a dtype of int16, values ranging from -20 to 988, and PixelRepresentation has a value of 1. I then burn in the text using PIL, and eventually end up with a numpy array, of type int16, that I want to put into PixelData. I do that with:

self.ds.PixelData = dic_data.tostring()

self.ds.save_as(output)

I have tried having dic_data be an array with types: np.int16, np.int16 after scaling setting all the negative values to 0, and np.uint16 after scaling setting all the negative values to 0, and changing PixelRepresentation to 0, but the dicom saved always has data scaled to fill up the entire range of integers available (-32768 through 32768). What should I be doing here? When I look at dic_data array's values, everything is correct. Where is this scaling happening, and what should I do to prevent it?

Thanks for the help.

Jonathan Suever

unread,

Jan 28, 2013, 4:16:12 PM1/28/13

to pyd...@googlegroups.com

James,

Just some initial thoughts on this.

First off, are you sure that your data is actually 16 bit? Check the BitsStored field to check. In most DICOMs I've dealt with, the image is actually 12 bit (but requires 16 bits to use an even number of bytes in storage).

Also I'm not sure how much it matters but you probably want to change SmallestImagePIxelValue and LargestImagePixelValue if they exist. I'm not entirely sure if you need to mess with the HighBit field but that's another potential thing to consider.

-Jonathan

--
You received this message because you are subscribed to the Google Groups "pydicom" group.
To post to this group, send email to pyd...@googlegroups.com.
To unsubscribe from this group, send email to pydicom+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/pydicom?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

James Hawkins

unread,

Jan 28, 2013, 5:23:54 PM1/28/13

to pyd...@googlegroups.com

Thanks for the reply. Largest and smallest pixel values are set to 988 and -20 (I've tried setting smallest to 0 when getting rid of the negative values). Bits stored is 16.

This also happens when I just create an array from pixel_array, and without any modifications, feed that back into PixelData.

Jonathan Suever

unread,

Jan 28, 2013, 6:05:44 PM1/28/13

to pyd...@googlegroups.com

So the only thing that I can think of is that this may be an endianness issue. Currently, when we write OW VRs back to the file (or when you reassign to PixelData for that matter), we don't do any byte-swapping (see note in write_OWvalue of filewriter.py). When ds.pixel_array is called, the proper byte-swapping is performed so it would look OK in numpy upon loading it. But when you convert that numpy array back using .tostring() it doesn't do any byte-swapping (at least back to the kind that the DICOM is set to handle). You could fix this by trying data.byteswap(True) on your numpy array before reassigning to PixelData.

What is the value of ds.file_meta.TransferSyntaxUID.name?

That's the only thing I can think of that would cause the results you're getting.

Also, if this is actually the issue, I THINK that you can specify the endianness during creation of the numpy array (inside of ds.pixel_array) which would allow tostring() to return the proper value and not require byte-swapping on write.

This could be wrong but it's worth testing out.

-Jonathan

James Hawkins

unread,

Jan 28, 2013, 7:07:48 PM1/28/13

to pyd...@googlegroups.com

image.ds.file_meta.TransferSyntaxUID.name is 'Explicit VR Big Endian'

When I create an array that's a byteswapped version of the data, i get an array with max and min values of 2130903040 and -2147483648, and writing that to PixelData gives a dicom with the same issues...

Darcy Mason

unread,

Jan 28, 2013, 8:24:28 PM1/28/13

to pyd...@googlegroups.com

I don't have any ideas beyond what has been said ... but I've attached a script which samples some data. Please edit to change the dicom file loaded and run the script. If the pixel data sampled is uninteresting (e.g. zeros) then change i_from and i_to to sample somewhere in the middle of the image (ideally do a sample of the pixel_array in the same row and column as a numpy subarray). Let us know if that helps figure something out ... you should at least see the two PixelData samples be the same.

-Darcy

test2.py

Darcy Mason

unread,

Jan 28, 2013, 11:43:24 PM1/28/13

to pyd...@googlegroups.com

On Monday, January 28, 2013 8:24:28 PM UTC-5, Darcy Mason wrote:

I don't have any ideas beyond what has been said ... but I've attached a script which samples some data. Please edit to change the dicom file loaded and run the script. If the pixel data sampled is uninteresting (e.g. zeros) then change i_from and i_to to sample somewhere in the middle of the image (ideally do a sample of the pixel_array in the same row and column as a numpy subarray). Let us know if that helps figure something out ... you should at least see the two PixelData samples be the same.

On reading the thread again, I'm thinking Jonathan was right about the endian issue, and that's the key thing I was hoping to dig into with that script. James, you already said that setting the tostring without manipulating the data gave the same problems, so we should see that in the bytes displayed, unless it happens during the actual writing. If you try running it on one of the images included with pydicom (all little endian except the one named BigEnd), the bytes should all match.

On the other hand, byte-swapping didn't seem to help your problem. Still, if we can break it down step by step it might help figure out some other options. Even better, would it be possible to provide an anonymized file? I'd be happy to dig into this further. Another choice might be to fake a file with known (simple) pixel patterns -- alternating bars of the same pixel value, for example.

Jonathan Suever

unread,

Jan 29, 2013, 9:28:01 AM1/29/13

to pyd...@googlegroups.com

Ok well if this isn't the answer to this particular problem, then I have at least found a problem that I think needs to be addressed with regard to endianness.

I've attached a script that demonstrates the issue and describes what's happening.

Basically:

numpy arrays created from byte strings use the native endianness, which we account for in pixel_array creation. The problem is that the array created now has the endianness of the system so if the system is LE and the data is BE (in this case) then .tostring() and the subsequent re-assignment to PixelData is little endian (the system ordering). Then when we save this to file, we don't do any byteswapping on write so it gets written as LE. Then when we re-read the file, the transfer syntax specifies Big Endian so when we go to get the pixel_array of this it's decoded as big endian which is obviously incorrect.

I see two possible fixes here:

1) Specify the endianness in the numpy.dtype when creating the pixel_array numpy array. This would allow the user to manipulate that array (keeping the same byte order) and then .tostring() will return the proper string. If the user created their own custom numpy array from scratch this obviously wouldn't work but I'd say that creating pixel data from scratch is a pretty advanced thing and the user should probably know what Transfer Syntax they're using.

2) Assume (always bad IMO) that all of the PixelData assignment values are using the system's natural byte-ordering (numpy arrays use this by default) and therefore if the Transfer Syntax differs from the system's byte-ordering, then we'd perform a byteswap on it.

3) Create a method for re-assigning PixelData internally. i.e. take a numpy array as input (ds.PixelData.from_array(np.array)). This way you could check the byte ordering of the dtype (np.array.dtype.byteorder) and make sure that it matches the Transfer Syntax

I personally think the first solution is more elegant but it's open to discussion.

James: (sorry about the tangent)

As far as another thing to try with your data, check out line 54 of the script I have attached. It shows how you could ensure your numpy data is encoded as Big Endian. Also, your use of PIL could complicate things a little further. As Darcy said, if you provide an anonymized DICOM that may help out tremendously.

-Jonathan

--

You received this message because you are subscribed to the Google Groups "pydicom" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pydicom+u...@googlegroups.com.

To post to this group, send email to pyd...@googlegroups.com.

pydicom_pixel_data_issue.py

Jonathan Suever

unread,

Jan 29, 2013, 10:06:56 AM1/29/13

to pyd...@googlegroups.com

James,

Another quick thought. You could also maybe send an image of what pixel_array looks like (using one of the contrib files). I've found visualization to be helpful in deciding if something is an endianness issue, a processing issue, scaling issue, etc.

-Jonathan

Darcy Mason

unread,

Jan 29, 2013, 5:30:05 PM1/29/13

to pyd...@googlegroups.com

On Tuesday, January 29, 2013 9:28:01 AM UTC-5, Suever wrote:

Ok well if this isn't the answer to this particular problem, then I have at least found a problem that I think needs to be addressed with regard to endianness.

I've attached a script that demonstrates the issue and describes what's happening....

Jonathan, can you repost this on the pydicom-dev list ... I was wanting to discuss these issues anyway (how to handle pixel_array/PixelData gotchas). I have some opinions on the options you presented, but I'd like to discuss over there. With the pydicom 1.0 release we can plan for some major changes. Those who are interested are welcome to join the dev discussion if not already on that group.

-Darcy

James Hawkins

unread,

Jan 30, 2013, 3:58:36 PM1/30/13

to pyd...@googlegroups.com

Thanks for looking into this; I appreciate it. Here's an anonymized file, and I'm looking into those scripts you two posted...

Anonymized.DCM.zip

Jonathan Suever

unread,

Jan 31, 2013, 11:03:04 AM1/31/13

to pyd...@googlegroups.com

James,

I'm not seeing any strange behavior with your DICOM on my end when converting to a numpy array and then re-assigning to PixelData and saving. Are you sure you sent the right file because this one appears to be Little Endian.

-Jonathan

--

You received this message because you are subscribed to the Google Groups "pydicom" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pydicom+u...@googlegroups.com.

To post to this group, send email to pyd...@googlegroups.com.

Message has been deleted

Jonathan Suever

unread,

Jan 31, 2013, 1:02:57 PM1/31/13

to pyd...@googlegroups.com, pyd...@googlegroups.com

Maybe try anonymizing with pydicom by just looking though the fields and changing sensitive info. This should keep everything else intact. Who knows what your anonymizer is doing.

-Jonathan

iTyped with my iThumbs

On Jan 31, 2013, at 9:49, James Hawkins <dr.ka...@gmail.com> wrote:

Huh. Yep, things are working fine with the anonymized file. I was originally using a non-anonymized file, as is from the scanner, and ran it through our anonymizer to post here...

James Hawkins

unread,

Mar 13, 2013, 1:36:17 PM3/13/13

to pyd...@googlegroups.com

Sorry for long delay on this... The problem seems to be that some of the scanners here are writing the pixel data tag with a VR of OW, and when I write back the modified image, the VR needs to be either 'OB' or 'OW or OB' (things work out when I run the script on dicoms where the pixel data tag VR is 'OW or OB'). What is best practice for changing the VR of a tag?

Thanks,

James

Jonathan Suever

unread,

Mar 13, 2013, 2:05:37 PM3/13/13

to pyd...@googlegroups.com

James,

The difference between OB and OW tags is that OW requires byte-swapping if the Transfer Syntax of your DICOM doesn't match the endianness of your system. Changing the VR to 'OB' may work because that will basically tell DICOM readers and writers to not perform any byte-swapping. I'm not sure if this is a very robust fix or not, but you could at least test it out.

To modify the VR of a data element, you need to modify the DataElement instance that holds the pixel data.

ds = dicom.read_file('example.dcm')

data_elem = ds[0x7fe0,0x10]

# Then set the VR to whatever you want

data_elem.VR = 'OB'

I still think that a more robust solution could be to byte-swap your modified array prior to writing back to the PixelData element.

On a side note, It doesn't look like pydicom currently factors in if PixelData is OW or OB to determine if a byte-swap is needed. Byte-swapping should really only occur for OW VRs. Because of this, you may have difficulty reading the image back into pydicom.

-Jonathan

James Hawkins

unread,

Mar 13, 2013, 3:41:05 PM3/13/13

to pyd...@googlegroups.com

How should I properly byteswap? When I try, my values are all changed...

In [147]: data = test.image_with_text

In [148]: data.max()

Out[148]: 988

In [149]: data.min()

Out[149]: -20

In [150]: bs = data.byteswap()

In [151]: bs.max()

Out[151]: 2130903040

In [152]: bs.min()

Out[152]: -2147483648

- James

James Hawkins

unread,

Mar 13, 2013, 5:14:14 PM3/13/13

to pyd...@googlegroups.com

Nevermind. Even though byteswapping the bs.max() gives 895, instead of 988, when written to a dicom all the values work out. Thanks for the help!

Jonathan Suever

unread,

Mar 13, 2013, 7:38:06 PM3/13/13

to pyd...@googlegroups.com

James,

I've attached a basic script that has a function that essentially ensures that the byte-ordering works regardless of whether your system is big or little endian. I'm sure there is a better way to determine the value of "needs_swap" using both the system endianness and the byteorder of the numpy array but this should get the job done.

As Darcy mentioned before, we're figuring out a better way to reassign pixel values in the new version of pydicom.