[Numpy-discussion] create a numpy array of images

1,278 views
Skip to first unread message

Asmi Shah

unread,
Jan 28, 2011, 10:01:36 AM1/28/11
to numpy-di...@scipy.org
Hi guys,

I am using python for a while now and I have a requirement of creating a numpy array of microscopic tiff images ( this data is 3d, meaning there are 100 z slices of 512 X 512 pixels.) How can I create an array of images? i then would like to use visvis for visualizing this in 3D.

any help is highly appreciated to get me started.. Thanks,..

--
Regards,
Asmi Shah

Christopher Barker

unread,
Jan 28, 2011, 1:57:43 PM1/28/11
to Discussion of Numerical Python
On 1/28/11 7:01 AM, Asmi Shah wrote:
> I am using python for a while now and I have a requirement of creating a
> numpy array of microscopic tiff images ( this data is 3d, meaning there are
> 100 z slices of 512 X 512 pixels.) How can I create an array of images?

It's quite straightforward to create a 3-d array to hold this kind of data:

image_block = np.empty((100, 512, 512), dtype=??)

now you can load it up by using some lib (PIL, or ???) to load the tif
images, and then:

for i in images:
image_block[i,:,:] = i


note that I put dtype to ??? up there. What dtype you want is dependent
on what's in the tiff images -- tiff can hold just about anything. So if
they are say, 16 bit greyscale, you'd want:

dtype=np.uint16

if they are 24 bit rgb, you might want a custom dtype (I don't think
there is a 24 bit dtype built in):

RGB_type = np.dtype([('r',np.uint8),('g',np.uint8),('b',np.uint8)])

for 32 bit rgba, you can use the same approach, or just a 32 bit integer.

The cool thing is that you can make views of this array with different
dtypes, depending on what's easiest for the given use case. You can even
break out the rgb parts into different axis:

image_block = np.empty((100, 512, 512), dtype=RGB_type)

image_block_rgb=image_block.view(dtype=np.uint8).reshape((100,512,512,3))

The two arrays now share the same data block, but you can look at them
differently.

I think this a really cool feature of numpy.

> i then would like to use visvis for visualizing this in 3D.

you'll have to see what visvis is expecting in terms of data types, etc.

HTH,

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Friedrich Romstedt

unread,
Jan 30, 2011, 2:29:16 PM1/30/11
to Discussion of Numerical Python
2011/1/28 Christopher Barker <Chris....@noaa.gov>:

> On 1/28/11 7:01 AM, Asmi Shah wrote:
>> I am using python for a while now and I have a requirement of creating a
>> numpy array of microscopic tiff images ( this data is 3d, meaning there are
>> 100 z slices of 512 X 512 pixels.) How can I create an array of images?
>
> It's quite straightforward to create a 3-d array to hold this kind of data:
>
> image_block = np.empty((100, 512, 512), dtype=??)
>
> now you can load it up by using some lib (PIL, or ???) to load the tif
> images, and then:
>
> for i in images:
>     image_block[i,:,:] = i

Notice that since PIL 1.1.6, PIL Image objects support the numpy
interface: http://effbot.org/zone/pil-changes-116.htm

>>> import PIL.Image
>>> im = PIL.Image.open('P1010102.JPG')
>>> im
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=3264x2448 at 0x4CA0A8>
>>> a = numpy.asarray(im)
>>> a.shape
(2448, 3264, 3)
>>> a.dtype
dtype('uint8')

You can use the image just as any other ndarray:

>>> stack = numpy.empty((5, 2488, 3264, 3))
>>> stack[0] = im
and so on

for 5 images in a stack, notice that the dtype of the initially empty
ndarray is float!

It works also vice-versa:

>>> im_copy = PIL.Image.fromarray(a)

but this seems to require integer-valued ndarrays as input, except
when the ndarray is monochrome.

This might be even simpler than the dtype proposed by Christopher.

For more info on PIL: http://www.pythonware.com/library/pil/handbook/

Friedrich

toton...@gmail.com

unread,
Jan 31, 2011, 6:19:57 AM1/31/11
to Discussion of Numerical Python
I've been done that but with CT and MRI dicom files, and the cool
thing is that with numpy I can do something like this:

# getting axial slice
axial = slices[n,:,:]

# getting coronal slice
coronal = slices[:, n, :]

# getting sagital slice
sagital = slices[:,:, n]

Zachary Pincus

unread,
Jan 31, 2011, 3:55:05 PM1/31/11
to Discussion of Numerical Python
>>> I am using python for a while now and I have a requirement of
>>> creating a
>>> numpy array of microscopic tiff images ( this data is 3d, meaning
>>> there are
>>> 100 z slices of 512 X 512 pixels.) How can I create an array of
>>> images?
>>
>> It's quite straightforward to create a 3-d array to hold this kind
>> of data:
>>
>> image_block = np.empty((100, 512, 512), dtype=??)
>>
>> now you can load it up by using some lib (PIL, or ???) to load the
>> tif
>> images, and then:
>>
>> for i in images:
>> image_block[i,:,:] = i
>
> Notice that since PIL 1.1.6, PIL Image objects support the numpy
> interface: http://effbot.org/zone/pil-changes-116.htm

For even longer than this, PIL has been somewhat broken with regard to
16-bit images (very common in microscopy); you may run into strange
byte-ordering issues that scramble the data on reading or writing.
Also, PIL's numpy interface is somewhat broken in similar ways.
(Numerous people have provided patches to PIL, but these are never
incorporated into any releases, as far as I can tell.)

So try PIL, but if the images come out all wrong, you might want to
check out the scikits.image package, which has hooks for various other
image read/write tools.

Zach

Asmi Shah

unread,
Feb 1, 2011, 3:39:12 AM2/1/11
to numpy-di...@scipy.org
Thanks a lot Friedrich and Chris.. It came in handy to use PIL and numpy.. :)

@Zach, m aware of the poor handling of 16bit images in PIL, for that I am using imagemagick to convert it into 8 bit first and then PIL for rest of the processing..

I have one more question: how to avoid the limitation of memoryerror in numpy. as I have like 200 images to stack in the numpy array of say 1024x1344 resolution.. have any idea apart from downsampling?

toton...@gmail.com

unread,
Feb 1, 2011, 5:20:52 AM2/1/11
to Discussion of Numerical Python
Hi,

On Tue, Feb 1, 2011 at 6:39 AM, Asmi Shah <asmi....@gmail.com> wrote:
> Thanks a lot Friedrich and Chris.. It came in handy to use PIL and numpy..
> :)
> @Zach, m aware of the poor handling of 16bit images in PIL, for that I am
> using imagemagick to convert it into 8 bit first and then PIL for rest of
> the processing..

You could try VTK to open those files and use vtk functions to
transform to numpy arrays.

> I have one more question: how to avoid the limitation of memoryerror in
> numpy. as I have like 200 images to stack in the numpy array of say
> 1024x1344 resolution.. have any idea apart from downsampling?

Take a look at numpy.memmap or h5py [1].


_______________________________________________
> NumPy-Discussion mailing list
> NumPy-Di...@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

[1] - http://code.google.com/p/h5py/

Sturla Molden

unread,
Feb 1, 2011, 8:49:39 AM2/1/11
to Discussion of Numerical Python, Discussion of Numerical Python

Den 1. feb. 2011 kl. 11.20 skrev "toton...@gmail.com" <toton...@gmail.com
>:

> I have one more question: how to avoid the limitation of memoryerror
> in
>>
>> numpy. as I have like 200 images to stack in the numpy array of say
>> 1024x1344 resolution.. have any idea apart from downsampling?
>
> Take a look at numpy.memmap or h5py [1].
>

memmap will not help unless he uses 64 bit Python, in which case he
can just buy more RAM if he has too little. I suspect he is running
out of virtual memory, not physical, for which 64 bit is the easiest
solution. It is not possible to compensate for lack of virtual memory
(typically 2GB limit with 32 bit) by memory mapping file into the
already exhausted memory space.

Using a database like h5py will help too, unless he tries to extract
them all at once.

Sturla

Asmi Shah

unread,
Feb 1, 2011, 9:07:35 AM2/1/11
to numpy-di...@scipy.org
Hi Zach and Sturla,

Well I am a "she" :)) Thanks for your inputs.. I am using 32 bit python as have so many libraries integrated with it.. and moreover, i plan to put this volume rendered on a web page or distribute the exe in the end, so want to be minimal for the memory requirements on the clients' systems.. 
the physical memory should not be a problem as I have 8GB RAM.. specially when the images are RGB then it gets into trouble as it adds the 4th dimension already in my case.. 

- asmi

Sturla Molden

unread,
Feb 1, 2011, 11:16:55 AM2/1/11
to numpy-di...@scipy.org
Den 01.02.2011 15:07, skrev Asmi Shah:
Hi Zach and Sturla,

Well I am a "she" :))

I apologize, I did not deduce correct gender from your name :)



Thanks for your inputs.. I am using 32 bit python as have so many libraries integrated with it.. and moreover, i plan to put this volume rendered on a web page or distribute the exe in the end, so want to be minimal for the memory requirements on the clients' systems.. 
the physical memory should not be a problem as I have 8GB RAM.. specially when the images are RGB then it gets into trouble as it adds the 4th dimension already in my case.. 


With 32 bit each process only has 2 or 3 GB (dependent on OS setting) of those 8 GB available in user space. If 32 bit is a requirement, you have to store some of the data on disk to prevent exhausting the 2 GB virtual memory limit.

Actually, if you have 200 16-bit images in 1024 x 1344 resolution, that is only 525 MB if stored compactly with dtype np.uint16. So check how those images are stored in your stack. Do you use floating point instead of the smallest possible integer? Do you use separate dimensions for RGBA channels? Is there anything else wasting memory besides this stack of images? Do you have a memory leak?

Sturla









Friedrich Romstedt

unread,
Feb 1, 2011, 11:31:59 AM2/1/11
to Discussion of Numerical Python
2011/2/1 Asmi Shah <asmi....@gmail.com>:

> Thanks a lot Friedrich and Chris.. It came in handy to use PIL and numpy..
> :)

:-)

> I have one more question: how to avoid the limitation of memoryerror in
> numpy. as I have like 200 images to stack in the numpy array of say
> 1024x1344 resolution.. have any idea apart from downsampling?

In case you *have* to downsample:

I also ran into this, with the example about my 5 images ...
im.resize((newx newy), PIL.Image.ANTIALIAS) will be your friend.
http://www.pythonware.com/library/pil/handbook/image.htm.

Note, you might take advatage of floating-point images ('F' spec), I
don't know what the trade-offs are here. 'F' most probably takes
4x(8bit), so ...

The PIL handbook does not state what PIL.Image.ANTIALIAS actually
does, we can only hope that it's real sinc interpolation or similar
(if your images are frequency bounded this would be best to my
knowledge). In this case you do not even lose information as long as
the spacial resolution of the downsampled images is still sufficient
to make the signal frequency bounded.

You might do a FFT (spacial) to check if your images *are* actually
bounded in frequency domain. I think it does not need to be perfect.

I strongly believe sinc is in scipy, but I never looked for.

Friedrich

Christopher Barker

unread,
Feb 1, 2011, 12:58:09 PM2/1/11
to Discussion of Numerical Python
On 2/1/11 12:39 AM, Asmi Shah wrote:
> I have one more question: how to avoid the limitation of memoryerror in
> numpy. as I have like 200 images to stack in the numpy array of say
> 1024x1344 resolution.. have any idea apart from downsampling?

If I'm doing my math right, that's 262 MB, shouldn't be a problem in
modern systems. That's 8bit, but 786MB if 24 bit RGB.

If you are careful about how many copies you're keeping around
(including temporaries), you mau be OK still.

But if you really have big collections of images, you might try memory
mapped arrays -- as Sturla pointed out they wont' let you create monster
arrays on a 32 bit python, but maybe they do help with not clogging up
memory too much? I don't know -- I haven't used them -- presumably they
have a purpose.

Also, pytables is worth a look, as another way to get HDF5 on disk, but
I think more "natural" access.

-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Christopher Barker

unread,
Feb 1, 2011, 12:58:17 PM2/1/11
to Discussion of Numerical Python
On 2/1/11 8:31 AM, Friedrich Romstedt wrote:
> In case you *have* to downsample:
>
> I also ran into this, with the example about my 5 images ...
> im.resize((newx newy), PIL.Image.ANTIALIAS) will be your friend.
> http://www.pythonware.com/library/pil/handbook/image.htm.

If you want to downsample by a integer amount (i.e a factor of 2) in
each dimension, I have some Cython code that optimizes that. I'm happy
to send it along.

-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Sturla Molden

unread,
Feb 1, 2011, 1:58:16 PM2/1/11
to Discussion of Numerical Python
Den 01.02.2011 18:58, skrev Christopher Barker:
> But if you really have big collections of images, you might try memory
> mapped arrays -- as Sturla pointed out they wont' let you create monster
> arrays on a 32 bit python,

But they will on 64 bit Python :D We can just memory map a temporary
file, and fake as much "memory" as we need. The OS will optimize the
disk access. Also consider that hardware is cheap compared to labour, at
least in Europe. Is programming for memory limited 32 bit Python worth
the effort?

Sturla

Francesc Alted

unread,
Feb 1, 2011, 2:57:57 PM2/1/11
to Discussion of Numerical Python
A Tuesday 01 February 2011 19:58:16 Sturla Molden escrigué:

> Den 01.02.2011 18:58, skrev Christopher Barker:
> > But if you really have big collections of images, you might try
> > memory mapped arrays -- as Sturla pointed out they wont' let you
> > create monster arrays on a 32 bit python,
>
> But they will on 64 bit Python :D We can just memory map a temporary
> file, and fake as much "memory" as we need. The OS will optimize the
> disk access.

What do you mean by 'optimize the disk access'? One of the drawbacks of
memory mapped files is precisely that OS cannot distinguish between data
that belongs to 'disk' and data that belongs to 'memory'. This normally
introduces extreme slowness in other programs when datasets in files
exceeds physical memory but have to loaded by OS -- the reason being
that OS swaps out most of programs/shared libraries that were in memory
in order to be able to load new 'disk' data.

The other important drawback of the memory mapped files is that you need
to have at very least an amount of virtual memory that is enough to keep
all of these data files. In general, you only have virtual memory that
is between 1.5x and 2x the physical memory (having more than this is
generally regarded as a waste of disk space).

This is why I prefer very much reading directly from a file: the OS in
this case is able to distinguish between data belonging to 'disk' and
data belonging to 'memory'. Is in this case when the OS can really
optimize disk access (unless you have complicated setups).

> Also consider that hardware is cheap compared to labour,
> at least in Europe. Is programming for memory limited 32 bit Python
> worth the effort?

--
Francesc Alted

Asmi Shah

unread,
Feb 2, 2011, 5:22:15 AM2/2/11
to Discussion of Numerical Python
Hi all,

It seems that using 64 bit python is the solution. But the thing is i would compile my code and wanna distribute it to the clients.. and that is the only reason why i want to work on 32 bit system. Sturla, how I can make it sure that some part of the data is kept on the disk and only the necessary one in the memory; as this seems to be a solution to my problem. As i said i want a 3d visualization out of the numpy array. it works fine for the downsampled dataset. And to visualize, i have to convert the 16bit data into 8bit as PIL doesnt support 16 bit data.. the only thing i do to create my array is this:

stack = numpy.empty((120, 1024, 1024))

i = 0

os.chdir(dirr)
for f in os.listdir(dirr):

    im = Image.open(f)
    im = im.convert("L")
    a = numpy.asarray(im)
    print a.dtype
    stack[i] = a
    i += 1

one more thing, it really doesnt work for tiff files at all, i have to convert them into jpgs as a prior step to this. and it at max lets me create an array for around 60 slices only, where as my requirement would be around 100 to 200 images..
any ideas? can diagnose the problem??

thanks a lot.. asmi

toton...@gmail.com

unread,
Feb 2, 2011, 5:41:28 AM2/2/11
to Discussion of Numerical Python

Give a try to pylibtiff [1], the cool thing is it give support to get
metainformations from tiff files. There is support to read tiff files
in VTK [2].

[1] - http://code.google.com/p/pylibtiff/
[2] - http://www.vtk.org/doc/nightly/html/classvtkTIFFReader.html

Christopher Barker

unread,
Feb 2, 2011, 12:12:47 PM2/2/11
to Discussion of Numerical Python
> It seems that using 64 bit python is the solution.

It's certainly the easy way to access a lot of memory -- and memory is
cheap these days.

> But the thing is i would
> compile my code and wanna distribute it to the clients..

I don't think 64 bit gets in the way of that -- except that it will only
runon 64 bit systems, which may be an issue.

> only reason why i want to work on 32 bit system. Sturla, how I can make it
> sure that some part of the data is kept on the disk and only the necessary
> one in the memory; as this seems to be a solution to my problem.

You can "roll your own" and have a disk cache of some sort -- it would
be pretty easy to store each image in a *.npz file and load the up as
you need them.

But it would probably be even easier to use one of the hdf-based
libarries, such as pytables -- I htink it will do it all for you.

One other option, that I've never tried, is carray, which is an array
compressed in memory. Depending on your images, perhaps they would
compress a lot (or not ....):

https://github.com/FrancescAlted/carray
http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052378.html

> As i said i
> want a 3d visualization out of the numpy array. it works fine for the
> downsampled dataset. And to visualize, i have to convert the 16bit data into
> 8bit as PIL doesnt support 16 bit data.

It's unclear to me what you native data really is: 16 bitgreyscale? 8bit
greyscale? either one should fit OK into 32 bit memory, and if 8bit is
accurate enough foryour needs, then it should be pretty easy.


> stack = numpy.empty((120, 1024, 1024))

numpy defaults to double precision float, np.float64, i.e. 8 bytes per
element -- you probably don't want that if you are concerned about
memoery use, and have 8 or 16 bit greyscale images. Try:

stack = np.empty((120, 1024, 1024), dtype=np.uint8) # (or dtype=np.uint16)

> i = 0
>
> os.chdir(dirr)
> for f in os.listdir(dirr):
>
> im = Image.open(f)
> im = im.convert("L")

you ight want to try mode "I". That should give you a 32 bit integer
grey scale, which should hold all the 16 bit data without loss -- then
you can convert to 16 bit when you bring it into numpy.

> a = numpy.asarray(im)
> print a.dtype

what does this print? it should be np.uint8

> stack[i] = a

here, if a is a uint8, numpy will convert it to a float64, to fit into
array a -- that's why you want to set the dtype of array a when you
create it.

> one more thing, it really doesnt work for tiff files at all, i have to
> convert them into jpgs as a prior step to this.

probably not the best choice either, jpeg is lossy -- is will smear
things out a bit, which you may not want. It also only holds 24 bit RGB
(I think), which is both a waste and will lose information from 16 bit
greyscale. If you have to convert, try PNG, though I'm not sure if it
handles 16 bit greyscale, either.

I'd look at a lib that can read tiff properly -- some have been
suggested here, and you can also use GDAL, which is meant for
geo-referenced data, but you can ignore the geo information and just get
an image if you want.

-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Francesc Alted

unread,
Feb 3, 2011, 6:33:08 AM2/3/11
to Discussion of Numerical Python
A Wednesday 02 February 2011 18:12:47 Christopher Barker escrigué:

> One other option, that I've never tried, is carray, which is an array
> compressed in memory. Depending on your images, perhaps they would
> compress a lot (or not ....):
>
> https://github.com/FrancescAlted/carray
> http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052378.h
> tml

Nice idea. In 0.3.1 release I've just implemented preliminary support
for multidimensional data. So I was curious on the kind of compression
that can be achieved on images:

# Preliminaries: load numpy, matplotlib an carray libs
>>> import numpy as np
>>> import matplotlib.image as mpimg
>>> import matplotlib.pyplot as plt
>>> import carray as ca

First I tried the classic Lenna (http://en.wikipedia.org/wiki/Lenna):

>>> img = mpimg.imread('Lenna.png')
>>> cimg = ca.carray(img)
>>> cimg.nbytes/float(cimg.cbytes)
1.2450163377998429

So, just a 25% compression, not too much. But trying another example
(http://matplotlib.sourceforge.net/_images/stinkbug.png) gives a
significantly better ratio:

>>> img2 = mpimg.imread('stinkbug.png')
>>> cimg2 = ca.carray(img2)
>>> cimg2.nbytes/float(cimg2.cbytes)
2.7716869102466184

And finally, the beautiful NumPy container drawing by Stéfan van der
Walt (slide 31 of his presentation in our latest advanced Python course,
https://portal.g-node.org/python-autumnschool/materials/advanced_numpy):

>>> img3 = mpimg.imread('numpy-container.png')
>>> cimg3 = ca.carray(img3)
>>> cimg3.nbytes/float(cimg3.cbytes)
3.7915321810785132

So, yeah, depending on the images, carray could be a nice way to keep
them in-memory. And although, as I said, multidimensional support is
still preliminary, matplotlib already understands carray beasts:

# plotting image
>>> imshow(cimg3)
<matplotlib.image.AxesImage object at 0x27d2150>

Cheers,

--
Francesc Alted

Reply all
Reply to author
Forward
0 new messages