It's quite straightforward to create a 3-d array to hold this kind of data:
image_block = np.empty((100, 512, 512), dtype=??)
now you can load it up by using some lib (PIL, or ???) to load the tif
images, and then:
for i in images:
image_block[i,:,:] = i
note that I put dtype to ??? up there. What dtype you want is dependent
on what's in the tiff images -- tiff can hold just about anything. So if
they are say, 16 bit greyscale, you'd want:
dtype=np.uint16
if they are 24 bit rgb, you might want a custom dtype (I don't think
there is a 24 bit dtype built in):
RGB_type = np.dtype([('r',np.uint8),('g',np.uint8),('b',np.uint8)])
for 32 bit rgba, you can use the same approach, or just a 32 bit integer.
The cool thing is that you can make views of this array with different
dtypes, depending on what's easiest for the given use case. You can even
break out the rgb parts into different axis:
image_block = np.empty((100, 512, 512), dtype=RGB_type)
image_block_rgb=image_block.view(dtype=np.uint8).reshape((100,512,512,3))
The two arrays now share the same data block, but you can look at them
differently.
I think this a really cool feature of numpy.
> i then would like to use visvis for visualizing this in 3D.
you'll have to see what visvis is expecting in terms of data types, etc.
HTH,
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris....@noaa.gov
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Notice that since PIL 1.1.6, PIL Image objects support the numpy
interface: http://effbot.org/zone/pil-changes-116.htm
>>> import PIL.Image
>>> im = PIL.Image.open('P1010102.JPG')
>>> im
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=3264x2448 at 0x4CA0A8>
>>> a = numpy.asarray(im)
>>> a.shape
(2448, 3264, 3)
>>> a.dtype
dtype('uint8')
You can use the image just as any other ndarray:
>>> stack = numpy.empty((5, 2488, 3264, 3))
>>> stack[0] = im
and so on
for 5 images in a stack, notice that the dtype of the initially empty
ndarray is float!
It works also vice-versa:
>>> im_copy = PIL.Image.fromarray(a)
but this seems to require integer-valued ndarrays as input, except
when the ndarray is monochrome.
This might be even simpler than the dtype proposed by Christopher.
For more info on PIL: http://www.pythonware.com/library/pil/handbook/
Friedrich
# getting axial slice
axial = slices[n,:,:]
# getting coronal slice
coronal = slices[:, n, :]
# getting sagital slice
sagital = slices[:,:, n]
For even longer than this, PIL has been somewhat broken with regard to
16-bit images (very common in microscopy); you may run into strange
byte-ordering issues that scramble the data on reading or writing.
Also, PIL's numpy interface is somewhat broken in similar ways.
(Numerous people have provided patches to PIL, but these are never
incorporated into any releases, as far as I can tell.)
So try PIL, but if the images come out all wrong, you might want to
check out the scikits.image package, which has hooks for various other
image read/write tools.
Zach
On Tue, Feb 1, 2011 at 6:39 AM, Asmi Shah <asmi....@gmail.com> wrote:
> Thanks a lot Friedrich and Chris.. It came in handy to use PIL and numpy..
> :)
> @Zach, m aware of the poor handling of 16bit images in PIL, for that I am
> using imagemagick to convert it into 8 bit first and then PIL for rest of
> the processing..
You could try VTK to open those files and use vtk functions to
transform to numpy arrays.
> I have one more question: how to avoid the limitation of memoryerror in
> numpy. as I have like 200 images to stack in the numpy array of say
> 1024x1344 resolution.. have any idea apart from downsampling?
Take a look at numpy.memmap or h5py [1].
_______________________________________________
> NumPy-Discussion mailing list
> NumPy-Di...@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
[1] - http://code.google.com/p/h5py/
> I have one more question: how to avoid the limitation of memoryerror
> in
>>
>> numpy. as I have like 200 images to stack in the numpy array of say
>> 1024x1344 resolution.. have any idea apart from downsampling?
>
> Take a look at numpy.memmap or h5py [1].
>
memmap will not help unless he uses 64 bit Python, in which case he
can just buy more RAM if he has too little. I suspect he is running
out of virtual memory, not physical, for which 64 bit is the easiest
solution. It is not possible to compensate for lack of virtual memory
(typically 2GB limit with 32 bit) by memory mapping file into the
already exhausted memory space.
Using a database like h5py will help too, unless he tries to extract
them all at once.
Sturla
Hi Zach and Sturla,
Well I am a "she" :))
Thanks for your inputs.. I am using 32 bit python as have so many libraries integrated with it.. and moreover, i plan to put this volume rendered on a web page or distribute the exe in the end, so want to be minimal for the memory requirements on the clients' systems..the physical memory should not be a problem as I have 8GB RAM.. specially when the images are RGB then it gets into trouble as it adds the 4th dimension already in my case..
:-)
> I have one more question: how to avoid the limitation of memoryerror in
> numpy. as I have like 200 images to stack in the numpy array of say
> 1024x1344 resolution.. have any idea apart from downsampling?
In case you *have* to downsample:
I also ran into this, with the example about my 5 images ...
im.resize((newx newy), PIL.Image.ANTIALIAS) will be your friend.
http://www.pythonware.com/library/pil/handbook/image.htm.
Note, you might take advatage of floating-point images ('F' spec), I
don't know what the trade-offs are here. 'F' most probably takes
4x(8bit), so ...
The PIL handbook does not state what PIL.Image.ANTIALIAS actually
does, we can only hope that it's real sinc interpolation or similar
(if your images are frequency bounded this would be best to my
knowledge). In this case you do not even lose information as long as
the spacial resolution of the downsampled images is still sufficient
to make the signal frequency bounded.
You might do a FFT (spacial) to check if your images *are* actually
bounded in frequency domain. I think it does not need to be perfect.
I strongly believe sinc is in scipy, but I never looked for.
Friedrich
If I'm doing my math right, that's 262 MB, shouldn't be a problem in
modern systems. That's 8bit, but 786MB if 24 bit RGB.
If you are careful about how many copies you're keeping around
(including temporaries), you mau be OK still.
But if you really have big collections of images, you might try memory
mapped arrays -- as Sturla pointed out they wont' let you create monster
arrays on a 32 bit python, but maybe they do help with not clogging up
memory too much? I don't know -- I haven't used them -- presumably they
have a purpose.
Also, pytables is worth a look, as another way to get HDF5 on disk, but
I think more "natural" access.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
If you want to downsample by a integer amount (i.e a factor of 2) in
each dimension, I have some Cython code that optimizes that. I'm happy
to send it along.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
But they will on 64 bit Python :D We can just memory map a temporary
file, and fake as much "memory" as we need. The OS will optimize the
disk access. Also consider that hardware is cheap compared to labour, at
least in Europe. Is programming for memory limited 32 bit Python worth
the effort?
Sturla
What do you mean by 'optimize the disk access'? One of the drawbacks of
memory mapped files is precisely that OS cannot distinguish between data
that belongs to 'disk' and data that belongs to 'memory'. This normally
introduces extreme slowness in other programs when datasets in files
exceeds physical memory but have to loaded by OS -- the reason being
that OS swaps out most of programs/shared libraries that were in memory
in order to be able to load new 'disk' data.
The other important drawback of the memory mapped files is that you need
to have at very least an amount of virtual memory that is enough to keep
all of these data files. In general, you only have virtual memory that
is between 1.5x and 2x the physical memory (having more than this is
generally regarded as a waste of disk space).
This is why I prefer very much reading directly from a file: the OS in
this case is able to distinguish between data belonging to 'disk' and
data belonging to 'memory'. Is in this case when the OS can really
optimize disk access (unless you have complicated setups).
> Also consider that hardware is cheap compared to labour,
> at least in Europe. Is programming for memory limited 32 bit Python
> worth the effort?
--
Francesc Alted
Give a try to pylibtiff [1], the cool thing is it give support to get
metainformations from tiff files. There is support to read tiff files
in VTK [2].
[1] - http://code.google.com/p/pylibtiff/
[2] - http://www.vtk.org/doc/nightly/html/classvtkTIFFReader.html
It's certainly the easy way to access a lot of memory -- and memory is
cheap these days.
> But the thing is i would
> compile my code and wanna distribute it to the clients..
I don't think 64 bit gets in the way of that -- except that it will only
runon 64 bit systems, which may be an issue.
> only reason why i want to work on 32 bit system. Sturla, how I can make it
> sure that some part of the data is kept on the disk and only the necessary
> one in the memory; as this seems to be a solution to my problem.
You can "roll your own" and have a disk cache of some sort -- it would
be pretty easy to store each image in a *.npz file and load the up as
you need them.
But it would probably be even easier to use one of the hdf-based
libarries, such as pytables -- I htink it will do it all for you.
One other option, that I've never tried, is carray, which is an array
compressed in memory. Depending on your images, perhaps they would
compress a lot (or not ....):
https://github.com/FrancescAlted/carray
http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052378.html
> As i said i
> want a 3d visualization out of the numpy array. it works fine for the
> downsampled dataset. And to visualize, i have to convert the 16bit data into
> 8bit as PIL doesnt support 16 bit data.
It's unclear to me what you native data really is: 16 bitgreyscale? 8bit
greyscale? either one should fit OK into 32 bit memory, and if 8bit is
accurate enough foryour needs, then it should be pretty easy.
> stack = numpy.empty((120, 1024, 1024))
numpy defaults to double precision float, np.float64, i.e. 8 bytes per
element -- you probably don't want that if you are concerned about
memoery use, and have 8 or 16 bit greyscale images. Try:
stack = np.empty((120, 1024, 1024), dtype=np.uint8) # (or dtype=np.uint16)
> i = 0
>
> os.chdir(dirr)
> for f in os.listdir(dirr):
>
> im = Image.open(f)
> im = im.convert("L")
you ight want to try mode "I". That should give you a 32 bit integer
grey scale, which should hold all the 16 bit data without loss -- then
you can convert to 16 bit when you bring it into numpy.
> a = numpy.asarray(im)
> print a.dtype
what does this print? it should be np.uint8
> stack[i] = a
here, if a is a uint8, numpy will convert it to a float64, to fit into
array a -- that's why you want to set the dtype of array a when you
create it.
> one more thing, it really doesnt work for tiff files at all, i have to
> convert them into jpgs as a prior step to this.
probably not the best choice either, jpeg is lossy -- is will smear
things out a bit, which you may not want. It also only holds 24 bit RGB
(I think), which is both a waste and will lose information from 16 bit
greyscale. If you have to convert, try PNG, though I'm not sure if it
handles 16 bit greyscale, either.
I'd look at a lib that can read tiff properly -- some have been
suggested here, and you can also use GDAL, which is meant for
geo-referenced data, but you can ignore the geo information and just get
an image if you want.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Nice idea. In 0.3.1 release I've just implemented preliminary support
for multidimensional data. So I was curious on the kind of compression
that can be achieved on images:
# Preliminaries: load numpy, matplotlib an carray libs
>>> import numpy as np
>>> import matplotlib.image as mpimg
>>> import matplotlib.pyplot as plt
>>> import carray as ca
First I tried the classic Lenna (http://en.wikipedia.org/wiki/Lenna):
>>> img = mpimg.imread('Lenna.png')
>>> cimg = ca.carray(img)
>>> cimg.nbytes/float(cimg.cbytes)
1.2450163377998429
So, just a 25% compression, not too much. But trying another example
(http://matplotlib.sourceforge.net/_images/stinkbug.png) gives a
significantly better ratio:
>>> img2 = mpimg.imread('stinkbug.png')
>>> cimg2 = ca.carray(img2)
>>> cimg2.nbytes/float(cimg2.cbytes)
2.7716869102466184
And finally, the beautiful NumPy container drawing by Stéfan van der
Walt (slide 31 of his presentation in our latest advanced Python course,
https://portal.g-node.org/python-autumnschool/materials/advanced_numpy):
>>> img3 = mpimg.imread('numpy-container.png')
>>> cimg3 = ca.carray(img3)
>>> cimg3.nbytes/float(cimg3.cbytes)
3.7915321810785132
So, yeah, depending on the images, carray could be a nice way to keep
them in-memory. And although, as I said, multidimensional support is
still preliminary, matplotlib already understands carray beasts:
# plotting image
>>> imshow(cimg3)
<matplotlib.image.AxesImage object at 0x27d2150>
Cheers,
--
Francesc Alted