Re: [SciPy-User] How to create multi-page tiff files with python tools?

1,439 views
Skip to first unread message

David Warde-Farley

unread,
Sep 29, 2009, 8:59:16 PM9/29/09
to SciPy Users List, scikit...@googlegroups.com
On 29-Sep-09, at 2:24 PM, Ralf Gommers wrote:

> On Tue, Sep 29, 2009 at 11:14 AM, Sebastian Haase
> <seb....@gmail.com>wrote:
>
> Hi Sebastian, this is very useful functionality for me as well.
>
> The question I have is if your patched PIL includes fixes for 16-bit
> images.
> Right now I'm using a patched PIL kindly provided to me by Zachary
> Pincus
> that fixes 16-bit issues. I saw that some improvements for 16-bit were
> included in PIL trunk but not his patches. Your patch is included it
> seems,
> so I could also run PIL trunk if someone can confirm that 16-bit TIF
> images
> work. I'd prefer Priithon though because then I could stop asking my
> users
> to compile PIL themselves...


I've been following this discussion somewhat and I wanted to point out
that (as far as I can remember) image I/O free of PIL dependence was
one of the stated goals of the image scikit. I'm not sure much
progress has been made on that front yet.

It seems that common requirements not being met by PIL are
a) full support for multipage TIFF (loading, creating, saving)
b) 16-bit multipage TIFF

Rather than monkeypatching PIL four ways from Sunday, maybe it would
be best to direct efforts towards building a PIL-free alternative?
Incorporation of very specific code from PIL shouldn't be an issue
given that PIL is quite liberally licensed.

David

(P.S. I'm CCing the scikits-image list as well, should you want to
join it, etc.)

Ralf Gommers

unread,
Sep 29, 2009, 10:11:13 PM9/29/09
to scikit...@googlegroups.com, SciPy Users List

That would be great. I don't know much about PIL internals but I am up for contributing tests and documentation if such an effort is made.

Cheers,
Ralf
 

Stéfan van der Walt

unread,
Oct 2, 2009, 7:19:51 PM10/2/09
to SciPy Users List, scikit...@googlegroups.com, Ralf Gommers
Apologies, this message was meant for the scikits-image list. Please
continue discussions there.

2009/10/3 Stéfan van der Walt <ste...@sun.ac.za>:
> 2009/9/30 Ralf Gommers <ralf.g...@googlemail.com>:
>>> I think the problem is that Frederick Lundh is the only one who has
>>> permission to add/change code base.
>>> I still find it very suspicious that somewhere on the PIL website it
>>> states that you can pay (a lot of money) for a "special license" to
>>> get early  access to the development version - so even you are
>>> providing (free) patches via the mailing list, you would have to pay
>>> to get access to the patched version !?
>>> A couple months ago I asked for an explanation but didn't get a reply.
>>>
>> Yeah that is very odd. An attempt to put the I/O part of PIL in a scikit may
>> be enough of a push to improve that situation. The only other important
>> Python library I can think of that was this inert is setuptools, and look
>> what happened there.
>
> I wonder if we shouldn't take the plunge and add OpenImageIO as a dependency?
>
> Here's the list of features (from their website):
>
> - Extremely simple but powerful ImageInput and ImageOutput APIs for
> reading and writing 2D images that is format agnostic -- that is, a
> "client app" doesn't need to know the details about any particular
> image file formats. Specific formats are implemented by DLL/DSO
> plugins.
>
> - Format plugins for TIFF, JPEG/JFIF, OpenEXR, PNG, HDR/RGBE, Targa,
> JPEG-2000, BMP, and ICO formats. More coming! The plugins are really
> good at understanding all the strange corners of the image formats,
> and are very careful about preserving image metadata (including Exif,
> GPS, and IPTC data).
>
> - An ImageCache class that transparently manages a cache so that it
> can access truly vast amounts of image data (thousands of image files
> totaling hundreds of GB) very efficiently using only a tiny amount
> (tens of megabytes at most) of runtime memory. Additionally, a
> TextureSystem class provides filtered MIP-map texture lookups, atop
> the nice caching behavior of ImageCache.
>
> - Supported on Linux, OS X, and Windows.  All available under the BSD
> license, so you may modify it and use it in both open source or
> proprietary apps.
>
>
> I really don't have much hope for PIL.  The development process is
> closed and slow.  Once you ignore your community, you are pretty much
> done for.  The only reason PIL still exists is because it is useful,
> but let's face it: we can easily rewrite 80% of its capabilities at a
> multi-day sprint.  Perhaps we should.
>
> Regards
> Stéfan
>

Zachary Pincus

unread,
Oct 3, 2009, 1:25:13 PM10/3/09
to scikit...@googlegroups.com
Stéfan:

> I wonder if we shouldn't take the plunge and add OpenImageIO as a
> dependency?

...

> - Supported on Linux, OS X, and Windows. All available under the BSD
> license, so you may modify it and use it in both open source or
> proprietary apps.
>
>
> I really don't have much hope for PIL. The development process is
> closed and slow. Once you ignore your community, you are pretty much
> done for. The only reason PIL still exists is because it is useful,
> but let's face it: we can easily rewrite 80% of its capabilities at a
> multi-day sprint. Perhaps we should.

The only downside to OpenImageIO is that it has some not-always-
standard dependencies, such as boost and cmake (neither of which
mentioned in the build instructions), which make it a bit tricky to
install, at least from an end-user perspective (especially as a
replacement for PIL, which is just a "python setup.py install" away).
The situation on Windows is also not super-simple. Perhaps a
streamlined build could be shoehorned into distutils (but the boost
thing is still a bit of a pain)...

Any thoughts on this matter? I think it looks like a great library,
and it would be great to have some ctypes wrappers for it, but I'm not
sure how simple a dependency it will wind up being, especially given
that they don't have platform binaries available yet for OpenImageIO...

Zach

Stéfan van der Walt

unread,
Oct 3, 2009, 4:27:25 PM10/3/09
to scikit...@googlegroups.com
Hey Zach

2009/10/3 Zachary Pincus <zachary...@yale.edu>:


> The only downside to OpenImageIO is that it has some not-always-
> standard dependencies, such as boost and cmake (neither of which

I didn't realise there was a boost dependency -- I'd rather avoid
boost if at all possible.

You spent some time writing a replacement IO reader in pure Python, if
I recall correctly; did you have any practically usable results?

Another option may be GraphicsMagick:

http://www.graphicsmagick.org/

Regards
Stéfan

Damian Eads

unread,
Oct 3, 2009, 4:36:40 PM10/3/09
to scikit...@googlegroups.com
One alternative is LIBCVD, which reads and writes many common formats
including BMP, PNG, PPM, JPEG, etc. It has a simple, easy-to-use image
loading function img_load,

Image <float> img(img_load("image.png"));

It also reads and writes video. All of its dependencies are optional
so a reader/writer is only compiled if the development library is
available during ./configure.

Damian

2009/10/3 Stéfan van der Walt <ste...@sun.ac.za>:
>

--
-----------------------------------------------------
Damian Eads Ph.D. Candidate
University of California Computer Science
1156 High Street Machine Learning Lab, E2-489
Santa Cruz, CA 95064 http://www.soe.ucsc.edu/~eads

Chris Colbert

unread,
Oct 3, 2009, 4:55:20 PM10/3/09
to scikit...@googlegroups.com
There is also imagemagick, which is included in the ubuntu repos:

http://www.imagemagick.org/script/formats.php

which supports a ton of formats, and also has python bindings...

Chris Colbert

unread,
Oct 3, 2009, 4:56:56 PM10/3/09
to scikit...@googlegroups.com
but then again, if we want to include anything from OpenCV we might as
well use that imageIO because it supports quite a bit as well..

Stéfan van der Walt

unread,
Oct 3, 2009, 5:31:29 PM10/3/09
to scikit...@googlegroups.com
2009/10/3 Chris Colbert <scco...@gmail.com>:

>
> but then again, if we want to include anything from OpenCV we might as
> well use that imageIO because it supports quite a bit as well..

This is an important issue that we should clarify: "general" vs.
"specific" dependencies.

With a "general" dependency, I refer to a library that developers are
encourage to use throughout the scikit code. If OpenCV is chosen as
such a library, we can use its image loading, processing, vision etc.
routines.

On the other hand, a specific dependency states that only a certain
function depends on, for example, OpenCV. We could say: "If you want
to execute optical_flow(...) you'll have to have OpenCV installed."

With a general dependency, code becomes inextricably intertwined, and
you won't be able to get rid of the dependency without invasive
surgery. A specific dependency is much more easily removed.

My personal feeling is that we should stay away from general
dependencies, if possible. I don't intend for scikits.image to become
a wrapper around libcvd or opencv -- those wrappers already exist.
Rather, I want to focus on implementing novel image processing
techniques that are not easily available elsewhere. [Of course, if a
function is easy enough to implement and useful for general purpose
image processing (such as the color conversion routines), there's
little reason to exclude it.]

So, for image reading, I'm OK following a path such as:

1. Attempt to use ImageMagick
2. Not found, attempt to use PIL
3. Try built-in png reader (we can adapt matplotlib's)
4. Give up

I'd like to hear your further opinions regarding dependencies.

Thanks
Stéfan

Chris Colbert

unread,
Oct 3, 2009, 6:11:31 PM10/3/09
to scikit...@googlegroups.com
I agree that the core functionality should not have dependencies. And
I feel that IO falls under this core functionality. So if we choose an
existing library for IO, I think it should be statically linked.

So then it becomes a questions of which of the existing libraries
would it be easiest to separate out the IO code in order to statically
link.

I'm not familiar with other libraries, but all IO functionality is
performed with libhighgui in opencv. This includes all video IO as
well... and also some basic routines for creating minimal gui windows
and widgets.

-Chris

2009/10/3 Stéfan van der Walt <ste...@sun.ac.za>:
>

Damian Eads

unread,
Oct 3, 2009, 6:47:33 PM10/3/09
to scikit...@googlegroups.com
2009/10/3 Stéfan van der Walt <ste...@sun.ac.za>:
>
> 2009/10/3 Chris Colbert <scco...@gmail.com>:
>>
>> but then again, if we want to include anything from OpenCV we might as
>> well use that imageIO because it supports quite a bit as well..
>
> This is an important issue that we should clarify: "general" vs.
> "specific" dependencies.

It is an important distinction. Along these lines, LIBCVD has no
general dependencies other than a C++ compiler and compiles on both
GCC and Visual Studio. If chosen as a specific dependency, it wouldn't
increase the size of our dependency DAG by very much at all.

> With a "general" dependency, I refer to a library that developers are
> encourage to use throughout the scikit code.  If OpenCV is chosen as
> such a library, we can use its image loading, processing, vision etc.
> routines.
>
> On the other hand, a specific dependency states that only a certain
> function depends on, for example, OpenCV.  We could say: "If you want
> to execute optical_flow(...) you'll have to have OpenCV installed."
>
> With a general dependency, code becomes inextricably intertwined, and
> you won't be able to get rid of the dependency without invasive
> surgery.  A specific dependency is much more easily removed.
>
> My personal feeling is that we should stay away from general
> dependencies, if possible.  I don't intend for scikits.image to become
> a wrapper around libcvd or opencv -- those wrappers already exist.
> Rather, I want to focus on implementing novel image processing
> techniques that are not easily available elsewhere.  [Of course, if a
> function is easy enough to implement and useful for general purpose
> image processing (such as the color conversion routines), there's
> little reason to exclude it.]

Novel image processing algorithms not available elsewhere? Like what?
Can you give examples? If we restrict our attention to novel
algorithms, then we greatly limit the breadth of functionality, and
scikits.image is less likely to be adopted by other researchers.
Python is not currently the preferred language for Computer Vision or
Image Processing. Most researchers use MATLAB, C++, or a combination
of both. We should think of ways to broaden the appeal of Python to
such researchers and the development of scikits.image should reflect
it.

Damian

Stéfan van der Walt

unread,
Oct 3, 2009, 7:49:14 PM10/3/09
to scikit...@googlegroups.com
2009/10/4 Damian Eads <ea...@soe.ucsc.edu>:

> It is an important distinction. Along these lines, LIBCVD has no
> general dependencies other than a C++ compiler and compiles on both
> GCC and Visual Studio. If chosen as a specific dependency, it wouldn't
> increase the size of our dependency DAG by very much at all.

I downloaded both ImageMagick and CVD earlier this evening and started
to compile both. ImageMagick completed fairly quickly, but CVD seems
to take extremely long (could be a platform/compiler specific issue,
I"m not sure. Are they making heavy use of templates?). We could
look at extracting the IO part of CVD or ImageMagick to keep things
light-weight. But like I mentioned earlier, we could just wrap
existing solutions -- the user is bound to have PIL or matplotlib or
imagemagick or ... installed (and we can encourage them to do so in
the readme, for example).

>> My personal feeling is that we should stay away from general
>> dependencies, if possible.  I don't intend for scikits.image to become
>> a wrapper around libcvd or opencv -- those wrappers already exist.
>> Rather, I want to focus on implementing novel image processing
>> techniques that are not easily available elsewhere.  [Of course, if a
>> function is easy enough to implement and useful for general purpose
>> image processing (such as the color conversion routines), there's
>> little reason to exclude it.]
>
> Novel image processing algorithms not available elsewhere? Like what?

Sorry, I should have said "novel OR not easily available elsewhere".
My main thought was that we should not try to replicate the wrappers
for OpenCV, for example.

> Image Processing. Most researchers use MATLAB, C++, or a combination
> of both. We should think of ways to broaden the appeal of Python to
> such researchers and the development of scikits.image should reflect
> it.

Absolutely, but since we can't be everything to all people, I'd rather
make a difference where it is needed: adding algorithms not already
easily accessible to Python users.

Cheers
Stéfan

Damian Eads

unread,
Oct 3, 2009, 8:17:23 PM10/3/09
to scikit...@googlegroups.com, Edward Rosten
2009/10/4 Stéfan van der Walt <ste...@sun.ac.za>:

>
> 2009/10/4 Damian Eads <ea...@soe.ucsc.edu>:
>> It is an important distinction. Along these lines, LIBCVD has no
>> general dependencies other than a C++ compiler and compiles on both
>> GCC and Visual Studio. If chosen as a specific dependency, it wouldn't
>> increase the size of our dependency DAG by very much at all.
>
> I downloaded both ImageMagick and CVD earlier this evening and started
> to compile both.  ImageMagick completed fairly quickly, but CVD seems
> to take extremely long (could be a platform/compiler specific issue,
> I"m not sure.  Are they making heavy use of templates?).

It's not a template issue in this case. The FAST corner detector,
which compiles by default in LIBCVD, requires a lot of memory and
computation. To disable, try ./configure --disable-fast7
--disable-fast8, --disable-fast9. This should greatly speed up
compilation.

> We could
> look at extracting the IO part of CVD or ImageMagick to keep things
> light-weight.

Could do. LIBCVD is pretty lightweight and much smaller than OpenCV.
For example, there aren't high-level systems like face detectors in
LIBCVD nor will there ever be. LIBCVD just contains basic data
structures (an Image and ImageRef class), basic image/video loaders,
and easy-to-use, highly optimized image processing operators. Its
interface is designed to be functional rather than object-oriented.

> But like I mentioned earlier, we could just wrap
> existing solutions -- the user is bound to have PIL or matplotlib or
> imagemagick or ... installed (and we can encourage them to do so in
> the readme, for example).

That could work too. :)

>>> My personal feeling is that we should stay away from general
>>> dependencies, if possible.  I don't intend for scikits.image to become
>>> a wrapper around libcvd or opencv -- those wrappers already exist.
>>> Rather, I want to focus on implementing novel image processing
>>> techniques that are not easily available elsewhere.  [Of course, if a
>>> function is easy enough to implement and useful for general purpose
>>> image processing (such as the color conversion routines), there's
>>> little reason to exclude it.]
>>
>> Novel image processing algorithms not available elsewhere? Like what?
>
> Sorry, I should have said "novel OR not easily available elsewhere".
> My main thought was that we should not try to replicate the wrappers
> for OpenCV, for example.

Agreed, I don't think we want to replicate efforts to wrap existing
libraries. However, I think rewrapping is acceptable if we offer a
much simpler, functional interface than what has been done before. One
of the reasons why MATLAB is so popular is its functional style and
use of arrays to represent most data. If we can greatly reduce
boilerplating then duplicating efforts may be worthwhile.

>> Image Processing. Most researchers use MATLAB, C++, or a combination
>> of both. We should think of ways to broaden the appeal of Python to
>> such researchers and the development of scikits.image should reflect
>> it.
>
> Absolutely, but since we can't be everything to all people, I'd rather
> make a difference where it is needed: adding algorithms not already
> easily accessible to Python users.

Yes, I still need to integrate the morphology code into your branch,
once I get around figuring out how GIT works.

Damian

Stéfan van der Walt

unread,
Oct 3, 2009, 8:53:24 PM10/3/09
to scikit...@googlegroups.com
2009/10/4 Damian Eads <ea...@soe.ucsc.edu>:

> Agreed, I don't think we want to replicate efforts to wrap existing
> libraries. However, I think rewrapping is acceptable if we offer a
> much simpler, functional interface than what has been done before. One
> of the reasons why MATLAB is so popular is its functional style and
> use of arrays to represent most data. If we can greatly reduce
> boilerplating then duplicating efforts may be worthwhile.

Right on.

> Yes, I still need to integrate the morphology code into your branch,
> once I get around figuring out how GIT works.

The easiest way may be to:

1. Branch off the current master
2. Copy your changes in and commit as necessary
3. Push back to the server using

git push origin <your_current_branch_name>
4. Click on "Request Merge"

The most important thing is not to merge with my or other branches
while developing. If you feel you'd like to provide a patch that
would provide cleanly, you can rebase, as long as you are aware of the
problems that can cause (for example, never rebase published changes).

Re: morphologie -- should we consider including the code from the
other Python library as well?

Regards
Stéfan

Gary Ruben

unread,
Oct 3, 2009, 10:39:33 PM10/3/09
to scikits-image
Re, IO, has anyone looked into any of the binary file parser libraries
for Python?
For example, there's pyffi, construct and bdec.
Pyffi http://pyffi.sourceforge.net/ looks to me like the best
candidate if this approach was to be considered and it's BSD licensed.
The advantages are that this approach should be robust against faulty
files, there's a gui file editor, it provides access to all the file
contents (not just the image planes) and it may provide a nice general
way to read more general (non-image) binary files in numpy. A possible
disadvantage is that it doesn't take advantage of any of numpy's
binary file machinery so it may be slower, but maybe this could be
improved. It's not clear whether specifying the file format with
something like this makes life easier, but I thought I'd put it out
there.

Construct may be worth a look, but I can't see any license info.
http://construct.wikispaces.com/

There's also bdec, but it's lgpl'ed so not a candidate:
http://www.hl.id.au/projects/bdec/

Gary

On Oct 4, 11:53 am, Stéfan van der Walt <ste...@sun.ac.za> wrote:
> 2009/10/4 Damian Eads <e...@soe.ucsc.edu>:

Ralf Gommers

unread,
Oct 5, 2009, 3:52:52 AM10/5/09
to scikit...@googlegroups.com
On Sun, Oct 4, 2009 at 4:39 AM, Gary Ruben <gary....@gmail.com> wrote:

Re, IO, has anyone looked into any of the binary file parser libraries
for Python?
For example, there's pyffi, construct and bdec.
Pyffi http://pyffi.sourceforge.net/ looks to me like the best
candidate if this approach was to be considered and it's BSD licensed.
The advantages are that this approach should be robust against faulty
files, there's a gui file editor, it provides access to all the file
contents (not just the image planes) and it may provide a nice general
way to read more general (non-image) binary files in numpy. A possible
disadvantage is that it doesn't take advantage of any of numpy's
binary file machinery so it may be slower, but maybe this could be
improved. It's not clear whether specifying the file format with
something like this makes life easier, but I thought I'd put it out
there.

With binary machinery do you mean memmap? I thought that that only helps you when the file does not fit in memory. Other than that numpy only has save/load which uses a very simple format for saving/loading ndarrays.

 
Construct may be worth a look, but I can't see any license info.
http://construct.wikispaces.com/

No activity there in 18 months it seems.

Another project that looks good is Hachoir http://bitbucket.org/haypo/hachoir/ .

These binary parsers seem like they make it easier to add new file formats, however none of them have parsers for image formats yet so it would be a lot of work.

Cheers,
Ralf

Gary Ruben

unread,
Oct 5, 2009, 7:13:44 AM10/5/09
to scikit...@googlegroups.com
See inline comments below.

Ralf Gommers wrote:
> On Sun, Oct 4, 2009 at 4:39 AM, Gary Ruben <gary....@gmail.com
> <mailto:gary....@gmail.com>> wrote:
>
> Re, IO, has anyone looked into any of the binary file parser libraries
> for Python?
> For example, there's pyffi, construct and bdec.
> Pyffi http://pyffi.sourceforge.net/ looks to me like the best
> candidate if this approach was to be considered and it's BSD licensed.
> The advantages are that this approach should be robust against faulty
> files, there's a gui file editor, it provides access to all the file
> contents (not just the image planes) and it may provide a nice general
> way to read more general (non-image) binary files in numpy. A possible
> disadvantage is that it doesn't take advantage of any of numpy's
> binary file machinery so it may be slower, but maybe this could be
> improved. It's not clear whether specifying the file format with
> something like this makes life easier, but I thought I'd put it out
> there.
>
> With binary machinery do you mean memmap? I thought that that only helps
> you when the file does not fit in memory. Other than that numpy only has
> save/load which uses a very simple format for saving/loading ndarrays.

No. I just meant numpy's binary file reading straight into numpy arrays,
rather than indirectly via some python-native container that I assume
requires intermediate storage using python types before being
transferred to numpy types.

> Construct may be worth a look, but I can't see any license info.
> http://construct.wikispaces.com/
>
> No activity there in 18 months it seems.
>
> Another project that looks good is Hachoir
> http://bitbucket.org/haypo/hachoir/ .
>
> These binary parsers seem like they make it easier to add new file
> formats, however none of them have parsers for image formats yet so it
> would be a lot of work.
>
> Cheers,
> Ralf

Looks like construct does have some image types supported
<http://sebulbasvn.googlecode.com/svn/trunk/construct/formats/graphics/>
but they may not be very robust. Anyway, I agree that it would be a lot
of work, but if the work has to be done anyway, I don't think that's an
argument against this approach if there are clear benefits.

Gary

Damian Eads

unread,
Oct 5, 2009, 8:58:15 AM10/5/09
to scikit...@googlegroups.com
Ed Rosten meant to send this to everyone so I'm forwarding this.


---------- Forwarded message ----------
From: Edward Rosten <edward...@gmail.com>
Date: Mon, Oct 5, 2009 at 10:33 AM
Subject: Re: OpenImageIO
To: Damian Eads <ea...@soe.ucsc.edu>


On Sun, Oct 4, 2009 at 1:17 AM, Damian Eads <ea...@soe.ucsc.edu> wrote:

>> I downloaded both ImageMagick and CVD earlier this evening and started
>> to compile both.  ImageMagick completed fairly quickly, but CVD seems
>> to take extremely long (could be a platform/compiler specific issue,
>> I"m not sure.  Are they making heavy use of templates?).
>
> It's not a template issue in this case. The FAST corner detector,
> which compiles by default in LIBCVD, requires a lot of memory and
> computation. To disable, try ./configure --disable-fast7
> --disable-fast8, --disable-fast9. This should greatly speed up
> compilation.

There are some compiler specific issues. Quite a few from the gcc 3.x
series failed to compile the code at all (getting stuck in an infinite
loop or failing entirely with an internal error). GCC 4.0.x and 4.1.x
are pretty slow with this code. The more recent iterations of 4.x work
well.

By the way, the makefile supports parallel builds flawlessly, which
speeds things up a great deal. The entire thing compiles on 30 seconds
on a fast, single socket computer.

>> We could
>> look at extracting the IO part of CVD or ImageMagick to keep things
>> light-weight.
>
> Could do. LIBCVD is pretty lightweight and much smaller than OpenCV.

It is possible, not that hard even. Due to the amount of generic code
in libCVD, once you have the image IO and video loading (with the
requisite colourspace conversion code), you will have most of the
compilable source code. This step wouldn't save you very much over
all. Much of the remaining code is in templates, so it won't add
anything unless you try to instantiate specific algorithms.

If you are still concerned about the extras, then I could put an
option in configure to compile only the image IO part, or we could
figure some other way of making it work. Although the build system is
quite monolithic, the library itself is very modular. I think it would
be worth trying to prevent fragmentation if possible, since it makes
maintenance, feature enhancements etc easier.

> For example, there aren't high-level systems like face detectors in
> LIBCVD nor will there ever be. LIBCVD just contains basic data
> structures (an Image and ImageRef class), basic image/video loaders,
> and easy-to-use, highly optimized image processing operators. Its
> interface is designed to be functional rather than object-oriented.

I would like to add that libCVD will stay this way.

-Ed

--
(You can't go wrong with psycho-rats.)  (http://mi.eng.cam.ac.uk/~er258)

/d{def}def/f{/Times findfont s scalefont setfont}d/s{11}d/r{roll}d f 2/m
{moveto}d -1 r 230 350 m 0 1 179{1 index show 88 rotate 4 mul 0 rmoveto}
 for /s 12 d f pop 235 420 translate 0 0 moveto 1 2 scale show showpage

--

Zachary Pincus

unread,
Oct 5, 2009, 10:04:16 AM10/5/09
to scikit...@googlegroups.com
> You spent some time writing a replacement IO reader in pure Python, if
> I recall correctly; did you have any practically usable results?

I looked into this for a while, and came to the conclusion that it
would be very annoying yet technically simple to write a bunch of
basic image format parsers in pure python (using the PIL image plugins
as a guide). Any compression beyond what exists in the python stdlib
(which is to say, zlib, basically), though, becomes rather more of a
pain -- either you'd have to disallow jpeg IO, or write/wrap a jpeg
decoder -- neither of which sound particularly fun.

That is, I think that one could write simple PNG and TIFF decoders
(which do not support all the corners of the spec, but neither do
those in the PIL) in pure python / numpy in a day or so. This would be
useful for many people, but lacking jpeg would be a big issue. Perhaps
we could grab just the C core of some jpeg decoder/encoder somewhere
and use that?

Otherwise, I think the best option is to find a simple, dependency-
free C image IO library to wrap. CVD looks OK here.

Zach

Stéfan van der Walt

unread,
Oct 5, 2009, 12:14:54 PM10/5/09
to scikit...@googlegroups.com
Hi Zach

2009/10/5 Zachary Pincus <zachary...@yale.edu>:


>
>> You spent some time writing a replacement IO reader in pure Python, if
>> I recall correctly; did you have any practically usable results?

[...]

> That is, I think that one could write simple PNG and TIFF decoders
> (which do not support all the corners of the spec, but neither do
> those in the PIL) in pure python / numpy in a day or so. This would be
> useful for many people, but lacking jpeg would be a big issue. Perhaps
> we could grab just the C core of some jpeg decoder/encoder somewhere
> and use that?

libjpeg and libpng are both fairly easy to wrap with a couple of
cython / ctypes calls, so I might just do that at a next sprint.

Looking back at this conversation, I believe a plugin system would be
a practical solution that can be implemented right away. For example:
plugins (be it for PIL, CVD, Magick, etc.) are asked to load an image.
If a plugin fails because the format is not supported, it raises a
FormatError and the next plugin is used.

WIth a plugin system in place, we can later replace as much of the
functionality under the hood as we want, while having developed a
consistent interface that can be exposed to the user right away (via
imread).

Let me know your thoughts.

Regards
Stéfan

Ralf Gommers

unread,
Oct 7, 2009, 5:48:07 AM10/7/09
to scikit...@googlegroups.com


2009/10/5 Stéfan van der Walt <ste...@sun.ac.za>


Hi Zach

2009/10/5 Zachary Pincus <zachary...@yale.edu>:
>
>> You spent some time writing a replacement IO reader in pure Python, if
>> I recall correctly; did you have any practically usable results?

[...]

> That is, I think that one could write simple PNG and TIFF decoders
> (which do not support all the corners of the spec, but neither do
> those in the PIL) in pure python / numpy in a day or so. This would be
> useful for many people, but lacking jpeg would be a big issue. Perhaps
> we could grab just the C core of some jpeg decoder/encoder somewhere
> and use that?

libjpeg and libpng are both fairly easy to wrap with a couple of
cython / ctypes calls, so I might just do that at a next sprint.

Looking back at this conversation, I believe a plugin system would be
a practical solution that can be implemented right away.  For example:
plugins (be it for PIL, CVD, Magick, etc.) are asked to load an image.
 If a plugin fails because the format is not supported, it raises a
FormatError and the next plugin is used.

A plugin system sounds like a good idea. Maybe it needs a little more than waiting for a format error, because it is possible for a format to be supported but in a buggy way. Then you'd get back an array filled with garbage.

It should be possible for the user to specify the order in which libraries are tried, to exclude libraries completely, as well as easily register their own library as a plugin.


WIth a plugin system in place, we can later replace as much of the
functionality under the hood as we want, while having developed a
consistent interface that can be exposed to the user right away (via
imread).

Do you want a single function for everything, or different functions for single-page / multi-page images? Having to do something like:

img = open(fname)
img2d = imread(img)
img.seek()
img2d = imread(img)
img.seek()

would be less than ideal.

Anyway, a big thumbs up for a plugin system no matter what the interface will look like exactly.

Cheers,
Ralf

Stéfan van der Walt

unread,
Oct 7, 2009, 7:34:31 AM10/7/09
to scikit...@googlegroups.com
2009/10/7 Ralf Gommers <ralf.g...@googlemail.com>:

> Do you want a single function for everything, or different functions for
> single-page / multi-page images? Having to do something like:
>
> img = open(fname)
> img2d = imread(img)
> img.seek()
> img2d = imread(img)
> img.seek()
>
> would be less than ideal.

I have some code waiting to be merged that implements an
ImageCollection. Typically, you'd have

ic = ImageCollection('*.png')

where all PNGs are access only as necessary, and are cached once
they've been read from disk. You can also index into or iterate over
an ImageCollection (yielding the image arrays). It sounds like a
multi-image could be interpreted as an ImageCollection.

> Anyway, a big thumbs up for a plugin system no matter what the interface
> will look like exactly.

OK, I'll implement this over the weekend. If someone else has time,
feel free to jump in.

Cheers
Stéfan

Ralf Gommers

unread,
Oct 9, 2009, 5:35:09 AM10/9/09
to scikit...@googlegroups.com


2009/10/7 Stéfan van der Walt <ste...@sun.ac.za>


2009/10/7 Ralf Gommers <ralf.g...@googlemail.com>:
> Do you want a single function for everything, or different functions for
> single-page / multi-page images? Having to do something like:
>
> img = open(fname)
> img2d = imread(img)
> img.seek()
> img2d = imread(img)
> img.seek()
>
> would be less than ideal.

I have some code waiting to be merged that implements an
ImageCollection.  Typically, you'd have

ic = ImageCollection('*.png')

where all PNGs are access only as necessary, and are cached once
they've been read from disk.  You can also index into or iterate over
an ImageCollection (yielding the image arrays).  It sounds like a
multi-image could be interpreted as an ImageCollection.

That sounds like a good option. Let me know if you want me to test it / work on it / send you some multi-image files.

Cheers,
Ralf
 

Stéfan van der Walt

unread,
Oct 9, 2009, 5:48:02 AM10/9/09
to scikit...@googlegroups.com
Hey Ralph

2009/10/9 Ralf Gommers <ralf.g...@googlemail.com>:


>> where all PNGs are access only as necessary, and are cached once
>> they've been read from disk.  You can also index into or iterate over
>> an ImageCollection (yielding the image arrays).  It sounds like a
>> multi-image could be interpreted as an ImageCollection.
>
> That sounds like a good option. Let me know if you want me to test it / work
> on it / send you some multi-image files.

I'd appreciate it if you could investigate a bit further. The code I
was referring to is at

http://bazaar.launchpad.net/~stefanv/supreme/main/annotate/head%3A/supreme/misc/io.py

As you can see, it is very simplistic. It also returns a bunch of
Image objects, that we don't need. But the basic idea is there: a
container over which you can iterate, that loads images on demand and
keeps a cache as necessary. I've never played with loading of
multi-layer images, so I hope you can get something going.

Cheers
Stéfan

Ralf Gommers

unread,
Oct 9, 2009, 7:19:23 AM10/9/09
to scikit...@googlegroups.com


2009/10/9 Stéfan van der Walt <ste...@sun.ac.za>

Sure, I'll give it a go. I cloned your scikits.image repo on github, will add a new branch and push to my cloned repo once it works. That is best way to do it right?

Another git question, for scipy I followed this guide: http://projects.scipy.org/numpy/wiki/GitMirror. Now I have it here: http://github.com/rgommers/scipy. Would it not be better to clone another scipy repo already on github, like David's or Pauli's? Or does it not matter?  

Cheers,
Ralf


Cheers
Stéfan

Stéfan van der Walt

unread,
Oct 9, 2009, 8:04:10 AM10/9/09
to scikit...@googlegroups.com
Hi Ralph

2009/10/9 Ralf Gommers <ralf.g...@googlemail.com>:


> Sure, I'll give it a go. I cloned your scikits.image repo on github, will
> add a new branch and push to my cloned repo once it works. That is best way
> to do it right?

I added some sparse instructions to

http://stefanv.github.com/scikits.image/contribute.html#development-process

but patches are welcome to flesh out the description.

> Another git question, for scipy I followed this guide:
> http://projects.scipy.org/numpy/wiki/GitMirror. Now I have it here:
> http://github.com/rgommers/scipy. Would it not be better to clone another
> scipy repo already on github, like David's or Pauli's? Or does it not
> matter?

The idea is that, eventually, we have an official git repo that
everybody clones. As is, it seems we all have our own clones hanging
around, but David and Pauli's were probably made from the official
scipy.git repo. I agree, though, that the instructions can be
improved -- a lot! Hopefully we'll be switching to git and redmine
soon, then these problems will go away.

Cheers
Stéfan

Ralf Gommers

unread,
Oct 9, 2009, 2:36:52 PM10/9/09
to scikit...@googlegroups.com


2009/10/9 Stéfan van der Walt <ste...@sun.ac.za>

Thanks Stefan, that was a useful start. I added a MultiImg class which is quite similar to your ImgCollection. There are enough differences between a multi-image file and a collection of single image files to justify creating a separate class I think. The code is here:

http://github.com/rgommers/scikits.image/blob/imgcollection/scikits/image/io/io.py

It works with my multi-frame TIFF files (only PIL trunk, not 1.1.6), and once I figure out how to create a correct TIFF header/file (does anyone have code for this?) I can add a self-contained example and tests.

Things that would be useful to add:
- caching a configurable number of frames (now 1 or all)
- a dtype keyword
- switch to the new IO plugin system once it's ready
- add a MultiImgCollection
- what else?

Questions:
- do you want to keep the Image class in that form? It seems either a plain ndarray or ndarray + tags dict is enough.
- can I remove the EXIF stuff or move it to a subclass of Image? I don't think it belongs in the base Image class.
- should imread be moved into io.py?

I'd appreciate any feedback on the basic design and new feature suggestions.

Cheers,
Ralf




Cheers
Stéfan

Stéfan van der Walt

unread,
Oct 10, 2009, 2:12:04 AM10/10/09
to scikit...@googlegroups.com
Hey Ralf

2009/10/9 Ralf Gommers <ralf.g...@googlemail.com>:
> http://github.com/rgommers/scikits.image/blob/imgcollection/scikits/image/io/io.py

Thanks for working on this!

> Questions:
> - do you want to keep the Image class in that form? It seems either a plain
> ndarray or ndarray + tags dict is enough.

I'd like to remove the image class entirely.

> - can I remove the EXIF stuff or move it to a subclass of Image? I don't
> think it belongs in the base Image class.

Yes, although having an EXIF reader as a separate function might be
handy! That code is BSD-licensed AFAIK.

> - should imread be moved into io.py?

Let's leave it where it is for now. It is accessible as
scikits.image.io.imread, which is fine from a user perspective.

Other notes:

If you require PIL trunk, you need to check that it is available
explicitly. Also, the PIL import test is already done by imread.

About naming: I'd prefer if we expand the names, i.e. MultiImage
instead of MultiImg. I've learnt this the hard way, but it seems I
can never remember my own shorthand :)

The example markup does not require the "::".

The description of MultiImage could be changed to reflect what it is
storing, i.e. something like

class MultiImage(object):
"""Class for loading multi-layer images."""

When using try-except statements, keep the code snippet contained as
small as possible. In this case, there's no problem really, because
you specifically wait for an EOFError. In general, however, it's
safer to use the form:

i = 0
while True:
i += 1
try:
img.seek(i)
except EOFError:
break
return i

Not sure whether you'll ever come across images without any frames
inside, but in those cases you need a return statement as well, as
above.

_getallframes can be simplified using _getframe:

frames = []
for i in range(len(self)):
frames.append(self._getframe(i))
return frames

The numframes variable should not be exposed, since len() is already available.

The string representation can also include information on the number
of frames, e.g.

cat.tiff [50 frames]

Finally, ensure that read-only members are defined as properties:

@property
def filename(self):
return _filename

As always with review comments: they may be overly pedantic, so use
what you find applicable and discard the rest.

Cheers
Stéfan

Ralf Gommers

unread,
Oct 10, 2009, 3:56:39 AM10/10/09
to scikit...@googlegroups.com

Hi Stefan,

2009/10/10 Stéfan van der Walt <ste...@sun.ac.za>


Hey Ralf

2009/10/9 Ralf Gommers <ralf.g...@googlemail.com>:
Thanks for working on this!

> Questions:
> - do you want to keep the Image class in that form? It seems either a plain
> ndarray or ndarray + tags dict is enough.

I'd like to remove the image class entirely.

> - can I remove the EXIF stuff or move it to a subclass of Image? I don't
> think it belongs in the base Image class.

Yes, although having an EXIF reader as a separate function might be
handy!  That code is BSD-licensed AFAIK.

What I added is BSD-licensed as well.
 
> - should imread be moved into io.py?

Let's leave it where it is for now.  It is accessible as
scikits.image.io.imread, which is fine from a user perspective.

Other notes:

If you require PIL trunk, you need to check that it is available
explicitly.  Also, the PIL import test is already done by imread.

OK. I'll move the import into the MultiImg class then, so ImageCollection still works if trunk is not available.
 
About naming: I'd prefer if we expand the names, i.e. MultiImage
instead of MultiImg.  I've learnt this the hard way, but it seems I
can never remember my own shorthand :)

The reason was the Image class, which conflicted with the PIL import. That is solved now, so I'll expand all the names again.

The example markup does not require the "::".

I saw Pauli do that for examples that are not self-contained, i.e. can't be run with doctest. Alternatively I can use the #doctest +SKIP markup (ugly as well...).
 
The description of MultiImage could be changed to reflect what it is
storing, i.e. something like

class MultiImage(object):
  """Class for loading multi-layer images."""

When using try-except statements, keep the code snippet contained as
small as possible.  In this case, there's no problem really, because
you specifically wait for an EOFError.  In general, however, it's
safer to use the form:

i = 0
while True:
   i += 1
   try:
       img.seek(i)
   except EOFError:
       break
return i

Sure, I'l change that.
 
Not sure whether you'll ever come across images without any frames
inside, but in those cases you need a return statement as well, as
above.

_getallframes can be simplified using _getframe:

frames = []
for i in range(len(self)):
   frames.append(self._getframe(i))
return frames

_getframe opens and closes the file each time, so _getallframes should be a little faster. And it's still simple code, so I think it's worth the few lines of duplication.

The numframes variable should not be exposed, since len() is already available.

Sure, I'll make it private.
 
The string representation can also include information on the number
of frames, e.g.

cat.tiff [50 frames]

Makes sense.
 
Finally, ensure that read-only members are defined as properties:

@property
def filename(self):
   return _filename

Sure.

As always with review comments: they may be overly pedantic, so use
what you find applicable and discard the rest.

Don't worry, I find the above very useful. Thanks for the feedback.

Cheers,
Ralf

 
Cheers
Stéfan

Stéfan van der Walt

unread,
Oct 11, 2009, 4:50:42 AM10/11/09
to scikit...@googlegroups.com
2009/10/10 Ralf Gommers <ralf.g...@googlemail.com>:

>> The example markup does not require the "::".
>>
> I saw Pauli do that for examples that are not self-contained, i.e. can't be
> run with doctest. Alternatively I can use the #doctest +SKIP markup (ugly as
> well...).

OK, that's fine then!

Let me know when you're done, then I'll have a look at the patch.

Cheers
Stéfan

Ralf Gommers

unread,
Oct 11, 2009, 8:41:03 PM10/11/09
to scikit...@googlegroups.com


2009/10/11 Stéfan van der Walt <ste...@sun.ac.za>

It's done. Nitpick away!

Also, what do you think about adding a dtype keyword to imread? I find it useful to be able to get images as float for example so you don't have to worry about division problems.

Cheers,
Ralf
 

Cheers
Stéfan

Stéfan van der Walt

unread,
Oct 12, 2009, 2:25:04 AM10/12/09
to scikit...@googlegroups.com
2009/10/12 Ralf Gommers <ralf.g...@googlemail.com>:

> Also, what do you think about adding a dtype keyword to imread? I find it
> useful to be able to get images as float for example so you don't have to
> worry about division problems.

That sounds like a useful addition. It should probably default to
int8 or uint8 -- whatever is currently returned.

Stéfan

Stéfan van der Walt

unread,
Oct 12, 2009, 3:09:06 AM10/12/09
to scikit...@googlegroups.com
2009/10/12 Ralf Gommers <ralf.g...@googlemail.com>:

>> Let me know when you're done, then I'll have a look at the patch.
>
> It's done. Nitpick away!

Thanks, Ralf! I've merged your changes:

http://github.com/stefanv/scikits.image/commits/

Cheers
Stéfan

Ralf Gommers

unread,
Oct 12, 2009, 7:59:15 AM10/12/09
to scikit...@googlegroups.com


2009/10/12 Stéfan van der Walt <ste...@sun.ac.za>
Thanks. I've fixed a test that broke due to the io -> collection rename, and added dtype keywords to imread and MultiImage. Defaults to None, which keeps the current behavior.

Can you pull those changes as well?

Cheers,
Ralf
 
Cheers
Stéfan

Stéfan van der Walt

unread,
Oct 12, 2009, 9:32:14 AM10/12/09
to scikit...@googlegroups.com
2009/10/12 Ralf Gommers <ralf.g...@googlemail.com>:

> Thanks. I've fixed a test that broke due to the io -> collection rename, and
> added dtype keywords to imread and MultiImage. Defaults to None, which keeps
> the current behavior.
>
> Can you pull those changes as well?

Thanks, done (will push soon).

In the future, it may be easier not to merge with the master branch.
I'm still figuring out the best way to do this, but I think that will
be easier since I can then just merge your branch, instead of cherry
picking out the changes.

Thanks!
Stéfan

Ralf Gommers

unread,
Oct 12, 2009, 9:43:22 AM10/12/09
to scikit...@googlegroups.com
2009/10/12 Stéfan van der Walt <ste...@sun.ac.za>

2009/10/12 Ralf Gommers <ralf.g...@googlemail.com>:

Hmm, not sure how else I would have fixed that test, since it only broke after you renamed io.py in the master branch. Why did you have to cherry pick, instead of just merging back my imgcollection branch into your master?

Disclaimer: I am also quite new to this way of doing things.

Cheers,
Ralf
 

Thanks!
Stéfan

Stéfan van der Walt

unread,
Oct 12, 2009, 9:48:38 AM10/12/09
to scikit...@googlegroups.com
2009/10/12 Ralf Gommers <ralf.g...@googlemail.com>:

>> In the future, it may be easier not to merge with the master branch.
>> I'm still figuring out the best way to do this, but I think that will
>> be easier since I can then just merge your branch, instead of cherry
>> picking out the changes.
>
> Hmm, not sure how else I would have fixed that test, since it only broke
> after you renamed io.py in the master branch. Why did you have to cherry
> pick, instead of just merging back my imgcollection branch into your master?
>
> Disclaimer: I am also quite new to this way of doing things.

You could simply have created a new branch, and made your changes
there. One branch per change (or related set of changes) sounds about
right.

If I simply merged, we would have had messages in the commit log such as:

Stefan merged Ralf's branch.
Ralf merged Stefan's main branch.
Ralf changes this and that.

Now, we just have:

Stefan merged Ralf's branch.
Ralf changed this and that.

Have a look at these two articles:

http://article.gmane.org/gmane.comp.video.dri.devel/34744
http://lwn.net/Articles/328436/

Cheers
Stéfan

Ralf Gommers

unread,
Oct 12, 2009, 10:14:04 AM10/12/09
to scikit...@googlegroups.com
2009/10/12 Stéfan van der Walt <ste...@sun.ac.za>

2009/10/12 Ralf Gommers <ralf.g...@googlemail.com>:

Makes sense, thanks for the lesson:)

Cheers,
Ralf


Cheers
Stéfan

Reply all
Reply to author
Forward
0 new messages