Let's take a moment to think about what names our (small) API should expose, and what their signatures should be. I'll start with an overview of the names and sigs for some popular software. Note that most allow additional arguments.
Matlab:
imread(filename,fmt)
imwrite(m, filename, fmt)
PIL:
open(filename, mode=None)
im.save(filename, format=None) # Method of the Image class
visvis:
imread(filename, format=None)
imwrite(filename, array, format=None)
mahotas:
imread(filename, as_grey=False, formatstr=None)
imsave(filename, array, formatstr=None)
tiffile:
imread(filename)
imsave(filename, array)
I would prefer to use names that match up as much as possible with existing packages. Existing Python packages are probably more important than matching with Matlab. Interestinglt, in all Python "write" functions the filename comes before the array to save, while in Matlab the array comes first. For reading, "imread" clearly wins, and the signature is no problem. For writing, both "imread" and "imwrite" are used. I think I like imsave, because that avoids confusion with Matlabs "imread". (I whish I'd thought of that when I made visvis.imwrite.)
It has been argued that we could just use "read" and "write" (or "save"). But I prefer the "im" prefix. Not only for compatibility with existing packages, but also because it allows a user to do "from imageio import imread, imsave".
I think it's probably a good idea to allow the user to explicitly specify the format. Especially when we are going to introduce new specific formats, there may come a day when two formats use the same extension, and some formats (e.g. DICOM) don't usually stick to one extension. Therefore, my proposal is:
imread(filename, format=None, **kwargs)
imsave(filename, array, format=None, **kwargs)
The (optional) keyword arguments are passed to the plugins. For FreeImage formats, they will mostly be converted to FreeImage flags.
(This post is about functionality for reading/writing 2D images. For animated/multipage images (and maybe volume data) I think I like to introduce separate functions. But lets discuss that in another thread.)
I think there are arguments in favor of not using different functions,
but rather handling multiple-image formats using keyword arguments to
fetch particular subsets of the data (or even only a single image at a
time). In particular, I think it allows developers using the library
to use fewer code paths, as there can be a single place in the code
where images are read, and the code leading up to it builds up the
keyword arguments as necessary.
For efficiency, it might be worth separating reading/writing image
into separate phases: opening the file and reading/writing the data,
similar to how h5py wraps HDF5 files or matplotlib handles multipage
PDFs. It might be limiting to require all of the data be ready in
order to write a multi-image format.
How about an object with number_of_images(), read_info(img_num) andread_image(img_num) methods (just to start with something)? The
module's default read_image could just be to load image number 0.
Bioformats is written for the bio-image world with an emphasis on 2D
images, possibly with timelapse and/or Z stacks, and so the TZ
coordinates (and series) become first class parts of the interface.
I'd prefer a solution that avoids this sort of data-defined interface.
There are some valid question that are brought up by it, though. What
should a library do when asked to "read an image" from each of the
following:
- a TIFF file with a single RGB image?
- a TIFF file with 3 Z-stacks?
- a TIFF file with 3 different series, each of a single grayscale image?
- a TIFF file with three channels, but not really RGB (i.e., three
different cell stains)?
- a TIFF file with 3 different grayscale timepoints?
Some of these would reasonably return a MxNx3 array, some might return
a MxN grayscale image. I'm sure there are other random formats where
there's another way to separate images rather than
channel/Z/time/series.
RGB vs. grayscale is a continual pain in the
ass, as well, a sometimes 3-channel data is stored in RGB, even though
it's not actual RGB (just some random 3-channel image).
>> How about an object with number_of_images(), read_info(img_num) andAs long as it's possible to get the metadata about the images, I think
>> read_image(img_num) methods (just to start with something)? The
>> module's default read_image could just be to load image number 0.
>
>
> Yes, something like that looks great. Although I'd prefer to make it look
> like an iterator, so the user can use the object in a for-loop, and do
> len(readerObject) to get the number of images.
that would be fine.
TIFF, through private tags, can store all kinds of things. A common
format in biological imaging is the LSM format, which can add all
kinds of extra dimensions. For an example, search for LSMInfo on the
pylibtiff page: http://code.google.com/p/pylibtiff/
Perhaps this is something that imageio doesn't need to deal with, but
there does need to be *some* way to get at the metadata, I think.
The RGB vs. three Z-stacks is made more ambiguous by some strangeness
in TIFF, in that an RGB image can be stored all together (as RGB) or
as separate grayscale images. So I think it's sometimes difficult to
do something correct for get_image(image_index=0). Is that the R
channel of an RGB image, or all the RGB data together?
Maybe all of this can be sidestepped in the interest of simplicity,
but I think it's important to balance the simplicity against the power
of whatever you end up with. TIFF is just an egregiously difficult
case, but one that many scientists have to deal with.
TIFF, through private tags, can store all kinds of things. A common
format in biological imaging is the LSM format, which can add all
kinds of extra dimensions. For an example, search for LSMInfo on the
pylibtiff page: http://code.google.com/p/pylibtiff/
Perhaps this is something that imageio doesn't need to deal with, but
there does need to be *some* way to get at the metadata, I think.So, if I understand correctly, TIFF allows one to put multiple images in a file, and to store arbitrary tags (metadata) with it. LSM is a format that "defines" some tag names to give them meaning and thereby more explicit structure to the file.If that is the case, then the plugin that deals with TIFF can be relatively agnostic about what the data mean, while the LSM plugin (if we get one) would use the meta information to give meaning to the data and uses it to structure what it returns. Does that make sense?The RGB vs. three Z-stacks is made more ambiguous by some strangeness
in TIFF, in that an RGB image can be stored all together (as RGB) or
as separate grayscale images. So I think it's sometimes difficult to
do something correct for get_image(image_index=0). Is that the R
channel of an RGB image, or all the RGB data together?Yuk. I suppose the best solution would be to combine them to RGB if the user used imread() and give the separate images if multiple_imread() (or whatever we call it) was used.Maybe all of this can be sidestepped in the interest of simplicity,
but I think it's important to balance the simplicity against the power
of whatever you end up with. TIFF is just an egregiously difficult
case, but one that many scientists have to deal with.Yes, we should offer simplicity for the simple cases, and allow more power for the more complex cases. But in a way that keeps our lives easy :) The hard part should be for the person that implements the "plugin" to read a certain format.
Ok, let me propose the following. Please let me know if this sounds right to you...
- There shall be 4 functions for reading: read() imread() mimread() (the M is for multiple) and volread().
- The function read(filename, format=None, expect=None, **kwargs) returns a reader object. Using the expect keyword the user can specify whether he expects an image, multiple images or a volume (or other?).
- The reader has a read_array(*indices) method and a read_meta(*indices) method to read the image data and meta info.
- The reader object has a __len__ method and can be used as an iterator, which will simply yield all images/volumes.
- The reader object may have additional methods. For instance, LSM may implement reader.stacks() to iterate over the z-stacks, or reader.timepoints() to iterate over the timepoint, or expose whatever API works for the specific format. Naturally, we should try and keep the API's of such "exotic" formats similar, but we can work out the details as they are introduced in imageio.
- imread() does someting like read(filename, format, expect=IMAGE).read_array(0)
- mimread() does something like [im for im in read(filename, format, expect=MULTI_IMAGE)]
- A similar story applies to save(), imsave(), mimsave(), and volsave().
So each plugin for imageio consists of a Reader and a Writer class, which will be instantiated and returned by read() and save(). The "expect" argument can be used by the read() function to check whether the selected/detected plugin supports what is expected. But in some cases can also be used by the plugin to yield the right data. For instance RGB tiff where the color planes are stored separately, or specifying VOLUME in DICOM will combine the multiple images to a volume for you.There should also be an easy way for the user to get access to the documentation of a plugin, to see which keyword arguments can be used, and in some cases what the reader/writer API looks like for a specific format.