Draft version of ImageArray

275 views
Skip to first unread message

Tim Holy

unread,
Mar 19, 2012, 4:58:07 PM3/19/12
to juli...@googlegroups.com
Hi folks,

If you're interested in images, see
https://github.com/timholy/julia/blob/imagelib/extras/image.jl
The principal advantage of this framework is that one can assure correctness
for (hopefully) any desired image type. It should also take some of the pain
out of geometric transformations ("which direction is up?")

Comments requested. I did not change any of the algorithms to operate on an
ImageArray yet, because I wanted feedback on the API before proceeding
further.

You need to be up-to-the minute with master to test this. The attached test
script shows most of the things you can do with this (not much, yet!). You'll
have to change the paths in the first two lines, and perhaps the coordinate
ranges if the image you choose to load in is smaller than the "peppers" image.

Best,
--Tim

image_test.jl

Stefan Kroboth

unread,
Mar 20, 2012, 5:45:11 AM3/20/12
to juli...@googlegroups.com
Hi Tim,

> Comments requested.

I have to admit that I didn't have time yet to try your code, but I have
a few comments. First of all, I think this looks very good!

1. It would be nice to also represent the value range in the data
structure. It's often good to know what values are to be expected. For
instance, to compute the mean square error, the maximum value needs to
be known.

2. I would rename data_size to dim, just to make it less to type.

3. Is it possible to make the color space a type? I haven't looked into
the details of types, so I don't know how to do this properly yet. The
goal of this could be to use multiple dispatch to decide which algorithm
to use depending on the color space. I was thinking of having serveral
"well known" color spaces defined as types and a "Other" type which is
defined further in color_space (and which is interpreted as RGB for
instance).

4. If I'm not mistaken, you define "valid" as "false" in the empty
constructor. If invalid pixels are represented as "true", wouldn't it be
more appropriate to name this field "invalid"?

5. Just a note for the future: we should also define a binary image
type. I'm thinking of binary masks, which aren't necessarily square like
a region of interest.

6. Another note for the future: Instead of reading the image using
imread, ImageArray can take care of this when passed a path to a image.
I'm sure you thought of this, I just wanted to mention it. I'll
implement this as soon as your changes are merged into Julia.

> # Test snipping by copy
> imgcopy = copy_pfields(img)
> imgcopy.arrayi_range = rng3

7. In this case, data_size is still the same as in img, right? If so, I
think this could lead to confusion. Since it's a copy, I think data_size
should be adapted as well.

8. In OpenCV, when you define a ROI, it creates a new header which
contains a pointer to the defined ROI of the original image, which means
that changes are actually performed in place on the original image. I
know that this can be achieved with arrayi_range (which I would call roi
(again, less to type)), but one often needs several ROIs. With the
current implementation, one would often need to change the ROI several
times instead of just defining several headers with different ROIs.

I'm sorry if I misunderstood some of your intentions. I hope that my
comments are still helpful.

I'm looking forward to working with this!

Best,
Stefan

Tim Holy

unread,
Mar 20, 2012, 12:59:50 PM3/20/12
to Stefan Kroboth, juli...@googlegroups.com
Hi Stefan,

On Tuesday, March 20, 2012 04:45:11 am Stefan Kroboth wrote:
> 1. It would be nice to also represent the value range in the data
> structure. It's often good to know what values are to be expected. For
> instance, to compute the mean square error, the maximum value needs to
> be known.

Agreed. Will do. (Though I'm not sure why one needs it to compute mean square
error, won't mean((x-y).^2) do the trick?)

>
> 2. I would rename data_size to dim, just to make it less to type.

Good idea. It probably wasn't obvious, but the real way to find the _current_
size of the image is just size(img). The field that will soon be called "dim"
is intended to support snipping out chunks from larger images. The structure
keeps track of the indices used for snipping, and you always know that the
original started at [1,1,1,...]. But you don't know how big the original was
without "dim". If you intend to paste things back together again later, it's
nice to know in advance how large an array to allocate to hold the final
assembled image. But aside from that usage, I don't think "dim" will be used
heavily.

>
> 3. Is it possible to make the color space a type? I haven't looked into
> the details of types, so I don't know how to do this properly yet. The
> goal of this could be to use multiple dispatch to decide which algorithm
> to use depending on the color space. I was thinking of having serveral
> "well known" color spaces defined as types and a "Other" type which is
> defined further in color_space (and which is interpreted as RGB for
> instance).

Yet another very good idea.

>
> 4. If I'm not mistaken, you define "valid" as "false" in the empty
> constructor. If invalid pixels are represented as "true", wouldn't it be
> more appropriate to name this field "invalid"?

I intended for a pixel i to be taken seriously only if valid[i] is true. I set
it to false in the default constructor simply because there is no image, so
the whole thing is invalid (note it gets set to true when you construct with
an array).

I think it will work better for outer products if we keep it as valid: suppose
you have a "camera mask" that specifies good/bad pixels on your camera, and a
"temporal mask" that specifies individual good/bad frames. Index camera pixels
by i and time slices by j. The problem is that
camera_invalid[i]*temporal_invalid[j] requires that a pixel be invalid both in
the camera sense and in the temporal sense, whereas
camera_valid[i]*temporal_valid[j] will mark it as invalid if it is invalid in
either case. I think we want the latter behavior.

> 5. Just a note for the future: we should also define a binary image
> type. I'm thinking of binary masks, which aren't necessarily square like
> a region of interest.

Absolutely. Obviously a vector of indices will serve to represent it, but we
might want to define natural operations with them, so a binary mask type makes
a lot of sense.

> 6. Another note for the future: Instead of reading the image using
> imread, ImageArray can take care of this when passed a path to a image.
> I'm sure you thought of this, I just wanted to mention it. I'll
> implement this as soon as your changes are merged into Julia.

Good. Can we parse png/tif directly rather than converting to ppm? Or get
convert to spit things out like the color model? Right now if you load a
grayscale image (cameraman.tif, for example), it converts it into an RGB.

>
> > # Test snipping by copy
> > imgcopy = copy_pfields(img)
> > imgcopy.arrayi_range = rng3
>
> 7. In this case, data_size is still the same as in img, right? If so, I
> think this could lead to confusion. Since it's a copy, I think data_size
> should be adapted as well.

It's intended to be, see the discussion above about using size(img) in normal
cases and only using data_size (changing to dim) in rare cases involving re-
stitching things together from their pieces.

> 8. In OpenCV, when you define a ROI, it creates a new header which
> contains a pointer to the defined ROI of the original image, which means
> that changes are actually performed in place on the original image. I
> know that this can be achieved with arrayi_range (which I would call roi
> (again, less to type)), but one often needs several ROIs. With the
> current implementation, one would often need to change the ROI several
> times instead of just defining several headers with different ROIs.

We can do that too, using sub rather than ref to take out snippets. I agree
that it's nice to do "virtual snippets" in some cases, but in others you want
to copy the data so that you can throw away the original (big) image to make
room for loading a new big image.

It sounds like it could be worthwhile to have a clean syntax for both, but I
haven't yet thought that through. Ideas welcome.

This is great feedback, I really appreciate it! Keep it coming!

I'll make some changes and submit a pull request. Today is busy, though, so it
may not be until tomorrow.

Best,
--Tim


Stefan Kroboth

unread,
Mar 20, 2012, 5:31:03 PM3/20/12
to juli...@googlegroups.com
Hi Tim,

> Agreed. Will do. (Though I'm not sure why one needs it to compute mean square
> error, won't mean((x-y).^2) do the trick?)

Sorry, I was thinking of the peak signal-to-noise ratio:

PSNR = 10*log(max^2/mse(img1,img2))

> Good idea. It probably wasn't obvious, but the real way to find the _current_
> size of the image is just size(img). The field that will soon be called "dim"
> is intended to support snipping out chunks from larger images. The structure
> keeps track of the indices used for snipping, and you always know that the
> original started at [1,1,1,...]. But you don't know how big the original was
> without "dim". If you intend to paste things back together again later, it's
> nice to know in advance how large an array to allocate to hold the final
> assembled image. But aside from that usage, I don't think "dim" will be used
> heavily.

I must have misunderstood this, because I was thinking of this to be
similar to OpenCV's "img.rows", "img.cols" and "img.channels".
So, to be certain, "dim" always holds the information of the dimensions
of the original image and "arrayi_range" holds the size of the current
snippet with its relation to the original image?

> I intended for a pixel i to be taken seriously only if valid[i] is true. I set
> it to false in the default constructor simply because there is no image, so
> the whole thing is invalid (note it gets set to true when you construct with
> an array).

I think "valid" is a perfectly fine name, I just got confused because it
got set to "false". But now it's clear :)



> I think it will work better for outer products if we keep it as valid: suppose
> you have a "camera mask" that specifies good/bad pixels on your camera, and a
> "temporal mask" that specifies individual good/bad frames. Index camera pixels
> by i and time slices by j. The problem is that
> camera_invalid[i]*temporal_invalid[j] requires that a pixel be invalid both in
> the camera sense and in the temporal sense, whereas
> camera_valid[i]*temporal_valid[j] will mark it as invalid if it is invalid in
> either case. I think we want the latter behavior.

On the other hand, assuming that there are in general more valid then
invalid pixels, an "invalid" array could be represented by a sparse
matrix.


> Absolutely. Obviously a vector of indices will serve to represent it, but we
> might want to define natural operations with them, so a binary mask type makes
> a lot of sense.

Representing it with a vector of indices is a good idea. Maybe a sparse
matrix would be suitable here too, because it allows to perform
pointwise matrix-matrix multiplications without explicitly looping
through the image/vector of indices. I'm not sure if this will really
cause a performance gain, or if it's just my Matlab-vectorize-everything
thinking that makes me assume that it will be faster.

> Good. Can we parse png/tif directly rather than converting to ppm? Or get
> convert to spit things out like the color model? Right now if you load a
> grayscale image (cameraman.tif, for example), it converts it into an RGB.

I'm not sure, I don't have much experience with convert (imread is
Jeff's creation, and I have to admit, he probably solved this in the
most awesome way with just a few lines of code). Do png/tiff even save
in other color spaces than RGB? One way could be to specify that it's an
grayscale image when loading an image and then automatically perform a
conversion to grayscale after loading. But I would be surprised if
convert didn't have an option to output just the gray scale values. If
it does, it's important that it produces the same conversion as rgb2gray
does, so that both ways produce the same image. For instance, it could
be possible that convert uses different weights for each color channel
than rgb2gray does.

> > > # Test snipping by copy
> > > imgcopy = copy_pfields(img)
> > > imgcopy.arrayi_range = rng3

> It's intended to be, see the discussion above about using size(img) in normal
> cases and only using data_size (changing to dim) in rare cases involving re-
> stitching things together from their pieces.

This makes total sense now.

> We can do that too, using sub rather than ref to take out snippets. I agree
> that it's nice to do "virtual snippets" in some cases, but in others you want
> to copy the data so that you can throw away the original (big) image to make
> room for loading a new big image.

I was thinking of this to be additional to a copy of a snippet.

> This is great feedback, I really appreciate it! Keep it coming!

Thanks, I'm glad I can help :)

Best,
Stefan

Tim Holy

unread,
Mar 20, 2012, 6:44:35 PM3/20/12
to Stefan Kroboth, juli...@googlegroups.com
Hi Stefan,

On Tuesday, March 20, 2012 04:31:03 pm Stefan Kroboth wrote:
> So, to be certain, "dim" always holds the information of the dimensions
> of the original image and "arrayi_range" holds the size of the current
> snippet with its relation to the original image?

Exactly.

> > I intended for a pixel i to be taken seriously only if valid[i] is true.
> > I set it to false in the default constructor simply because there is no
> > image, so the whole thing is invalid (note it gets set to true when you
> > construct with an array).
>
> I think "valid" is a perfectly fine name, I just got confused because it
> got set to "false". But now it's clear :)
>
> > I think it will work better for outer products if we keep it as valid:
> > suppose you have a "camera mask" that specifies good/bad pixels on your
> > camera, and a "temporal mask" that specifies individual good/bad frames.
> > Index camera pixels by i and time slices by j. The problem is that
> > camera_invalid[i]*temporal_invalid[j] requires that a pixel be invalid
> > both in the camera sense and in the temporal sense, whereas
> > camera_valid[i]*temporal_valid[j] will mark it as invalid if it is
> > invalid in either case. I think we want the latter behavior.
>
> On the other hand, assuming that there are in general more valid then
> invalid pixels, an "invalid" array could be represented by a sparse
> matrix.

Yes, that's true, and I think that's a great suggestion. There are cases where
it might not be very sparse (say, after a registration process that introduces
a lot of invalid pixels from drift "over the edge"), but in that case you also
won't have a good outer-product form, either. Maybe we should just default on
a sparse matrix? On the other hand, there are cases where the outer-product
form would be even more efficient; you can represent an entire bad frame with a
single bool, for example.

Of course, one could always make it an outer product of sparse matrices, so in
fact your suggestion is always useful.

> > Absolutely. Obviously a vector of indices will serve to represent it, but
> > we might want to define natural operations with them, so a binary mask
> > type makes a lot of sense.
>
> Representing it with a vector of indices is a good idea. Maybe a sparse
> matrix would be suitable here too, because it allows to perform
> pointwise matrix-matrix multiplications without explicitly looping
> through the image/vector of indices. I'm not sure if this will really
> cause a performance gain, or if it's just my Matlab-vectorize-everything
> thinking that makes me assume that it will be faster.

If you're just talking 1 ROI, I think in most cases that multiplication is
equivalent to sum(img[mask]), unless of course you want to assign uneven
weights to the pixels in the ROI. In Matlab it's absolutely true that if
you're doing many ROIs simultaneously, it's better to use a sparse matrix (we
do that in our own analyses). And we are starting to use weighted ROIs, so
again your idea makes a lot of sense.

> > Good. Can we parse png/tif directly rather than converting to ppm? Or get
> > convert to spit things out like the color model? Right now if you load a
> > grayscale image (cameraman.tif, for example), it converts it into an RGB.
>
> I'm not sure, I don't have much experience with convert (imread is
> Jeff's creation, and I have to admit, he probably solved this in the
> most awesome way with just a few lines of code).

Worth looking into.

Best,
--Tim

Tim Holy

unread,
Mar 22, 2012, 7:03:46 AM3/22/12
to Stefan Kroboth, juli...@googlegroups.com
Hi Stefan and others,

First question: would the maintainers prefer that we do our daily work on the
image processing library within the main repository? Or is it better to do
daily work in a separate repository and then issue more infrequent pull
requests when big updates are ready for use? I'm beginning to favor the later.
Stefan, I'd be happy to give you read/write access to my github repository.

On Tuesday, March 20, 2012 04:31:03 pm Stefan Kroboth wrote:

> On the other hand, assuming that there are in general more valid then
> invalid pixels, an "invalid" array could be represented by a sparse
> matrix.

I am finding this argument more and more compelling; we may want to switch this
over to an "invalid" rather than "valid" field. I'd say treat this field as
work-in-progress for now.

Going forward, I see two main directions:
1. Modifying the algorithms to use the ImageArray type.
2. Writing ImageFileArray and ImageFileBricks

Both seem useful---the first will give us feedback about how the design works
in practice, and the second will give us information about how well the
choices we've made thus far generalize to different representations. Stefan,
are you interested in tackling #1 (ignoring valid/invalid until this settles a
bit more), and having me do #2?

Best,
--Tim

Stefan Kroboth

unread,
Mar 22, 2012, 8:57:14 AM3/22/12
to juli...@googlegroups.com
Hi Tim,

sorry that it took me so long to answer you, I was quite busy yesterday.

I think "size_ancestor" is much better than "dim". If it's not supposed
to be used very often, clarity is more important than short-to-type.

> First question: would the maintainers prefer that we do our daily work on the
> image processing library within the main repository? Or is it better to do
> daily work in a separate repository and then issue more infrequent pull
> requests when big updates are ready for use? I'm beginning to favor the later.

Maybe we could even think about a separate repository for julia-vision
(or whatever it's supposed to be called).

> Stefan, I'd be happy to give you read/write access to my github repository.

Thanks!

> I am finding this argument more and more compelling; we may want to switch this
> over to an "invalid" rather than "valid" field. I'd say treat this field as
> work-in-progress for now.

I'll just trust your experience on this point, since I've never needed
to define invalid pixels. I think there was a discussion some time ago
about NaN/NA values, which could be of interest. I remember that NaNs
are not what you are looking for, but NA might be.

> Going forward, I see two main directions:
> 1. Modifying the algorithms to use the ImageArray type.
> 2. Writing ImageFileArray and ImageFileBricks
>
> Both seem useful---the first will give us feedback about how the design works
> in practice, and the second will give us information about how well the
> choices we've made thus far generalize to different representations. Stefan,
> are you interested in tackling #1 (ignoring valid/invalid until this settles a
> bit more), and having me do #2?

This is exactly what I would have suggested if I would have had time
yesterday*. I'll start to modify some of the algorithms later today and
then push it to your repository if that's ok for you.

Best,
Stefan

* I probably screwed up the tenses in this scentence... sorry ;)

Tim Holy

unread,
Mar 22, 2012, 9:26:35 AM3/22/12
to Stefan Kroboth, juli...@googlegroups.com
Hi Stefan,

On Thursday, March 22, 2012 07:57:14 am Stefan Kroboth wrote:
> > Stefan, are you interested in tackling #1 (ignoring
> > valid/invalid until this settles a bit more), and having me do #2?
>
> This is exactly what I would have suggested if I would have had time
> yesterday*. I'll start to modify some of the algorithms later today and
> then push it to your repository if that's ok for you.

Great. It should be set up (https://github.com/timholy/julia, the "imagelib"
branch). If you can't push, let me know. I pushed a couple of cleanups there
this morning, so you will probably want to start from there.

--Tim

Stefan Karpinski

unread,
Mar 22, 2012, 10:18:16 AM3/22/12
to juli...@googlegroups.com
On Thu, Mar 22, 2012 at 8:57 AM, Stefan Kroboth <stefan....@gmail.com>

> First question: would the maintainers prefer that we do our daily work on the
> image processing library within the main repository? Or is it better to do
> daily work in a separate repository and then issue more infrequent pull
> requests when big updates are ready for use? I'm beginning to favor the later.

Maybe we could even think about a separate repository for julia-vision
(or whatever it's supposed to be called).

This could be the first official Julia package! That might actually be helpful to have for figuring out how a package manager for Julia ought to work.
 
> I am finding this argument more and more compelling; we may want to switch this
> over to an "invalid" rather than "valid" field. I'd say treat this field as
> work-in-progress for now.

I'll just trust your experience on this point, since I've never needed
to define invalid pixels. I think there was a discussion some time ago
about NaN/NA values, which could be of interest. I remember that NaNs
are not what you are looking for, but NA might be.

I've actually been having an ongoing conversation with Harlan Harris about how to do an R-like data frame type with support for NAs. It's a bit long to put inline here, so I'll start a separate thread proposing the model so people can hack away at it with feedback. One thing I'll say here is that I suspect it makes sense to make this field be an abstract parametric type, maybe invalid::AbstractMatrix{Bool}. That way the compiler knows it can only ever get boolean values out of the thing, but you can put any implementation in there if you want to. A sparse matrix is probably a good default implementation, but there may be cases where someone wants something else and there's no real reason they shouldn't be able to construct a ImageArray object that uses some other representation to indicate where the invalid pixels are. The only thing the ImageArray object really cares about is the ability to query for invalidity.

> Going forward, I see two main directions:
> 1. Modifying the algorithms to use the ImageArray type.
> 2. Writing ImageFileArray and ImageFileBricks
>
> Both seem useful---the first will give us feedback about how the design works
> in practice, and the second will give us information about how well the
> choices we've made thus far generalize to different representations. Stefan,
> are you interested in tackling #1 (ignoring valid/invalid until this settles a
> bit more), and having me do #2?

This is exactly what I would have suggested if I would have had time
yesterday*. I'll start to modify some of the algorithms later today and
then push it to your repository if that's ok for you.

Best,
Stefan

* I probably screwed up the tenses in this scentence... sorry ;)

Close: "if I had had time" — which is super awkward. Not English's finest moment ;-)

Stefan Kroboth

unread,
Mar 22, 2012, 7:41:59 PM3/22/12
to juli...@googlegroups.com
Hi,

> This could be the first official Julia package! That might actually be
> helpful to have for figuring out how a package manager for Julia ought to
> work.

Sounds good. Should one of us create a repository or should it be part
of "JuliaLang"?

> Close: "if I had had time" — which is super awkward. Not English's finest
> moment ;-)

Thanks for the feedback :) "Learn proper english" is on top of my
imaginary TODO list ;)

@Tim: I just pushed to your repository. I adapted most of the functions
to work with the new ImageArray type. It was quite easy, although I had
to define a lot of functions for the new type.

One thing I'd like to discuss with you is the representation of the
color space. I was thinking of using multiple dispatch on (for instance)
ImageArray{T,sRGB}. This would make it possible to get rid of all the
color space conversion routines and replace them all with one convert().

Also, I often use different algorithms based on wether the image is 1 or
3 channel, therefore something like ImageArray{T,sRGB,3} would be nice
to have.

What do you think about these ideas?

Best,
Stefan

Tim Holy

unread,
Mar 23, 2012, 12:34:53 AM3/23/12
to Stefan Kroboth, juli...@googlegroups.com
Hi Stefan,

On Thursday, March 22, 2012 06:41:59 pm Stefan Kroboth wrote:
> @Tim: I just pushed to your repository. I adapted most of the functions
> to work with the new ImageArray type. It was quite easy, although I had
> to define a lot of functions for the new type.

You are much faster than me!

A few comments:
1. I see several places where there are checks like ndims(img) == 2, which
seem to effectively be "is this a grayscale image?" checks. Better would be to
compare img.color_space to CSgray.
2. It might be interesting to write an "imshow_flip" (or something), that
parses the arrayi_order string to determine which direction a given coordinate
is increasing, and maybe even takes transposes. (See the part in the
documentation about "br" encoding.) At least in my experience, it is common to
have to manipulate the image in these ways before displaying it. I am not sure
this is a good idea, and maybe we shouldn't even try, but see what you think.
3. It would be good to make the filtering multi-dimensional capable. I know
this is probably not a small task. One of my favorite tricks is to prepare a
vector of ranges, and then iterate over the dimensions, replacing the one
you're working on now with a restricted range. My _image_named_coords_sub
contains an example of this technique.
4. For something like the gaussian kernel, it would probably be best to pass a
vector for both sigma and filter_size (this would be especially important for
cases where the resolution is not the same in all coordinates).
5. Similarly, in functions like rgb2gray, rather than assuming the color
channel is the third dimension, you can parse the arrayi_order string to figure
out which one it is:
matchc = match(r"c",img.arrayi_order)
if !is(matchc,nothing)
cindx = matchc.offset # this is the dimension with color information
...
end
However, for color space conversions, it may be fine to leave them in their
current form as long as you're not worried that their assumptions will ever
be violated. Biological images will frequently violate the "color in the third
dimension" assumption, but I'm not sure how often anyone doing biological
imaging will need to convert to an "ntsc" colorspace...

> One thing I'd like to discuss with you is the representation of the
> color space. I was thinking of using multiple dispatch on (for instance)
> ImageArray{T,sRGB}. This would make it possible to get rid of all the
> color space conversion routines and replace them all with one convert().

You can achieve this now by writing conversion functions with this syntax:
convert{T}(csout::ColorSpace,img::Array{T,3},csin::ColorSpace)

Just generate different versions for each conversion, e.g.,
convert{T}(CSsrgb,img::Array{T,3},CSntsc)

Since the color space is stored as a field in an ImageArray, you might also
want something like this:
convert{T}(csout::ColorSpace,img::ImageArray{T})
if ndims(img) < 3 || img.arrayi_order[3] != 'c'
error("Must have color channel as the third array dimension")
end
imgout = copy_pfields(img)
copy_metadata(imgout,img)
imgout.data = convert(csout,img.data,img.color_space)
return imgout
end

Note this just uses the array-based conversion functions above to do the heavy
lifting.

> Also, I often use different algorithms based on wether the image is 1 or
> 3 channel, therefore something like ImageArray{T,sRGB,3} would be nice
> to have.

With multiple dispatch you have to write those two algorithms anyway; the
advantage of multiple dispatch is that selecting the right one is surely
faster than an "if color_space == ..." test. (In the cases where you'd need to
test the types of multiple inputs, it's easier, too.) But in the case of image
processing, a single scalar test is a trivial computational overhead compared
to the image manipulations themselves.

Moreoever, in my lab images are 4-dimensional even before we start talking
about color. I don't want to write separate types for dimensions 1 through 5,
plus all the possible variants (is the color channel the third out of 5, or
first, or last, or...). I think it will be much easier for the algorithms to
just look for the color channel and test its format with the color_space field.

Best,
--Tim

Jeff Bezanson

unread,
Mar 23, 2012, 12:48:38 AM3/23/12
to juli...@googlegroups.com
I just want to second the idea of definitions to abstract things where
appropriate, for example

isgrayscale(img::Array) = ndims(img)==2

or whatever is appropriate. These will generally be inlined, and go a
long way towards preserving sanity as the codebase grows. It can
allow, for example, representing an image as either a simple Array or
a fancier Image type with the same code. Admittedly I didn't set a
good example with my original image.jl, but you guys are taking this
much much farther.

Stefan Kroboth

unread,
Mar 23, 2012, 4:10:10 PM3/23/12
to juli...@googlegroups.com
Hi Tim,

> A few comments:
> 1. I see several places where there are checks like ndims(img) == 2, which
> seem to effectively be "is this a grayscale image?" checks. Better would be to
> compare img.color_space to CSgray.

Yes, you are right. I have to admit that I probably didn't look closly
enough. I will change that too.

> 2. It might be interesting to write an "imshow_flip" (or something), that
> parses the arrayi_order string to determine which direction a given coordinate
> is increasing, and maybe even takes transposes. (See the part in the
> documentation about "br" encoding.) At least in my experience, it is common to
> have to manipulate the image in these ways before displaying it. I am not sure
> this is a good idea, and maybe we shouldn't even try, but see what you think.

I'm not sure I fully understand what you mean. In my understanding, the
upper left is always A[1,1].
In general, imshow isn't a masterly achievement anyway and probably
needs a total makeover. For instance, having to write the data on the
disk isn't really a good solution. I just quickly wrote this so that I
don't need to reload the images every time in an external viewer.
If we want to flip images bevor displaying them, it would be a lot
easier if we had a way to pipe the data to an external viewer.
It's interesting what you wrote about MRI images ... this might explain
why my reconstructed images are always tilted ;D

> 3. It would be good to make the filtering multi-dimensional capable. I know
> this is probably not a small task. One of my favorite tricks is to prepare a
> vector of ranges, and then iterate over the dimensions, replacing the one
> you're working on now with a restricted range. My _image_named_coords_sub
> contains an example of this technique.

I skipped imfilter for now because, honestly, I was afraid. This
function will definitely take a lot more time to adapt. For instance, if
one wants to filter a snippet of an image, it might not be wise to see
the snippet as a separate image because of the border treatment. The
borders could be taken from the original image rather than replicating
or mirroring them. From what I remember, you often have to filter very
big images which don't even fit into the memory by dividing them into
subproblems. We should adress this issue very carefully, and I think
there is a lot of discussion necessary. Especially for high dimensional
data, where the order of the dimensions is "unusual".
Also, most users will want to just use imfilter the way it is used in
Matlab, so it should not be too complicated for them.

> 4. For something like the gaussian kernel, it would probably be best to pass a
> vector for both sigma and filter_size (this would be especially important for
> cases where the resolution is not the same in all coordinates).

You are completely right, I should have thought of that ;)
Do you need 3-dimensional kernels for your work? We should adress this
at some point too (but I think this can wait).

> 5. Similarly, in functions like rgb2gray, rather than assuming the color
> channel is the third dimension, you can parse the arrayi_order string to figure
> out which one it is:
> matchc = match(r"c",img.arrayi_order)
> if !is(matchc,nothing)
> cindx = matchc.offset # this is the dimension with color information
> ...
> end
> However, for color space conversions, it may be fine to leave them in their
> current form as long as you're not worried that their assumptions will ever
> be violated. Biological images will frequently violate the "color in the third
> dimension" assumption, but I'm not sure how often anyone doing biological
> imaging will need to convert to an "ntsc" colorspace...

Hm, good point. We might want to leave it like it is for now. If the
need for conversion functions for "unusual" image data arises, we can
think about how to implement them properly.


> > One thing I'd like to discuss with you is the representation of the
> > color space. I was thinking of using multiple dispatch on (for instance)
> > ImageArray{T,sRGB}. This would make it possible to get rid of all the
> > color space conversion routines and replace them all with one convert().
>
> You can achieve this now by writing conversion functions with this syntax:
> convert{T}(csout::ColorSpace,img::Array{T,3},csin::ColorSpace)
>
> Just generate different versions for each conversion, e.g.,
> convert{T}(CSsrgb,img::Array{T,3},CSntsc)
>
> Since the color space is stored as a field in an ImageArray, you might also
> want something like this:
> convert{T}(csout::ColorSpace,img::ImageArray{T})
> if ndims(img) < 3 || img.arrayi_order[3] != 'c'
> error("Must have color channel as the third array dimension")
> end
> imgout = copy_pfields(img)
> copy_metadata(imgout,img)
> imgout.data = convert(csout,img.data,img.color_space)
> return imgout
> end
>
> Note this just uses the array-based conversion functions above to do the heavy
> lifting.

This is a good idea, I will implement it this way.

> > Also, I often use different algorithms based on wether the image is 1 or
> > 3 channel, therefore something like ImageArray{T,sRGB,3} would be nice
> > to have.

This paragraph is the result of sleep deprivation, just ignore it ;) If
I already know it's RGB, why would I even need the number of channels...

I was also thinking about an "FFT" type which stores the fourier
transform of an image and all the other relevant information. This could
be useful for operations on MRI data.

If I have time this evening I'll do some more work.

Best,
Stefan

Tim Holy

unread,
Mar 23, 2012, 5:00:13 PM3/23/12
to Stefan Kroboth, juli...@googlegroups.com
Hi Stefan,

On Friday, March 23, 2012 03:10:10 pm Stefan Kroboth wrote:
> I'm not sure I fully understand what you mean. In my understanding, the
> upper left is always A[1,1].

That's how it's diplayed, but sometimes the data aren't organized that way
(for instance, some cameras provide data in row-major organization, rather
than column-major organization). I was wondering if we could piggyback the
dimension-tag into instructions for display. But that may be overengineering.

> I skipped imfilter for now because, honestly, I was afraid.

I don't blame you :-).

> This
> function will definitely take a lot more time to adapt. For instance, if
> one wants to filter a snippet of an image, it might not be wise to see
> the snippet as a separate image because of the border treatment.

Yep. My plan, though, is to pass "snippets" that are larger than what we
ultimately want to get out. The snippets will be padded, hopefully by enough
that the border won't be an issue (this only works for FIR filters, not IIR
filters). If that plan holds, then we can just write the in-memory imfilter
routine as if it's all straightforward.

> You are completely right, I should have thought of that ;)
> Do you need 3-dimensional kernels for your work?

Sometimes, yes indeed. But 2d is more common, and a great start.

Best,
--Tim

Stefan Kroboth

unread,
Mar 23, 2012, 7:04:28 PM3/23/12
to juli...@googlegroups.com
Hi Tim,

> That's how it's diplayed, but sometimes the data aren't organized that way
> (for instance, some cameras provide data in row-major organization, rather
> than column-major organization). I was wondering if we could piggyback the
> dimension-tag into instructions for display. But that may be overengineering.

I think this is a good feature, though I also think that we should just
keep this in mind for now and implement it later.

> Yep. My plan, though, is to pass "snippets" that are larger than what we
> ultimately want to get out. The snippets will be padded, hopefully by enough
> that the border won't be an issue (this only works for FIR filters, not IIR
> filters). If that plan holds, then we can just write the in-memory imfilter
> routine as if it's all straightforward.

The biggest problem will be efficiency, especially if one is fetching
data from the disk. We should solve this in a smart way such that we
minimize the access to the disk by reusing already fetched data.
Should imfilter take care of the fetching of the data or is this up to
the user?

Best,
Stefan

Tim Holy

unread,
Mar 24, 2012, 6:01:44 AM3/24/12
to Stefan Kroboth, juli...@googlegroups.com
Hi Stefan,

On Friday, March 23, 2012 06:04:28 pm Stefan Kroboth wrote:
> > Yep. My plan, though, is to pass "snippets" that are larger than what we
> > ultimately want to get out. The snippets will be padded, hopefully by
> > enough that the border won't be an issue (this only works for FIR
> > filters, not IIR filters). If that plan holds, then we can just write
> > the in-memory imfilter routine as if it's all straightforward.
>
> The biggest problem will be efficiency, especially if one is fetching
> data from the disk. We should solve this in a smart way such that we
> minimize the access to the disk by reusing already fetched data.

The OS will cache for us, so we won't have to explicity implement re-use. Keep
in mind that the purpose of this architecture is to handle data sets that
range from hundreds of gigabytes to terabytes or even petabytes in size.
(There are people thinking about electron microscopy images of the full mouse
brain at a resolution of a few tens of nanometers in all three axes, and that
works out to well over a petabyte.) I suspect the architecture for petabytes
will be fundamentally different (distributed computing in the sense of "across
continents"), but I'm definitely aiming for good support for many terabytes.
There's only so much data-reuse one can do in such circumstances.

I think the key is efficient disk usage patterns. The "Bricks" architecture is
intended to facilitate this.

> Should imfilter take care of the fetching of the data or is this up to
> the user?

I'm definitely _not_ aiming for "keep the algorithm writer from having to worry
about how the image is represented"---the differences in efficiency for some
algorithms would be huge, and so awareness/control is needed. Instead, I'm
aiming for "try to make it relatively trivial to implement the right algorithm
in the right circumstance by providing a well-designed set of elementary
tools."

I bet that FIR filtering, with small and/or separable kernels, can be
implemented at the algorithmic level in a general way, so the end-user doesn't
have to worry about the details. For a separable but not small kernel it might
take more than one pass and the storage of temporary images. I'm hoping to
make that simple for the algorithm-writer. And when the user tries to do
something truly stupid (say, a non-separable IIR filter on a multi-terabyte
image), just throw an error (don't waste time even trying to code it).

I'd say we should write the "simple" ImageArray version of the algorithm, and
then if/when we need to do filtering on something bigger we write the
ImageFileArray and/or ImageFileBricks versions. Those will, of course, call
the version that works on ImageArray as a subroutine.

Best,
--Tim

Stefan Karpinski

unread,
Mar 24, 2012, 1:03:17 PM3/24/12
to juli...@googlegroups.com
On Sat, Mar 24, 2012 at 6:01 AM, Tim Holy <tim....@gmail.com> wrote:

I'm aiming for "try to make it relatively trivial to implement the right algorithm in the right circumstance by providing a well-designed set of elementary tools."

This quote should go on a Julia philosophy page or something.

Patrick O'Leary

unread,
Mar 24, 2012, 6:35:57 PM3/24/12
to juli...@googlegroups.com
On Saturday, March 24, 2012 12:03:17 PM UTC-5, Stefan Karpinski wrote:
On Sat, Mar 24, 2012 at 6:01 AM, Tim Holy wrote:

I'm aiming for "try to make it relatively trivial to implement the right algorithm in the right circumstance by providing a well-designed set of elementary tools."

This quote should go on a Julia philosophy page or something.

Can we hook up special behavior for load("this.jl")? (Pythonistas in the audience should catch my drift.)

Stefan Karpinski

unread,
Mar 24, 2012, 7:08:34 PM3/24/12
to juli...@googlegroups.com
I had no idea what this meant, but the first thing I tried did reveal it:

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Tim Holy

unread,
Mar 25, 2012, 6:35:58 AM3/25/12
to juli...@googlegroups.com
Thanks!

I'm not sure I would have discovered "import this", that is a neat set of
programming aphorisms.

--Tim

Stefan Kroboth

unread,
Mar 25, 2012, 6:04:21 PM3/25/12
to juli...@googlegroups.com
Hi Tim,

> I think the key is efficient disk usage patterns. The "Bricks" architecture is
> intended to facilitate this.

Exactly, this is what I meant to say. The elements of one snippet are
also the borders of another snippet.

> I'd say we should write the "simple" ImageArray version of the algorithm, and
> then if/when we need to do filtering on something bigger we write the
> ImageFileArray and/or ImageFileBricks versions. Those will, of course, call
> the version that works on ImageArray as a subroutine.

This probably is the best approach, let's go with this.

I just tried to make some more changes, but I ran into a problem with
imROF. I seem to be unable to solve this... any ideas?

julia> img = imread("img.jpg");

julia> typeof(img.data)
Array{Float64,3}

julia> f = imROF(img.data, 0.25, 20);
type error: ImageArray: in DataType, expected Number, got Type{Ptr{T}}
in abstract_call_gf at inference.jl:421
in abstract_call at inference.jl:500
in abstract_eval_call at inference.jl:541
in abstract_eval at inference.jl:581
in typeinf at inference.jl:1029
in typeinf at inference.jl:839
in abstract_call_gf at inference.jl:454
in abstract_call at inference.jl:500
in abstract_eval_call at inference.jl:541
in abstract_eval at inference.jl:581
in anonymous at no file
in map at tuple.jl:33
in abstract_eval_call at inference.jl:530
in abstract_eval at inference.jl:581
in _jl_abstract_interpret at inference.jl:708
in typeinf at inference.jl:988
in typeinf at inference.jl:839
in abstract_call_gf at inference.jl:454
in abstract_call at inference.jl:500
in abstract_eval_call at inference.jl:541
in abstract_eval at inference.jl:581
in _jl_abstract_interpret at inference.jl:708
in typeinf at inference.jl:988
in typeinf at inference.jl:839
in abstract_call_gf at inference.jl:454
in abstract_call at inference.jl:500
in abstract_eval_call at inference.jl:541
in abstract_eval at inference.jl:581
in anonymous at no file
in map at tuple.jl:32
in abstract_eval_call at inference.jl:530
in abstract_eval at inference.jl:581
in typeinf at inference.jl:1029
in typeinf at inference.jl:839
in abstract_call_gf at inference.jl:454
in abstract_call at inference.jl:500
in abstract_eval_call at inference.jl:541
in abstract_eval at inference.jl:581
in anonymous at no file
in map at tuple.jl:32
in abstract_eval_call at inference.jl:530
in abstract_eval at inference.jl:581
in _jl_abstract_interpret at inference.jl:700
in typeinf at inference.jl:988
in typeinf at inference.jl:839
in abstract_call_gf at inference.jl:454
in abstract_call at inference.jl:500
in abstract_eval_call at inference.jl:541
in abstract_eval at inference.jl:581
in _jl_abstract_interpret at inference.jl:700
in typeinf at inference.jl:988
in typeinf_ext at inference.jl:833

I pulled the latest master from Julia (8522dc22b) and merged it into the
imagelib branch. It works in my fork (which is also up-to-date).
The error is odd, because there is no mention of ImageArray in imROF and
I have no idea where I could start looking.

One desperate attempt to solve this was

julia> f = imROF(copy(img.data), 0.25, 20);

which didn't work either and gave me a similar error, BUT:

julia> f = imROF(img.data, 0.25, 20);

works now. Weird. Restarting Julia gets it back to the non-working
state.

Best,
Stefan

Tim Holy

unread,
Mar 26, 2012, 10:00:57 PM3/26/12
to Stefan Kroboth, juli...@googlegroups.com
Hi Stefan,

Sorry to be slow to respond; I was in the middle of a big chunk of code that
made it hard to load image.jl.

On Sunday, March 25, 2012 05:04:21 pm Stefan Kroboth wrote:
> I just tried to make some more changes, but I ran into a problem with
> imROF. I seem to be unable to solve this... any ideas?

The problem below is indeed weird. I don't understand it, but it has something
to do with the section "#Mathematical operations", where you define +,-,*, etc
operators. If I comment out all of those functions, it doesn't happen. I note
that when you first load your version of image.jl, you get a warning about
ambiguity.

This must indicate that there's something weird going on with the type system,
almost like there's free interconversion between Array and ImageArray. I
looked, but didn't find a place where you added additional constructor methods
(if you allowed the user to skip the arrayi_order string, then indeed trouble
could result, because you might get automatic creation of ImageArrays from
normal Arrays).

So I'm a bit puzzled about what is going on. It might be good to do some
experiments on a "trimmed down" version, by just deleting sections wholesale,
to try to determine what's happening. Right now I'm up to my ears in
iterators, so I won't be of help for a couple of days.

Best,
--Tim

Stefan Kroboth

unread,
Mar 27, 2012, 6:13:41 PM3/27/12
to juli...@googlegroups.com
Hi Tim,

> Sorry to be slow to respond; I was in the middle of a big chunk of code that
> made it hard to load image.jl.

No problem, unfortunately I'm also too busy to work on image.jl at the
moment :/

> The problem below is indeed weird. I don't understand it, but it has something
> to do with the section "#Mathematical operations", where you define +,-,*, etc
> operators. If I comment out all of those functions, it doesn't happen. I note
> that when you first load your version of image.jl, you get a warning about
> ambiguity.

I noticed that, but since I defined the functions the warning is asking
for, I didn't pay much attention to it. Loading image.jl a second time
doesn't give the warnings.

> This must indicate that there's something weird going on with the type system,
> almost like there's free interconversion between Array and ImageArray. I
> looked, but didn't find a place where you added additional constructor methods
> (if you allowed the user to skip the arrayi_order string, then indeed trouble
> could result, because you might get automatic creation of ImageArrays from
> normal Arrays).

Interconversion would indeed be a odd behaviour. I think the problem
lies in accessing the .data field, which is somehow interpreted as an
ImageArray. Althoug I checked the types. This leaves the question, why
commenting out the mathematical operations solves the problem.



> So I'm a bit puzzled about what is going on. It might be good to do some
> experiments on a "trimmed down" version, by just deleting sections wholesale,
> to try to determine what's happening. Right now I'm up to my ears in
> iterators, so I won't be of help for a couple of days.

Sounds like a good idea, I'll have a look in the next few days.

Thanks for your time and help! :)

Best,
Stefan

Jeff Bezanson

unread,
Aug 5, 2012, 7:34:47 PM8/5/12
to juli...@googlegroups.com, Stefan Kroboth
Just happened to look at this code; reviving thread. I have two comments.

First, there seems to be a lot of pulling channels out of images.
Would it be better to store channels in separate arrays? True, I
imagine there isn't one universally optimal representation. We might
want some kind of a pixel abstraction to go along with this, to
naturally express pixel-wise operations.

Second and related, functions like hsi2rgb are really excessively
vectorized. While vector code is data parallel, I think this pushes it
to the breaking point. Since this is a pixel-wise operation, the right
way to express it going forward will be to implement the function for
pixels and then do a map(), which can be serial or parallel.

Tim Holy

unread,
Aug 6, 2012, 3:01:43 PM8/6/12
to juli...@googlegroups.com, Stefan Kroboth
I agree with both of these points. I haven't yet touched any of the colorspace
code, but I will see what I can do. It may be a week or two; I am trying to
add a whole bunch of new functionality to the image processing code, and will
likely have to do some refactoring as part of that effort anyway.

Certainly for people who manipulate independent color channels frequently (I'm
not one of them), separate storage for each channel does seem to make a lot of
sense.

The issue of image representation is probably the trickiest one of all.
There's a balance to be struck in consistency vs. agnosticism about the
format. For big data, you'd really rather not force people to convert a 1
terabyte file into another format just so it matches Julia's conventions, which
argues for flexibility. But too much flexibility might also quickly turn into
unwelcome code baggage. I'll see what I can come up with, and then invite
comments.

Best,
--Tim

Stefan Karpinski

unread,
Aug 6, 2012, 3:08:18 PM8/6/12
to juli...@googlegroups.com, Stefan Kroboth
So the question is either "vector or pixels" or a "bundle of channel vectors". They're both are logically matrices, just transposes of each other. Seems like a matrix representation might make sense, which would allow both access patterns? It would also allow linear algebraic color transforms in a very natural way (e.g. mapping to grayscale).

--




Tim Holy

unread,
Aug 6, 2012, 4:12:58 PM8/6/12
to juli...@googlegroups.com
On Monday, August 06, 2012 03:08:18 PM Stefan Karpinski wrote:
> So the question is either "vector or pixels" or a "bundle of channel
> vectors". They're both are logically matrices, just transposes of each
> other. Seems like a matrix representation might make sense, which would
> allow both access patterns? It would also allow linear algebraic color
> transforms in a very natural way (e.g. mapping to grayscale).

I've indeed been assuming array representations up till now. Below is a
smattering of the issues I'm grappling with. A lot of these ideas are factors
that motivated the ImageArray type I started on in image.jl. However, my
current thought is that it is too heavy; for many applications, people won't
know or care about most of the fields. Since I'm a little uneasy with it, I'm
looking to rework that, and therefore it's a good time for people to chime in
with suggestions.

Issues:

1. An RGB image that is w pixels wide and h pixels tall might be an array of
size [h w 3] (the Matlab convention, which also fits with Array row/column
syntax) or of size [3 w h] (the convention for most on-disk image formats,
e.g., PNG). For small images you can easily convert from one format to any
other, but this becomes increasingly impractical the larger the image size. So
ideally we allow "whatever."

2. If an array is [h w 3], that 3 could mean color or it could mean something
else. (It could be a "short stack" of 3 grayscale images at slightly different
focal planes.) I hate writing code that makes guesses, because this can lead
to unexpected behavior. So whatever the solution is, I'd really like it to be
explicit about what things mean.

The solution I started with, back when I first joined Julia and started working
on this (and therefore what's in image.jl now), was a short ASCII string,
e.g., "cxy" indicating that the first dimension is color, the second is the
horizontal (width) dimension, and the third the vertical (height) dimension. I
think there are some good things about this, but some weaknesses too. First,
is text the way to go here? If so, what about cases where you might actually
prefer to look at the images turned sideways, at which point that "x" and "y"
will start to get annoying and downright confusing. So perhaps better would be
"csst", where s means spatial, t temporal (for images taken over time), and c
"color" channel. Of course, the geologists analyzing acoustic images of the
earth's crust and mantle won't work in RGB, they'll work in "longitudinal
reflectivity" and "lateral reflectivity" for their two "color channels". You'd
like to denote that, too.

In other words, giving meaning to each dimension of the array seems quite
desirable but not to be entirely straightforward. I may have some solutions,
but I'm definitely interested in ideas that others have.

3. Metadata can sometimes be quite helpful. Know the min/max values of your
detector? Great, use the appropriate image type that stores that information
for you, and the default display will autoscale nicely for you (overridable of
course, for images that don't fill this range), and could even mark saturated
pixels by coloring them in a special way. Don't know that information? No
problem, use a simpler type and then expect to supply a range manually when
needed. But assuming that any Float64 image is normalized to 1.0 in each color
channel (again, the Matlab convention) seems a bit braindead.

However, too much metadata about fields that you don't know or care about seems
bad, and is what is contributing to my unease that ImageArray is too heavy.



Instead of "one true image type," I'm starting to play with a set of possible
image classes, all inheriting from an abstract type, that can be customized to
allow people to use whatever aspect of metadata they deem important. Accessor
functions whose behaviors are customized via multiple dispatch abstract the
access to the data. It's just too early to tell whether this can work.

--Tim

Stefan Karpinski

unread,
Aug 7, 2012, 7:56:25 AM8/7/12
to juli...@googlegroups.com
I like the many types with a common set of interfaces approach. I think that's the right way to go. The hard part then becomes deciding what that interface should be.


--Tim

--




Stefan Kroboth

unread,
Aug 8, 2012, 6:56:38 AM8/8/12
to juli...@googlegroups.com
> Just happened to look at this code; reviving thread. I have two comments.
>
> First, there seems to be a lot of pulling channels out of images.
> Would it be better to store channels in separate arrays? True, I
> imagine there isn't one universally optimal representation. We might
> want some kind of a pixel abstraction to go along with this, to
> naturally express pixel-wise operations.
>
> Second and related, functions like hsi2rgb are really excessively
> vectorized. While vector code is data parallel, I think this pushes it
> to the breaking point. Since this is a pixel-wise operation, the right
> way to express it going forward will be to implement the function for
> pixels and then do a map(), which can be serial or parallel.

When I wrote this, I was still thinking in Matlab (luckily I got rid of
that ;)). I would love to rewrite the algorithms to make it juliaesque,
but unfortunately I don't have any time at the moment due to my masters
thesis. I will probably have time to give it another try in mid October.

Cheers,
Stefan

Stefan Karpinski

unread,
Aug 8, 2012, 11:16:48 AM8/8/12
to juli...@googlegroups.com
Good luck with the thesis! When you're done, you and Tim can hammer this out.


--




Reply all
Reply to author
Forward
0 new messages