Array dimensionality inference

96 views
Skip to first unread message

Toivo Henningsson

unread,
Apr 15, 2012, 5:12:07 PM4/15/12
to julia-dev
I've been thinking a bit more about what really happens when an array
gets extra singleton dimensions added, and what you want to happen.

Some of my conclusions so far are that:
* For + and -, we should have

ndims(x+y) = ndims(x-y) = min(ndims(x), ndims(y))

(Yes, you read correctly, min, not max!)
If you happen to add a column matrix to a vector,
the result your are after is really a vector, not a matrix.
* Any function that always expects a scalar for an argument
should ideally accept a single-element Array of any dimension,
and convert it to a scalar.
I believe rules like these are very reasonable, can actually be very
consistent, and will be pretty good at doing what you want them to. If
you keep adding singleton dimensions to an Array, sooner or later you
will want to remove them. Why not sooner?

Please read the argument at

https://github.com/toivoh/Julia-Design-Thoughts/wiki/Array-dimensionality-inference

I'm not done thinking through the implications for all kinds of
operations yet, I'll get back when I have considered some more.

Jeff Bezanson

unread,
Apr 15, 2012, 7:11:53 PM4/15/12
to juli...@googlegroups.com
I concede that since we now drop trailing dimensions on ref, it might
make sense to drop dimensions instead of add them when combining
shapes. But there is an important subtlety, which is that "singleton
dimensions" is not really the appropriate concept. We don't drop
dimensions just because they are of size 1, but because they were
indexed with scalars. So I'm not too sure. Actually, we might be able
to go back to insisting on exact shape matches, since A[:,i]+v will
now work under that rule.

It is weird that v'*v gives a size (1,1) or (1,) array which is
supposed to be a scalar, but a scalar is more naturally a 0-d array.
We can't have a vector behave like a scalar just because it has length
1. No idea what to do about that. This makes me think that
mathematically v'*v is not really the same operation as dot(v,v) even
though they compute the same number. But now I'm out of my depth and
must step aside.

Stefan Karpinski

unread,
Apr 15, 2012, 7:26:44 PM4/15/12
to juli...@googlegroups.com
I'm going to propose something entirely different, which is that we up-cast array shapes to the largest in an expression and broadcast singleton dimensions (including trailing dimensions that are only implied) when they are paired with non-singletons. This makes a lot of very common operations much easier and more efficient to express. For example, you can normalize matrices very easily:

X./sum(X,1)  # column stochastic
X./sum(X,2)  # row stochastic

There are a ton of other use-cases. I believe this is something that Alan has specifically requested before.

Stefan Karpinski

unread,
Apr 15, 2012, 7:33:33 PM4/15/12
to juli...@googlegroups.com
Dot versus v'*v is a really interesting case. It's an example where it actually really matters whether v is a vector or a column matrix. Because in the case of a vector, the product eliminates the dimension that's being summed over, and since this is the only dimension, what's left is logically a scalar. In the case of a column matrix, however, the result is a 1x1 matrix, which is not a scalar at all.

On Sun, Apr 15, 2012 at 7:11 PM, Jeff Bezanson <jeff.b...@gmail.com> wrote:

Stefan Karpinski

unread,
Apr 15, 2012, 7:47:15 PM4/15/12
to juli...@googlegroups.com
I think this may arguably be an advantage of using row matrices instead of having vectors and covectors: if you had vectors and covectors and v' was a covector when v is a vector, then v'*v where v is a vector should be a scalar whereas when v is a column matrix, the result should be a 1x1 matrix. In our current scheme since v' is a row matrix, things come out the same regardless of whether v is vector or a column matrix. That strikes me as a good thing.

Jeff Bezanson

unread,
Apr 15, 2012, 7:49:16 PM4/15/12
to juli...@googlegroups.com

Makes me question the behavior of v'. Maybe v*v should be dot, with a different operator for outer product?

Stefan Karpinski

unread,
Apr 15, 2012, 7:51:53 PM4/15/12
to juli...@googlegroups.com
I see no problem with dot(v,v). It's a feature not a bug :-)

Toivo Henningsson

unread,
Apr 16, 2012, 7:59:49 AM4/16/12
to julia-dev
Ok, I guess I should clarify a few things:


All I'm suggesting at this point is that:

* The + and - operators should discard dimensions down to the minimum
ndims of both operands.

* Suppose the function f(x) gives an error if ndims(x) > N.
Then, it should be allowable to extend the domain of f by letting it
attempt to discard the dimensions of x beyond N
--- which will of course still give an error if they are not all
singleton.
- Also, supposing f(x) always expects x to be scalar, f should be
allowed to promote any single-element Array argument x to be scalar.

* The operators .+ .- .* ./ should provide elementwise operations,
broadcasting both present and implied trailing dimensions to make
their arguments match.


Why? The way I see it is that:

* There's two kinds of dimensions in an Array: _intentional_ and
_nuisance_.
Example: eye(n) has two intentional dimensions, even when n=1. The
intent is to create a matrix; it just happened to be 1x1 in this case.

* All nuisance dimensions are singletons, but not all singletons are
nuisance.

* Nuisance dimensions are created when there is no other way to
represent the result as an Array, e g by v' and [1 2 3].
(On the other hand, [A v] will not create a nuisance dimension: the
first dimension is already present in both the matrix A and the vector
v.)

* The only rule that would be sensible to use for all functions
regarding how to add/remove dimensions from Array arguments before the
actual operation is: Don't!
Example: The broadcast rule makes sense for many operations (I really
really want to have the broadcast operations .+ .- .* ./ etc), but
using it for matrix multiply would give that

eye(2)*[1,2] = [1 1; 2 2]

hardly what you want! (The vector [1, 2] is broadcast to match the
size of eye(2) before the operation.)

* My conclusion is that _different promotion rules_ (for adding/
dropping dimensions) are needed depending on what an operation _is
supposed to do._ Maybe you only need a few rules, but unless it's
"don't", you will need more than one.

* It makes sense to try to reason about the intents of the user and
try to give him what he wants,
under some pretty harsh restrictions:
- The resulting rules should be simple
- The actual operation carried out and ndims of the result should not
depend on the values of dimension sizes; typically only on the
function itself and the ndims of its arguments.
[Side remark:] squeeze doesn't satisfy this. But it's really the
prototype of a non-singleton-safe function! (If it were singleton
safe, it'd be a noop :) I'd rather specify which dimensions to remove/
keep manually.)

* If we can safely say (based on the argument's types) that an
operation be would in error unless the last dimension of an Array
argument were absent, it is allowable to attempt to discard it:
- Throw an error if it's non-singleton (then it is, surely, not
absent!)
- Discard it and carry on if it is in fact singleton; it might still
have been a nuisance dimension.

* A few simple rules such as the rule for + and - suggested above
would go a long way to spare the user from much manual nuisance
dimension removal. I believe that this would be a good thing.

* If you really want to add x+y and promote them to their greatest
common ndims, there would still be x.+y.

* When + throws an error, it should be informative about its
difference to the .+ operator. It will proabably be possible to
identify some other common errors you get if you write + when you
wanted .+, but + itself does not throw an error. Then these errors
could be informative about that maybe some dimensions have been
stripped.


Though the word intentional sounds a bit fuzzy and imprecise, I hope
you will find that the way I use it is not imprecise at all:
* The user's intentions are only interpreted through assertions such
as

"It's an error to intentionally add arrays with different ndims
using +"

* I acknowledge that nuisance dimensions will sometimes be created.
* I define them not to be intentional.
That's really all that I put into the term intentional!


Please please tell me at which point (points? ;) of my argument where
you begin to disagree!

Toivo Henningsson

unread,
Apr 16, 2012, 8:27:35 AM4/16/12
to julia-dev


On Apr 16, 1:11 am, Jeff Bezanson <jeff.bezan...@gmail.com> wrote:
> I concede that since we now drop trailing dimensions on ref, it might
> make sense to drop dimensions instead of add them when combining
> shapes.

Yes, sometimes! I argue that not all operations can use the same rule
for this.

> But there is an important subtlety, which is that "singleton
> dimensions" is not really the appropriate concept. We don't drop
> dimensions just because they are of size 1, but because they were
> indexed with scalars. So I'm not too sure.

I'm all against depending on whether an Array dimension is singleton,
for pretty much all other purposes than to decide whether to throw an
error.
Maybe my frequent use of the word "singleton" was confusing, so I
changed my wording to _intentional_ and _nuisance_ dimensions (please
see the end of my last mail for the precise definition of
"intentional!")

What I want to do is to see when I can figure out that some dimension
is a nuisance, relying only on the types of function arguments. The
tool to do this is some small assumptions about which kinds of
operation/argument combinations the user intends/should be allowed to
invoke.

> Actually, we might be able
> to go back to insisting on exact shape matches, since A[:,i]+v will
> now work under that rule.

I haven't thought it through, but I don't think it's a good idea.
There's still two many sources of nuisance dimensions, and there
always will be as long as Array has to overapproximate the set of
intentional dimensions with a range 1:ndims.

> It is weird that v'*v gives a size (1,1) or (1,) array which is
> supposed to be a scalar, but a scalar is more naturally a 0-d array.
> We can't have a vector behave like a scalar just because it has length
> 1. No idea what to do about that. This makes me think that
> mathematically v'*v is not really the same operation as dot(v,v) even
> though they compute the same number. But now I'm out of my depth and
> must step aside.

The way I think about what happens is that
* When v is a vector, intention of v'*v is to produce a scalar, or
possibly a 0-dimensional array.
* Unfortunately, the intermediate step of forming v' adds a nuisance
dimension: the first dimension of v'. (The second dimension of v' is
the original first of v; the intentional one.)
* When it's time to evaluate the product (v')*v, there's no way for *
to know that the second dimension of v' is not intentional.
* So the nuisance dimension (the first of v') remains as a nuisance
dimension in v'*v; this is why it becomes a vector instead of a
scalar.

Mathematically, v'*v should have been a scalar, or at least a 0-d
Array, but v' changed the result.
Now, by writing dot(v, v), the user asserts that the intended result
was indeed a scalar (that's the whole difference between v'*v and
dot(v, v) ), so dot is free to create a scalar result, or throw an
error if it's not possible.

I'm definitely not saying to make all single-element vectors behave
like scalars. I'm just saying that if the only type of value that is
valid in a given situation is a scalar, it's fine to promote an Array
that's passed in to become one (if has numel=1, of course!)

Toivo Henningsson

unread,
Apr 16, 2012, 8:33:16 AM4/16/12
to julia-dev
On Apr 16, 1:26 am, Stefan Karpinski <ste...@karpinski.org> wrote:
> I'm going to propose something entirely different, which is that we up-cast
> array shapes to the largest in an expression and broadcast singleton
> dimensions (including trailing dimensions that are only implied) when they
> are paired with non-singletons. This makes a lot of very common operations
> much easier and more efficient to express. For example, you can normalize
> matrices very easily:
>
> X./sum(X,1)  # column stochastic
> X./sum(X,2)  # row stochastic
>
> There are a ton of other use-cases. I believe this is something that Alan
> has specifically requested before.

Yes yes yes I also really want those to work! Please add it as soon if
you can :)

What I argue is that they are not in conflict!
We could have + and - reducing dimensions and not broadcasting, and .+
and .- broadcasting just like ./ in your example. I love those
broadcasting elementwise operations, they don't make much sense for
pure linear algebra when you're working with matrices, vectors, and
matrix products. That is when you want to use + and - (and probably in
many other cases when broadcasting doesn't make sense too).
Broadcasting is powerful, but it's not always what you want. I think
the user should be allowed to say when he does.

Toivo Henningsson

unread,
Apr 16, 2012, 8:50:53 AM4/16/12
to julia-dev
On Apr 16, 1:47 am, Stefan Karpinski <ste...@karpinski.org> wrote:
> I think this may arguably be an advantage of using row matrices instead of
> having vectors and covectors: if you had vectors and covectors and v' was a
> covector when v is a vector, then v'*v where v is a vector should be a
> scalar whereas when v is a column matrix, the result should be a 1x1
> matrix. In our current scheme since v' is a row matrix, things come out the
> same regardless of whether v is vector or a column matrix. That strikes me
> as a good thing.

They are not the same right now:
v = ones(3) makes v'*v a vector.
v = ones(3,1) makes it a matrix.

I argue that when you have some variable that is intended to be a
vector, you want to represent it as one, not a matrix. Column matrices
should ideally only be used for things that are actually matrices,
such as ones(n, m), when m happened to be one.

Of course, we can't expect 1d vectors to be produced at all times when
a vector is intended. But we could try to help the user to convert
them back in some cases. (exercising all the care I've outlined
previously)

On the other hand, we must be very careful not to remove a dimension
that might be intentional.
I argue that there is a real difference between v'*v and C'*C (the
second being a column matrix):
* v is surely meant to be a vector (or at least have ndims <= 1), so
looking at the operation as a whole, we could safely say that
ndims(v'*v) = 0.
* C is ambiguous: it could be meant as a vector, or a matrix. In the
latter case, the correct result would be a 1x1 matrix.
Our working assumption must be that C is actually a matrix, or we
might cause code to break when dimensions become singleton.
So we must let ndims(C'*C) = 2. Hopefully, if this is not the intent,
some code down the line will fix it; by explicit conversion or
implicitly as C'*C + scalar would do by my suggestion.

Toivo Henningsson

unread,
Apr 16, 2012, 9:04:50 AM4/16/12
to julia-dev
> I see no problem with dot(v,v). It's a feature not a bug :-)

I think that's a good idea; it would at least get rid of some
ambiguities (by making intentions clearer).

Perhaps there could even be an infix form

t = v dot v

But that's probably too much :) In that case, dot would have to be a
keyword (but wouldn't that be kinda cool? :)
Also, dot would have to be more tightly binding than *, since it's
really asserting that the product (x dot y) should be a scalar. That
would break if A*x dot y (= matrix) were allowed to be evaluated as
(Ax) dot y (=scalar).

While I'm on the subject of operators: One thing that would avoid
introducing some nuisance dimensions as well would be if it were
possible to define a '* operator: u'*v could be evaluated as a whole,
and the nuisance dimension avoided. The default implementation for '*
would of course be just to call ' first and then *. But there's
probably way too much trickery involved in this approach compared to
the gains. It might also feel a bit awkward that the result's ndims
from a computation would depend on whether the compiler recognized the
'* operation, or interpreted it as ' and *.

Toivo Henningsson

unread,
Apr 16, 2012, 9:17:04 AM4/16/12
to julia-dev


On Apr 16, 1:49 am, Jeff Bezanson <jeff.bezan...@gmail.com> wrote:
> Makes me question the behavior of v'.

Yes, I think that's what causes this! But I don't think there's an
easy fix.

I might write a nuisance dimension free array on top of Array, that
would keep a bitmask (as part of its type) for which dimensions are
intentional.
Then for a vector v, v' would be represented with only one intentional
dimension: the second.
Same for [1 2 3], if i could override hcat to produce an NFArray (or
whatever I should call it).
v'*v would produce a scalar.
matrix + vector would give an error
matrix .+ vector would still broadcast
etc
It would be a stricter version of Array: If you wrote code using
NFArray and then changed it to use Array instead, it should produce
the same results (but with an easier time to find errors when you
wrote it).

I still think it's worthwhile trying to make Array handle these
matters in a reasonable way, even though bitmask of intentional
dimensions is too heavy machinery to put into it.

> Maybe v*v should be dot, with a
> different operator for outer product?

I don't think that would be a good idea. * is already overloaded for
scalar and matrix multiplication, adding yet another operation could
really confuse things :)

Reply all
Reply to author
Forward
0 new messages