Size and Dimensionality

324 views
Skip to first unread message

Edmund Jackson

unread,
Apr 2, 2013, 10:29:57 AM4/2/13
to numerica...@googlegroups.com
Hi,

   I'm attempting to push clatrix up to the new release of core.matrix.  I keep becoming tangled in issues of size and dimensionality. 

This is currently how it does things, and makes sense to me.  Is it what the API wants ?

  (def M (matrix [[1 2 3] [4 5 6] [7 8 9]]))  ;; 3x3 Matrix
  (get-shape M) ;; [3 3]
  (dimensionality M) ;; 2
 
  (def r (get-row M 0)) ;; 1x3 Matrix
  (get-shape r)  ;; [1 3]
  (dimensionality r) ;; 2, as its a matrix.  Should this be one ?

  (get-column r 1) ;; 1x1 matrix: [2]
  (get-shape (get-column r 1)) ;; [1 1]
  (dimensionality (get-column r 1)) ;; 2.  As its still a matrix.  Should this be 0 ?
  (get-row r 0) ;; 1x3 matrix

Thanks for the help,

Edmund

Simone Mosciatti

unread,
Apr 2, 2013, 3:55:19 PM4/2/13
to numerica...@googlegroups.com
That is a good point...

Matrix 1xN are matrix or vector ?

And there are any difference between 1xN and Nx1 (horizontal and vertical vector) ?

Matrix 1xN are equals to vector N with the same elements, and whats about Nx1 matrix ?

Edmund Jackson

unread,
Apr 2, 2013, 4:59:27 PM4/2/13
to numerica...@googlegroups.com
Yes vectors are different from matrices, and core.matrix has a concept of a row-matrix an a column-matrix as opposed to vectors.  As I understand it they result when you slice out a matrix, or create a 1xm or nx1 matrix.  Slicing doesn't create vectors.  Vectors are explicitly created.

Matthew:  can you perhaps explicitly doc out the semantics of vectors, as opposed to row- and column-matrix.  Coming from matlab these vectors are a new creature to me.

Mike Anderson

unread,
Apr 2, 2013, 7:25:07 PM4/2/13
to numerica...@googlegroups.com
On Wednesday, 3 April 2013 04:59:27 UTC+8, Edmund Jackson wrote:
Yes vectors are different from matrices, and core.matrix has a concept of a row-matrix an a column-matrix as opposed to vectors.  As I understand it they result when you slice out a matrix, or create a 1xm or nx1 matrix.  Slicing doesn't create vectors.  Vectors are explicitly created.

Slicing should create 1D vectors when you slice a 2D matrix. Slicing always reduced the dimensionality by 1.

I think this is the most useful operation: it is much more likely that the user wants a vector when they take a slice of a matrix than a Nx1 matrix for example.

If you want a Nx1 matrix you can always use "submatrix" to achieve this.

Mike Anderson

unread,
Apr 2, 2013, 7:32:16 PM4/2/13
to numerica...@googlegroups.com
I think that get-row and get-column should always reduce dimensionality by 1.

This is for several reasons:
- consistency with other operations like "slice"
- potentially more efficient (1D vectors can have less overhead than 2D matrices)
- mathematically makes sense, e.g. you often want to do vector dot products with matrix rows or columns.

Comments on the specific examples below:

On Tuesday, 2 April 2013 22:29:57 UTC+8, Edmund Jackson wrote:
Hi,

   I'm attempting to push clatrix up to the new release of core.matrix.  I keep becoming tangled in issues of size and dimensionality. 

This is currently how it does things, and makes sense to me.  Is it what the API wants ?

  (def M (matrix [[1 2 3] [4 5 6] [7 8 9]]))  ;; 3x3 Matrix
  (get-shape M) ;; [3 3]
  (dimensionality M) ;; 2
 
  (def r (get-row M 0)) ;; 1x3 Matrix
  (get-shape r)  ;; [1 3]
  (dimensionality r) ;; 2, as its a matrix.  Should this be one ?

I think it should be dimensionality 1, shape [3]
 

  (get-column r 1) ;; 1x1 matrix: [2]

should be an error if r is a 1D vector (1D vectors don't have columns...)

Mike Anderson

unread,
Apr 2, 2013, 7:38:19 PM4/2/13
to numerica...@googlegroups.com
Here's what happens with vectorz-clj, for comparison:

(def m (matrix [[1 2 3] [4 5 6] [7 8 9]]))
=> #<Matrix33 [[1.0,2.0,3.0][4.0,5.0,6.0][7.0,8.0,9.0]]>

(def r (get-row m 0))
=> #<MatrixRow [1.0,2.0,3.0]>

(def s (get-row (get-row m 0) 1))
=> #<VectorIndexScalar mikera.vectorz.impl.VectorIndexScalar@40000000>
;; A scalar value... hmmm must fix the toString representation.

(mget s)   ;; zero dimensional lookup
=> 2.0

Why the scalar wrapper? mostly becuase vectorz-clj supports 0-D views so you can use the indexed scalar to alter the originla array

(mset! s 10.0)
m
=> #<Matrix33 [[1.0,10.0,3.0][4.0,5.0,6.0][7.0,8.0,9.0]]>

The 0-D scalar views are an optional feature of course - it would be fine to just return the double value itself.

On Tuesday, 2 April 2013 22:29:57 UTC+8, Edmund Jackson wrote:

Mike Anderson

unread,
Apr 2, 2013, 9:04:16 PM4/2/13
to numerica...@googlegroups.com
On Wednesday, 3 April 2013 03:55:19 UTC+8, Simone Mosciatti wrote:
That is a good point...

Matrix 1xN are matrix or vector ?

It's a matrix with dimensionality 2. It is of course isomorphic to an N-element vector.


And there are any difference between 1xN and Nx1 (horizontal and vertical vector) ?

Yes - they are different shapes entirely. In particular, you will get very different results doing matrix multiplication.
 

Matrix 1xN are equals to vector N with the same elements, and whats about Nx1 matrix ?

All three are a bit different. 

Some examples with mul are useful:

(mul [[1 2 3]] [1 2 3])   ;; 1x3 * 3
=> [14]  ;; 1 element vector

(mul [[1 2 3]] [[1] [2] [3]])  ;; 1x3 * 3x1
=> [[14]]   ;; 1x1 matrix

(mul [[1] [2] [3]] [[1 2 3]] )  ;; 3x1 * 1x3
=> [[1 2 3] [2 4 6] [3 6 9]]    ;; 3x3 matrix result

Mike Anderson

unread,
Apr 3, 2013, 1:18:36 AM4/3/13
to numerica...@googlegroups.com
Hey Edmund,

Just been hacking around with Clatrix and I think the best way to solve these issues is to create a separate Vector deftype that wraps a Nx1 DoubleMatrix. That way it makes the distinction clear, and you don't need to do all the nasty conditional checks to see if you are dealing with a 1D vector or 2D matrix.

I'll see If I can get the tests to pass and push up a proof-of-concept branch.

BTW it looks like JBlas doesn't support mutable views.... is my understanding correct? Fine if that is the case, it just means that clatrix will mostly behave like an immutable matrix library from a core.matrix perspective.

  Mike.


On Tuesday, 2 April 2013 22:29:57 UTC+8, Edmund Jackson wrote:

Edmund Jackson

unread,
Apr 3, 2013, 4:06:01 AM4/3/13
to numerica...@googlegroups.com
So here's the thing: JBlas has no concept of Vectors, everything is a 2D matrix, so I'm hesitant to try bolt a Vector type onto it as I'm unlikely to cover all the cases.

1. What advantage does Vector offer us ?
 - potentially more efficient (1D vectors can have less overhead than 2D matrices). 
   - Not to JBlas, its all the same.

- mathematically makes sense, e.g. you often want to do vector dot products with matrix rows or columns.
 - the same operations are well defined for matrix rows/columns.

2. Can we do without it in core.matrix ? 

FWIW: Here are the examples in clatrix (the contents of the matrices are obviously the same !)

(* (matrix [[1 2 3]]) (matrix [1 2 3]))       ;; 1x3 * 3x1 -> 1x1 matrix
(* (matrix [[1 2 3]]) (matrix [[1] [2] [3]])) ;; 1x1 * 3x1 -> 1x1 matrix
(* (matrix [[1] [2] [3]]) (matrix [[1 2 3]])) ;; 3x1 * 1*3 -> 3x3 matrix

Mutable views: unsure, will look it up.

Mike Anderson

unread,
Apr 3, 2013, 4:44:45 AM4/3/13
to numerica...@googlegroups.com
Agreed, you can do everything with 2D matrices if you like. There is a question about whether it would confuse users who expect 1D vectors to work though. 

What do you do with things like [1 2 3] for example? Silently convert them into 3x1 or 1x3 matrices? I think you'll find that the semantics get ill-defined if you start doing things like that.....

Anyway, I made an experimental proof of concept with a Vector type that wraps a JBlas DoubleMatrix, and otherwise pretends to be a 1D vector.

Take a look, see if you like it. The branch probably has some bugs still, but at least it passes the core.matrix compliance tests for both 2D and 1D vectors.


--
You received this message because you are subscribed to the Google Groups "Numerical Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numerical-cloj...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Edmund Jackson

unread,
Apr 3, 2013, 7:02:07 AM4/3/13
to numerica...@googlegroups.com
Thanks.  I took a look at the branch.  Its certainly worth thinking about.

My concerns are
- There a lot of repetition.  I wonder if this can be cut down cleverly.
- Keeping the Vector separate from the Matrix is hard.  For instance all the PMathsFunctions spit out Matrix, as do some of constructors like rnorm and constant.

I guess I'm pushing back because I still don't understand why we need a Vector at all.  What does it do ?  Why does it exists ?   Please help me to understand.

Mike Anderson

unread,
Apr 3, 2013, 7:45:53 AM4/3/13
to numerica...@googlegroups.com
On 3 April 2013 19:02, Edmund Jackson <edmunds...@gmail.com> wrote:
Thanks.  I took a look at the branch.  Its certainly worth thinking about.

My concerns are
- There a lot of repetition.  I wonder if this can be cut down cleverly.

Probably. Some of the protocol implementations only apply to one of matrices or vectors, so quite a few could be culled. Could also use code generation for some of it, though I think this is overkill for just 2 cases.
 
- Keeping the Vector separate from the Matrix is hard.  For instance all the PMathsFunctions spit out Matrix, as do some of constructors like rnorm and constant.

Hmmm yes I missed these. Should just be a case of making the functions spit out whatever they get passed in?
 

I guess I'm pushing back because I still don't understand why we need a Vector at all.  What does it do ?  Why does it exists ?   Please help me to understand.

To be clear: it isn't strictly needed, and it isn't a requirement for core.matrix. You can, as you rightly observe, do everything with 2D matrices.

Nevertheless, many (most?) matrix implementations do have the concept of 1D vectors, and it's a natural consequence of N-Dimensional array support that they exist. It would be a very odd NDArray implementation that supported 2,3,4.... dimensions but not 1. Thus it's essential that core.matrix supports 1D vectors in the API.

Also I like the clear logical distinction between matrices and vectors, though I freely admit that might just be my own subjective opinion :-) To me it's like the comparing a double[1][100] and a double[100] - they contain the same data but have logically quite different structure.

Anyway here's where it gets tricky: given the API is designed around having 1D vectors available, various functions assume 1D vectors rather than "simulating" them with matrices. So while you can multiply two 2D matrices together to "simulate" a vector dot product, that isn't the way dot products are expected to work in the API. e.g: you're actually creating a 1x1 matrix result which contradicts the API's contract that a dot product should always produce a scalar value. slice is another good example: the API assumes that doing a slice always produces a result with dimensionality reduced by 1. You can't implement slice correctly if you always return a 2D matrix.

So a key issue here is really around how to make sure we don't violate any API contracts and ensure we get consistent behaviour across implementations. I think it's a big no-go to have inconsistent API contracts and/or implementations freely violating the contracts that exist.

So for Clatrix I think we have a couple of options:
a) Provide 1D vector support, using something like what I hacked together
b) Support 2D only, and provide separate functions to simulate the 1D operations with 2D matrices (i.e. avoiding the core.matrix API so we don't violate any contracts)

My ingoing preference is for a) since I think it's manageable and a better solution for users than b) which implies them having to use a separate Clatrix-specific API.

Interestingly BLAS has vectors, they just don't seem to have made the jump to jblas.....








Daniel Slutsky

unread,
Apr 3, 2013, 8:14:29 AM4/3/13
to numerica...@googlegroups.com
It might be interesting here to learn some lesson from the design decisions of the R language.

Indeed, such dimensionality issues are responsible for some of the good and bad surprises in R, which determine much of the ease, difficulty, consistency and inconsistency in everyday use of that language.

See, for example, this discussion:
http://radfordneal.wordpress.com/2008/08/20/design-flaws-in-r-2-%E2%80%94-dropped-dimensions/



Mike Anderson

unread,
Apr 3, 2013, 8:36:18 AM4/3/13
to numerica...@googlegroups.com
Hmmm great link Daniel - that is a horrendous design flaw indeed!

Basically that's what you get when you create functions with inconsistent behaviour  / strange API contracts (sometimes it drops a dimension, sometimes it doesn't.....)

I'm hoping we can avoid such nastiness in core.matrix, we just need to be very vigilant and strict about definitions to ensure nothing like this creeps in.

Edmund Jackson

unread,
Apr 3, 2013, 8:53:04 AM4/3/13
to numerica...@googlegroups.com
Right!  Are we not doing precisely the same thing with slice: Matrix -> Vector in the case of 2D Matrices, but Matrix -> Matrix when nD ?  It looks like the drop=TRUE case from R.  The same bug he identifies comes up if we have a function defined for Matrix but not Vector.

Mike Anderson

unread,
Apr 3, 2013, 9:37:20 AM4/3/13
to numerica...@googlegroups.com
slice is totally consistent - it *always* goes from N dimensions to N-1 dimensions, for any value of N. That's the purpose of slice.

If you want to keep the same number of dimensions, then there is the "submatrix" function in core.matrix, which always goes from N dimensions to N dimensions. You can use submatrix to select a specific column or row as a 2D matrix if you like.

Edmund Jackson

unread,
Apr 3, 2013, 12:10:14 PM4/3/13
to numerica...@googlegroups.com
My concern isn't that, its that its changing types to Vector from Matrix.  This is precisely what the [ ,drop=TRUE] is doing in R. Nobody has yet said why we need a Vector - I don't understand it.  How does it behave differently to 1xn on nx1, and why is that beneficial ?

I hate to hold up Matlab as an example but it is a pretty good matrix algebra abstraction that does without such a type.

I hope I'm not sounding argumentative for its own sake, I genuinely want to understand this.

Robert Lachlan

unread,
Apr 3, 2013, 12:51:56 PM4/3/13
to numerica...@googlegroups.com
This has me worried.   This definition of "slice" (always going from N to N-1 dimensions) isn't consistent with what I'd expect coming from python, go etc.  Slicing is ubiquitous, and the most common usage is from 1-dimension to 1-dimension.  In numpy, with NDArrays, this is usually called slicing:

>>> x = reshape(range(16), (4, 4))
>>> x
array([[ 0,  1,  2,  3],
            [ 4,  5,  6,  7],
            [ 8,  9, 10, 11],
            [12, 13, 14, 15]])
>>> x[0:2,0:2]
array([[0, 1],
            [4, 5]])

So while slice as one dimension less makes loads of intuitive sense, I'm very worried about it not being what people have come to expect from slicing.

Konrad Hinsen

unread,
Apr 3, 2013, 1:05:16 PM4/3/13
to numerica...@googlegroups.com
Edmund Jackson writes:

> My concern isn't that, its that its changing types to Vector from
> Matrix. This is precisely what the [ ,drop=TRUE] is doing in R.

There are two important differences:

1) We don't have the "drop" argument. We have slice which behaves like
R's "drop=True" and submatrix which behaves like R's
"drop=False". They are distinct operations.

2) In R, the default behaviour is "drop=True" if the result of a
slicing operation has length 1, and "drop=False" otherwise. The
number of dimensions of the result depends on how much of the input
array was selected in the slice. That's the design flaw in R that
the blog post refers to. You can't easily write an algorithm that
selects, say, the first half of a matrix (the first N/2 rows), because
you have to handle the cases N=2 and N=3 specially.


> Nobody has yet said why we need a Vector - I don't understand it.
> How does it behave differently to 1xn on nx1, and why is that
> beneficial ?

I'll try to explain it in terms of equivalent Clojure vectors:

vector: [1 2 3]
N x 1 matrix: [[1] [2] [3]]
1 x N matrix: [[1 2 3]]

This is how vectors behave differently from row and column matrices.

Not making the distinction between these values is the same as
implicitly flattening nested vectors in all operations. Clojure
doesn't do that, for good reasons. A list of numbers is not the same
as a list of lists of numbers.

Suppose we didn't have vectors, only 2D matrices. What would the matrix
equivalent of (range 10) be? An Nx1 matrix or a 1xN matrix? Why one and
not the other? And why not a three-dimensional Nx1x1 array?

Ultimately, the answer to your question is: decades of experience with
array programming, starting with APL in the 1960's, have shown that it
is useful to keep the distinction between arrays of different rank but
identical elements.

Konrad.

Edmund Jackson

unread,
Apr 3, 2013, 4:37:03 PM4/3/13
to numerica...@googlegroups.com
Konrad,

 Thanks for a really lucid answer.  I'm starting to get it.  I understand that [1 2 3] is distinct from the other two forms, and relying on convention to unflatten to a column/row matrix doesn't necessarily make sense.

So to follow on: 
  - In linear algebra a vector is always 1xn on nx1.   So operations like x'*x and x*x' (where ' denotes transpose and * matrix multiplication) are distinct operations.  How does a Vector behave in these operations, or do they not apply ?  In general what is a Vector in terms of linear algebra ?
  - When should a user expect/use a Vector rather than a row/column matrix ?
 
Edmund

Mike Anderson

unread,
Apr 3, 2013, 7:57:36 PM4/3/13
to numerica...@googlegroups.com
On Thursday, 4 April 2013 04:37:03 UTC+8, Edmund Jackson wrote:
Konrad,

 Thanks for a really lucid answer.  I'm starting to get it.  I understand that [1 2 3] is distinct from the other two forms, and relying on convention to unflatten to a column/row matrix doesn't necessarily make sense.

So to follow on: 
  - In linear algebra a vector is always 1xn on nx1.   So operations like x'*x and x*x' (where ' denotes transpose and * matrix multiplication) are distinct operations.  How does a Vector behave in these operations, or do they not apply ?  In general what is a Vector in terms of linear algebra ?

An important point about a vector is that it only has one dimension, so x == transpose(x) (given the conventional definition of transpose that it reverses the dimensions). so both x*x' and x'*x are the same. Of course, * could be either an inner product or an outer product which determines whether you get a scalar or a matrix result.

When you multiply vectors with matrices, the order matters. If you put the vector first, it behaves like a row matrix (x' * M) If you put the vector second, it behaves like a column matrix (M * x). In both cases the result is a vector.

Having said all that, I personally tend to think of vectors as column vectors by default since:
- that is how they are most commonly presented in mathematics texts
- it matches the behaviour you expect when you put the matrix first in multiplications (which seems to be the most common convention).
- it matches the intuition that "slices" returns a sequence of the individual rows (in this case scalars)
 
  - When should a user expect/use a Vector rather than a row/column matrix ?

It's sort of up to the user, given that they understand the equivalences. 

But as a data point, I almost never use row / column matrices: whenever something is 1xN or Nx1 I generally find it is simpler to use a vector instead. I find this is a useful approach:
- It catches some runtime errors where the code expects a vector but got passed a matrix instead
- Accessing a vector is simpler than accessing a matrix (only 1D indexing needed)
- Performance of vectors is marginally better (in vectorz-clj at least..... probably true of other implementations too)

This has been a useful conversation to clarify things, I think I might copy some of the discussion into a Wiki page.....


Mike Anderson

unread,
Apr 3, 2013, 9:01:54 PM4/3/13
to numerica...@googlegroups.com
Maybe this is a naming problem then.... happy to change if we can come up with a better naming scheme that fits better with general expectations.

the python slice is more like what is currently called "submatrix" in core.matrix:

(def a (reshape (range 16) [4 4]))
=> [[0 1 2 3] [4 5 6 7] [8 9 10 11] [12 13 14 15]]

(submatrix a [[0 2] [0 2]])
=> #<NDWrapper [[0 1] [4 5]]>

I still think we need the current "slice" and "slices" operations, but I'm personally OK if we call them something else at this stage: one of the key things to do during this incubation period for core.matrix is get all the naming right.

Edmund Jackson

unread,
Apr 4, 2013, 4:09:34 AM4/4/13
to numerica...@googlegroups.com
I think a wiki page would be very helpful for people like myself who've not met Vector before.

 I'll turn this over for a while and see if we can simplify the clatrix implementation.

Konrad Hinsen

unread,
Apr 4, 2013, 4:36:54 AM4/4/13
to numerica...@googlegroups.com
Edmund Jackson writes:

> So to follow on:
> - In linear algebra a vector is always 1xn on nx1. So operations like x'*x and x*x'
> (where ' denotes transpose and * matrix multiplication) are distinct operations. How
> does a Vector behave in these operations, or do they not apply ? In general what is a
> Vector in terms of linear algebra ?

There are two conventions in use. The proponents of each one consider
theirs the "obviously right" one and the other one "strange".

One convention is the one you describe: linear algebra deals with
matrices, which are 2D objects. There are "row vectors" and "column vectors"
which are special cases of matrices having only one row/column. This convention
is probably the most familiar one because it is most frequently used in textbooks
for scientists and engineers.

The other convention starts from the definition of a vector space,
whose elements are vectors, and goes on to define transformations and
mappings, which transform one vector to another vector and thus need
two indices. Higher-order operations (-> tensors) take even more
indices. Vectors and transformations are abstract objects,
representations are defined in a second step. In numerical
applications, these representations are 1D and 2D arrays usually
called "vectors" and "matrices". The terms "row vector" and "column
vector" are never used, nor are transposes of vectors.

The difficulty for language implementers is that they have to choose
between the approach that is familiar to most potential users (the
first one) and the one that is more flexible because it extends to
higher dimensions (the second one). Matlab and R have chosen the first
one, APL and NumPy the second one. NumPy has later added a "Matrix"
class, which is a wrapper around arrays, to cater for adepts of the
first convention, but I don't see it much used in practice.

I have been arguing for the second approach on this list since the
beginning for two reasons: First, I need it because I work a lot with
higher-dimensional arrays. Second, it is more flexible, to the point
that one can always define an additional API layer implementing the
first convention in terms of the second one.

> - When should a user expect/use a Vector rather than a row/column matrix ?

When the result of an operation is by definition one-dimensional, it
should be a vector. Matrices with a single column or row should only occur
as special cases of general matrices.

For example, matrix-vector multiplication should return a vector and
matrix-matrix multiplication should return a matrix, such that the
rank of the input arguments determines the rank of the result, no
matter what the concrete shapes are.

Konrad.

Edmund Jackson

unread,
Apr 4, 2013, 12:40:12 PM4/4/13
to numerica...@googlegroups.com
OK, I get it, I'm simply seeing the world from a different upbringing.  I'm in favour of doing the most flexible thing implementation wise, so will get behind these vectors.  I'll have a think about how to do this tidily in clatrix.

I'm in favour of clearly documenting this.

Thanks again for a very clear and thoughtful answer Konrad.

Mike Anderson

unread,
Apr 4, 2013, 9:03:20 PM4/4/13
to numerica...@googlegroups.com
Great stuff.

I've set up a Wiki: so please feel free to add whatever you think is most useful!

Also on the documentation front I've been going through and trying to improve the docstrings in core.matrix to make the API contracts clearer. Any extra help on this / doc patches much appreciated - when you spend a lot of time in a code base it is easy to forget what is and what isn't obvious to a newcomer.....
Reply all
Reply to author
Forward
0 new messages