What was (is?) slow was a call in an untyped module A to a function exported
from a typed module B. The functions in B must check at runtime that
the values coming from A are of the correct type. If the A was written
in Typed Racket, the types would be known at compile time.
Here math/matrix is written in Typed Racket, so if you are writing an
untyped module, you will in general want to minimize the use of,say,
maxtrix-ref. Instead operations that works on entire matrices or
row/columns are preferred.
> (: sum : Integer Integer -> Flonum)
> (define (sum i n)
> (let loop ((j 0) (acc 0.0))
> (if (>= j mx) acc
> (loop (+ j 1) (+ acc (matrix-ref A i j))) )))
>
> (: b : (Matrix Flonum))
> (define b (build-matrix mx 1 sum))
The matrix b contains the sums of each row in the matrix.
Since matrices are a subset of arrays, you can use array-axis-sum,
which computes sum along a given axis (i.e. a row or a column when
speaking of matrices).
(define A (matrix [[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]))
> (array-axis-sum A 1)
- : (Array Flonum)
(array #[3.0 12.0 21.0])
However as Eric points out, matrix-solve is an O(n^3) algorithm,
so the majority of the time is spent in matrix-solve.
Apart from finding a way to exploit the relationship between your
matrix A and the column vector b, I see no obvious way of
speeding up the code.
Note that when you benchmark with
time racket matrix.rkt
you will include startup and compilation time.
Therefore if you want to time the matrix code,
insert a literal (time ...) call.
--
Jens Axel Søgaard
The math/matrix library uses the arrays from math/array to represent matrices.
If you want to try the same representation as Bigloo, you could try Will Farr's
matrix library:
I am interested in hearing the results.
/Jens Axel
--
at see if something can be improved.
/Jens Axel
Neil ⊥
____________________
#lang typed/racket
(require math/matrix
math/array
math/private/matrix/utils
math/private/vector/vector-mutate
math/private/unsafe
(only-in racket/unsafe/ops unsafe-fl/)
racket/fixnum
racket/list)
(define-type Pivoting (U 'first 'partial))
(: flonum-matrix-gauss-elim
(case-> ((Matrix Flonum) -> (Values (Matrix Flonum) (Listof Index)))
((Matrix Flonum) Any -> (Values (Matrix Flonum) (Listof Index)))
((Matrix Flonum) Any Any -> (Values (Matrix Flonum) (Listof Index)))
((Matrix Flonum) Any Any Pivoting -> (Values (Matrix
Flonum) (Listof Index)))))
(define (flonum-matrix-gauss-elim M [jordan? #f] [unitize-pivot? #f]
[pivoting 'partial])
(define-values (m n) (matrix-shape M))
(define rows (matrix->vector* M))
(let loop ([#{i : Nonnegative-Fixnum} 0]
[#{j : Nonnegative-Fixnum} 0]
[#{without-pivot : (Listof Index)} empty])
(cond
[(j . fx>= . n)
(values (vector*->matrix rows)
(reverse without-pivot))]
[(i . fx>= . m)
(values (vector*->matrix rows)
;; None of the rest of the columns can have pivots
(let loop ([#{j : Nonnegative-Fixnum} j] [without-pivot
without-pivot])
(cond [(j . fx< . n) (loop (fx+ j 1) (cons j without-pivot))]
[else (reverse without-pivot)])))]
[else
(define-values (p pivot)
(case pivoting
[(partial) (find-partial-pivot rows m i j)]
[(first) (find-first-pivot rows m i j)]))
(cond
[(zero? pivot) (loop i (fx+ j 1) (cons j without-pivot))]
[else
;; Swap pivot row with current
(vector-swap! rows i p)
;; Possibly unitize the new current row
(let ([pivot (if unitize-pivot?
(begin (vector-scale! (unsafe-vector-ref rows i)
(unsafe-fl/ 1. pivot))
(unsafe-fl/ pivot pivot))
pivot)])
(elim-rows! rows m i j pivot (if jordan? 0 (fx+ i 1)))
(loop (fx+ i 1) (fx+ j 1) without-pivot))])])))
(: flonum-matrix-solve
(All (A) (case->
((Matrix Flonum) (Matrix Flonum) -> (Matrix Flonum))
((Matrix Flonum) (Matrix Flonum) (-> A) -> (U A (Matrix
Flonum))))))
(define flonum-matrix-solve
(case-lambda
[(M B) (flonum-matrix-solve
M B (λ () (raise-argument-error 'flonum-matrix-solve
"matrix-invertible?" 0 M B)))]
[(M B fail)
(define m (square-matrix-size M))
(define-values (s t) (matrix-shape B))
(cond [(= m s)
(define-values (IX wps)
(parameterize ([array-strictness #f])
(flonum-matrix-gauss-elim (matrix-augment (list M B)) #t #t)))
(cond [(and (not (empty? wps)) (= (first wps) m))
(submatrix IX (::) (:: m #f))]
[else (fail)])]
[else
(error 'flonum-matrix-solve
"matrices must have the same number of rows; given ~e and ~e"
M B)])]))
(: mx Index)
(define mx 600)
(: r (Index Index -> Flonum))
(define (r i j) (random))
(: A : (Matrix Flonum))
(define A (build-matrix mx mx r))
(: sum : Integer Integer -> Flonum)
(define (sum i n)
(let loop ((j 0) (acc 0.0))
(if (>= j mx) acc
(loop (+ j 1) (+ acc (matrix-ref A i j))) )))
(: b : (Matrix Flonum))
(define b (build-matrix mx 1 sum))
(time
(let [(m (flonum-matrix-solve A b))]
(matrix-ref m 0 0)))
(time
(let [(m (matrix-solve A b))]
(matrix-ref m 0 0)))
(time
(let [(m (flonum-matrix-solve A b))]
(matrix-ref m 0 0)))
(time
(let [(m (matrix-solve A b))]
(matrix-ref m 0 0)))
(time
(let [(m (flonum-matrix-solve A b))]
(matrix-ref m 0 0)))
(time
(let [(m (matrix-solve A b))]
(matrix-ref m 0 0)))
You were absolute right. The version below cuts the time in half.
It is mostly cut and paste from existing functions and removing
non-Flonum cases.
/Jens Axel
#lang typed/racket
(require math/matrix
math/array
math/private/matrix/utils
math/private/vector/vector-mutate
math/private/unsafe
(only-in racket/unsafe/ops unsafe-fl/)
racket/fixnum
racket/flonum
racket/list)
(flonum-elim-rows! rows m i j pivot (if jordan? 0 (fx+ i 1)))
(loop (fx+ i 1) (fx+ j 1) without-pivot))])])))
(: flonum-elim-rows!
((Vectorof (Vectorof Flonum)) Index Index Index Flonum
Nonnegative-Fixnum -> Void))
(define (flonum-elim-rows! rows m i j pivot start)
(define row_i (unsafe-vector-ref rows i))
(let loop ([#{l : Nonnegative-Fixnum} start])
(when (l . fx< . m)
(unless (l . fx= . i)
(define row_l (unsafe-vector-ref rows l))
(define x_lj (unsafe-vector-ref row_l j))
(unless (= x_lj 0)
(flonum-vector-scaled-add! row_l row_i (fl* -1. (fl/ x_lj pivot)) j)
;; Make sure the element below the pivot is zero
(unsafe-vector-set! row_l j (- x_lj x_lj))))
(loop (fx+ l 1)))))
(: flonum-matrix-solve
(All (A) (case->
((Matrix Flonum) (Matrix Flonum) -> (Matrix Flonum))
((Matrix Flonum) (Matrix Flonum) (-> A) -> (U A (Matrix
Flonum))))))
(define flonum-matrix-solve
(case-lambda
[(M B) (flonum-matrix-solve
M B (λ () (raise-argument-error 'flonum-matrix-solve
"matrix-invertible?" 0 M B)))]
[(M B fail)
(define m (square-matrix-size M))
(define-values (s t) (matrix-shape B))
(cond [(= m s)
(define-values (IX wps)
(parameterize ([array-strictness #f])
(flonum-matrix-gauss-elim (matrix-augment (list M B)) #t #t)))
(cond [(and (not (empty? wps)) (= (first wps) m))
(submatrix IX (::) (:: m #f))]
[else (fail)])]
[else
(error 'flonum-matrix-solve
"matrices must have the same number of rows; given ~e and ~e"
M B)])]))
(define-syntax-rule (flonum-vector-generic-scaled-add! vs0-expr
vs1-expr v-expr start-expr + *)
(let* ([vs0 vs0-expr]
[vs1 vs1-expr]
[v v-expr]
[n (fxmin (vector-length vs0) (vector-length vs1))])
(let loop ([#{i : Nonnegative-Fixnum} (fxmin start-expr n)])
(if (i . fx< . n)
(begin (unsafe-vector-set! vs0 i (+ (unsafe-vector-ref vs0 i)
(* (unsafe-vector-ref vs1 i) v)))
(loop (fx+ i 1)))
(void)))))
(: flonum-vector-scaled-add!
(case-> ((Vectorof Flonum) (Vectorof Flonum) Flonum -> Void)
((Vectorof Flonum) (Vectorof Flonum) Flonum Index -> Void)))
(define (flonum-vector-scaled-add! vs0 vs1 s [start 0])
(flonum-vector-generic-scaled-add! vs0 vs1 s start + *))
/Jens Axel
The question is now how to automate this sort of thing.
/Jens Axel
(: flonum-matrix? (All (A) (-> (Matrix A) Boolean : (Matrix Flonum))))
Then `matrix-solve` could dispatch to `flmatrix-solve` and still be
well-typed. We could/should do something similar for every operation for
which checking flonum-ness is cheap compared to computing the result,
which at least includes everything O(n^3).
One thing we should really do is get your LAPACK FFI into the math
library and have `flmatrix-solve` use that, but fail over to Racket code
systems that don't have LAPACK. If I remember right, it would have to
transpose the data because LAPACK is column-major.
Some thoughts, in no particular order:
1. Because of transposition and FFI overhead, there's a matrix size
threshold under which we ideally should use the code below, even on
systems with LAPACK installed.
2. Because of small differences in how it finds pivots, LAPACK's
solver can return slightly different results. Should we worry about that
at all?
3. A design decision: if a matrix contains just one flonum, should we
convert it to (Matrix Flonum) and solve it quickly with
`flmatrix-solve`, or use the current `matrix-solve` to preserve some of
its exactness?
I lean toward regarding a matrix with one flonum as a flonum matrix.
It's definitely easier to write library code for, and would make it
easier for users to predict when a result entry will be exact.
Currently, we have this somewhat confusing situation, in which how
pivots are chosen determines which result entries are exact:
> (matrix-row-echelon (matrix ([1 2 3] [4.0 5 4])) #t #t 'first)
(mutable-array #[#[1 0 -2.333333333333333]
#[-0.0 1.0 2.6666666666666665]])
> (matrix-row-echelon (matrix ([1 2 3] [4.0 5 4])) #t #t 'partial)
(mutable-array #[#[1.0 0.0 -2.333333333333333]
#[0 1.0 2.6666666666666665]])
I doubt this has caused problems for anyone, but it bothers me a little.
Neil ⊥
> We need a predicate like
>
> (: flonum-matrix? (All (A) (-> (Matrix A) Boolean : (Matrix Flonum))))
I think in our world of types we could even have
(: flonum-matrix? (All (A) (-> (Matrix A) Boolean : (TriangularMatrix A))))
and such and then dispatch to even more special solvers. It's kind of like a number hierarchy generalization. Just a thought.
See my post in this thread 3 days ago. It is attached.
The code works (to my knowledge), but improvements are
certainly possible. For example, it ought to be reasonable
straightforward to support more than Flonum matrices.
That said, I hope someone takes the code and turns it into
something more.
I have tested it on OS X, on other systems you might
need add names/paths of the libraries in question.
> > Some thoughts, in no particular order:
> >
> > 1. Because of transposition and FFI overhead, there's a matrix size
> > threshold under which we ideally should use the code below, even on
> > systems with LAPACK installed.
>
> Transposition can be avoided in many practically relevant situations
> (e.g. symmetric matrices), so such decisions should be taken on a
> case-by-case basis.
The problem is that (Matrix Flonum) and LAPACK has different
representation for matrices. In order to take a (Matrix Flonum)
and turn it into a LAPACK one, the entries in the underlying vector
must be transposed.
If we(I) had realized it sooner, the representation in (Matrix Flonum)
would have been the same.
> As for the FFI overhead, even after reading the introduction to the FFI
> documentation I have no clear idea of how important it is. It seems
> (but I am not at all certain) that there are different vector-like
> data structures, some of which are optimized for access from Racket and
> others for access via C pointers. If that's true, it may be of interest
> to have a constructor for arrays and matrices that live in "C-optimized"
> space. Or even in "Fortran-optimized" space, using column-major storage.
I think it is negligible for all but very small matrices.
> > 2. Because of small differences in how it finds pivots, LAPACK's
> > solver can return slightly different results. Should we worry about that
> > at all?
>
> I'd say documenting the issue is sufficient.
Agree.
> > 3. A design decision: if a matrix contains just one flonum, should we
> > convert it to (Matrix Flonum) and solve it quickly with
> > `flmatrix-solve`, or use the current `matrix-solve` to preserve some of
> > its exactness?
> >
> > I lean toward regarding a matrix with one flonum as a flonum matrix.
> > It's definitely easier to write library code for, and would make it
> > easier for users to predict when a result entry will be exact.
>
> I agree. I'd like to see a clear use case for any other approach.
> Predictability is important.
Following the inexactness-is-contagious principle, I agree.
--
Jens Axel Søgaard
I think it was a mistake to represent matrices simply as naked arrays.
If the representation were
(struct matrix (representation-type representation properties))
then one could take advantage of the properties of the matrix.
E.g. for a symmetric matrix it is enough to store the upper triangular
part (the same goes for a triangular matrix).
This would also allow mixing computation between ones representated
as two dimensional arrays and LAPACK ones.
Problems using a dispatch based on inspecting the matrix values:
* there are two many interesting properties to check
(symmetric, hermition, upper/lower-triangular, sparse)
* times saved is less than the time used to check
(consider the case of adding two symmetric matrices)