Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

changing operator associativity in matrix multiplication

25 views

Skip to first unread message

Edd

unread,

Sep 27, 2007, 9:36:37 PM9/27/07

Hi folks,

I'm putting together a little fixed-size matrix and vector library for
some game/simulation software I'm going to be working on. I've got one
in reserve, but I thought I'd experiment with a different approach for
a giggle.

Essentially, the new library will revolve around concepts -- anything
that looks like a vector can be used as one. Same for matrices.
Roughly speaking, a type M models the concept of a Matrix if each
object m of that type supports access of elements using the syntax
m(i, j). There are a bunch of class-scope typedefs and constants
required of a model, too, but they aren't really relevant here, except
to say that they are of course known at compile time (which helps with
TMP).

Now, if P and Q are each of a type that models the Matrix concept, and
v is of a type that models the Vector concept, then the usual C++
precedence and associativity rules make the following calculation less
efficient than it needs to be:

P * Q * v

as it is equivalent to (P * Q) * v, rather than P * (Q * v), which
would involve fewer arithmetic operations. So what I've done is to
have the operation P * Q create an object of type
delayed_matrix_multiply<typeof(P), typeof(Q)>, which is also a model
of the Matrix concept, that simply references P and Q and does not
perform any multiplication at the point of construction.

If a delayed_matrix_multiply<> is multiplied with a vector, the code
knows to take the referenced "right hand side" of the
delayed_matrix_multiply<> (Q in the example) and multiply with the
vector before multiplying in the referenced "left hand side" (P in the
example).

This seems to be working nicely, including cases where there are more
than two matrices involved (though I confess I haven't benchmarked any
of this, yet). There are some extensions to be made, such as
"optimising the associativity" in the general case of arbitrary
chained matrix multiplication (see
http://en.wikipedia.org/wiki/Chain_matrix_multiplication
for example). But with a little more effort I think I can obtain the
optimal sequence of multiplications at compile-time with a generous
sprinkling of TMP. However, a problem arises here. Consider:

// P, Q and R are each of a type that models the Matrix concept.
// matrix<> does too.
matrix<double, 4, 4> S(P * Q * R); // note: no vector involved

This calculation will generate a
delayed_matrix_multiply<delayed_matrix_multiply<M1, M2>, M3>, which is
then used to construct S. This constructor simply fills its internal
array by accessing the elements of (P * Q * R) via the (i, j) syntax.
But here, identical calculations are performed multiple times[1], as
each element access essentially involves performing a large number of
inner products of rows with columns. Not good.

I'm wondering if anyone can see a way of working around this
inefficiency? Or should I just go back to doing things the easy way
and use brackets to manually affect associativity?

FWIW, I have looked at Blitz++ and boost::ublas and I *think* I'm
correct in saying that neither attempts the same kind "associativity
munging".

I hope that made sense :)

Thanks,

Edd

[1] consider:

[x y] * [a b] = [xa+yc xb+yd]
[z w] [c d] [za+wc zb+wd]

Note how each element of either operand is accessed twice. If this
access involves a calculation (such as an row/column inner product),
the multiplication operation is far more costly than it needs to be.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Kai-Uwe Bux

unread,

Oct 1, 2007, 11:41:32 PM10/1/07

Edd wrote:

Absolutely. Note that not only efficiency is concerned. Also numerical
accuracy can depend on the grouping. You don't want these issues to be
decided by the compiler.

> FWIW, I have looked at Blitz++ and boost::ublas and I *think* I'm
> correct in saying that neither attempts the same kind "associativity
> munging".
>
> I hope that made sense :)

Well, it nevertheless is an interesting exercise in template programming. So
here is some code that does ruthless compile time term-regrouping to
minimize the number of multiplications.

#include <cstddef>
#include <iostream>
#include <string>
#include <cassert>

template < bool, typename A, typename B >
struct if_then_else;

template < typename A, typename B >
struct if_then_else<true,A,B> {

static bool const value = true;
typedef A type;

};

template < typename A, typename B >
struct if_then_else<false,A,B> {

static bool const value = false;
typedef B type;

};

template < std::size_t ROWS, std::size_t COLS >
class Matrix {

std::string name;

public:

static std::size_t const rows = ROWS;
static std::size_t const cols = COLS;
static std::size_t const cost = 0;
static std::size_t const size = 1;
static std::size_t const depth = 1;

std::size_t expense ( void ) const {
return ( cost );
}

Matrix ( std::string par )
: name ( par )
{}

friend
std::ostream & operator<< ( std::ostream & ostr,
Matrix const & m ) {
return ( ostr << m.name );
}

};

template < typename ExprA, typename ExprB >
struct Product {

typedef ExprA left;
typedef ExprB right;

left expr_a;
right expr_b;

static std::size_t const rows = left::rows;
static std::size_t const cols = right::cols;

static std::size_t const cost =
left::cost +
rows * cols * left::cols +
right::cost;

static std::size_t const size =
left::size + right::size;

static std::size_t const depth
= left::size > right::size ?
left::depth + 1 : right::depth + 1;

Product ( left const & a,
right const & b )
: expr_a ( a )
, expr_b ( b )
{}

std::size_t expense ( void ) const {
return ( cost );
}

friend
std::ostream & operator<< ( std::ostream & ostr,
Product const & m ) {
ostr << "( " << m.expr_a << " * " << m.expr_b;
return ( ostr << " @cost " << Product::cost << " )" );
}

};

template < typename ExprA, typename ExprB >
Product< ExprA, ExprB >
operator* ( ExprA const & lhs,
ExprB const & rhs ) {
return ( Product< ExprA, ExprB >( lhs, rhs ) );
}

template < typename Expr >
struct left_grouping;

template < typename Expr, std::size_t N >
struct right_N;

template < std::size_t K, std::size_t L >
struct left_grouping< Matrix<K,L> > {

typedef Matrix<K,L> type;

static
type convert ( type const & m ) {
return m;
}

};

template < typename ExprA, std::size_t L, std::size_t M >
struct left_grouping< Product< ExprA, Matrix<L,M> > > {

typedef typename left_grouping< ExprA >::type left;
typedef Matrix<L,M> right;
typedef Product< left, right > type;

typedef Product< ExprA, Matrix<L,M> > argument_type;
typedef type result_type;

static
result_type convert ( argument_type const & m ) {
left expr_a ( left_grouping< ExprA >::convert( m.expr_a ) );
right expr_b ( m.expr_b );
return ( result_type( expr_a, expr_b ) );
}

};

template < typename ExprA, typename ExprB >
struct left_grouping< Product< ExprA, ExprB > > {

typedef left_grouping< ExprA > GroupA;
typedef left_grouping< ExprB > GroupB;
typedef typename GroupA::type a;
typedef typename GroupB::left b;

typedef left_grouping< Product<a,b> > GroupLeft;
typedef typename GroupLeft::type left;
typedef typename GroupB::right right;
typedef Product< left, right > type;

typedef Product< ExprA, ExprB > argument_type;
typedef type result_type;

static
result_type convert ( argument_type const & m ) {
a expr_aa ( GroupA::convert( m.expr_a ) );
b expr_ab ( GroupB::convert( m.expr_b ).expr_a );
left expr_a
( GroupLeft::convert( Product< a, b >( expr_aa, expr_ab ) ) );
right expr_b
( GroupB::convert( m.expr_b ).expr_b );

return ( result_type( expr_a, expr_b ) );
}

};

template < typename Expr >
struct right_N< Expr, 1 > {

typedef left_grouping< Expr > LeftGroup;
typedef typename LeftGroup::type type;

static
type convert ( Expr const & m ) {
return ( LeftGroup::convert( m ) );
}

};

template < typename Expr >
struct right_N< Expr, 0 >;

template < typename Expr, std::size_t N >
struct right_N {

typedef right_N< Expr, N-1 > Previous;
typedef typename Previous::type::left::left left;
typedef typename Previous::type::left::right middle;
typedef Product< middle, typename Previous::type::right > raw_right;
typedef typename left_grouping< raw_right >::type right;
typedef Product< left, right > type;

typedef Expr argument_type;
typedef type result_type;

static
result_type convert ( argument_type const & m ) {
typename Previous::type prev = Previous::convert( m );
left expr_a = prev.expr_a.expr_a;
middle mid = prev.expr_a.expr_b;
raw_right r ( mid, prev.expr_b );
right expr_b = left_grouping< raw_right >::convert( r );
return ( result_type( expr_a, expr_b ) );
}

};

template < typename From, typename To >
struct convert {

typedef right_N< From, To::right::size > TopSplit;
typedef typename TopSplit::type::left Left;
typedef typename TopSplit::type::right Right;
typedef typename To::left left;
typedef typename To::right right;

static
To eval ( From const & expr ) {
typename TopSplit::type dummy = TopSplit::convert( expr );
left expr_a = convert< Left, left >::eval( dummy.expr_a );
right expr_b = convert< Right, right >::eval( dummy.expr_b );
return ( To( expr_a, expr_b ) );
}
};

template < typename Expr >
struct convert< Expr, Expr > {

static
Expr eval ( Expr const & e ) {
return ( e );
}

};

template < typename Expr >
struct best_grouping;

template < typename Expr, std::size_t N >
struct try_grouping {

typedef right_N< Expr, N > Split;
typedef typename Split::type split;
typedef typename best_grouping< typename split::left >::type left_try;
typedef typename best_grouping< typename split::right >::type right_try;
typedef Product< left_try, right_try > type_try;

typedef typename try_grouping< Expr, N-1 >::type last_try;

typedef typename if_then_else<
last_try::cost < type_try::cost,
last_try,
type_try >::type type;

};

template < typename Expr >
struct try_grouping< Expr, 1 > {
typedef typename right_N< Expr, 1 >::type split;
typedef typename best_grouping< typename split::left >::type left;
typedef typename best_grouping< typename split::right >::type right;
typedef Product< left, right > type;
};

template < typename Expr >
struct try_grouping< Expr, 0 > {
typedef Expr type;
};

template < typename Expr >
struct best_grouping {

typedef typename try_grouping< Expr, Expr::size-1 >::type type;

};

template < typename Expr >
typename best_grouping< Expr >::type evaluate ( Expr const & e ) {
return
convert< Expr, typename best_grouping< Expr >::type >::eval( e );
}

template < std::size_t K, std::size_t L >
std::ostream & evaluate ( std::ostream & ostr,
Matrix<K,L> const & m ) {
return ( ostr << m );
}

#define SHOW(expr) \
std::cout << #expr " --> " << expr << '\n';

int main ( void ) {
Matrix<20,20> a ( "a" );
Matrix<20,1> b ( "b" );
Matrix<1,20> c ( "c" );
SHOW( evaluate( a * a * b ) );
SHOW( evaluate( a * a * a * b ) );
SHOW( evaluate( a * a * a * a * b ) );
SHOW( evaluate( ( a * a ) * ( a * a * b ) ) );
SHOW( evaluate( c * a * b ) );
SHOW( evaluate( c * a * a * b ) );
SHOW( evaluate( c * ( a * ( a * a ) * a ) * b ) );
}

Best

Kai-Uwe Bux

Edd

unread,

Oct 2, 2007, 4:50:31 PM10/2/07

Hi! Thanks for your response.

On 2 Oct, 04:41, Kai-Uwe Bux <jkherci...@gmx.net> wrote:

> Edd wrote:
> > I'm wondering if anyone can see a way of working around this
> > inefficiency? Or should I just go back to doing things the easy way
> > and use brackets to manually affect associativity?
>
> Absolutely. Note that not only efficiency is concerned. Also numerical
> accuracy can depend on the grouping. You don't want these issues to be
> decided by the compiler.

Indeed, but that would be a concern whether I did it this way or not.
Besides, I need to get some reasonable code up-and-running to assess
whether or not any undesirable rounding effects appear.

> Well, it nevertheless is an interesting exercise in template programming. So
> here is some code that does ruthless compile time term-regrouping to
> minimize the number of multiplications.

[ 8< --- LOTS of code snipped --- ]

Thanks very much for that! I can tell you enjoyed writing it :) It's
really rather close to one of the variants that I had come up with.
However, it doesn't really solve the problem I was trying to explain,
which is that code such as the following is actually *less* efficient
than it is in a straight-forward multiplication implementation:

matrix<4, 4> M(P * Q * R * S); // P, Q, R, S are also matrices

The problem lies in where to store the intermediate results of each
matrix multiplication. The problem doesn't show up in your code
because you don't actually attempt to do the multiplication proper.
Indeed it becomes impossible as far as I can tell (I hope someone
smarter than me disagrees) to introduce storage for the intermediate
results unless you detect that you're constructing the matrix M from a
delayed_matrix_multiplication<>, and then code differently for this
particular case. This is certainly possible, but it feels ugly in a
library that is supposed to work generically with all models of a
concept, as opposed to recognising specific concrete implementations.

Thanks,

Edd

jkher...@gmx.net

unread,

Oct 3, 2007, 2:59:51 AM10/3/07

Edd wrote:

> Hi! Thanks for your response.
>
> On 2 Oct, 04:41, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
>> Edd wrote:
>> > I'm wondering if anyone can see a way of working around this
>> > inefficiency? Or should I just go back to doing things the easy way
>> > and use brackets to manually affect associativity?
>>
>> Absolutely. Note that not only efficiency is concerned. Also numerical
>> accuracy can depend on the grouping. You don't want these issues to be
>> decided by the compiler.
>
> Indeed, but that would be a concern whether I did it this way or not.
> Besides, I need to get some reasonable code up-and-running to assess
> whether or not any undesirable rounding effects appear.
>
>> Well, it nevertheless is an interesting exercise in template programming.
>> So here is some code that does ruthless compile time term-regrouping to
>> minimize the number of multiplications.
>
> [ 8< --- LOTS of code snipped --- ]
>
> Thanks very much for that! I can tell you enjoyed writing it :)

Sure did :-)

> It's really rather close to one of the variants that I had come up with.
> However, it doesn't really solve the problem I was trying to explain,
> which is that code such as the following is actually *less* efficient
> than it is in a straight-forward multiplication implementation:
>
> matrix<4, 4> M(P * Q * R * S); // P, Q, R, S are also matrices
>
> The problem lies in where to store the intermediate results of each
> matrix multiplication.

What about the stack, i.e., local variables of a function?

Add a

Matrix< rows, cols > value ( void ) const;

method to the delayed_matrix_multiplication class and a constructor

Matrix ( delayed_matrix_multiplication const & )

that uses the value() method.

> The problem doesn't show up in your code
> because you don't actually attempt to do the multiplication proper.
> Indeed it becomes impossible as far as I can tell (I hope someone
> smarter than me disagrees) to introduce storage for the intermediate
> results unless you detect that you're constructing the matrix M from a
> delayed_matrix_multiplication<>, and then code differently for this
> particular case.

A totally different idea would be to implement a cached version of
coefficient access. It would compute a matrix entry only once and
subsequent calls would retrieve the computed answer.

> This is certainly possible, but it feels ugly in a
> library that is supposed to work generically with all models of a
> concept, as opposed to recognising specific concrete implementations.

Well, you could go as far as have the compiler check for the existence of a
value() method and code uniformly for that optimization. Also, it's not at
all that ugly and the point of a library is not so much elegance on the
inside but on the outside. As long as the user has an orthogonal and
transparent interface, there is nothing wrong with the library doing
optimization for important cases internally.

Anyway, here is a modified version that illustrates the idea of value()
method and corresponding constructor.

#include <cstddef>
#include <iostream>
#include <string>
#include <cassert>

template < bool, typename A, typename B >
struct if_then_else;

template < typename A, typename B >
struct if_then_else<true,A,B> {

static bool const value = true;
typedef A type;

};

template < typename A, typename B >
struct if_then_else<false,A,B> {

static bool const value = false;
typedef B type;

};

template < std::size_t ROWS, std::size_t COLS >

class Matrix;

template < typename ExprA, typename ExprB >

struct Product;

template < typename Expr >
struct best_grouping;

template < typename From, typename To >
struct convert;

template < typename Expr >
typename best_grouping< Expr >::type

regroup ( Expr const & e ) {

return
convert< Expr, typename best_grouping< Expr >::type >::eval( e );
}

template < std::size_t ROWS, std::size_t COLS >
class Matrix {

std::string name;

public:

static std::size_t const rows = ROWS;
static std::size_t const cols = COLS;
static std::size_t const cost = 0;
static std::size_t const size = 1;
static std::size_t const depth = 1;

Matrix ( std::string par )
: name ( par )
{}

Matrix ( char const * str )
: name ( str )
{}

template < typename Expr >
Matrix ( Expr const & expr )
: name ()
{
regroup( expr ).value().get_name().swap( name );
}

std::string get_name ( void ) const {
return ( name );
}

Matrix value ( void ) const {
return ( *this );
}

friend
std::ostream & operator<< ( std::ostream & ostr,
Matrix const & m ) {

ostr << "[ "
<< rows
<<" x "
<< cols
<< " : "
<< m.name
<< " ]";
return ( ostr );
}

};

template < std::size_t K, std::size_t L, std::size_t M >
Matrix<K,M> multiply ( Matrix<K,L> const & lhs, Matrix<L,M> const & rhs ) {
return ( Matrix<K,M>( std::string("(") + lhs.get_name() + rhs.get_name()
+ ")" ) );
}

template < typename ExprA, typename ExprB >

struct Product {

typedef ExprA left;
typedef ExprB right;

left expr_a;
right expr_b;

static std::size_t const rows = left::rows;
static std::size_t const cols = right::cols;

static std::size_t const cost =
left::cost +
rows * cols * left::cols +
right::cost;

static std::size_t const size =
left::size + right::size;

static std::size_t const depth
= left::size > right::size ?
left::depth + 1 : right::depth + 1;

Product ( left const & a, right const & b )
: expr_a ( a )
, expr_b ( b )
{}

Matrix< rows, cols > value ( void ) const {
return ( multiply( expr_a.value(), expr_b.value() ) );
}

friend
std::ostream & operator<< ( std::ostream & ostr,
Product const & m ) {
ostr << "( " << m.expr_a << " * " << m.expr_b;
return ( ostr << " @cost " << Product::cost << " )" );
}

};

template < typename ExprA, typename ExprB >
Product< ExprA, ExprB >
operator* ( ExprA const & lhs,
ExprB const & rhs ) {
return ( Product< ExprA, ExprB >( lhs, rhs ) );
}

template < typename Expr >
struct left_grouping;

template < typename Expr, std::size_t N >

struct group_N_right;

typedef Matrix<K,L> type;

};

template < typename Expr >
struct group_N_right< Expr, 1 > {

typedef left_grouping< Expr > LeftGroup;
typedef typename LeftGroup::type type;

static
type convert ( Expr const & m ) {
return ( LeftGroup::convert( m ) );
}

};

template < typename Expr >
struct group_N_right< Expr, 0 >;

template < typename Expr, std::size_t N >

struct group_N_right {

typedef group_N_right< Expr, N-1 > Previous;

typedef typename Previous::type::left::left left;
typedef typename Previous::type::left::right middle;
typedef Product< middle, typename Previous::type::right > raw_right;
typedef typename left_grouping< raw_right >::type right;
typedef Product< left, right > type;

typedef Expr argument_type;
typedef type result_type;

};

template < typename From, typename To >
struct convert {

typedef group_N_right< From, To::right::size > TopSplit;

typedef typename TopSplit::type::left Left;
typedef typename TopSplit::type::right Right;
typedef typename To::left left;
typedef typename To::right right;

template < typename Expr >
struct convert< Expr, Expr > {

static
Expr eval ( Expr const & e ) {
return ( e );
}

};

template < typename Expr, std::size_t N >
struct try_grouping {

typedef group_N_right< Expr, N > Split;

typedef typename Split::type split;
typedef typename best_grouping< typename split::left >::type left_try;
typedef typename best_grouping< typename split::right >::type right_try;
typedef Product< left_try, right_try > type_try;

typedef typename try_grouping< Expr, N-1 >::type last_try;

typedef typename if_then_else<
last_try::cost < type_try::cost,
last_try,
type_try >::type type;

};

template < typename Expr >
struct try_grouping< Expr, 1 > {

typedef typename group_N_right< Expr, 1 >::type split;

typedef typename best_grouping< typename split::left >::type left;
typedef typename best_grouping< typename split::right >::type right;
typedef Product< left, right > type;
};

template < typename Expr >
struct try_grouping< Expr, 0 > {
typedef Expr type;
};

template < typename Expr >
struct best_grouping {

typedef typename try_grouping< Expr, Expr::size-1 >::type type;

};

#define SHOW(expr) \
std::cout << #expr " --> " << expr << '\n';

int main ( void ) {
Matrix<20,20> a ( "a" );
Matrix<20,1> b ( "b" );
Matrix<1,20> c ( "c" );

Matrix<40,20> d ( "d" );
Matrix<40,40> e ( "e" );
Matrix<20,40> f ( "f" );
Matrix<20,1> M ( ( f * ( e * e ) * d ) * ( a * a ) );
std::cout << M << '\n';
}

Best

Kai-Uwe Bux

Edd

unread,

Oct 4, 2007, 5:46:34 PM10/4/07

Hello again!

On Oct 3, 7:59 am, jkherci...@gmx.net wrote:

> > [...] such as the following is actually *less* efficient

> > than it is in a straight-forward multiplication implementation:
>
> > matrix<4, 4> M(P * Q * R * S); // P, Q, R, S are also matrices
>
> > The problem lies in where to store the intermediate results of each
> > matrix multiplication.

> > The problem doesn't show up in your code

> > because you don't actually attempt to do the multiplication proper.
> > Indeed it becomes impossible as far as I can tell (I hope someone
> > smarter than me disagrees) to introduce storage for the intermediate
> > results unless you detect that you're constructing the matrix M from a
> > delayed_matrix_multiplication<>, and then code differently for this
> > particular case.
>
> A totally different idea would be to implement a cached version of
> coefficient access. It would compute a matrix entry only once and
> subsequent calls would retrieve the computed answer.

Yes, I thought about that. But if fear the amount of branching that
would involve (unless there's some technique that eludes me). Of
course, this is entirely unjustified -- I haven't done benchmarks.

> > This is certainly possible, but it feels ugly in a
> > library that is supposed to work generically with all models of a
> > concept, as opposed to recognising specific concrete implementations.
>
> Well, you could go as far as have the compiler check for the existence of
a
> value() method and code uniformly for that optimization. Also, it's not at
> all that ugly and the point of a library is not so much elegance on the
> inside but on the outside. As long as the user has an orthogonal and
> transparent interface, there is nothing wrong with the library doing
> optimization for important cases internally.

Yes. I think my primary objection was that Matrix models supplied with
the library would have an "advantage" over models supplied by the
client. However, by extending the Model to include an optional feature
-- such as the one you suggest -- would be playing fair.

[ 8< -- Even MORE generous code snipped -- ]

I think that is the way forward! I will start from scratch later this
week now my thoughts have settled. Thanks for letting me bounce some
ideas off of you.

I'm not sure yet whether I'll go exactly the same route as you.

Another way around it would be to allow clients to specialise a
template class, something like:

template<typename T, unsigned Height, unsigned Width>
struct optimise_element_access
{
typedef T optimsed_model;

static const optimised_model &optimise(const T &matrix) const
{ return matrix; }
};

Perhaps that's overkill. But it may be possible that a client has a
Matrix model that can be "consolidated" more efficiently in to a type
other than the matrix<> template I provide. This approach would allow
them to take advantage of that. That said, I can't think of any such
cases off the top of my head...

Thanks again -- I've never seen so much code in a single response!
It's an interesting little problem, though :)

I'll make sure to post a link back to whatever I come up with, if
you're interested.

Edd

0 new messages