Difference of approach.

Richard Parker

unread,

Aug 8, 2012, 7:44:04 AM8/8/12

to m4ri

I think I have now understood your paper, and can see how you
achieve the results you manage.

My own approach has been to put "must scale well" at the top of
my priority list, forcing me into cache where you use main memory,
thereby losing a factor of around 2 on grease level (the "k" for
the four Russians, particularly with row echelon form) and a similar
factor for Strassen.

My impression is that we have both operated close to a local optimum, and
that neither approach gives much that is useful to improve the other if
applied in isolation. Exceptions include me using Strassen at a slightly
earlier stage, and you chopping up the rows in the base-case multiply
which, I still think, extends the range of your base case a bit.

Both are pretty marginal.

Your system beats mine hands down on a single core, whereas I suspect
mine will scale better and overtake once a package contains a given
number of cores. My guess is that the crossover is around 10-20 cores on
a package.

Whether intel will ever make such a chip (e.g. a 24-core processor)
is the key question that may decide which approach is the long-term fastest.

In some ways I suspect they will not, since I am one of the few people
trying to learn how to program such a beast, so the market will not
be large enough for them.

In practice, I suspect it is the GPU that is the key thing for the
future, and we will probably both have to start again! I have tried to
modularize with this in mind, so that the brick can live in a GPU, but
that is as far as I have got with this so far.

I am not a great one for publishing stuff, but I am entirely happy to divulge
everything I know if you wish for further information.

Maybe a phone call or a beer somewhere might be useful for us both.

Richard Parker

Martin Albrecht

unread,

Aug 8, 2012, 8:21:10 AM8/8/12

to m4ri-...@googlegroups.com

On Wednesday 08 Aug 2012, Richard Parker wrote:
> I think I have now understood your paper, and can see how you
> achieve the results you manage.
>
> My own approach has been to put "must scale well" at the top of
> my priority list, forcing me into cache where you use main memory,
> thereby losing a factor of around 2 on grease level (the "k" for
> the four Russians, particularly with row echelon form) and a similar
> factor for Strassen.
>
> My impression is that we have both operated close to a local optimum, and
> that neither approach gives much that is useful to improve the other if
> applied in isolation. Exceptions include me using Strassen at a slightly
> earlier stage, and you chopping up the rows in the base-case multiply
> which, I still think, extends the range of your base case a bit.
>
> Both are pretty marginal.
>
> Your system beats mine hands down on a single core, whereas I suspect
> mine will scale better and overtake once a package contains a given
> number of cores. My guess is that the crossover is around 10-20 cores on
> a package.

Well you definitely convinced me to implement your approach (at some point),
partly because my experience with parallel code is so little, I want to learn
more about it. My impression is things are going more and more multi-cpu/core
so your approach might actually be more robust in the future. Looking forward
to playing with it in more detail.

> Whether intel will ever make such a chip (e.g. a 24-core processor)
> is the key question that may decide which approach is the long-term
> fastest.
>
> In some ways I suspect they will not, since I am one of the few people
> trying to learn how to program such a beast, so the market will not
> be large enough for them.
>
> In practice, I suspect it is the GPU that is the key thing for the
> future, and we will probably both have to start again! I have tried to
> modularize with this in mind, so that the brick can live in a GPU, but
> that is as far as I have got with this so far.
>
> I am not a great one for publishing stuff, but I am entirely happy to
> divulge everything I know if you wish for further information.
>
> Maybe a phone call or a beer somewhere might be useful for us both.

I tend to prefer beer over phone calls :)

Cheers,
Martin

--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de

Bill Hart

unread,

Aug 8, 2012, 9:04:01 AM8/8/12

to m4ri-...@googlegroups.com

On 8 August 2012 13:21, Martin Albrecht <martinr...@googlemail.com> wrote:
> I tend to prefer beer over phone calls :)

Phones were those quaint 20th century devices which allowed two people
to contact each other without the internet with only the government
listening in, right?

Bill.

Tom Boothby

unread,

Aug 9, 2012, 5:28:47 PM8/9/12

to m4ri-...@googlegroups.com

Correct. In the 20th century, telephony was not routed through the
internet, and only the government listened in.

>
> Bill.
>
> --
> You received this message because you are subscribed to the Google Groups "M4RI Development" group.
> To post to this group, send an email to m4ri-...@googlegroups.com.
> To unsubscribe from this group, send email to m4ri-devel+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/m4ri-devel?hl=en-GB.
>

Reply all

Reply to author

Forward