fair

Martin Albrecht

unread,

Dec 13, 2011, 1:12:29 PM12/13/11

to linbox-use

Hi,

I am using the attached code to benchmarket LinBox's performance over GF(2^e)
for e small. I know there is no specialised implementation for it in
LinBox/FFLAS, that's the point that I want to demonstrate in my benchmark.

But I want to make sure I'm doing it fairly. Hence my question, will the
attached code (based on code by Clément) choose the fastest code path?

Cheers,
Martin

--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de

test-fgemm-gfq.C

Jean-Guillaume Dumas

unread,

Dec 14, 2011, 7:24:16 AM12/14/11

to linbo...@googlegroups.com, Martin Albrecht

Martin Albrecht wrote, on 13/12/2011 19:12:
> Hi,
>
> I am using the attached code to benchmarket LinBox's performance over GF(2^e)
> for e small. I know there is no specialised implementation for it in
> LinBox/FFLAS, that's the point that I want to demonstrate in my benchmark.
>
> But I want to make sure I'm doing it fairly. Hence my question, will the

> attached code (based on code by Cl�ment) choose the fastest code path?
>
> Cheers,
> Martin
>
Dear Martin,
yes this is our fastest !
It is roughly a small constant (2/3/4 ...) factor faster than peak
floating point performance.
Thanks for the benchmarking, please let us know the results ...
Best,

--
Jean-Guillaume Dumas.
____________________________________________________________________
Jean-Guill...@imag.fr T�l.: +33 476 514 866
Universit� Joseph Fourier, Grenoble I. Fax.: +33 476 631 263
Laboratoire Jean Kuntzmann, Math�matiques Appliqu�es et Informatique
51, avenue des Math�matiques. LJK/IMAG - BP53. 38041 Grenoble FRANCE
http://ljk.imag.fr/membres/Jean-Guillaume.Dumas
____________________________________________________________________

Martin Albrecht

unread,

Dec 14, 2011, 8:05:18 AM12/14/11

to linbo...@googlegroups.com

On Wednesday 14 December 2011, Jean-Guillaume Dumas wrote:
> Martin Albrecht wrote, on 13/12/2011 19:12:
> > Hi,
> >
> > I am using the attached code to benchmarket LinBox's performance over
> > GF(2^e) for e small. I know there is no specialised implementation for
> > it in LinBox/FFLAS, that's the point that I want to demonstrate in my
> > benchmark.
> >
> > But I want to make sure I'm doing it fairly. Hence my question, will the

> > attached code (based on code by Clément) choose the fastest code path?

> >
> > Cheers,
> > Martin
>
> Dear Martin,
> yes this is our fastest !
> It is roughly a small constant (2/3/4 ...) factor faster than peak
> floating point performance.
> Thanks for the benchmarking, please let us know the results ...
> Best,

Hi,

currently it takes about 10 seconds on my computer to multiply two 4000 x 4000
matrices over GF(2^2). But I am apparently linking against the wrong BLAS. In
Sage GF(3) takes about 6 seconds which uses a custom built ATLAS + LinBox. For
comparison my dedicated implementation needs 0.24 seconds.

Jean-Guillaume Dumas

unread,

Dec 14, 2011, 8:43:23 AM12/14/11

to linbo...@googlegroups.com, Martin Albrecht

Martin Albrecht wrote, on 14/12/2011 14:05:
> On Wednesday 14 December 2011, Jean-Guillaume Dumas wrote:
>
>> Martin Albrecht wrote, on 13/12/2011 19:12:
>>
>>> Hi,
>>>
>>> I am using the attached code to benchmarket LinBox's performance over
>>> GF(2^e) for e small. I know there is no specialised implementation for
>>> it in LinBox/FFLAS, that's the point that I want to demonstrate in my
>>> benchmark.
>>>
>>> But I want to make sure I'm doing it fairly. Hence my question, will the

>>> attached code (based on code by Cl�ment) choose the fastest code path?

>>>
>>> Cheers,
>>> Martin
>>>
>> Dear Martin,
>> yes this is our fastest !
>> It is roughly a small constant (2/3/4 ...) factor faster than peak
>> floating point performance.
>> Thanks for the benchmarking, please let us know the results ...
>> Best,
>>
> Hi,
>
> currently it takes about 10 seconds on my computer to multiply two 4000 x 4000
> matrices over GF(2^2). But I am apparently linking against the wrong BLAS.

With Goto on my quad 2.8GHz, I have
GF(2^2); n = 4000: 5.64035sIn

> Sage GF(3) takes about 6 seconds which uses a custom built ATLAS + LinBox.

With the same blas and machine I have
GF(3); n = 4000: 5.35234s
so this seems coherent.

> For
> comparison my dedicated implementation needs 0.24 seconds.
>

very good, that's great !

Martin Albrecht

unread,

Dec 14, 2011, 9:26:04 AM12/14/11

to linbo...@googlegroups.com

Hi,

> > Hi,
> >
> > currently it takes about 10 seconds on my computer to multiply two 4000 x
> > 4000 matrices over GF(2^2). But I am apparently linking against the
> > wrong BLAS.
>
> With Goto on my quad 2.8GHz, I have
> GF(2^2); n = 4000: 5.64035sIn
>
> > Sage GF(3) takes about 6 seconds which uses a custom built ATLAS +
> > LinBox.
>
> With the same blas and machine I have
> GF(3); n = 4000: 5.35234s
> so this seems coherent.
>
> > For
> > comparison my dedicated implementation needs 0.24 seconds.

Thanks for those timings. GF(2^2) is probably a choice where LinBox is
particularly disadvantaged, because it's so small. That is, the time it takes
LinBox to perform the computation doesn't seem to increase between GF(2^2) and
GF(2^10). So, GF(2^8) still takes only 10 seconds (using OpenBLAS on Debian
still) whereas my code takes 2.5 seconds. For GF(2^10) LinBox still only takes
10 seconds but my implementation 36 seconds (I didn't implement the bitslice
business yet for e>8)

One more question: does linbox compute echelon forms for GF(2^e)?

Jean-Guillaume Dumas

unread,

Dec 14, 2011, 9:39:31 AM12/14/11

to linbo...@googlegroups.com, Martin Albrecht

Martin Albrecht wrote, on 14/12/2011 15:26:
> Hi,
>
>
>>> Hi,
>>>
>>> currently it takes about 10 seconds on my computer to multiply two 4000 x
>>> 4000 matrices over GF(2^2). But I am apparently linking against the
>>> wrong BLAS.
>>>
>> With Goto on my quad 2.8GHz, I have
>> GF(2^2); n = 4000: 5.64035sIn
>>
>>
>>> Sage GF(3) takes about 6 seconds which uses a custom built ATLAS +
>>> LinBox.
>>>
>> With the same blas and machine I have
>> GF(3); n = 4000: 5.35234s
>> so this seems coherent.
>>
>>
>>> For
>>> comparison my dedicated implementation needs 0.24 seconds.
>>>
> Thanks for those timings. GF(2^2) is probably a choice where LinBox is
> particularly disadvantaged, because it's so small. That is, the time it takes
> LinBox to perform the computation doesn't seem to increase between GF(2^2) and
> GF(2^10). So, GF(2^8) still takes only 10 seconds (using OpenBLAS on Debian
> still) whereas my code takes 2.5 seconds. For GF(2^10) LinBox still only takes
> 10 seconds but my implementation 36 seconds (I didn't implement the bitslice
> business yet for e>8)
>

hum, wait, I am not sure about this anymore.
To compute with kronecker substitution, we need to specialize the bound
computations for FFLAS.
We worked on it at some point but at first glance I don't find it in the
svn.
Cl�ment do you rememebr where we put it ?

> One more question: does linbox compute echelon forms for GF(2^e)?
>

Potentially yes. In practice we also need to specialize the bounds
there, this I don't think we did.

Martin Albrecht

unread,

Dec 14, 2011, 9:42:44 AM12/14/11

to Jean-Guillaume Dumas, linbo...@googlegroups.com

One more question, I'm having trouble to convince myself the computation does
what I expect. I added printing of A, B and C and get for example:

$ ./test-fgemm-gfq 1 2
A:
3

B:
2

C:
3

But (a+1)*a = 1 in GF(2^2). Am I interpreting/doing this wrong?

test-fgemm-gfq.C

Jean-Guillaume Dumas

unread,

Dec 15, 2011, 9:17:16 AM12/15/11

to Martin Albrecht, linbo...@googlegroups.com

Martin Albrecht wrote, on 14/12/2011 15:42:
> One more question, I'm having trouble to convince myself the computation does
> what I expect. I added printing of A, B and C and get for example:
>
> $ ./test-fgemm-gfq 1 2
> A:
> 3
>
> B:
> 2
>
> C:
> 3
>

> But (a+1)*a = 1 in GF(22). Am I interpreting/doing this wrong?
>
> Cheers,
> Martin
>
Well this is because random is not yet correct.
For now it uses a random value without shifts.
for instance, if there is a shift then 1+a should be in binary as say 1 001,
so 3 is actualy interpreted as 0 011 which has no meaning ...
Best,

--
Jean-Guillaume Dumas.
____________________________________________________________________
Jean-Guill...@imag.fr Tél.: +33 476 514 866
Université Joseph Fourier, Grenoble I. Fax.: +33 476 631 263
Laboratoire Jean Kuntzmann, Mathématiques Appliquées et Informatique
51, avenue des Mathématiques. LJK/IMAG - BP53. 38041 Grenoble FRANCE
http://ljk.imag.fr/membres/Jean-Guillaume.Dumas
____________________________________________________________________

Jean-Guillaume Dumas

unread,

Dec 15, 2011, 10:01:30 AM12/15/11

to Martin Albrecht, linbo...@googlegroups.com

Martin Albrecht wrote, on 14/12/2011 15:42:
> One more question, I'm having trouble to convince myself the computation does
> what I expect. I added printing of A, B and C and get for example:
>
> $ ./test-fgemm-gfq 1 2
> A:
> 3
>
> B:
> 2
>
> C:
> 3
>

also be careful between values an representation in the field.

your one for instance is not correct, you should use F.one or F.init(one,1),
then printing should also use F.write,
then I have for instance :

A:2 is x
B:3 is 1+x
C:1 is 1 which is correctly x*(1+x) mod x^2+x+1

Best,

--
Jean-Guillaume Dumas.
____________________________________________________________________

Jean-Guill...@imag.fr Tél.: +33 476 514 866

Université Joseph Fourier, Grenoble I. Fax.: +33 476 631 263
Laboratoire Jean Kuntzmann, Mathématiques Appliquées et Informatique
51, avenue des Mathématiques. LJK/IMAG - BP53. 38041 Grenoble FRANCE
http://ljk.imag.fr/membres/Jean-Guillaume.Dumas
____________________________________________________________________

Jean-Guillaume Dumas

unread,

Dec 15, 2011, 11:23:07 AM12/15/11

to linbo...@googlegroups.com, Martin Albrecht

OK, I worked things out.
1) First there is one issue in fflas-ffpack :
e.g. in fflas.h there are tests like
---
F.areEqual(F.mOne, alpha) alphad= -1.0
---
Which are not correct over char 2 with givgfqext.h :
indeed in char 2, 1=-1 but if init from givgfqext.h receives a negative
value it can segfault.
A first patch for this is to add tests for F.one beforehand, everywhere
? see attached partial patch for fflas.
The bug becomes more important if Winograd is used as more negative
values are needed ...
Cl�ment, Brice what should we do there ?

2) Second I added a random generator in givgfqext.h, in the svn :
----
template<class RandIter> Rep& random(RandIter& g, Rep& r) const {
return init(r, static_cast<double>( (UTT)g() % _MODOUT));
}
---

3) Third the specializations of BaseCompute and Mantissa for FFLAS and
givgfqext.h are missing
Not sure where to put them ...
----
namespace FFLAS {
namespace Protected {
template<> FFLAS_BASE BaseCompute (const Field& F, const size_t w) {
return FflasDouble; }
template<> unsigned long Mantissa (const Field& F, const FFLAS_BASE
base) {
return F.bits(); }
}
}
----
see attached code.

Then I have this behavior (my winograd threshold is 2536) :
GF(2^2); n = 2500: 3.1962s
GF(2^3); n = 2500: 3.42021s
GF(2^4); n = 2500: 7.19245s
GF(2^5); n = 2500: 20.4533s
GF(2^6); n = 2500: 38.7384s

GF(3^2); n = 2500: 3.09619s
GF(3^3); n = 2500: 4.8323s
GF(3^4); n = 2500: 18.6812s

GF(5^2); n = 2500: 3.05619s
GF(5^3); n = 2500: 10.8167s
GF(5^4); n = 2500: 68.5043s

etc.

Best regards,

mone_fflas.patch

test-fgemm-gfq.C

Martin Albrecht

unread,

Dec 16, 2011, 8:17:20 AM12/16/11

to Jean-Guillaume Dumas, linbo...@googlegroups.com

Hi,

On Thursday 15 December 2011, Jean-Guillaume Dumas wrote:
> OK, I worked things out.
> 1) First there is one issue in fflas-ffpack :
> e.g. in fflas.h there are tests like
> ---
> F.areEqual(F.mOne, alpha) alphad= -1.0
> ---
> Which are not correct over char 2 with givgfqext.h :
> indeed in char 2, 1=-1 but if init from givgfqext.h receives a negative
> value it can segfault.
> A first patch for this is to add tests for F.one beforehand, everywhere
> ? see attached partial patch for fflas.
> The bug becomes more important if Winograd is used as more negative
> values are needed ...

> Clément, Brice what should we do there ?

Great. Are the instructions posted somewhere how to get all the bits and
pieces (i.e, SVN versions of the various libraries) and how to install them by
any chance?

Reply all

Reply to author

Forward