I am using the attached code to benchmarket LinBox's performance over GF(2^e)
for e small. I know there is no specialised implementation for it in
LinBox/FFLAS, that's the point that I want to demonstrate in my benchmark.
But I want to make sure I'm doing it fairly. Hence my question, will the
attached code (based on code by Clément) choose the fastest code path?
Cheers,
Martin
--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de
--
Jean-Guillaume Dumas.
____________________________________________________________________
Jean-Guill...@imag.fr T�l.: +33 476 514 866
Universit� Joseph Fourier, Grenoble I. Fax.: +33 476 631 263
Laboratoire Jean Kuntzmann, Math�matiques Appliqu�es et Informatique
51, avenue des Math�matiques. LJK/IMAG - BP53. 38041 Grenoble FRANCE
http://ljk.imag.fr/membres/Jean-Guillaume.Dumas
____________________________________________________________________
Hi,
currently it takes about 10 seconds on my computer to multiply two 4000 x 4000
matrices over GF(2^2). But I am apparently linking against the wrong BLAS. In
Sage GF(3) takes about 6 seconds which uses a custom built ATLAS + LinBox. For
comparison my dedicated implementation needs 0.24 seconds.
> > Hi,
> >
> > currently it takes about 10 seconds on my computer to multiply two 4000 x
> > 4000 matrices over GF(2^2). But I am apparently linking against the
> > wrong BLAS.
>
> With Goto on my quad 2.8GHz, I have
> GF(2^2); n = 4000: 5.64035sIn
>
> > Sage GF(3) takes about 6 seconds which uses a custom built ATLAS +
> > LinBox.
>
> With the same blas and machine I have
> GF(3); n = 4000: 5.35234s
> so this seems coherent.
>
> > For
> > comparison my dedicated implementation needs 0.24 seconds.
Thanks for those timings. GF(2^2) is probably a choice where LinBox is
particularly disadvantaged, because it's so small. That is, the time it takes
LinBox to perform the computation doesn't seem to increase between GF(2^2) and
GF(2^10). So, GF(2^8) still takes only 10 seconds (using OpenBLAS on Debian
still) whereas my code takes 2.5 seconds. For GF(2^10) LinBox still only takes
10 seconds but my implementation 36 seconds (I didn't implement the bitslice
business yet for e>8)
One more question: does linbox compute echelon forms for GF(2^e)?
$ ./test-fgemm-gfq 1 2
A:
3
B:
2
C:
3
But (a+1)*a = 1 in GF(2^2). Am I interpreting/doing this wrong?
--
Jean-Guillaume Dumas.
____________________________________________________________________
Jean-Guill...@imag.fr Tél.: +33 476 514 866
Université Joseph Fourier, Grenoble I. Fax.: +33 476 631 263
Laboratoire Jean Kuntzmann, Mathématiques Appliquées et Informatique
51, avenue des Mathématiques. LJK/IMAG - BP53. 38041 Grenoble FRANCE
http://ljk.imag.fr/membres/Jean-Guillaume.Dumas
____________________________________________________________________
your one for instance is not correct, you should use F.one or F.init(one,1),
then printing should also use F.write,
then I have for instance :
A:2 is x
B:3 is 1+x
C:1 is 1 which is correctly x*(1+x) mod x^2+x+1
Best,
--
Jean-Guillaume Dumas.
____________________________________________________________________
Jean-Guill...@imag.fr Tél.: +33 476 514 866
Université Joseph Fourier, Grenoble I. Fax.: +33 476 631 263
Laboratoire Jean Kuntzmann, Mathématiques Appliquées et Informatique
51, avenue des Mathématiques. LJK/IMAG - BP53. 38041 Grenoble FRANCE
http://ljk.imag.fr/membres/Jean-Guillaume.Dumas
____________________________________________________________________
2) Second I added a random generator in givgfqext.h, in the svn :
----
template<class RandIter> Rep& random(RandIter& g, Rep& r) const {
return init(r, static_cast<double>( (UTT)g() % _MODOUT));
}
---
3) Third the specializations of BaseCompute and Mantissa for FFLAS and
givgfqext.h are missing
Not sure where to put them ...
----
namespace FFLAS {
namespace Protected {
template<> FFLAS_BASE BaseCompute (const Field& F, const size_t w) {
return FflasDouble; }
template<> unsigned long Mantissa (const Field& F, const FFLAS_BASE
base) {
return F.bits(); }
}
}
----
see attached code.
Then I have this behavior (my winograd threshold is 2536) :
GF(2^2); n = 2500: 3.1962s
GF(2^3); n = 2500: 3.42021s
GF(2^4); n = 2500: 7.19245s
GF(2^5); n = 2500: 20.4533s
GF(2^6); n = 2500: 38.7384s
GF(3^2); n = 2500: 3.09619s
GF(3^3); n = 2500: 4.8323s
GF(3^4); n = 2500: 18.6812s
GF(5^2); n = 2500: 3.05619s
GF(5^3); n = 2500: 10.8167s
GF(5^4); n = 2500: 68.5043s
etc.
Best regards,
On Thursday 15 December 2011, Jean-Guillaume Dumas wrote:
> OK, I worked things out.
> 1) First there is one issue in fflas-ffpack :
> e.g. in fflas.h there are tests like
> ---
> F.areEqual(F.mOne, alpha) alphad= -1.0
> ---
> Which are not correct over char 2 with givgfqext.h :
> indeed in char 2, 1=-1 but if init from givgfqext.h receives a negative
> value it can segfault.
> A first patch for this is to add tests for F.one beforehand, everywhere
> ? see attached partial patch for fflas.
> The bug becomes more important if Winograd is used as more negative
> values are needed ...
> Clément, Brice what should we do there ?
Great. Are the instructions posted somewhere how to get all the bits and
pieces (i.e, SVN versions of the various libraries) and how to install them by
any chance?