slow factorization in finite field

56 views
Skip to first unread message

Ralf Hemmecke

unread,
Oct 20, 2020, 8:40:34 AM10/20/20
to fricas-devel
Hello,

The attached file is code comes from the problem of creating the
algebraic closure of PrimeField(p) by dynamically extending with a field
with a new polynomial that does not completely factor. It basically
works, but when I tried with the polynomials over GF(43) I realized very
long running times.

The following maps this to FiniteField(43,84) (the splitting field of
the 3 polynomials

p3 := x^3 - 29
p5 := x^5 - 29
p7 := x^7 - 29

When I factor them on my lapto I get:

Time: 8.57 (EV) + 0.00 (OT) = 8.57 sec
Time: 23.81 (EV) + 0.00 (OT) = 23.82 sec
Time: 35.38 (EV) = 35.38 sec

After analyzing where the time is spent I found that there is an
exptMod(t1, (p1 quo 2)::NNI, fprod) call in ddfact.spad
where t1 and fprod are polynomials of degree 1 and 7 (for the last case)
and (p1 quo 2)::NNI is

81343016389104495051429314429892710283748121052067002779751599748804821941
461709990823638183537929646810274525597886525946443695227097400

Clearly, that is a huge number and the coefficients of the polynomials
are (as elements of FF(43,84)) univariate polynomials of degree 83).
So it is expected to take a while.

However, I did the same computation with Magma in a fraction of a
second. Is FriCAS so bad here? :-(

I guess, we need fast polynomial multiplication here.
Actually, Marc Moreno Maza implemented it.

http://fricas.github.io/api/UnivariatePolynomialMultiplicationPackage

Shouldn't we somehow use it (at least for FiniteField of high degree)?

Ralf
fff.input

Dima Pasechnik

unread,
Oct 20, 2020, 8:59:47 AM10/20/20
to fricas...@googlegroups.com
it also takes a fraction of a second in SageMath.

Why don't you use an already available fast implementation in C or C++?
In this case, SageMath uses NTL https://shoup.net/ntl/
which is very fast, and written by an expert in computational number theory.



> I guess, we need fast polynomial multiplication here.
> Actually, Marc Moreno Maza implemented it.
>
> http://fricas.github.io/api/UnivariatePolynomialMultiplicationPackage
>
> Shouldn't we somehow use it (at least for FiniteField of high degree)?
>
> Ralf
>
> --
> You received this message because you are subscribed to the Google Groups "FriCAS - computer algebra system" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fricas-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fricas-devel/90161cd5-23cf-ebfd-5cb7-a8c0ceff119f%40hemmecke.org.

Ralf Hemmecke

unread,
Oct 20, 2020, 9:08:59 AM10/20/20
to fricas...@googlegroups.com
> Why don't you use an already available fast implementation in C or C++?
> In this case, SageMath uses NTL https://shoup.net/ntl/
> which is very fast, and written by an expert in computational number theory.

Ah... thanks. Good suggestion!

It would be an option for finite fields in FriCAS since the internals
are hidden and could be mapped to use NTL or FLINT.

The questions is, however, how can I call NTL from FriCAS?

Ralf

Dima Pasechnik

unread,
Oct 20, 2020, 9:24:45 AM10/20/20
to fricas...@googlegroups.com
One can call NTL from Lisp, as NTL is just a C++ library. In fact
people did this, see e.g.
https://people.eecs.berkeley.edu/~fateman/papers/lisp2ntl/

I guess one uses CFFI, a standard ANSI Lisp package.


>
> Ralf
>
> --
> You received this message because you are subscribed to the Google Groups "FriCAS - computer algebra system" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fricas-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fricas-devel/3f11a310-f4a4-a9b0-6176-163018547747%40hemmecke.org.

Waldek Hebisch

unread,
Oct 20, 2020, 2:50:34 PM10/20/20
to fricas...@googlegroups.com
Yes.

> I guess, we need fast polynomial multiplication here.
> Actually, Marc Moreno Maza implemented it.
>
> http://fricas.github.io/api/UnivariatePolynomialMultiplicationPackage
>

Well, it would be significant effort to decide when to use
it and when not. And for your example I would expect
something between no speedup and say 2 times speedup.


You apparently looked at profiler report but did not
see what is visible from the report. First, most
of actual compute time goes into division with
remainder (Lisp TRUNCATE). Next, actual compute
time is rather small compared to various overheads.
This example actually nicely illustrate why generic
code is frequently much slower than specialized one.
Namely, FiniteField uses univariate polynomials
as representation, that is arithmetic operations
are taken from SparseUnivariatePolynomial, which
allocate a list node for each term and used arithmetic
from coefficient domain, that is PrimeField(43).
Now, each operation in PrimeField uses division with
reminder as final step (that is why TRUNCATE dominates
runtime). This division is a single machine instruction,
but is done as a Lisp function call (because Lisp
does not know that arguments are machine-size integers).

Efficient arithmetic with modular polynomials is more
like U32VectorPolynomialOperations. Namely, one
perfoms several arithmetic operations and does
division with remainder only as last step.
One keeps coefficients in specialized arrays
(like U32Vector), to avoid cost of memory menagement
needed for lists. And one uses type-specific
operations which can be compiled to inline code
instead of function calls.

One possible way is to create variant of FiniteField
which uses U32Vector as representation. To say how
much speedup one would get one needs to implement it
and measure. I would expect 10 times faster than
current version or more, but ATM it is just a wild
guess. One possible trap is that in sbcl allocating
vectors is much more expensive than allocating list
node (it should be the opposise as vector actually
require less work...), so for small degree (say 2 or 3)
current implementation is likely to be faster.
Also, U32Vector and friends assume coefficients of
limited size.

There are other ways to speed up factorization.
I am not sure if they would help in this case.
Anyway, before looking at other tricks one should
invest in fast low-level implementation as
this would give biggest speedup and decide if
anything more is really needed/helpful.


--
Waldek Hebisch

Waldek Hebisch

unread,
Oct 20, 2020, 3:41:23 PM10/20/20
to fricas...@googlegroups.com
On Tue, Oct 20, 2020 at 01:59:25PM +0100, Dima Pasechnik wrote:
>
> Why don't you use an already available fast implementation in C or C++?
> In this case, SageMath uses NTL https://shoup.net/ntl/
> which is very fast, and written by an expert in computational number theory.

Well, in this case it would help. However, many computations
in FriCAS use operations on polynomials with finite field
coefficients. Delegating all those users of polynomials
with finite field coefficients to external libraries is
unrealistic (I think that is some cases FriCAS is the
only system which can do given computation). More attractive
would be to delegate say polynomial multiplication to
external library. But this is tricky performancewise.
FriCAS performs a lot of operations on relativly small
polynomials. For such polynomials cost of external
call and (usually) data convertion is significant.
To illustrate what I mean let me say that whan I
implemented U32VectorPolynomialOperations I did some
benchmarks comparing it with Sage. For small degrees
FriCAS was faster. I am pretty sure that low level
C libraries used by Sage were faster than FriCAS
routines. But Sage overhead apparently was enough
to change which routines where faster when used from
higher level code. It is quite likely that calls
from FriCAS to C would cause similar overhead.

--
Waldek Hebisch

Ralf Hemmecke

unread,
Oct 20, 2020, 3:55:27 PM10/20/20
to fricas...@googlegroups.com
Hi Waldek, (and all others interested in this topic)

first of all, thanks for looking into it.

Yes, I cannot read/interpret the profile report.

> There are other ways to speed up factorization.
> I am not sure if they would help in this case.
> Anyway, before looking at other tricks one should
> invest in fast low-level implementation as
> this would give biggest speedup and decide if
> anything more is really needed/helpful.

Honestly, after Dima mentioned NTL, I think that would be the better way
to go. Unfortunately, I feel unable to program a general interfact to
NTL. (In fact, it would be nice to have a simple way to call external
libraries, wouldn't it?)

NTL comes with Debian. If we could simply call functions from the
dynamic library, that would be great. I don't know yet how memory
allocation for NTL-polynomials (Z_p[x]) would work, but I guess, that
can be done.

I just need an example how I would call one NTL function from FriCAS. I
can probably do the rest.

Actually my intend is to program something like AlgebraicNumber, but
with the property that in cases like the one below at least one of the
zero tests returns true. Actually, already programmed. It relies on
keeping a projection into a finite field with enough roots (i.e. it has
to be dynamically extended, similar to being the closure of a prime field).

Only for that I need such a high degree finite field (and of course,
factorization of univariate polynomials over that field).

Ralf

(1) -> a2: AN := 2

(1) 2
Type: AlgebraicNumber
(2) -> a3: AN := 3

(2) 3
Type: AlgebraicNumber
(3) -> r2 := sqrt(a2)

+-+
(3) \|2
Type: AlgebraicNumber
(4) -> r3 := sqrt(a3)

+-+
(4) \|3
Type: AlgebraicNumber
(5) -> r6 := sqrt(a2*a3)

+-+
(5) \|6
Type: AlgebraicNumber
(6) -> zero?(r6 - r2*r3)

(6) false
Type: Boolean
(7) -> zero?(r6 + r2*r3)

(7) false
Type: Boolean

Ralf Hemmecke

unread,
Oct 20, 2020, 4:05:56 PM10/20/20
to fricas...@googlegroups.com
On 10/20/20 9:41 PM, Waldek Hebisch wrote:
> More attractive would be to delegate say polynomial multiplication to
> external library. But this is tricky performancewise. FriCAS
> performs a lot of operations on relativly small polynomials. For
> such polynomials cost of external call and (usually) data convertion
> is significant.

I don't think that this is an argument against NTL. FriCAS allows
several representations of polynomials. That is its strength. So we
could have NTLPolynomial, say, and a user would choose that if he thinks
that this would better fit his needs. Just like you proposed to use
U32VectorPolynomialOperations.

> For small degrees FriCAS was faster. I am pretty sure that low level
> C libraries used by Sage were faster than FriCAS routines. But Sage
> overhead apparently was enough to change which routines where faster
> when used from higher level code. It is quite likely that calls from
> FriCAS to C would cause similar overhead.

Might well be, but in my case, FriCAS would not need to do lots of
conversions. I need a projection from (my) AlgebraicNumber into the
finite field and checking that it is a root of the (projected)
polynomial in question. I cannot believe that this would be the
bottleneck. Note that (multivariate) polynomial representations of
elements of AlgebraicNumber is an implementation detail. It should not
interest the end user how it is done. So I guess being able to call NTL
would be a nice feature.

Ralf

Dima Pasechnik

unread,
Oct 20, 2020, 6:23:20 PM10/20/20
to fricas...@googlegroups.com
On Tue, Oct 20, 2020 at 8:41 PM Waldek Hebisch <heb...@math.uni.wroc.pl> wrote:
>
> On Tue, Oct 20, 2020 at 01:59:25PM +0100, Dima Pasechnik wrote:
> >
> > Why don't you use an already available fast implementation in C or C++?
> > In this case, SageMath uses NTL https://shoup.net/ntl/
> > which is very fast, and written by an expert in computational number theory.
>
> Well, in this case it would help. However, many computations
> in FriCAS use operations on polynomials with finite field
> coefficients. Delegating all those users of polynomials
> with finite field coefficients to external libraries is
> unrealistic (I think that is some cases FriCAS is the
> only system which can do given computation).

NTL (or Flint - which might be easier to use from FriCAS, as it's plain C
library rather than C++ - http://www.flintlib.org/features.html)
are quite feature-rich. You get multivariate polynomial arithmetic,
factorisation, gcds,
etc. (it stops at Groebner bases).

Just as ultimately for arbitrary precision arithmetic you call GMP
(via Lisp) you
can just as well call NTL or Flint for slightly higher order data.


> More attractive
> would be to delegate say polynomial multiplication to
> external library. But this is tricky performancewise.
> FriCAS performs a lot of operations on relativly small
> polynomials. For such polynomials cost of external
> call and (usually) data convertion is significant.
> To illustrate what I mean let me say that whan I
> implemented U32VectorPolynomialOperations I did some
> benchmarks comparing it with Sage. For small degrees
> FriCAS was faster. I am pretty sure that low level
> C libraries used by Sage were faster than FriCAS
> routines.

this is a tricky task to benchmark these things - as you might be at
a situation where slow generic Python implementation was used for the
thing at hand.
And also a cold start might be a problem - libraries take a bit of
time to load.

> But Sage overhead apparently was enough
> to change which routines where faster when used from
> higher level code. It is quite likely that calls
> from FriCAS to C would cause similar overhead.
>
> --
> Waldek Hebisch
>
> --
> You received this message because you are subscribed to the Google Groups "FriCAS - computer algebra system" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fricas-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fricas-devel/20201020194117.GB9995%40math.uni.wroc.pl.

Waldek Hebisch

unread,
Oct 21, 2020, 5:47:27 PM10/21/20
to fricas...@googlegroups.com
> One possible way is to create variant of FiniteField
> which uses U32Vector as representation. To say how
> much speedup one would get one needs to implement it
> and measure.

I have implemented toy domain like this (just operations
needed for test above). On my machine it runs 3 times
faster with default build. With sbcl at highest
optimization level it runs 4 times faster. There
is significant room for easy improvement as polynomial
remainder U32VectorPolynomialOperations is rather slow.

--
Waldek Hebisch

Waldek Hebisch

unread,
Feb 20, 2021, 11:47:32 AM2/20/21
to fricas...@googlegroups.com
Just little more about possible speed. I tried toy
problem as a benchmark:

pF := PrimeField(nextPrime(10^7)) -- 10000019
uP := UnivariatePolynomial(x, pF)
pol := reduce(*, [x - (20 + 5*i)::pF for i in 1..84])

On my computer (rather slow one) I get the following times
(in seconds):

regular FriCAS factor 1.03
mfractor from MODFACT 0.018
flint 0.009
NTL 0.038

Note: it is possible that I am using NTL incorrectly. Namely,
NTL have a type for machine sized integers modulo prime, but
ATM I found no way to use polynomials of machine sized integers
modulo prime. Also, for the test I compiled critical FriCAS
routines at safety 0 (most of the time extra safety checks
are cheap, but in low level code involved here they make
significant difference).

As you can see there is possibility for 50 times speedup and
that brings us within factor of 2 to flint and is better than
my current NTL result.

Concerning your original problem, degree 3 case in flint
took 0.08s, and in NTL about 1.2s. Current regular FriCAS
factorizer needs 15.88s on my machine. AFAICS using
similar methods like in MODFACT it should be possible
to have speed within factor 2-4 to flint. Let me add
that there are other possiblities for faster arithmetic
over finite fields, but my impression is that flint
does not use them: basically it seems that flint
advantage is mainly due to better machine optimization
in gcc compared to sbcl.

Looking again at the problem it is not clear what is the
intent. Is the intent to benchmark finite field factorizer?
Then the example is somewhat atypical because polynomials
split into linear factors. If the intend is to embed
smaller field into bigger one, then there are better
methods. For example, factorizer spends some thime to
find out that polynomials split, this can be skipped
if we know this. For purpose of embedding single
root should be enough, that is cheaper than finding
all roots. In this case we know more: polynomial
will spilt over a subfield, so we can bias factoring
to effectively work in this subfield. And working
in opposite direction may be cheaper: computing power
of single element with high probablity we will get
minimal polynomial for subfield, factoring this
polynomial over smaller field will give us embedding.
On my machine I need 0.92s to compite power and 0.3s
to compite minimal polynomial (which is of degree 3).
Both operations should allows significant speed-up
by using more efficient low level routines.

--
Waldek Hebisch

Waldek Hebisch

unread,
Feb 20, 2021, 1:20:46 PM2/20/21
to fricas...@googlegroups.com
On Sat, Feb 20, 2021 at 05:47:27PM +0100, Waldek Hebisch wrote:
> On Wed, Oct 21, 2020 at 11:47:19PM +0200, Waldek Hebisch wrote:
> > On Tue, Oct 20, 2020 at 08:50:27PM +0200, Waldek Hebisch wrote:
> > > On Tue, Oct 20, 2020 at 02:40:32PM +0200, Ralf Hemmecke wrote:
> > > >
> > > > The attached file is code comes from the problem of creating the
> > > > algebraic closure of PrimeField(p) by dynamically extending with a field
> > > > with a new polynomial that does not completely factor. It basically
> > > > works, but when I tried with the polynomials over GF(43) I realized very
> > > > long running times.
> > > >
> > > > The following maps this to FiniteField(43,84) (the splitting field of
> > > > the 3 polynomials
> > > >
> > > > p3 := x^3 - 29
> > > > p5 := x^5 - 29
> > > > p7 := x^7 - 29
> > > >
> > > > When I factor them on my lapto I get:
> > > >
> > > > Time: 8.57 (EV) + 0.00 (OT) = 8.57 sec
> > > > Time: 23.81 (EV) + 0.00 (OT) = 23.82 sec
> > > > Time: 35.38 (EV) = 35.38 sec
> > > >
<snip>
> Just little more about possible speed. I tried toy
> problem as a benchmark:
>
> pF := PrimeField(nextPrime(10^7)) -- 10000019
> uP := UnivariatePolynomial(x, pF)
> pol := reduce(*, [x - (20 + 5*i)::pF for i in 1..84])
>
> On my computer (rather slow one) I get the following times
> (in seconds):
>
> regular FriCAS factor 1.03
> mfractor from MODFACT 0.018
> flint 0.009
> NTL 0.038
>
> Note: it is possible that I am using NTL incorrectly. Namely,
> NTL have a type for machine sized integers modulo prime, but
> ATM I found no way to use polynomials of machine sized integers
> modulo prime.

Correction: header files for such polynomials are hidden under
names 'lzz_pX...' and 'lzz_pEX...'. Using them I get 0.006 s
for problem above (so slightly better than flint) and
0.345s for orignal factoring in degree 3.

A little extra remak: MODFACT in trunk has silly performace bug
and time is much longer. Time above is using fixed version.
> --
> You received this message because you are subscribed to the Google Groups "FriCAS - computer algebra system" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fricas-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fricas-devel/20210220164727.GA19590%40math.uni.wroc.pl.

--
Waldek Hebisch

Ralf Hemmecke

unread,
Feb 20, 2021, 2:01:53 PM2/20/21
to fricas...@googlegroups.com
Hi Waldek,

thank you for taking a look at this issue. A factor 50 speedup looks
promising. However, I do not think it is enough, if I can tweak the
factorizer to be compiled at higher optimization level. There would be
at least need to exactly explain how anyone can easily do this.

Nevertheless, I think an interface to other existing libraries would be
a good idea.

> Looking again at the problem it is not clear what is the
> intent.

I had posted the original intend here.
https://groups.google.com/g/fricas-devel/c/jmEWZvM5Jdk/m/1j33ik34AAAJ

https://www.mail-archive.com/fricas...@googlegroups.com/msg13091.html

I did this some time ago so I guess the code attached there is not
up-to-date. But you asked about the intend.

What I have posted at the beginning of the "slow factorization in finite
field" thread is connected to it. I experimented with my implementation
of a dynamic algebraically closed field and wondered about long running
times. The polynomials

p3 := x^3 - 29
p5 := x^5 - 29
p7 := x^7 - 29

are somewhat made up, because I wanted to test whether my code can
recognize zeros in an algebraic setup.

In fact, a dynamically algebraic closure is like AlgebraicNumber with
the property that the zero test really finds zeros. For example, it can
decide whether sqrt(2)*sqrt(3)-sqrt(6) is true or false. That example is
too easy, but you get the point. Note that is could also be that by
independently creating sqrt(2), sqrt(3) and sqrt(6), it coudl also be
that for consistency, we would have

zero?(sqrt(2)*sqrt(3) - sqrt(6)) is false
and
zero?(sqrt(2)*sqrt(3) + sqrt(6)) is true.

Internally, the dynamic algebraic closure relies on a projection into a
finite field (with enough roots). Clearly with the dynamic extension of
the field one also must sometimes extend the underlying finite field.
And there I need to factor polynomials in finite fields.

If this is too vague, I can try to single out the respective code that I
have and make it public.

Ralf

Waldek Hebisch

unread,
Feb 20, 2021, 7:28:17 PM2/20/21
to fricas...@googlegroups.com
On Sat, Feb 20, 2021 at 08:01:51PM +0100, Ralf Hemmecke wrote:
> Hi Waldek,
>
> > Looking again at the problem it is not clear what is the
> > intent.
>
> I had posted the original intend here.
> https://groups.google.com/g/fricas-devel/c/jmEWZvM5Jdk/m/1j33ik34AAAJ
>
> https://www.mail-archive.com/fricas...@googlegroups.com/msg13091.html
>
> I did this some time ago so I guess the code attached there is not
> up-to-date. But you asked about the intend.

I remember that you talked about algebraic closure. I forgot that
you posted the code.

> What I have posted at the beginning of the "slow factorization in finite
> field" thread is connected to it. I experimented with my implementation
> of a dynamic algebraically closed field and wondered about long running
> times.

OK, so the intent is to speed up your algebraic closure code.
I would say that first step is to avoid factorization when
simpler thing will do. Factorization has its cost. IME
factorization is cheaper than say resultants, so "rational"
algorithms that use resultants may be more expensive
than methods depending on resultants. Algebraic
extentions have serious cost, with current FriCAS methods at
best compute time will grow quadraticaly with extension degree
and frequently faster. For large extension degrees we would
need asymptotically fast methods (currently not present in
FriCAS). Doing things in smaller fields (if possible)
usually will be faster.

Of course, faster routines for basic operations will
speed up everthing, but IMO also at higher level
there is possibilty for savings.

As extra remarks: Expression is supposed to implement
algebraic closure. I spent some time thinking about
possible improvements to Expression and related code
(in particular handling of algebraic functions in
integrator). I view using finite fields as key
factor in possible improvements. In particular,
I expect to make more uses of routines based on
U32Vector. OTOH I expect most algebaic quantities
to be radicals and for radicals in many cases one
can do better than for roots of general polynomials.

> The polynomials
>
> p3 := x^3 - 29
> p5 := x^5 - 29
> p7 := x^7 - 29
>
> are somewhat made up, because I wanted to test whether my code can
> recognize zeros in an algebraic setup.

As I wrote, they are rather special.

> In fact, a dynamically algebraic closure is like AlgebraicNumber with
> the property that the zero test really finds zeros. For example, it can
> decide whether sqrt(2)*sqrt(3)-sqrt(6) is true or false. That example is
> too easy, but you get the point. Note that is could also be that by
> independently creating sqrt(2), sqrt(3) and sqrt(6), it coudl also be
> that for consistency, we would have
>
> zero?(sqrt(2)*sqrt(3) - sqrt(6)) is false
> and
> zero?(sqrt(2)*sqrt(3) + sqrt(6)) is true.
>
> Internally, the dynamic algebraic closure relies on a projection into a
> finite field (with enough roots). Clearly with the dynamic extension of
> the field one also must sometimes extend the underlying finite field.
> And there I need to factor polynomials in finite fields.

In code you posted I see:

Exports ==> Join(AlgebraicallyClosedField, FiniteFieldCategory) with

This is mathematical contradiction and we should not do this.

--
Waldek Hebisch

Grégory Vanuxem

unread,
Feb 22, 2021, 8:00:07 AM2/22/21
to fricas...@googlegroups.com
Hi Waldek, all,

If that matters, I know that Victor Shoup maintains a set of
benchmarks that compares NTL vs Flint. He even spotted a recently
introduced bug in Flint via his set (fixed since). If you want here is
a link of some of his results : https://libntl.org/benchmarks.pdf.

Cheers,


Le sam. 20 févr. 2021 à 17:47, Waldek Hebisch
<heb...@math.uni.wroc.pl> a écrit :
> --
> You received this message because you are subscribed to the Google Groups "FriCAS - computer algebra system" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fricas-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fricas-devel/20210220164727.GA19590%40math.uni.wroc.pl.



--
__
G. Vanuxem

Dima Pasechnik

unread,
Feb 22, 2021, 9:16:00 AM2/22/21
to fricas...@googlegroups.com
On Mon, Feb 22, 2021 at 1:00 PM Grégory Vanuxem <g.va...@gmail.com> wrote:
>
> Hi Waldek, all,
>
> If that matters, I know that Victor Shoup maintains a set of
> benchmarks that compares NTL vs Flint. He even spotted a recently
> introduced bug in Flint via his set (fixed since). If you want here is
> a link of some of his results : https://libntl.org/benchmarks.pdf.

flint nowadays does multivariate polynomial operations (no Groebner
bases though), something
what NTL does not do at all.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fricas-devel/CAHnU2dZD86zUL47-rw8%3DzJ3ihKNxVqmDNt85iABps-77DB8Qmg%40mail.gmail.com.

Grégory Vanuxem

unread,
Feb 22, 2021, 9:46:58 AM2/22/21
to fricas...@googlegroups.com
I see that too recently. At the beginning of COVID in the EU, I
compared Flint vs FriCAS for multivariate polynomial operations. At
that time nested univariate polynomials were needed. I use Nemo to do
those things, a Julia package (https://github.com/Nemocas/Nemo.jl).
> To view this discussion on the web visit https://groups.google.com/d/msgid/fricas-devel/CAAWYfq2UyPzSRnprSKhntvmN2x7hpaYPfWNtVHEY3q5NoX8YOA%40mail.gmail.com.



--
__
G. Vanuxem

Grégory Vanuxem

unread,
Feb 22, 2021, 10:19:49 AM2/22/21
to fricas...@googlegroups.com
In fact I've been thinking since some time to optionally interface
Julia for basic (and linear algebra) numerical things. The primary
purpose of Julia is numerical mathematics. It's very easy to link it
in C (a script in Julia is given) and Julia is available for Mac,
Linux, Windows, FreeBSD, maybe others. That discussion remains me that
but I have/had not in mind this interaction for other things than fast
numerical mathematics, FriCAS is about symbolic mathematics from my
point of view.
--
__
G. Vanuxem

Grégory Vanuxem

unread,
Feb 22, 2021, 10:24:59 AM2/22/21
to fricas...@googlegroups.com
[/digression stop] sorry. Have a good day!
--
__
G. Vanuxem

Waldek Hebisch

unread,
Feb 26, 2021, 7:07:52 PM2/26/21
to fricas...@googlegroups.com
On Sat, Feb 20, 2021 at 08:01:51PM +0100, Ralf Hemmecke wrote:
>
> thank you for taking a look at this issue. A factor 50 speedup looks
> promising. However, I do not think it is enough, if I can tweak the
> factorizer to be compiled at higher optimization level. There would be
> at least need to exactly explain how anyone can easily do this.

Forgot to write this: currently there is '--enable-algebra-optimization'
option to configure. As explained in INSTALL, if you give:

--enable-algebra-optimization="((speed 3) (safety 0))"

you get fastest code (but unsafe). Currently this is all-or-nothing
choice, that is applies to whole algebra. I probably will
rework it, in particular I think that low level domains like
U32Vector, U32VectorPolynomialOperations and bunch of similar
domain should be compiled at high optimization settings,
while rest of algebra should use current settings. This should
give us most gain from optimization with only small decrease
in safety.


--
Waldek Hebisch

Ralf Hemmecke

unread,
Feb 27, 2021, 4:13:19 AM2/27/21
to fricas...@googlegroups.com
> Forgot to write this: currently there is
> '--enable-algebra-optimization' option to configure. As explained in
> INSTALL, if you give:
>
> --enable-algebra-optimization="((speed 3) (safety 0))"
>
> you get fastest code (but unsafe).

Thanks. I knew this, but have not used it. I have bad experiences with
high optimization with the Aldor compiler. I probably don't tell you
anything new, but there were at least 3 errors.

(1) the compiler optimizes too much and produces garbage from correct
sources
(2) the library sources were buggy and just getting a
"Segmentation fault" is not very helpful to find the
source of the problem.
(3) I write buggy user library extensions.

With all-or-noting and no concrete error message a distinction between
(2) and (3) would be hard to make. But this is certainly not a big
problem for me, because you are just speaking about the library code
that is inside the FriCAS repository.

Distinction between (1) and (2) is probably more delicate. But don't we
need a bigger set of unit tests for these low-level domains if they are
going to be super-optimized?

> Currently this is all-or-nothing choice, that is applies to whole
> algebra. I probably will rework it, in particular I think that low
> level domains like U32Vector, U32VectorPolynomialOperations and bunch
> of similar domain should be compiled at high optimization settings,
> while rest of algebra should use current settings. This should give
> us most gain from optimization with only small decrease in safety.

As far as I remember, you have introduced the U32... domains. How can
the optimize level of those influence the spead of the finite field
factorizer?

Nevertheless, don't you think that it would be a good idea to allow
including calls to FLINT in the FriCAS library? Unfortunately, I have no
idea how to do such an interface right, but it would be nice to have one.

Ralf

Waldek Hebisch

unread,
Feb 27, 2021, 10:25:04 AM2/27/21
to fricas...@googlegroups.com
On Sat, Feb 27, 2021 at 10:13:17AM +0100, Ralf Hemmecke wrote:
> > Forgot to write this: currently there is
> > '--enable-algebra-optimization' option to configure. As explained in
> > INSTALL, if you give:
> >
> > --enable-algebra-optimization="((speed 3) (safety 0))"
> >
> > you get fastest code (but unsafe).
>
> Thanks. I knew this, but have not used it. I have bad experiences with
> high optimization with the Aldor compiler. I probably don't tell you
> anything new, but there were at least 3 errors.
>
> (1) the compiler optimizes too much and produces garbage from correct
> sources
> (2) the library sources were buggy and just getting a
> "Segmentation fault" is not very helpful to find the
> source of the problem.
> (3) I write buggy user library extensions.
>
> With all-or-noting and no concrete error message a distinction between
> (2) and (3) would be hard to make. But this is certainly not a big
> problem for me, because you are just speaking about the library code
> that is inside the FriCAS repository.
>
> Distinction between (1) and (2) is probably more delicate. But don't we
> need a bigger set of unit tests for these low-level domains if they are
> going to be super-optimized?

I do not think (1) is a problem. Current default is due to (2)
and (3). To explain more: the setting above essentially tells
compiler to trust declarations. Without such setting sbcl
inserts checks for overflow and bounds checks for arrays.
In slow code such checks does not make much difference to
speed, in fast code they can take half of execution time.
Skipping checks is unlikely to introduce/trigger bugs,
just is is harder to find reasons for bugs. Note that
even with '(safety 0)' sbcl still inserts some checks,
so there is more runtime error checking than in typical
C code. You are somewhat comfortable with idea of
running C code which has no automatic error checking...

Concerning tests, the code is intensively used, in the past
a few bugs were found, but recently it worked quite well.

> > Currently this is all-or-nothing choice, that is applies to whole
> > algebra. I probably will rework it, in particular I think that low
> > level domains like U32Vector, U32VectorPolynomialOperations and bunch
> > of similar domain should be compiled at high optimization settings,
> > while rest of algebra should use current settings. This should give
> > us most gain from optimization with only small decrease in safety.
>
> As far as I remember, you have introduced the U32... domains. How can
> the optimize level of those influence the spead of the finite field
> factorizer?

Current finite field factorizer does not use U32... types, so
ATM there is no influence. However, factorizer in 'ffact.spad'
uses U32... types for basic operations and in fact it
spends about 90 % of time in operations from
POLYVEC. Currently 'ffact.spad' only factorizes over prime
fields, but the algorithm is general, that is applies to
all finite fields. One possiblity to generalize 'ffact.spad'
is to parametrise it by basic polynomial and matrix domain
(like ModularAlgebraicGcd2). ATM missing ingredient is
implementation for operations over non-prime fields.
I can not say when it will be done, but we need such operations
to speed up gcd (and later factorization) for polynomials
with algebraic coefficients. Currently we have
ModularAlgebraicGcdTools2 which handles single algebraic
extension, but is somewhat naive way and ModularAlgebraicGcdTools3
which handles multiple extensions (triangular systems)
by generic (that is slow) code.

We need to replace ModularAlgebraicGcdTools2 by faster
version. ATM is not clear for me what to do with
ModularAlgebraicGcdTools3. Namely, triangular systems
seem to be inherently slow. For applications we
should be able to replace triangular system by single
polynomial (that is use primitive element), and this
may be faster (despite cost of convertion). Anyway,
speed of Expression depends very much on those
routines so I want to make them faster. Beside
speed of basic routines there are other issues
like using Zippel method (ATM Axiom family seem to
be the only major CAS family that does not use
Zippel method).

Anyway, coming back to finite fields: we should
call routines in 'ffact.spad' from finite field
code.

> Nevertheless, don't you think that it would be a good idea to allow
> including calls to FLINT in the FriCAS library? Unfortunately, I have no
> idea how to do such an interface right, but it would be nice to have one.

Nice yes. But who should do this? Currently there is reasonably
general machinery for foreign calls, one needs to write
some Lisp code but this is moderate effort. Problem with
FLINT (and NTL) is that official FLINT interfaces are
abstract which is very good design if you work at C
level, but makes interfacing harder. So probably somebody
would need to write more interface-friendly wrappers.

Anyway, to have interface like this as part of FriCAS
we would need configure machinery, deal with different
machine architectures, etc.

--
Waldek Hebisch
Reply all
Reply to author
Forward
0 new messages