learning go, or, yes, math/big.Mod, there *is* a better way :-)

92 views
Skip to first unread message

Jeff Templon

unread,
Jul 19, 2017, 11:27:14 AM7/19/17
to golang-nuts
Hi

A followup message ... I managed to increase the speed of my little prime factorization program by a bit more than a factor of two, the performance win is hardware-dependent.  

After I learned how to profile the code, I discovered a few things:

  • 70% of the time was taken in math/big.Mod function (I already knew this)
  • that function was calling QuoRem which was calling nat.divLarge, meaning that it was calculating the Mod by actually doing the division and then finding the remainder
  • the big surprise was that of the 65% percent of the runtime taken by divLarge and routines called by it, more than half was going to what is essentially memory management
Part of this was going to runtime.makeslice and ultimately runtime.mallocgc, the rest was going to sync Pool put and get calls.  divLarge needs to create temporaries, and apparently, in order to relieve pressure on the garbage collector, divLarge makes use of a sync pool of "big.nat" objects.  I guess it's done this way (via sync pool) to make divLarge thread-safe.

I didn't need to do the division, and I didn't need (not yet anyway) thread safety, and my calculations are all like "a mod N" where a*a < N, in that case one can do Barrett reduction to find a mod N, and since it's millions of a mod N with the same N, the constants for N can be precomputed.  I made a pair of global temporaries for the function in the package, which avoids all the calls to sync pool but now it's not thread safe I guess.

There were a number of other optimisations that had to do with avoiding makeslice, one was to avoid reusing a particular big.Int variable for first small and then large values ... I defined two temporaries, one to hold values for large calcs and another to hold for smaller, avoiding work in the "make" call as there will always be enough capacity to re-use the variable in those cases. 

A final silly one that saved 10% of the run time an eliminated almost all garbage collection (after the previously mentioned one) was swapping

for (loop over lots of iterations) {
   var tmp1 big.Int
   ... 

for 

var tmp1 big.Int
for (loop over lots of iterations) {
   ... 

I had naively assumed that as long as I was executing the loop, it was the same "tmp1", however it was allocating a new tmp1 on each trip around the loop.  1.2 GB of memory allocated in a 25-second run :-)

It's been a lot of fun, thanks for the discussions, and I am very impressed with the tooling.  pprof absolutely rocks.

 JT

Reply all
Reply to author
Forward
0 new messages