BigInt as a value class?

221 views
Skip to first unread message

Mark Hammons

unread,
Apr 26, 2013, 9:10:31 AM4/26/13
to scala-l...@googlegroups.com
I've been working a little on porting https://github.com/tbuktu/bigint to scala and using value classes and universal traits to implement BigInt. So far my implementation seems to have a linear speed improvement over scala.math.BigInt on repeated addition. I was curious, if I get this code into good shape would it be able to be added to the standard library?
--
Mark Edgar Hammons II - Research Engineer at the ISCPIF | Contributor to OpenMOLE

Cell - +33 06 03 69 56 56 | + 1 405 928 8206

Simon Ochsenreither

unread,
Apr 26, 2013, 9:35:57 AM4/26/13
to scala-l...@googlegroups.com, Erik Osheim
Hi Mark!


I've been working a little on porting https://github.com/tbuktu/bigint to scala and using value classes and universal traits to implement BigInt. So far my implementation seems to have a linear speed improvement over scala.math.BigInt on repeated addition. I was curious, if I get this code into good shape would it be able to be added to the standard library?

Nice to see that there is interest in having a better, native BigInt imIplementation!
I have been working on a clean-room implementation of BigInt in Scala since a while already, but didn't have much time in the last few months.

Recent versions of BigInt feature a big, fat @deprecatedInheritance annotation to allow making it a value class, as soon as multi-field value classes are supported (if that will ever happen). There are other issues, like the requirement to expose an array (publicly mutable elements in an immutable class!?) if BigInt is converted to a value class, but I want to keep this option open. (That's pretty much one fo the reasons why I implemented @deprecatedInheritarce in the first place.)

The huge issue with Tim Buktus' and Alan Eliasen's implementation is that the license is incompatible. I was planning to ask them whether they would be open towards separating their algorithm implementations from Sun/Oracle's BigInteger code and releasing it under a different license. This way, we would have an usable/portable implementation of many of the more tricky parts and could concentrate on getting the basics right.

I didn't communicate with them about that idea yet due to time constraints, but maybe we can get this going and figure out whether their algorithms can be used.
I'll CC Erik Osheim, who is working on the spire library. Maybe he has some ideas/suggestions about this topic.

Thanks and Bye,

Simon

Mark Hammons

unread,
Apr 26, 2013, 10:25:14 AM4/26/13
to scala-l...@googlegroups.com
I managed to skirt the accessible array issue by making BigInt a universal trait. I did this to side-step the issue that signum is pretty important in tim's implementation, by having three private implementations of BigInt(Zero, PosBigInt, NegBigInt).

I'll contact Tim Buktus and see if he'd be willing to work with us on the licence.


--
You received this message because you are subscribed to the Google Groups "scala-language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-languag...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Simon Ochsenreither

unread,
Apr 26, 2013, 10:59:29 AM4/26/13
to scala-l...@googlegroups.com

I'll contact Tim Buktus and see if he'd be willing to work with us on the licence.

I would prefer it if we could get at least Erik and someone from the Scala team on board first and figure out the content, the wordening, the reasoning and the value proposition of the mail together.

Rex Kerr

unread,
Apr 26, 2013, 11:21:37 AM4/26/13
to scala-l...@googlegroups.com
If it's going to be a non-negligible amount of work, I'd also want to make sure we're starting in the right place.  It's not clear to me that BigInteger even has the right model; it would be a shame to pour lots of effort into something that is going to end up being e.g. 2x slower than it needs to be due to details in how data is laid out etc..

  --Rex



On Fri, Apr 26, 2013 at 10:59 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

I'll contact Tim Buktus and see if he'd be willing to work with us on the licence.

I would prefer it if we could get at least Erik and someone from the Scala team on board first and figure out the content, the wordening, the reasoning and the value proposition of the mail together.

--

Simon Ochsenreither

unread,
Apr 26, 2013, 11:54:24 AM4/26/13
to scala-l...@googlegroups.com
Hi Rex,


If it's going to be a non-negligible amount of work, I'd also want to make sure we're starting in the right place.

having evaluated a lot options, I think Tim Buktu's and Alan Eliasen's is the right starting point. The code is extremely fast, has undergone an immense amount of testing and is currently intended to be integrated into OpenJDK.
 
It's not clear to me that BigInteger even has the right model; it would be a shame to pour lots of effort into something that is going to end up being e.g. 2x slower than it needs to be due to details in how data is laid out etc.

Pretty much the only thing I'm interested in (and which has the chance to be relicensed) are the algorithms. I have already looked into how the representation can be optimized and apart from leaving out deprecated, superfluous fields, the decision between a BigEndian and a LitteEndian representation of the magnitude and whether to represent the sign as a field or put it in the array in front of the magnitude are the only important decisions to make.

There is no questions that it will be faster than scala's BigInt. It even beats the shit out of the current Java one. We just need to benchmark it to make sure that no regression is introduced.

Thanks and bye,

Simon

Erik Osheim

unread,
Apr 26, 2013, 12:05:42 PM4/26/13
to scala-l...@googlegroups.com
On Fri, Apr 26, 2013 at 08:54:24AM -0700, Simon Ochsenreither wrote:
> having evaluated a lot options, I think Tim Buktu's and Alan Eliasen's is
> the right starting point. The code is extremely fast, has undergone an
> immense amount of testing and is currently intended to be integrated into
> OpenJDK.

I agree. The algorithms they have implemented seem to me to be the
right ones based on surveying the field (of course, if someone else
here is an expert I'd value their opinion here highly). And even just
getting the right to base other implementations on their code would be
great (even if we need to tweak it, or decide not to use all of it).

> Pretty much the only thing I'm interested in (and which has the chance to
> be relicensed) are the algorithms. I have already looked into how the
> representation can be optimized and apart from leaving out deprecated,
> superfluous fields, the decision between a BigEndian and a LitteEndian
> representation of the magnitude and whether to represent the sign as a
> field or put it in the array in front of the magnitude are the only
> important decisions to make.

Agreed. Getting permission here is really just about being able to use
their code as a reference without worrying about possible copyright
issues.

-- Erik

Simon Ochsenreither

unread,
Apr 26, 2013, 12:18:03 PM4/26/13
to scala-l...@googlegroups.com, er...@plastic-idolatry.com
Hi Erik,


Agreed. Getting permission here is really just about being able to use
their code as a reference without worrying about possible copyright
issues.

that's exactly what I'm thinking.

Imho, our question should be along the lines of

a) whether they would be willing to license their work under an additional license (can someone from EPFL/Typesafe please comment on the exact details?)
b) whether they would be willing to clearly separate the code they own from the code coming from OpenJDK (e. g. by placing it in a separate repository), so that there is no confusion about which parts can be relicensed and which not.

Bye,

Simon

Rex Kerr

unread,
Apr 26, 2013, 12:54:16 PM4/26/13
to scala-l...@googlegroups.com
On Fri, Apr 26, 2013 at 11:54 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:
Hi Rex,


If it's going to be a non-negligible amount of work, I'd also want to make sure we're starting in the right place.

having evaluated a lot options, I think Tim Buktu's and Alan Eliasen's is the right starting point. The code is extremely fast, has undergone an immense amount of testing and is currently intended to be integrated into OpenJDK.

If it's going into OpenJDK anyway, why do we need to do anything but wait?

Also, maybe the benchmarks are screwy, but it's not hard at all to beat BigInteger at small sizes also.
 
 
It's not clear to me that BigInteger even has the right model; it would be a shame to pour lots of effort into something that is going to end up being e.g. 2x slower than it needs to be due to details in how data is laid out etc.

Pretty much the only thing I'm interested in (and which has the chance to be relicensed) are the algorithms.

Well, the algorithms aren't novel; they're the result of numerical computing research and are used in a variety of places (e.g. GMP).  It's nice to have a reference implementation for the more complicated algorithms.
 
I have already looked into how the representation can be optimized and apart from leaving out deprecated, superfluous fields, the decision between a BigEndian and a LitteEndian representation of the magnitude and whether to represent the sign as a field or put it in the array in front of the magnitude are the only important decisions to make.

Er, really?  You don't think that creating log(n) extra temporary copies in recursive routines might be not the fastest/most memory-efficient way to do things (for example)?  Or not re-using existing data which will fit because your code is all in the immutable path?  And the 32 bit vs. 64 bit word size is likely to be important also--is it really true that going to a 64 bit word is not dramatically better on a 64 bit machine?  (And can we do both without extraordinary extra effort?)

It's very important to get the algorithms right first, especially when you have n^2 vs. n^1.58 vs. n^1.4 vs. n log n log log n with multiplication, for instance.  But the overall design of the library strikes me as wrong: you want to have a private mutable phase where you can reuse space for calculations.  I suppose one could move everything into MutableBigInteger, but then you have double object creations on everything which is a drag for not-so-big BigIntegers.

Anyway, I am far from convinced that the current framework is the right one.  It looks to me like it follows BigInteger's general design principles, which I think were right for ease-of-correctness but wrong for speed, and if you're going to spend a lot of effort on speed it makes sense to spend a bit more effort obtaining correctness in a slightly-less-easy-but-more-performant framework while protecting the library user from having to worry about it at all.

Then again, I don't really know what Mark Hammons is doing differently from Tim Buktu.
 

There is no questions that it will be faster than scala's BigInt. It even beats the shit out of the current Java one. We just need to benchmark it to make sure that no regression is introduced.

Right, but if it's going into BigInteger anyway, why take on extra work?  When it's in the JVM, scala.math.BigInt will automatically benefit.

  --Rex
 

Simon Ochsenreither

unread,
Apr 26, 2013, 1:32:20 PM4/26/13
to scala-l...@googlegroups.com
Hi Rex,

If it's going into OpenJDK anyway, why do we need to do anything but wait?

because not everyone uses/can use/wants to use OpenJDK. Additionally, BigInteger has a lot of legacy garbage, lacks operations and cannot be made into a value class. I want a good, Scala-native implementation when I'm deploying on Android, when I compile to native code and when I use the JavaScript backend.
Additionally, I want less dependencies on non-Scala class library in general, to make it easier for people to port Scala to new platforms, especially if those dependencies are in Predef and pulled in regardless

I have grown to accept that Typesafe's interest in that is close to zero because they are betting their whole company on Oracle not misbehaving, but I think Scala as an ecosystem will benefit from more portability and predictable performance. There is no reason why Scala has to be the only modern language which only runs on a specific implementation of one specific platform.

Also, maybe the benchmarks are screwy, but it's not hard at all to beat BigInteger at small sizes also.

As far as I remember, the original BigInteger was actually a bit slower than Buktu's one. This would probably be an interesting area for further improvements:
EDIT: Looking at the numbers, it seems they have fixed that issues and are now as fast as the original BigInteger: https://github.com/tbuktu/bigint/ (Ah, right: https://github.com/tbuktu/bigint/commit/37ce46b35addf973dd3f8430df1e471a8ab7c397)
 
 Well, the algorithms aren't novel; they're the result of numerical computing research and are used in a variety of places (e.g. GMP).  It's nice to have a reference implementation for the more complicated algorithms.

Exactly.
 
Er, really?  You don't think that creating log(n) extra temporary copies in recursive routines might be not the fastest/most memory-efficient way to do things (for example)?  Or not re-using existing data which will fit because your code is all in the immutable path?  And the 32 bit vs. 64 bit word size is likely to be important also--is it really true that going to a 64 bit word is not dramatically better on a 64 bit machine?  (And can we do both without extraordinary extra effort?)

It's very important to get the algorithms right first, especially when you have n^2 vs. n^1.58 vs. n^1.4 vs. n log n log log n with multiplication, for instance.  But the overall design of the library strikes me as wrong: you want to have a private mutable phase where you can reuse space for calculations.  I suppose one could move everything into MutableBigInteger, but then you have double object creations on everything which is a drag for not-so-big BigIntegers.

As far as I remember, it already makes use of MutableBigInteger and the algorithms use arrays anyway, but I agree. Regarding Int/Long: It would be interesting to see how the additional overhead of handling overflows would relate to the improved performance of using Long instead of Int.

I think it's a good idea to look into that.

Anyway, I am far from convinced that the current framework is the right one.  It looks to me like it follows BigInteger's general design principles, which I think were right for ease-of-correctness but wrong for speed, and if you're going to spend a lot of effort on speed it makes sense to spend a bit more effort obtaining correctness in a slightly-less-easy-but-more-performant framework while protecting the library user from having to worry about it at all.

Agree, but those algorithms don't really prevent from doing either. They would just be a nice starting point and that's why I want them.

Thanks and bye,

Simon
 

Simon Ochsenreither

unread,
May 5, 2013, 3:32:01 PM5/5/13
to scala-l...@googlegroups.com
Hi everyone,

should we start writing a mail so we can get this going?

Important points from my POV:
  • Figure out which license we need/want.
  • Ask for the exact algorithms we are interested in AND which is owned by the people we want to ask.

Anything else?

Thanks and bye,

Simon

Jason Zaugg

unread,
May 5, 2013, 3:37:26 PM5/5/13
to scala-l...@googlegroups.com
I'd like to see this sort of code incubated and matured in an independent open-source library before being included in the standard library (if at all.) Spire might provide a good home for this.

-jason

Erik Osheim

unread,
May 5, 2013, 3:50:18 PM5/5/13
to scala-l...@googlegroups.com
On Sun, May 05, 2013 at 09:37:26PM +0200, Jason Zaugg wrote:
> I'd like to see this sort of code incubated and matured in an independent
> open-source library before being included in the standard library (if at
> all.) Spire might provide a good home for this.
>
> -jason

I'd be happy to put this in Spire.

One thing that's going to be weird though is the continued existence
of scala.math.BigDecimal and scala.math.BigInt (not to mention the
rest of the scala.math package). What are the long-term plans for
these classes? Are there any?

Their current state is not great, but I don't see any obvious path for
them to get better (or a person with the time, interest, and authority
to fix them). Some design decisions questionable (like the treatment
of MathContext in BigDecimal) and many of the problems will be
difficult to correct given the requirement of binary compatibility.

I'd prefer to see these things deprecated and split out from the
standard library (similar to what was done for Scala actors). People
who like the current design and rely on it could continue to use
it. But it would create more room for third-party libraries (like
Spire, Breeze, Saddle, etc) to focus on the problem (and who have more
flexibility to fix bugs and improve the design).

-- Erik

Tom Switzer

unread,
May 5, 2013, 3:53:11 PM5/5/13
to scala-l...@googlegroups.com

Yes, we'd definitely welcome a faster Bigint type in Spire. For thone that don't know, amongst other things, Spire provides several new number types for Scala, including complex and rational numbers, floating point filters, a work with anything type, a number type for exact geometric computation, etc.

Simon Ochsenreither

unread,
May 5, 2013, 5:10:49 PM5/5/13
to scala-l...@googlegroups.com

I'd like to see this sort of code incubated and matured in an independent open-source library before being included in the standard library (if at all.) Spire might provide a good home for this.

Yes, that's pretty much what already happens (I just didn't push any new BigInt updates to my GitHub fork of the spire repo recently).

Simon Ochsenreither

unread,
May 7, 2013, 2:33:12 PM5/7/13
to scala-l...@googlegroups.com, er...@plastic-idolatry.com
Update:

Erik took the initiative, wrote a mail to Tim and Alan and was successful!
They made their algorithms available as patches under a BSD license: https://github.com/tbuktu/bigint/commit/df5f3ed717420d21fc8e68cf5344bbef470dbedd

I took those parts and plugged them into my BigInt prototype here: https://github.com/soc/spire/tree/bigint

What's missing?
  • Many of the operation's base cases (e. g. long multiplication: BigInt#multiplyByInt, BigInt#multiplyByLen)
  • Various bit twiddling stuff (BigInt#apply, BigInt#bitLength, ...)
  • Code for creating Strings from BigInts and BigInts from Strings (BigInt.apply, BigInt#toString)
  • Everything related to prime numbers
  • A lot of documentation
  • Tests!

I'm happy about every contribution here, because I really suck at some of these tasks. :-)

Thanks and bye,

Simon

Fernando Racca

unread,
May 7, 2013, 3:02:53 PM5/7/13
to scala-l...@googlegroups.com, er...@plastic-idolatry.com
Big win for Spire ! Another reason why it should soon become the standard Scala math library :)

Fernando

Paul Phillips

unread,
May 10, 2013, 4:58:36 PM5/10/13
to scala-l...@googlegroups.com
On Sun, May 5, 2013 at 12:50 PM, Erik Osheim <er...@plastic-idolatry.com> wrote:
One thing that's going to be weird though is the continued existence
of scala.math.BigDecimal and scala.math.BigInt (not to mention the
rest of the scala.math package). What are the long-term plans for
these classes? Are there any?

To the extent that I have any say, I would desperately like to get them out of the core library. I think we should aim at shipping an interface and let people compete to be the implementation. We will never be able to properly manage the super-fiddly, detail-oriented, easy to mis-design areas like math classes. There clearly are people who can and would do it better, and it's highly unfortunate that it's so hard to get to a place where they can do that.

But if I haven't been able to get rid of views given 4+ years of trying and their brokenness to the point of uselessness, not to mention the huge bytecode and complexity taxes they inflict, prospects are a bit dim for ejecting less broken code (that being everything else.)

Rex Kerr

unread,
May 10, 2013, 5:30:00 PM5/10/13
to scala-l...@googlegroups.com
On Fri, May 10, 2013 at 4:58 PM, Paul Phillips <pa...@improving.org> wrote:

On Sun, May 5, 2013 at 12:50 PM, Erik Osheim <er...@plastic-idolatry.com> wrote:
One thing that's going to be weird though is the continued existence
of scala.math.BigDecimal and scala.math.BigInt (not to mention the
rest of the scala.math package). What are the long-term plans for
these classes? Are there any?

To the extent that I have any say, I would desperately like to get them out of the core library.

This is one area where I rather strongly disagree.  Mathematics should be a core competency of any serious programming language.  Unlike many things, math has one right answer, and if you define a clean interface you can slot in the superior implementation once it clearly is. 
 
I think we should aim at shipping an interface and let people compete to be the implementation. We will never be able to properly manage the super-fiddly, detail-oriented, easy to mis-design areas like math classes.

But there's little benefit from maintaining a half dozen different implementations of an error function or arbitrary precision integers.  There is approximately one right way to do these things, so let's do them the right way.
 
There clearly are people who can and would do it better, and it's highly unfortunate that it's so hard to get to a place where they can do that.

Math doesn't regress!  That's one great thing about it.  Get it right once--with external contributions if necessary--and write some good tests.  Then you're done pretty much forever, except for implementing algorithmic improvements now and then--and those don't come along very often.  Witness the longevity of the ancient LAPACK Fortran code.  It still gives more modern code a run for its money.  Or look at GMP, which has made Java BigInteger look like a hibernating tortoise for as long as both have existed (for the same reason the entire time--GMP is using good algorithms, while BigInteger only implements halfway decent algorithms for very-slightly-larger-than-Long integers).

The only problem would be if someone writes a clearly superior library and then refuses to have it included with Scala.  I would rather solve that problem by asking nicely than by not having just-beyond-basic math in core Scala.

It would make more sense to remove collections, save maybe for List and Array.  There are plenty of different ways to approach everything else.
 
  --Rex

Paul Phillips

unread,
May 10, 2013, 5:35:09 PM5/10/13
to scala-l...@googlegroups.com

On Fri, May 10, 2013 at 2:30 PM, Rex Kerr <ich...@gmail.com> wrote:
Get it right once

My observations in this arena strongly suggest this is not how it happens. "Math" may have one right answer, but code never does.

Simon Ochsenreither

unread,
May 10, 2013, 7:02:31 PM5/10/13
to scala-l...@googlegroups.com
To the extent that I have any say, I would desperately like to get them out of the core library. I think we should aim at shipping an interface and let people compete to be the implementation. We will never be able to properly manage the super-fiddly, detail-oriented, easy to mis-design areas like math classes. There clearly are people who can and would do it better, and it's highly unfortunate that it's so hard to get to a place where they can do that.

I'm torn on this. I tried to introduce an interface/implementation distinction earlier but failed horribly.

My idea is to have a great native Scala implementation of BigInt available and thoroughly tested in Spire.
If this goes well, I hope that the authors of the various math libraries can agree on a minimal set of types and operations which should ship in the standard library (like it happened with Futures and Promises).
After that, I hope that we have gathered enough expertise, proven implementations and interest to sit down and create a SIP which overhauls the current math stuff in a way which preserves compatibility as much as possible but creates a dependency-free, maintainable math API on which more specialized math libraries can build upon.

Maybe you're seeing my grand plan behind the combination of this, my work on Avian (and native support for value types), trying to remove the java.lang default import (and moving some of the java.lang.{Integer, Long, ...} to extension methods on Int, Long, ... to provide a consistent API from basic types to BigInt) and reducing platform dependencies in general. :-)


My observations in this arena strongly suggest this is not how it happens. "Math" may have one right answer, but code never does.

I'm currently looking into diffing the bytecode of the important parts against the Java implementation ... no idea if this is feasible or whether the differences of converting basic imperative code to bytecode are vastly different between scalac and javac.

This should at least prevent introducing errors in the translation from Java to Scala. The Java version has been thoroughly tested, widely used and verified by the leading researchers in the field of long arithmetic so I trust them that the implementation is decent.

Paul Phillips

unread,
May 10, 2013, 8:35:15 PM5/10/13
to scala-l...@googlegroups.com
I don't have anything super specific I'm dead set on here; but what I can say with some confidence is what will not work, and that's whatever it is we have been doing. If we do not increase the surface area of the frontier, we will underperform forever. It is not primarily a technical problem; it is organizational. The more we centralize, the slower the resulting speed limit, and the more draconian the (metaphorical) enforcement.
Reply all
Reply to author
Forward
0 new messages