Power Series Example and Benchmarking Lazy Numeric Computation

62 views
Skip to first unread message

Sophia Gold

unread,
Oct 10, 2016, 2:53:08 PM10/10/16
to Numerical Clojure
Hi,

I was recently made aware of some of the things you guys are up to over here in the Numerics group when I proposed a patch on the Dev list related to a small symbolic algebra program I had been working on and Mike just so happened to suggest I check out Expresso and also using core.matrix for speeding up scalar math. I was very impressed and am hoping to get involved in some form or another.

Although I have some background in hardware APIs from doing graphics work, my interests currently lie in problems that can be solved outside the domain of numerical analysis, such as symbolic computation and lazy evaluation. With that in mind, I'll start by sharing an old favorite everyone has probably seen before: a tiny library for lazy computation of power series https://github.com/Sophia-Gold/power-series.clj

What inspired me to whip this up in Clojure was being introduced to transducers while playing with core.async. At first, I was impressed just by the flexibility of being able to apply an algorithm to any input and output (two-thirds of the functions in this library are just variations on map, while the rest build on top of those), but then when I actually wrote my own transducer for the first time I ran some naive speed tests on it and couldn't believe what I was seeing. It was doing all the algorithmic heavy lifting, but pretty much anything I applied it to would slow it down by a polynomial factor. 

I bring this up not just to wax theoretical or bask in awe, but for two reasons:
  1. There seems to be a perception that Clojure is a somehow good for processing large amounts of data, but too slow for numerical computing. I've heard this even from people inside the community. On the one hand, as I've learned from Mike, this can be fixed by instituting best practices for time-sensitive math. But I also think a lot of the reason why we're seeing dynamic languages be considered for numeric computing at all recently is because there's an increasing relationship between performance and fungibility of parameters in the problem domain, which is exactly what such abstract combinators allow for.
  2. In Clojure this flexibility is implicitly lazy, which takes numerical computing in an entirely different direction than the work being done with core.matrix and parallelism in general. This seems underexploited when it comes to numerics, likely due to entrenched opinions on the viability of laziness for efficient problem solving. With that in mind, I decided to run some slightly more concrete benchmarks on examples from the power series code using Criterium and include them in the repo. I couldn't find much background on benchmarking lazy computation in general so would appreciate any feedback on them.
A cursory search of the list found this blog post benchmarking vectorz-clj vs. protocols, which provided a very helpful point of comparison: https://clojurefun.wordpress.com/2013/03/07/achieving-awesome-numerical-performance-in-clojure/

Using lazy evaluation, I was able to generate a cosine series (defined recursively) to the point of convergence at almost exactly the same speed as adding two core.matrix vectors using mutation (the former using core.math division on an exponentially greater number of elements). 

At the far other end of complexity, I was able to generate a tangent series to the point of convergence by dividing two other infinite series through calculating the compositional inverse of their Cauchy products (the convolution between terms) in roughly one-third of the time it took to add two regular Clojure vectors. This is the most expensive operation in the library and surely could be accelerated further by both sacrificing the integrity of the divisor and dividend and, more importantly, giving type hints so as to avoid conversion of ratios to BigIntegers (the latter applying to nearly all series produced by the library).

Infinite series are somewhat paradoxical benchmarks because, although they grow linearly with number, the rate is exponential with regards to storage due to the expansion of each term. However, because these particular series are convergent what would be expected to be exponential turns out to be constant rather quickly. For example, cosine and most similar recursive series complete in only roughly three times the overhead. In that sense, it's quite a different metric for lazy evaluation than the Sieve of Eratosthenes, where computation takes longer the further it progresses. I would be interested to see comparisons between languages for both types of lazy numeric computation, particularly with the Haskell versions.

I'd like to see more comparison of benchmarks for common numeric applications between Clojure and other languages in general, especially outside of the JVM. It seems like we're at a great time as far as library and community support to position it as a general option for a dynamic language to use for HPC, among others.

Lastly, and I know brevity isn't my strong suit, I seem to always leave even the tiniest project with a new weird idea and wanted to float it here. I had the idea for a lazy version of both BigInteger and BigDecimal types that would allow essentially limitless precision at no storage cost, with of course the oddity that one would be streaming only a portion of the large number. Can anyone think of use cases for this? Applications involving lossless compression, such as arithmetic coding come to mind, but other than having implemented versions as exercises that's far from my area of expertise. It seems almost trivial to implement, so I'll likely end up doing it regardless and come back to share :)

Thanks,
Sophia

Christopher Small

unread,
Oct 11, 2016, 12:50:06 PM10/11/16
to Numerical Clojure
Very cool work! I don't know that I'd have anything particular to use it on, but I'd be very interested in seeing what you come up with along your "lazy numerics" idea.

Dragan Djuric

unread,
Oct 11, 2016, 3:02:32 PM10/11/16
to Numerical Clojure
Hi Sophia,

It's great you like Clojure and decided to take time create something useful with it.

Now, instead of just giving you the obligatory thumbs up, I'd have to be that grumpy old skeptic who actually cares, and can not politely shut up in the name of general happiness and friendship :) 

I've just glanced at the code and the benchmarks, but have not run them (so I might be wrong, but probably am not). I just wanted to write this short note to raise your awareness that you seem to be not measuring the execution of what you intended to, because there is not any execution taking place at all. Namely, you are not releasing any of those sequences - they stay lazy all the time. What's the problem with that, you may ask - well, the problem is that in each of your benchmarks, you actually measure just the time to create any lazy clojure object. Try to retrieve the 1000th or 1000000th element from each of those, and the running time would change from 40 nanoseconds to much, much, much more. Because, in any application, you have to retrieve the result that interests you sooner or later.

I know, it's the common mistake, don't feel bad about it. It is better if you realize it's happening early on, and fix your assumptions before you commit too much time following it ;).

Sophia

unread,
Oct 11, 2016, 4:54:18 PM10/11/16
to numerica...@googlegroups.com
Ah! Well, thank you for pointing it out. That is why I asked for feedback on the benchmarks. 

I ran more accurate ones and, sure enough, they show a thousand-fold increase... Plus contrary to my comments about convergent series, you now see a polynomial rate of growth same as space over the length of the series (although keep in mind these are ratios peppered with BigInts when they reduce, so purposefully not the fastest choice):

power-series.core=> (bench (into [] (take 10 (cosine-series))))
Evaluation count : 1574520 in 60 samples of 26242 calls.
             Execution time mean : 38.838738 µs
    Execution time std-deviation : 445.092552 ns
   Execution time lower quantile : 38.240002 µs ( 2.5%)
   Execution time upper quantile : 39.891957 µs (97.5%)
                   Overhead used : 12.286567 ns

Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

power-series.core=> (bench (into [] (take 1000 (cosine-series))))
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 1.869831 sec
    Execution time std-deviation : 19.002563 ms
   Execution time lower quantile : 1.843601 sec ( 2.5%)
   Execution time upper quantile : 1.918796 sec (97.5%)
                   Overhead used : 12.286567 ns

Found 4 outliers in 60 samples (6.6667 %)
low-severe 4 (6.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outlier

Now, exuberance aside... Storing one million values in a vector isn't a suitable application for lazy evaluation. It's what Clojure vectors were made for. What you suggested, trying to retrieve a single value located very deep into the dataset is a much better test:

power-series.core=> (bench (conj [] (take-nth 1000 (cosine-series))))
Evaluation count : 683765640 in 60 samples of 11396094 calls.
             Execution time mean : 75.009083 ns
    Execution time std-deviation : 0.959064 ns
   Execution time lower quantile : 73.493562 ns ( 2.5%)
   Execution time upper quantile : 76.835144 ns (97.5%)
                   Overhead used : 12.286567 ns

Found 2 outliers in 60 samples (3.3333 %)
low-severe 1 (1.6667 %)
low-mild 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

power-series.core=> (bench (conj [] (take-nth 1000000 (cosine-series))))
Evaluation count : 693054420 in 60 samples of 11550907 calls.
             Execution time mean : 74.754414 ns
    Execution time std-deviation : 1.375869 ns
   Execution time lower quantile : 73.148560 ns ( 2.5%)
   Execution time upper quantile : 78.131448 ns (97.5%)
                   Overhead used : 12.286567 ns

Found 4 outliers in 60 samples (6.6667 %)
low-severe 3 (5.0000 %)
low-mild 1 (1.6667 %)
 Variance from outliers : 7.7995 % Variance is slightly inflated by outliers

power-series.core=> (bench (conj [] (take-nth 1000000000 (cosine-series))))
Evaluation count : 685621980 in 60 samples of 11427033 calls.
             Execution time mean : 75.855787 ns
    Execution time std-deviation : 0.994341 ns
   Execution time lower quantile : 74.524899 ns ( 2.5%)
   Execution time upper quantile : 77.558569 ns (97.5%)
                   Overhead used : 12.286567 ns

Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

There you go. ~75ns constant-time retrieval for one element up to one billion deep (didn't both trying to go further). Those results seem pretty damn good so please do let me know if my methodology could still be flawed.

And in addition to running all new benchmarks I'm going to have to rethink how to discuss this disparity on the repo, so please give me about a day for that. 

Sophia

--
You received this message because you are subscribed to a topic in the Google Groups "Numerical Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/numerical-clojure/P3XdLAkcFeM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to numerical-clojure+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dragan Djuric

unread,
Oct 11, 2016, 5:19:46 PM10/11/16
to Numerical Clojure
I believe you repeated the same mistake again - take-nth does not give you the nth element, but a *lazy* sequence of every nth element. What you really want here is something like

(first (drop 100000 (my-fn x)))
To unsubscribe from this group and all its topics, send an email to numerical-cloj...@googlegroups.com.

Sophia

unread,
Oct 11, 2016, 5:20:00 PM10/11/16
to numerica...@googlegroups.com
Oops. Sorry, Dragan:

power-series.core=> (bench (conj [] (nth (cosine-series) 1000)))
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 1.871578 sec
    Execution time std-deviation : 19.392301 ms
   Execution time lower quantile : 1.846093 sec ( 2.5%)
   Execution time upper quantile : 1.894695 sec (97.5%)
                   Overhead used : 12.286567 ns

Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Reply all
Reply to author
Forward
0 new messages