Hi,
I’ve found myself writing a small library for efficiently composing synchronous and asynchronous (i.e. returning scala.concurrent.Future[T]) functions. The core abstraction is a trait Func[-A, +B] that may be either SyncFunc[-A, +B] or AsyncFunc[-A, +B], with methods like compose, recover, etc. Synchronous functions are composed directly, asynchronous ones are Future.mapped.
Is there something similar in existence? If not, I’ll polish it and publish on github at some point. The current draft is here.
Side note: at first I wanted to make a DSL using macros that return synchronously for a synchronous functions, and use async/await for asynchronous ones. Unfortunately there’s a bug in async/await that causes a stack overflow, and which is beyond my ability to fix. I wish I could do something about it; it’s really annoying that I can’t safely use async/await.
Daniel Armak
Is there something similar in existence? If not, I’ll polish it and publish on github at some point. The current draft is here.
Hi Julian,
My only motivation here was performance.
My hobby these days, apparently, is writing Future-based implementations of Reactive Streams. My first attempt failed in practice because, among other reasons, creating very many Futures (at least 1 object instance per equivalent function call) is quite expensive compared to ordinary function calls. Even when I used already-completed Futures without scheduling them, the performance wasn’t good enough. I wrote this Func abstraction as part of my 2nd attempt (which may not be completed if akka-streams is mature enough for me to start playing with it instead).
However, back then I used instances created by Future.successful. This actually creates three objects: a Promise, a Future and a Success. And some of them have extra fields in memory (e.g. KeptPromise.value is a val). So I’ll try benchmarking Johanes’s optimized FastFuture. If it’s not significantly slower than my Func for combining synchronous functions, I’ll probably go with it, since it’s easier to write logic in term of map/flatMap.
Another, unrelated advantage to function composition is that all the functions appear on the stack, which is useful when looking at stack traces in exceptions. But that wasn’t my motivation.
--
I was going to run a lot of different tests, but the first two results are so impressive I might as well stop here (at least for tonight).
A simple synchronous while(true) loop does around 100k rounds (function calls) per ms. And akka’s FastFuture.map does about 87k rounds/ms.
This is really impressive - the cost of the extra allocations and indirections is barely felt! (Although I have more unused CPU cores, so the GC is probably using a bit of those in parallel.) And the lesson I draw - assuming these results hold up under more involved real world testing - is that I probably don’t need the new Func abstraction after all. Ordinary Futures can be fast enough with a good implementation.
For comparison, actually scheduling futures on the default EC does about 5300 rounds / ms (with scala 2.10.4). And that uses all 4 CPU cores, and not just 1 like the other tests, because the Futures aren’t entirely serial:
def next(): Unit = Future {
func()
next()
}
next()
(I vaguely feel there's some basic mistake in this code, it's late and I'm tired and might be missing something.)
Viktor, reading the code for KeptPromise.onComplete (even in the 2.12 tree), it seems that it always schedules a new future on the EC. Naturally this can’t compete with synchronous completion. Why not make it complete synchronously? It’s allowed by the Future.onComplete documentation. Do you think it would break a lot of user code that implicitly relies on the current method of scheduling?
My quick-and-dirty testing code is here. The function under test does the minimum necessary; it can be inlined by the JIT, but it has side effects so it can’t be removed completely. You can test it against the 2.12 snapshot if you like. I’m afraid I don’t have time to do it myself right now, since it turns out allocation probably isn’t the cause of my performance woes with Future.successful. I’m not sure, but I may have missed
Conclusion: FastFuture seems almost as fast as regular functions, and I can use it and write much simpler Future-based code instead of introducing a new abstraction. I might also be able to use it to speed up my existing reactive streams implementation and maybe get it to live long enough to be replaced by akka-streams :-)
Thanks a lot to Johannes and to Julian!
>Viktor, reading the code for KeptPromise.onComplete (even in the 2.12 tree), it seems that it always schedules a new future on the EC. Naturally this can’t compete with synchronous completion. Why not make it complete synchronously? It’s allowed by the Future.onComplete documentation. Do you think it would break a lot of user code that implicitly relies on the current method of scheduling?
Johannes,
You’re absolutely right. (I have a lot to learn about benchmarking and optimization.)
Given:
val counter = new AtomicLong()
def makeFunc() : Unit => Unit = _ => counter.incrementAndGet()
def funcs = (0 to 100) map (_ => makeFunc())
This runs at ~ 100k rounds/millisecond:
Future {
val func = funcs(0)
while (true) {
func()
}
}
While this runs at only ~ 1300 rounds / millisecond:
Future {
while (true) {
val func = funcs(0)
func()
}
}
And this runs at just 700 rounds/ms:
Future {
var counter = 0
while (true) {
val func = funcs(counter)
func()
counter += 1
if (counter == funcs.size) counter = 0
}
}
I’ll do measurements with jmh next, but it may take me some time.
Thanks,
On Tue, Dec 9, 2014 at 9:39 AM, √iktor Ҡlang <viktor...@gmail.com> wrote:
No, that would not work, onComplete needs to execute its logic on the supplied EC. Hijacking the calling thread goes against the purpose. (shielding producers from consumers)
I understand that code now relies on this behavior, so it probably shouldn't be changed. I was referring to the documentation for Future.onComplete which seems to allow synchronous completion:
If the future has already been completed, this will either be applied immediately or be scheduled asynchronously.
You’re absolutely right. (I have a lot to learn about benchmarking and optimization.)
--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I think you’re raising very interesting questions.
For instance, with Futures there’s a common pattern that affects performance. We want to combine library methods that return Futures. Suppose we have:
def method1(): Future[String]
def method2(str: String): Future[Unit]
val combined: Future[Unit] = method1() flatMap method2
This will schedule three futures, not two. method2 has some synchronous beginning - even if it’s defined as def method1() = Future { "foobar" }, it calls Future.apply outside the future itself. This synchronous part runs in a future scheduled by calling Promise.tryComplete, and itself schedules a third future.
For a long chain of mapped Futures, half of them will be small stubs like this. I can’t easily measure the impact on performance, and I don’t see how to get rid of it, but it might bear investigation.
Here are my results using jmh. My jmh project is here.
Note: this is getting involved and I completely understand if some or all of you don’t want to keep investing the time in engaging with this issue. Thanks for all your help so far!
o.s.MyBenchmark.oneConstantFunc thrpt 20 281306760.808 ± 9961644.729 ops/s
o.s.MyBenchmark.oneDirectCall thrpt 20 315845747.813 ± 5705814.003 ops/s
o.s.MyBenchmark.oneFirstFunc thrpt 20 135427280.115 ± 11090443.358 ops/s
o.s.MyBenchmark.oneManuallyInlined thrpt 20 320860149.627 ± 3259196.119 ops/s
o.s.MyBenchmark.oneMapFastFuture thrpt 20 84623485.218 ± 8221961.582 ops/s
o.s.MyBenchmark.oneMapFastFutureFirstFunc thrpt 20 61425304.392 ± 450415.285 ops/s
o.s.MyBenchmark.oneMapFastFutureFunc thrpt 20 83605488.102 ± 192865.554 ops/s
o.s.MyBenchmark.oneMapFastFutureSomeFunc thrpt 20 51454221.847 ± 411261.329 ops/s
o.s.MyBenchmark.oneSomeFunc thrpt 20 100908290.019 ± 1491151.514 ops/s
The baseline for comparison is just putting the code under test (which increments a private long var by 1) inside the jmh test method. This is the test named oneManuallyInlined and it loops at 320 million times / second on my i7-4600U, 3.3GHz CPU. That is slower than I expected (why does a loop incrementing a variable take 10 CPU cycles per iteration?), so I’m suspicious, but let’s go with it.
A summary of the other tests:
oneDirectCall - call a method that does the actual work - 98.5% performance.
oneConstantFunc - call a function that calls the method that increments the field - 87% performance.
oneFirstFunc - access a constant location in an array of functions, and call that function which calls a method - 42% performance.
oneSomeFunc - on each call, access a different location in an array of 100 identical-but-separate functions which call the method - 31% performance.
This is quite contrary to expectations. There’s a 50% drop in performance when I call the function indirectly via the array, even though I always call the same function. When I tried timing loops manually without using jmh, there was a much smaller drop in performance here. But in the oneSomeFunc test, which calls any one of 100 different functions inside the loop, there was a 10x drop in performance, which doesn’t appear here.
I have no experience with jmh, but clearly it’s doing something differently to the ordinary behavior of my code. Maybe looking at its output with javap would help, but I don’t have the time to continue investigating this today…
The other tests in the list keep a completed Future variable between tests. Each call to the test method calls FastFuture.map on that future to produce a new one. The mapping closure increments the variable, or calls a function that increments it, etc. There’s nothing very surprising here: all of these variants are slower than the ones without Futures, because they’re doing more work; and the ordering of their performance is as expected.
I may continue to investigate this when I have time, but that probably won’t be before Saturday.
Thanks again,
Side note: at first I wanted to make a DSL using macros that return synchronously for a synchronous functions, and use
async/awaitfor asynchronous ones. Unfortunately there’s a bug in async/await that causes a stack overflow, and which is beyond my ability to fix. I wish I could do something about it; it’s really annoying that I can’t safely use async/await.