Library for composing synchronous and asynchronous functions

Daniel Armak

unread,

Dec 7, 2014, 3:56:08 PM12/7/14

to scala-user

Hi,

I’ve found myself writing a small library for efficiently composing synchronous and asynchronous (i.e. returning scala.concurrent.Future[T]) functions. The core abstraction is a trait Func[-A, +B] that may be either SyncFunc[-A, +B] or AsyncFunc[-A, +B], with methods like compose, recover, etc. Synchronous functions are composed directly, asynchronous ones are Future.mapped.

Is there something similar in existence? If not, I’ll polish it and publish on github at some point. The current draft is here.

Side note: at first I wanted to make a DSL using macros that return synchronously for a synchronous functions, and use async/await for asynchronous ones. Unfortunately there’s a bug in async/await that causes a stack overflow, and which is beyond my ability to fix. I wish I could do something about it; it’s really annoying that I can’t safely use async/await.

Daniel Armak

Julian Michael

unread,

Dec 7, 2014, 5:25:35 PM12/7/14

to scala...@googlegroups.com

Hello Daniel,

Is there something similar in existence? If not, I’ll polish it and publish on github at some point. The current draft is here.

My first thought is that there is—map and flatMap on Future are exactly those functions that let you compose synchronous and asynchronous functions together. If you don't want those calls to map and flatMap everywhere then using for-comprehensions lets you avoid that while still maintaining a distinction between the two (which I would think is a plus for program readability).

That said, if you want to think of asynchronous functions just like normal functions in the ways you can compose them together, you might be interested in looking at how for a monad M, a function A -> M[B] can be seen as an arrow in the Kleisli category corresponding to M. (This is relevant because we can think of Future as a monad.) Abstractions like this can be found in the scalaz library.

I don't think anything exactly like what you're doing already exists, but I think that's because we already have other abstractions to do similar things. Do you have a particular use case in mind where using for-comprehensions wouldn't work just as well?

Regards,

Julian

Johannes Rudolph

unread,

Dec 8, 2014, 4:09:29 AM12/8/14

to Daniel Armak, scala-user

Hi Daniel,

we also had the need for something similar in spray / akka-http:
futures that only schedule something if really necessary. This is what
we came up with:

https://github.com/akka/akka/blob/release-2.3-dev/akka-http-core/src/main/scala/akka/http/util/FastFuture.scala

Johannes

> --
> You received this message because you are subscribed to the Google Groups
> "scala-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scala-user+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net

Daniel Armak

unread,

Dec 8, 2014, 9:46:57 AM12/8/14

to Johannes Rudolph, scala-user

Hi Johannes,

Your approach is to construct already-completed Future instances whenever possible. That's a good thing whenever you have Futures that are often created already completed. I tried to achieve a similar goal based on async/await previously, but now I'm staying away because of the bug I linked above (and also because async/await generates inefficient code).

My Func abstraction has a slightly different, complementary goal. It keeps synchronous functions from having to use futures at all, instead composing them directly. So I avoid creating lots of small wrapper objects (Future/Try/etc); I'm not sure how bit a difference it makes to performance, given enough optimization of each alternative. I should really profile it.

Thanks!

Daniel Armak

unread,

Dec 8, 2014, 4:55:13 PM12/8/14

to Julian Michael, scala-user

Hi Julian,

My only motivation here was performance.

My hobby these days, apparently, is writing Future-based implementations of Reactive Streams. My first attempt failed in practice because, among other reasons, creating very many Futures (at least 1 object instance per equivalent function call) is quite expensive compared to ordinary function calls. Even when I used already-completed Futures without scheduling them, the performance wasn’t good enough. I wrote this Func abstraction as part of my 2nd attempt (which may not be completed if akka-streams is mature enough for me to start playing with it instead).

However, back then I used instances created by Future.successful. This actually creates three objects: a Promise, a Future and a Success. And some of them have extra fields in memory (e.g. KeptPromise.value is a val). So I’ll try benchmarking Johanes’s optimized FastFuture. If it’s not significantly slower than my Func for combining synchronous functions, I’ll probably go with it, since it’s easier to write logic in term of map/flatMap.

Another, unrelated advantage to function composition is that all the functions appear on the stack, which is useful when looking at stack traces in exceptions. But that wasn’t my motivation.

Daniel Armak

--

√iktor Ҡlang

unread,

Dec 8, 2014, 5:15:26 PM12/8/14

to Daniel Armak, Julian Michael, scala-user

Hi Daniel,

could you test against a nightly of 2.12?

I've optimized things for KeptPromise: https://github.com/scala/scala/blob/2.12.x/src/library/scala/concurrent/impl/Promise.scala

Also, KeptPromise does not create a separate Future instance. So the only extra bit now would be the Try.

--

Cheers,

√

Daniel Armak

unread,

Dec 8, 2014, 6:40:49 PM12/8/14

to √iktor Ҡlang, Julian Michael, scala-user

I was going to run a lot of different tests, but the first two results are so impressive I might as well stop here (at least for tonight).

A simple synchronous while(true) loop does around 100k rounds (function calls) per ms. And akka’s FastFuture.map does about 87k rounds/ms.

This is really impressive - the cost of the extra allocations and indirections is barely felt! (Although I have more unused CPU cores, so the GC is probably using a bit of those in parallel.) And the lesson I draw - assuming these results hold up under more involved real world testing - is that I probably don’t need the new Func abstraction after all. Ordinary Futures can be fast enough with a good implementation.

For comparison, actually scheduling futures on the default EC does about 5300 rounds / ms (with scala 2.10.4). And that uses all 4 CPU cores, and not just 1 like the other tests, because the Futures aren’t entirely serial:

  def next(): Unit = Future {
    func()
    next()
  }
  next()

(I vaguely feel there's some basic mistake in this code, it's late and I'm tired and might be missing something.)

Viktor, reading the code for KeptPromise.onComplete (even in the 2.12 tree), it seems that it always schedules a new future on the EC. Naturally this can’t compete with synchronous completion. Why not make it complete synchronously? It’s allowed by the Future.onComplete documentation. Do you think it would break a lot of user code that implicitly relies on the current method of scheduling?

My quick-and-dirty testing code is here. The function under test does the minimum necessary; it can be inlined by the JIT, but it has side effects so it can’t be removed completely. You can test it against the 2.12 snapshot if you like. I’m afraid I don’t have time to do it myself right now, since it turns out allocation probably isn’t the cause of my performance woes with Future.successful. I’m not sure, but I may have missed

Conclusion: FastFuture seems almost as fast as regular functions, and I can use it and write much simpler Future-based code instead of introducing a new abstraction. I might also be able to use it to speed up my existing reactive streams implementation and maybe get it to live long enough to be replaced by akka-streams :-)

Thanks a lot to Johannes and to Julian!

Daniel Armak

Julian Michael

unread,

Dec 8, 2014, 11:18:56 PM12/8/14

to Daniel Armak, √iktor Ҡlang, scala-user

Very interesting results! Thank you for sharing that with us and enlightening me about your code :)

Johannes Rudolph

unread,

Dec 9, 2014, 4:26:39 AM12/9/14

to Daniel Armak, √iktor Ҡlang, Julian Michael, scala-user

Hi Daniel,

On Tue, Dec 9, 2014 at 12:40 AM, Daniel Armak <dana...@gmail.com> wrote:
> A simple synchronous while(true) loop does around 100k rounds (function
> calls) per ms. And akka’s FastFuture.map does about 87k rounds/ms.

From my experience I wouldn't trust those results if you don't use any
microbenchmarking tools to run them. Try jmh.

There are a lot of effects that will change the performance
characteristics of the `map` method. I would guess that it basically
suffers from the same problems a collection library based on higher
order functions like map/flatMap etc. suffers [1]: if the JIT isn't
able to inline both the HOF and the closure into the call site
performance will decrease a lot because the closure invocation in
`map` (et al) will in practice be megamorphic (in you benchmark try
calling `map` with several (> 2) different function implementations)
and will compile to a slow virtual call.

> My quick-and-dirty testing code is here. The function under test does the
> minimum necessary; it can be inlined by the JIT, but it has side effects so
> it can’t be removed completely. You can test it against the 2.12 snapshot if
> you like. I’m afraid I don’t have time to do it myself right now, since it
> turns out allocation probably isn’t the cause of my performance woes with
> Future.successful. I’m not sure, but I may have missed

Have you seen that we have a `FulfilledFuture` alternative to
`Future.successful` which tries to get rid of allocations as much as
possible? That's an important part of the performance story.

> Thanks a lot to Johannes and to Julian!

Credits for the current implementation of FastFuture should got to Mathias [2].

--
Johannes

[1] http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem
[2] https://github.com/akka/akka/commit/8966a56fbdc04b48ed96f4fc55110f480c6d1c99#diff-4

√iktor Ҡlang

unread,

Dec 9, 2014, 4:39:19 AM12/9/14

to Daniel Armak, Julian Michael, scala-user

>Viktor, reading the code for KeptPromise.onComplete (even in the 2.12 tree), it seems that it always schedules a new future on the EC. Naturally this can’t compete with synchronous completion. Why not make it complete synchronously? It’s allowed by the Future.onComplete documentation. Do you think it would break a lot of user code that implicitly relies on the current method of scheduling?

No, that would not work, onComplete needs to execute its logic on the supplied EC. Hijacking the calling thread goes against the purpose. (shielding producers from consumers)

On Tue, Dec 9, 2014 at 12:40 AM, Daniel Armak <dana...@gmail.com> wrote:

--

Cheers,

√

Daniel Armak

unread,

Dec 9, 2014, 5:34:13 AM12/9/14

to Johannes Rudolph, √iktor Ҡlang, Julian Michael, scala-user

Johannes,

You’re absolutely right. (I have a lot to learn about benchmarking and optimization.)

Given:

  val counter = new AtomicLong()
  def makeFunc() : Unit => Unit = _ => counter.incrementAndGet()
  def funcs = (0 to 100) map (_ => makeFunc())

This runs at ~ 100k rounds/millisecond:

  Future {
    val func = funcs(0)
    while (true) {
      func()
    }
  }

While this runs at only ~ 1300 rounds / millisecond:

  Future {
    while (true) {
      val func = funcs(0)
      func()
    }
  }

And this runs at just 700 rounds/ms:

  Future {
    var counter = 0
    while (true) {
      val func = funcs(counter)
      func()
      counter += 1
      if (counter == funcs.size) counter = 0
    }
  }

I’ll do measurements with jmh next, but it may take me some time.

Thanks,

Daniel Armak

unread,

Dec 9, 2014, 5:37:03 AM12/9/14

to √iktor Ҡlang, Julian Michael, scala-user

On Tue, Dec 9, 2014 at 9:39 AM, √iktor Ҡlang <viktor...@gmail.com> wrote:

No, that would not work, onComplete needs to execute its logic on the supplied EC. Hijacking the calling thread goes against the purpose. (shielding producers from consumers)

I understand that code now relies on this behavior, so it probably shouldn't be changed. I was referring to the documentation for Future.onComplete which seems to allow synchronous completion:

If the future has already been completed, this will either be applied immediately or be scheduled asynchronously.

√iktor Ҡlang

unread,

Dec 9, 2014, 5:39:30 AM12/9/14

to Daniel Armak, Julian Michael, scala-user

Sounds like a documentation bug, I'll fix. Thanks for letting me know!

--

Cheers,

√

Oliver Ruebenacker

unread,

Dec 9, 2014, 7:43:30 AM12/9/14

to Daniel Armak, Johannes Rudolph, √iktor Ҡlang, Julian Michael, scala-user

Hello,

On Tue, Dec 9, 2014 at 5:33 AM, Daniel Armak <dana...@gmail.com> wrote:

You’re absolutely right. (I have a lot to learn about benchmarking and optimization.)

I don't know much about benchmarking, but how about this approach:

Create something that runs a few minutes. Try different sizes. Plot runtime versus size and extrapolate.

Best, Oliver

--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Oliver Ruebenacker

Solutions Architect at Altisource Labs

Be always grateful, but never satisfied.

Daniel Armak

unread,

Dec 9, 2014, 9:02:48 AM12/9/14

to Oliver Ruebenacker, Johannes Rudolph, √iktor Ҡlang, Julian Michael, scala-user

Hello Oliver,

I did all that the first time around. What I was missing here was what to measure, rather than how. If I can't tell another way of writing the same code might be 10 times faster, I'll never think about writing it and never find out my code could be faster, even when I'm looking at a known hotspot and trying to optimize it.

I've been getting some very surprising results for jmh. I want to test more scenarios before coming to any conclusions.

Daniel Armak

Matthew Pocock

unread,

Dec 9, 2014, 11:00:12 AM12/9/14

to Daniel Armak, Oliver Ruebenacker, Johannes Rudolph, √iktor Ҡlang, Julian Michael, scala-user

Let's assume that we used just map/flatmap and equivalents to compose various combinations of futures and functions. So this, if we were to run it directly, gives crazy overhead in 'monad mediating' nuisance objects. Now, let's say that at compile time we ran as much of the code as was possible without knowing the inputs. Could we evaluate away most of the nuisance objects? What, in principle, would prevent this approach? What extra knowledge is missing from the code itself that we would rely upon to do the optimizations that allow things to fuse and disappear? I realise that we can't currently evaluate arbitrary code any way we wish in the compilation phase, but I've an eye to the future ;)

Matthew

Dr Matthew Pocock

Turing ate my hamster LTD

mailto: turingate...@gmail.com

Integrative Bioinformatics Group, School of Computing Science, Newcastle University

mailto: matthew...@ncl.ac.uk

gchat: turingate...@gmail.com

msn: matthew...@yahoo.co.uk

irc.freenode.net: drdozer

skype: matthew.pocock

tel: (0191) 2566550

mob: +447535664143

Daniel Armak

unread,

Dec 9, 2014, 12:00:01 PM12/9/14

to Matthew Pocock, Oliver Ruebenacker, Johannes Rudolph, √iktor Ҡlang, Julian Michael, scala-user

I think you’re raising very interesting questions.

For instance, with Futures there’s a common pattern that affects performance. We want to combine library methods that return Futures. Suppose we have:

def method1(): Future[String]
def method2(str: String): Future[Unit]
val combined: Future[Unit] = method1() flatMap method2

This will schedule three futures, not two. method2 has some synchronous beginning - even if it’s defined as def method1() = Future { "foobar" }, it calls Future.apply outside the future itself. This synchronous part runs in a future scheduled by calling Promise.tryComplete, and itself schedules a third future.

For a long chain of mapped Futures, half of them will be small stubs like this. I can’t easily measure the impact on performance, and I don’t see how to get rid of it, but it might bear investigation.

Daniel Armak

unread,

Dec 10, 2014, 6:11:51 PM12/10/14

to Johannes Rudolph, √iktor Ҡlang, Julian Michael, scala-user

Here are my results using jmh. My jmh project is here.

Note: this is getting involved and I completely understand if some or all of you don’t want to keep investing the time in engaging with this issue. Thanks for all your help so far!

o.s.MyBenchmark.oneConstantFunc              thrpt       20  281306760.808 ±  9961644.729  ops/s
o.s.MyBenchmark.oneDirectCall                thrpt       20  315845747.813 ±  5705814.003  ops/s
o.s.MyBenchmark.oneFirstFunc                 thrpt       20  135427280.115 ± 11090443.358  ops/s
o.s.MyBenchmark.oneManuallyInlined           thrpt       20  320860149.627 ±  3259196.119  ops/s
o.s.MyBenchmark.oneMapFastFuture             thrpt       20   84623485.218 ±  8221961.582  ops/s
o.s.MyBenchmark.oneMapFastFutureFirstFunc    thrpt       20   61425304.392 ±   450415.285  ops/s
o.s.MyBenchmark.oneMapFastFutureFunc         thrpt       20   83605488.102 ±   192865.554  ops/s
o.s.MyBenchmark.oneMapFastFutureSomeFunc     thrpt       20   51454221.847 ±   411261.329  ops/s
o.s.MyBenchmark.oneSomeFunc                  thrpt       20  100908290.019 ±  1491151.514  ops/s

The baseline for comparison is just putting the code under test (which increments a private long var by 1) inside the jmh test method. This is the test named oneManuallyInlined and it loops at 320 million times / second on my i7-4600U, 3.3GHz CPU. That is slower than I expected (why does a loop incrementing a variable take 10 CPU cycles per iteration?), so I’m suspicious, but let’s go with it.

A summary of the other tests:

oneDirectCall - call a method that does the actual work - 98.5% performance.

oneConstantFunc - call a function that calls the method that increments the field - 87% performance.

oneFirstFunc - access a constant location in an array of functions, and call that function which calls a method - 42% performance.

oneSomeFunc - on each call, access a different location in an array of 100 identical-but-separate functions which call the method - 31% performance.

This is quite contrary to expectations. There’s a 50% drop in performance when I call the function indirectly via the array, even though I always call the same function. When I tried timing loops manually without using jmh, there was a much smaller drop in performance here. But in the oneSomeFunc test, which calls any one of 100 different functions inside the loop, there was a 10x drop in performance, which doesn’t appear here.

I have no experience with jmh, but clearly it’s doing something differently to the ordinary behavior of my code. Maybe looking at its output with javap would help, but I don’t have the time to continue investigating this today…

The other tests in the list keep a completed Future variable between tests. Each call to the test method calls FastFuture.map on that future to produce a new one. The mapping closure increments the variable, or calls a function that increments it, etc. There’s nothing very surprising here: all of these variants are slower than the ones without Futures, because they’re doing more work; and the ordering of their performance is as expected.

I may continue to investigate this when I have time, but that probably won’t be before Saturday.

Thanks again,

Daniel Armak

On Tue, Dec 9, 2014 at 9:26 AM, Johannes Rudolph <johannes...@googlemail.com> wrote:

Jason Zaugg

unread,

Dec 14, 2014, 10:48:29 PM12/14/14

to Daniel Armak, scala-user

On Mon, Dec 8, 2014 at 6:55 AM, Daniel Armak <dana...@gmail.com> wrote:

Side note: at first I wanted to make a DSL using macros that return synchronously for a synchronous functions, and use async/await for asynchronous ones. Unfortunately there’s a bug in async/await that causes a stack overflow, and which is beyond my ability to fix. I wish I could do something about it; it’s really annoying that I can’t safely use async/await.

I've just proposed a fix for async that avoids blowing the stack when control flows through synchronous paths.

-jason

Reply all

Reply to author

Forward