I'm working on a macro-based pickling library
upickle, and one thing I do is that I do this whole implicit-resolution-to-find-picklers thing, e.g.
write[Int](...) -> write[Int](...)(intWriter)
write[Seq[Int]](...) -> write[Int](...)(SeqWriter(intWriter))
write[MyCaseClass](...) -> write[Int](...)(Writer.macroWriter) -> write[Int](...)(...expanded code...)
Well, one thing I realized recently that in my benchmarks, the *act of instantiating the implicit* is sufficient to make a real impact on the performance of the pickling! For example, my benchmark basically tests:
sealed trait A
case class B(i: Int) extends A
case class C(s1: String, s2: String) extends A
sealed trait LL
case object End extends LL
case class Node(c: Int, next: LL) extends LL
case class ADT0()
case class ADTc(i: Int = 2, s: String, t: (Double, Double) = (1, 2))
type Data = ADT[Seq[(Int, Int)], String, A, LL, ADTc, ADT0]
read[Data](...: String)
write[Data](...: Data)
When I pre-instantiate the readers and writers outside my while-loop, my perf numbers (higher is better) are
[info] jvm/read Success(306933)
[info] jvm/write Success(330214)
[info] js/read Success(34296)
[info] js/write Success(25559)
On the other hand, if I leave the instantiation to the implicit resolution, and thus inside the while loop, the numbers are
[info] jvm/read Success(260349)
[info] jvm/write Success(313252)
[info] js/read Success(23268)
[info] js/write Success(18099)
This is consistent over many runs, and all that stuff. It turns out that in Scala.js around 1/3 of my time is simply spent instantiating my (non-trivial) materializers! Even on the JVM, there is a noticeable 5-10% perf hit from doing this every time.
Now, every single one of these implicits and the macro-materializers are "pure"; they serve no other purpose other than to make the whole typeclass-pattern-thing work, and the exact same structure is going to be instantiated every single time this line of code is executed.
This leads up to my point: is there a way to mark an implicit as "cached", such that (similar to a lazy val) it gets stored somewhere after being calculated the first time (with a mutex or whatever), and there after it always returns the same instance of the thing rather than re-instantiating it each time?
I imagine that for people using generic typeclasses (for non-generic ones you can just make the implicit a val), a large fraction of their implicits materialize "pure" objects with no internal state and which can be safely shared across any and all invocations of the callsite. That would save a significant amount of garbage being generated make a bunch of things run faster, similar to lifting non-closure-lambdas into static-lambdas.
Thoughts?