Incremental compiler improvements

23 views

Skip to first unread message

Grzegorz Kossakowski

unread,

Nov 20, 2012, 7:28:09 PM11/20/12

to simple-b...@googlegroups.com

Hi,

I recently got interested in working on sbt's incremental compiler. Mark helped me to get up-to-speed with sbt's code-base and I got some initial prototype working. You can track my progress here: https://github.com/gkossakowski/xsbt/tree/inc

My plan is to implement a more fine-grained heuristics for invalidating dependencies. Initial idea (inspired by discussion with Jason Zaugg) is to look at names in source files and take them into account when invalidating dependencies. An example illustrates it best:

// A.scala

package test

class A { def foo(x: String): String = x}

// B.scala

package test

class B { def bar(a: A): String = a.foo("test") }

Let's imagine we compile it (so sbt's indexes those files by extracting and hashing public apis). Now, we add `xyz` method to class A:

// A.scala

package test

class A { def foo(x: String): String = x; def xyz: Int = 11 }

With current implementation, sbt would recompile A.scala, hash API of the whole file, get different sum and invalidate B.scala because it depends on A.scala.

The algorithm I have in mind would keep one hash sum per simple name (unqualified) in given source file. It would also track set of used names in given source file. For example above before change to A.scala we would get:

// A.scala

name hashes: Map("<init>" -> ..., "A" -> ..., "foo" -> ...)

used names: Set("test", "A", "<init>", "foo") // some names (like "String") skipped for simplicity

// B.scala

name hashes: Map("<init>" -> ..., "B" -> ..., "bar" -> ...)

used names: Set("test", "B", "<init>", "bar", "a", "A", "foo") // some names (like "String") skipped for simplicity

After the change to A.scala data structures presented above would become:

// A.scala

hashes: Map("<init>" -> ..., "A" -> ..., "foo" -> ..., "bar" -> ...)

used names: Set("test", "A", "<init>", "foo", "bar") // some names (like "String") skipped for simplicity

Now, when we consider if we should invalidate B.scala we just take intersections between used names for B.scala and keys for name hashes. If it's non-empty, we check if hashes has changed. Only then we invalidate. In this case, "bar" is not mentioned anywhere in B.scala so it doesn't get invalidated.

There are lots of tiny details that I didn't describe here but this should be enough to give you an idea where I'm heading.

Let me know if you have your own ideas how to improve incremental compilation or questions to what I presented above.