Generalized Type Inference Optimization

49 views
Skip to first unread message

Daniel Spiewak

unread,
Nov 6, 2008, 4:31:04 PM11/6/08
to Clojure
I've been perusing Stu Halloway's beta of "Programming Clojure" and I
came across the following example:

(defn describe-class [c]
{:name (.getName c)
:final (java.lang.reflect.Modifier/isFinal (.getModifiers c))})

As demonstrated by the *warn-on-reflection* flag, Clojure is unable to
determine a type for c, therefore all (or rather, almost all) of the
calls made upon it must be handled reflectively. The performance
"solution" suggested in the book is just to annotate the c parameter
with a type:

(defn describe-class [#^Class c] ...)

This is a perfectly valid solution, but it seems a little ugly to me.
The thought that I had is perhaps the Clojure compiler could actually
infer this type automatically. Languages like ML and Haskell have
been doing this for decades. Traditionally, Hindley-Milner isn't
applied to object-oriented languages (or rather, languages which deal
with uncontrolled OO constructs), but from what I understand, most of
the reasoning behind this does not apply to Clojure. For example,
Clojure doesn't have structural or parametric types because of its
dynamic nature. There would be no *need* to infer any type parameters
on calls out to Java, so the type system could be treated as nominal.

The algorithm I'm thinking of would be something like this:

* Maintain a multi-map of method and field names to classes (this
could be done incrementally based on what classes the compiler knows
are in scope)
* For each parameter with an unfixed type:
* Traverse the FnExpr AST and find all Java member access to that
parameter
* For each access, obtain a set of possible classes from the scope
multi-map
* At each step, intersect the previous set of possible classes with
the subsequent one (think: a fold)
* Final set can be optimized by choosing the most-specific type
(eliminating the inheritance case)
* If the final set has size of 1, then the type has been inferred
and can be used to avoid reflection
* If the set has size 0, then fallback on reflection
* If the set has size >1, either fallback on reflection or get more
sophisticated

Improvements on this could be added to the >1 case where the second
pass could look for methods of a given name *and* a specific arity.
Subsequent passes could also incorporate inferred types for the method
parameters.

It's not a complete inference algorithm, but it doesn't have to be.
There's nothing stopping Clojure from falling back on reflection for
unfixed types. The whole logic behind this dance is to avoid
reflection as much as possible, producing a fairly serious
optimization in Java-interop without requiring those ugly type
annotations.

I briefly looked at implementing this myself, but Clojure's compiler
code is...well, it's a compiler. :-) At first glance, I couldn't
even figure out where to place the appropriate hooks to kick off this
process. A better question though is: has this been tried/considered
before? What are your overall thoughts with regards to this?

Rich Hickey

unread,
Nov 6, 2008, 5:26:35 PM11/6/08
to Clojure
It's not a bad idea, and I'm sure Clojure could do more inference than
it does, but the devil's in the details as usual.

You'd want the least specific type.

The best you could do is make a type-inferred fastpath, since you
could always be wrong:

In the case above, let's say you had imported Class (actually
Clojure did that for you), and nothing else you imported had getName
and getModifiers methods. Making the inference 'must be Class' could
be wrong, since the function should work fine on
java.lang.reflect.Method, for instance, and maybe that was the intent.

So you'd need a runtime instanceof test for Class, and use the
fastpath if true, reflection if not.

Perf could be harder to pin down, as adding an import could cause
previously fast code to get slow.

Calls can be made only in specific conditional branches, which some
higher-level logic might understand maps to non-overlapping types, so
the inference should be of the call site, not the parameter itself -
i.e. it's perfectly fine to write fns that expect a set of unrelated
types, without being duck-typed. In fact it's even ok to type hint it
even though the hint is only correct in one path.

You have to allow for by-name duck-type calls or call them out
specifically. Right now they are allowed.

I think a simple version of this would be easy to do, when I get some
time.

I think it hasn't been a priority as it takes very few hints at the
moment to remove reflection, given that types are tracked from calls
to new and static calls, and from any other hinted calls. But it would
be great to not need them at all!

Rich

Daniel Spiewak

unread,
Nov 6, 2008, 8:42:48 PM11/6/08
to Clojure
> So you'd need a runtime instanceof test for Class, and use the
> fastpath if true, reflection if not.
>
> Perf could be harder to pin down, as adding an import could cause
> previously fast code to get slow.

Actually, I was thinking of performing the inference based on *all*
classes on the classpath, not just the ones which have been imported.
However, considering the fact that Clojure allows dynamic additions to
the classpath, coupled with the fact that not all available classes
need to be loaded, this might not be the best idea. The idea of call-
point import scoping is an interesting one, I'm not sure how solid you
could make it though. I mean, don't you look at the FnExpr types
prior to analyzing its usage? (at least, that's how I've always
written my compilers) At the very least, it might make library
functions a little more interesting (certainly ruling out any pre-
compilation).

With respect to the separate paths issue, I assume that you're talking
about something like this:

(defn bad-boy [c x]
(if c
(.getMethods x)
(.length x)))

I think in this case the proper inference would be #{} -- in other
words, reflection. The overhead of trying to fastpath and then
fallback in certain branches would probably outweigh any benefits in
this sort of edge case.

> The best you could do is make a type-inferred fastpath, since you
> could always be wrong

Yeah, that is annoying. I hadn't thought of the fact that you could
infer at declaration point and get a totally unambiguous type but
still be wrong (since calls in different scopes could pass unexpected
types that happen to fulfill the structural signature). How much
overhead does #getClass() impose these days? I heard they were
optimizing some of that in Java 6, but I haven't actually benchmarked
anything. I suppose a simple check of that sort wouldn't ever be more
expensive that a full-blown reflective call, but still.

At what point are functions actually compiled? Is it possible to
apply some of the techniques used in tracing VMs (e.g. V8) to get some
improved context before performing the inference? In that case, you
could actually create several different fast-path compilations
+fallback. There are just so many complicated places you could go
with this, it's just a question of implementation details (and of
course, time to do it).

Daniel

Rich Hickey

unread,
Nov 7, 2008, 8:16:41 AM11/7/08
to Clojure


On Nov 6, 8:42 pm, Daniel Spiewak <djspie...@gmail.com> wrote:
> > So you'd need a runtime instanceof test for Class, and use the
> > fastpath if true, reflection if not.
>
> > Perf could be harder to pin down, as adding an import could cause
> > previously fast code to get slow.
>
> Actually, I was thinking of performing the inference based on *all*
> classes on the classpath, not just the ones which have been imported.
> However, considering the fact that Clojure allows dynamic additions to
> the classpath, coupled with the fact that not all available classes
> need to be loaded, this might not be the best idea.

In my mind, the whole value of this is to leverage the imports. If you
could get fast performance by just importing the classes you know your
code is intending to utilize, without actually having to specify again
at the point of use, there's maximum leverage. Looking at the whole
classpath is definitely too much.

> The idea of call-
> point import scoping is an interesting one, I'm not sure how solid you
> could make it though. I mean, don't you look at the FnExpr types
> prior to analyzing its usage? (at least, that's how I've always
> written my compilers) At the very least, it might make library
> functions a little more interesting (certainly ruling out any pre-
> compilation).
>
> With respect to the separate paths issue, I assume that you're talking
> about something like this:
>
> (defn bad-boy [c x]
> (if c
> (.getMethods x)
> (.length x)))
>
> I think in this case the proper inference would be #{} -- in other
> words, reflection. The overhead of trying to fastpath and then
> fallback in certain branches would probably outweigh any benefits in
> this sort of edge case.
>

I think you missed my point here, which was that the job is not to
determine a single type for x above, but to determine the right method
at each method call. If only class Class has getMethods, then the
inferred type of the first call would be ((Class)x).getMethods,
similarly if only String had length, then the second would be
((String)x).length. There's no need to unify the types. There's
nothing 'bad' about the code above, that's why we're using Lisp - the
type systems are still struggling to be expressive enough. If some
third class had both methods, that would become the preferred branch.

I was not talking about analyzing the contexts of the calls to bad-
boy, or any whole-program analysis.

> > The best you could do is make a type-inferred fastpath, since you
> > could always be wrong
>
> Yeah, that is annoying. I hadn't thought of the fact that you could
> infer at declaration point and get a totally unambiguous type but
> still be wrong (since calls in different scopes could pass unexpected
> types that happen to fulfill the structural signature). How much
> overhead does #getClass() impose these days? I heard they were
> optimizing some of that in Java 6, but I haven't actually benchmarked
> anything. I suppose a simple check of that sort wouldn't ever be more
> expensive that a full-blown reflective call, but still.
>

A type check would be significantly faster than a reflective call, is
a subset of same, and is happening in any case, as the arguments to
all fns are Objects. Even if you found 2 or 3 candidates, making a
multiway branch would be faster.

> At what point are functions actually compiled?

When loaded.

> Is it possible to
> apply some of the techniques used in tracing VMs (e.g. V8) to get some
> improved context before performing the inference? In that case, you
> could actually create several different fast-path compilations
> +fallback. There are just so many complicated places you could go
> with this, it's just a question of implementation details (and of
> course, time to do it).
>

There's no need to go there, IMO. What I've described would be
relatively simple and effective.

Communication with the user about reflection might have to become more
nuanced.

Rich

Daniel Spiewak

unread,
Nov 7, 2008, 1:45:28 PM11/7/08
to Clojure
> ((String)x).length. There's no need to unify the types. There's
> nothing 'bad' about the code above, that's why we're using Lisp - the
> type systems are still struggling to be expressive enough. If some
> third class had both methods, that would become the preferred branch.

I actually hadn't thought of casting. That does make more sense than
restricting the method type since it avoids reflection without forcing
a specific nominal type.

I wasn't saying anything about bad-boy being an example of "bad
coding". I like type systems, but this is a dynamic language, one
where things like that are idiomatic and no cause for concern. I was
just trying to use an obviously untypeable function to illustrate my
point.

The problem I see with trying to infer different types along different
branches is there is nothing to stop me from creating different
branches without directly involving a known conditional construct:

(defn my-cond [pred then else] (if pred (then) (else)))

(defn bad-boy [c x]
(my-cond c
#(.getMethods x)
#(.length x)))

I don't see that you would be able to infer the separate paths for
this in the general sense. I suppose though that just recognizing
(if ...) and acting accordingly is a step in the right directly
without actually breaking anything else. Cases like (my-cond ...)
could either be left as reflection or unified (if unambiguous).

> There's no need to go there, IMO. What I've described would be
> relatively simple and effective.

Just spinning ideas, looking to see where performance can be had while
simultaneously trying to wrap my head around the approach you
suggested. I still think that tracing (even in a limited form) could
bring some serious performance gains on hard-hit code paths where
reflection could not be avoided, but maybe that's a thought for
another day. Even if you did want to go with a tracing-based
approach, the overhead of bottling that analysis onto the JVM would
probably outweigh most gains. And as you said, there's no need to go
there.

Daniel
Reply all
Reply to author
Forward
0 new messages