RPCs type dictionary cannot be perfectly calculated

2 views
Skip to first unread message

Lex Spoon

unread,
Jun 30, 2008, 12:13:36 PM6/30/08
to Google Web Toolkit Contributors, Miguel Méndez, Scott Blum, Amit Manjhi, Rajeev Dayal
GWT's SerializableTypeOracleBuilder, STOB, is the part of the RPC
system that determines what classes might be involved when serializing
an object of a given type. It must identify all classes that might be
involved, but it should try to generate the shortest list possible.
The problem is, it cannot generate the shortest list possible unless
it includes a halting problem detector. As a result, STOB cannot
always be precise. We have to draw the line somewhere about where
precision will be lost. Currently, imprecision comes in when looking
at the types of fields of a type that is reachable. The types of fields
in Foo<Bar,Baz> are always looked up in the base type Foo; the
handling of Bar and Baz is then managed via a "type exposure
computer", but that component is necessarily imprecise.

I'm writing this email to explain the reasoning why STOB cannot be
maximally precise. It would be great to have more eyes on it, because
the implication gives a fundamental restriction on what GWT's RPC can
achieve.

This email starts by showing how to encode addition, and thus require
STOB to add. It then shows how to encode Turing machines, and thus
require STOB to run Turing machines.

For addition, suppose a user provides the following classes:

class Zero extends Serializable { }
class Succ<N> extends Serializable { }
class Add<A, B> extends Serializable { }
class Rule1<X,Y> extends Add<X, Succ<Y>> {
Add<Succ<X>, Y> exposed;
}
class Rule2 extends Add<Succ<Succ<Zero>>, Zero> {
RandomClass exposed;
}

Zero stands for the number zero, Succ for the number N plus 1, and Add
for a command to add two numbers via type reachability. Rule1 does
the addition. Rule2 says that if you ever add two numbers and the
result is two, then RandomClass should be exposed.

Our user proceeds to define an RPC service with the following
type returned from one of the methods:

Add<Succ<Zero>, Succ<Zero>>

The question for STOB is, could there be an instance of RandomClass
that needs to be deserialized as part of deserializing this type?
Let's find out.

This initial type can be thought of as adding 1 to 1. Now, this type
is "reachable", because its directly returned from an RPC interface, so
the following type is also reachable due to being a subtype of it:

Rule1<Succ<Zero>, Zero>

Notice that a "Succ" disappeared from the second argument. When
something is reachable via subtyping, the type parameters of the
subtype are bits and pieces of the type parameters of the supertype.

Continuing, Rule1 has a field, so the type of that field is also
reachable:

Add<Succ<Succ<Zero>>, Zero>

This type does not have any Rule1 subtypes, but it does have a Rule2 subtype:

Rule2

Rule2 has a field, in this case of the following type:

RandomClass


So you see, if your root type is Add<something, somethingElse>, then
RandomClass is exposed iff something plus somethingElse is two. Thus,
with this pattern of types, STOB is required to perform addition and
figure out whether the input numbers add up to two.


Addition is not such a big deal. We can teach STOB to add. However,
a user could supply different definitions that would cause STOB to
trace a Turing Machine. Imagine our optimistic user defines the
following classes, all assumed to inherit from Serializable:

class Turing<State, Token, LeftTape, RightTape> { }
class Nil { } // empty list
class Cons<HD, TL> { } // non-empty list

Next, the user needs to specify the program this Turing Machine should
run. To do so, they add a class for each state and token, and a
RuleNNN class for each transition their Turing machine has. To see
what the RuleNNN classes look like, suppose the user desires that if
the Turing machine is in State1 and the head points to a Token1 token,
that the machine is supposed to write Token2 and shift to the left.
They would implement this rule as follows:

class State1 { }
class State2 { }
class Rule123<LTH, LTT, RT>
extends Turing<State1, Token1, Cons<LTH, LTT>, RT> {
Turing<State2, LTH, LTT, Cons<Token2, RT>> nextConfiguration;
}

Note that Rule123 triggers in State1 and Token1, for arbitrary states
of the left tape (LHT, LTT) and right tape (RT). It changes the state
to State2, replaces the Token1 token by Token2, moves the tape to the
left (thus pointing at a LTH token), and updates the left and right
tapes accordings. The left tape is the old tail of the left tape, and
the updated right tape is what it was before plus the new Token2.

That's one rule. Users can specify any rules they want, thus encoding
an arbitrary Turing machine. Since a Turing machine this is the
ultimate Turing-complete language, users can require STOB to trace
through any computable algorithm.

Already that sounds hard. To rub it in, you can make STOB compute
answers to the halting problem. Suppose the user writes something
like this, analogous to Rule2 from the addition example:

class Rule456<T, L, R> extends Turing<Friday, T, L, R> {
WhoKnowsWhat willBeExposed;
}

They then provide a root type that looks like this:

// root type
Turing<_some state_, _some token_, Nil, Nil>


Given this setup, for STOB to find out whether to include WhoKnowsWhat
in the set of serializable types, it must decide whether the above
Turing Machine will ever reach Friday. To do this accurately requires
a halting detector, because the user's program might itself include an
interpreter of Turing Machines, and it might run the simulator on some
program and then progress to Friday iff the simulated machine halts.


Does this look right to people? If it is, then making STOB maximally
precise is impossible. There will always exist a STOB version n+1
that is a little more precise than version n. Barring new ideas,
therefore, we should try to keep STOB simple and precise enough for
practical programs. It would be nicer if we could identify some
restrictions on user code that are not onerous and that allow for a
maximally precise STOB; I don't have any ideas about that at the
moment, though.


-Lex

Ray Cromwell

unread,
Jun 30, 2008, 2:23:53 PM6/30/08
to Google-Web-Tool...@googlegroups.com, Miguel Méndez, Scott Blum, Amit Manjhi, Rajeev Dayal

Lex, best GWTC post ever! :) 

Ian Petersen

unread,
Jun 30, 2008, 11:29:29 PM6/30/08
to Google-Web-Tool...@googlegroups.com
On Mon, Jun 30, 2008 at 12:13 PM, Lex Spoon <sp...@google.com> wrote:
> Does this look right to people?

I read it very quickly, so there could be some flaw somewhere that I
missed, but it seems right to me.

Is there any way to improve matters by allowing some runtime
computation that could dynamically expand the set of types that can be
(de)serialized by the client?

RPC is, by design, asynchronous. There's an implied delay in invoking
a remote method so, to my mind, some arbitrary extra delays here and
there are indistinguishable from network bubbles. What if
GWT.createAsync() (or some other asynchronous method) could be used to
dynamically compute new code for (de)serializing types that couldn't
be statically detected as reachable?

This is a very vague idea on my part, so I can't really expand on the
above or provide implementation details or suggestions, but it seemed
like a brainstorm worth sharing.

Oh, and I agree with Ray: best GWTC post ever!

Ian

John Tamplin

unread,
Jul 1, 2008, 2:17:09 AM7/1/08
to Google-Web-Tool...@googlegroups.com
On Mon, Jun 30, 2008 at 11:29 PM, Ian Petersen <ispe...@gmail.com> wrote:
Is there any way to improve matters by allowing some runtime
computation that could dynamically expand the set of types that can be
(de)serialized by the client?

RPC is, by design, asynchronous.  There's an implied delay in invoking
a remote method so, to my mind, some arbitrary extra delays here and
there are indistinguishable from network bubbles.  What if
GWT.createAsync() (or some other asynchronous method) could be used to
dynamically compute new code for (de)serializing types that couldn't
be statically detected as reachable?

runAsync can't create new code dynamically, it can only dynamically load precompiled chunks of code.  If you allow any runtime-determined serializable types, you essentially have to assume all types are serializable and generate code for them.  It isn't clear how you would partition those into useful chunks (otherwise you wind up with a round-trip for each type determined at runtime).

--
John A. Tamplin
Software Engineer, Google

Ray Cromwell

unread,
Jul 1, 2008, 3:19:04 AM7/1/08
to Google-Web-Tool...@googlegroups.com
Maybe Lex can restrict the STOB to just languages accepted by a Push
Down Automaton :)

-Ray

Lex Spoon

unread,
Jul 1, 2008, 8:28:37 AM7/1/08
to Google-Web-Tool...@googlegroups.com

Loading the deserializers lazily does sound nice, because if nothing
else it will speed up the initial download, perhaps by a lot.
However, it would be a shame to give up STOB's ability to reject
classes as possibly serialized, because every serializable class is
considered live by the optimizer. This can keep a lot of dead code
from being removed.


-Lex

Lex Spoon

unread,
Jul 1, 2008, 8:30:58 AM7/1/08
to Google-Web-Tool...@googlegroups.com
On Tue, Jul 1, 2008 at 3:19 AM, Ray Cromwell <cromw...@gmail.com> wrote:
>
> Maybe Lex can restrict the STOB to just languages accepted by a Push
> Down Automaton :)

That way sounds really nice. The only question is, what's the
language? :) It would be just great to find some non-onerous
restrictions on people's class hierarchies that allowed STOB to always
know what types they are pulling in.

-Lex

Ian Petersen

unread,
Jul 1, 2008, 12:51:46 PM7/1/08
to Google-Web-Tool...@googlegroups.com
On Tue, Jul 1, 2008 at 8:28 AM, Lex Spoon <sp...@google.com> wrote:
> Loading the deserializers lazily does sound nice, because if nothing
> else it will speed up the initial download, perhaps by a lot.
> However, it would be a shame to give up STOB's ability to reject
> classes as possibly serialized, because every serializable class is
> considered live by the optimizer. This can keep a lot of dead code
> from being removed.

I was thinking that STOB would be conservative in its calculations and
that the downloaded code would patch things up on the client where
ever necessary. Is that even possible? The fluidity of Javascript
makes it seem possible in principle, but it might not be a tractable
problem.

Ian

Ray Cromwell

unread,
Jul 1, 2008, 1:10:51 PM7/1/08
to Google-Web-Tool...@googlegroups.com
Lex,
What about some kind of annotation hint that allows the STOB to
terminate the search early?

e.g.

@Serialize({Integer.class, Double.class)
List<Number> serializableFIeld = ...

therefore skip every other subclass of Number.

-Ray

Ian Petersen

unread,
Jul 1, 2008, 1:26:28 PM7/1/08
to Google-Web-Tool...@googlegroups.com

I was being rushed when I wrote that, but I think it deserves a little
more explanation.

I'm imagining a situation where STOB would figure out two bounds on
the problem: one that's definitely wide enough to include everything
and therefore might include too much, and one that's as narrow as
seems reasonable while still encompassing enough types to be generally
useful. I guess you could consider the wide bound "optimistic" (in
the sense that it will default to a type being necessary) and the
narrow bound "pessimistic".

Now, I was about to suggest that this would be a good (if complex)
approach because the type tightener could operate on a minimal set of
types but you'd still have the flexibility to patch the running code
later on if the server sends back unexpected types. It seems possible
to me, in principle, for the Javascript runtime to be patched to
support new types, but I realized while typing this email that the
type tightener probably can't be fooled, so I'm not sure if my
suggestion is tractable.

Would it be possible for the compiler to have a concept of speculative
types? Something like "this java.lang.Number is at least
java.lang.Integer and at most either java.lang.Integer or
java.lang.Double". Supposing that was possible, you could generate
code for the least situation (that tightens Numbers down to Integers)
and also remember what optimizations need to be undone in case an RPC
forces a re-widening to (Integer | Double).

Ian

John Tamplin

unread,
Jul 1, 2008, 2:02:58 PM7/1/08
to Google-Web-Tool...@googlegroups.com
On Tue, Jul 1, 2008 at 1:26 PM, Ian Petersen <ispe...@gmail.com> wrote:
I'm imagining a situation where STOB would figure out two bounds on
the problem: one that's definitely wide enough to include everything
and therefore might include too much, and one that's as narrow as
seems reasonable while still encompassing enough types to be generally
useful.  I guess you could consider the wide bound "optimistic" (in
the sense that it will default to a type being necessary) and the
narrow bound "pessimistic".

Now, I was about to suggest that this would be a good (if complex)
approach because the type tightener could operate on a minimal set of
types but you'd still have the flexibility to patch the running code
later on if the server sends back unexpected types.  It seems possible
to me, in principle, for the Javascript runtime to be patched to
support new types, but I realized while typing this email that the
type tightener probably can't be fooled, so I'm not sure if my
suggestion is tractable.

I don't see how that can work, because if the compiler decides that Shape can be tightened to square, it will have already inlined Square.area() at every call of Shape.area().  If you later find out that you can get another Shape from the server, you are toast since basically all of the code you have can be incorrect.

To make this work, you would have to have the type tightener consider any type in your pessimistic set, and it isn't clear that set can be restricted enough that you have a useful level of optimization.
 
Would it be possible for the compiler to have a concept of speculative
types?  Something like "this java.lang.Number is at least
java.lang.Integer and at most either java.lang.Integer or
java.lang.Double".  Supposing that was possible, you could generate
code for the least situation (that tightens Numbers down to Integers)
and also remember what optimizations need to be undone in case an RPC
forces a re-widening to (Integer | Double).

This isn't like the Java JIT -- the compiler doesn't live in the browser, so it can't simply build new code to match the new assumptions when a check shows the previous assumptions failed.  The best you could do would be to download new code that was previously compiled for those assumptions, but even then if you are able to patch it in that means that you didn't inline any of those implementations, which is the largest win for type tightening/devirtualization.

Alexander Vasiljev

unread,
Jul 2, 2008, 2:46:04 AM7/2/08
to Google Web Toolkit Contributors
This is all about generic, but is it possible that the issue here is
somehow related to the Issue 2369?

WBR
Alexander Vasiljev

Lex Spoon

unread,
Jul 2, 2008, 9:32:11 AM7/2/08
to Google-Web-Tool...@googlegroups.com
On Wed, Jul 2, 2008 at 2:46 AM, Alexander Vasiljev
<A.A.Va...@gmail.com> wrote:
>
> This is all about generic, but is it possible that the issue here is
> somehow related to the Issue 2369?

No, that looks like an outright bug. Let's get to the bottom of it. -Lex

Lex Spoon

unread,
Jul 2, 2008, 9:40:08 AM7/2/08
to Google-Web-Tool...@googlegroups.com
On Tue, Jul 1, 2008 at 2:02 PM, John Tamplin <j...@google.com> wrote:
> On Tue, Jul 1, 2008 at 1:26 PM, Ian Petersen <ispe...@gmail.com> wrote:
>> Now, I was about to suggest that this would be a good (if complex)
>> approach because the type tightener could operate on a minimal set of
>> types but you'd still have the flexibility to patch the running code
>> later on if the server sends back unexpected types. It seems possible
>> to me, in principle, for the Javascript runtime to be patched to
>> support new types, but I realized while typing this email that the
>> type tightener probably can't be fooled, so I'm not sure if my
>> suggestion is tractable.
>
> I don't see how that can work, because if the compiler decides that Shape
> can be tightened to square, it will have already inlined Square.area() at
> every call of Shape.area(). If you later find out that you can get another
> Shape from the server, you are toast since basically all of the code you
> have can be incorrect.

That's a killer example. It does not make the approach impossible,
but it shows just how hard it is if you want to also get good
optimization. You'd have to handle speculative optimizations that can
be undone by later code loads. For example, you'd need to be able to
re-virtualize calls to the area() method. That seems possible, but
complicated. As John points out, it's much easier in the JVM, because
a JVM can inspect and rewrite the whole state of the program including
stacks.

-Lex

Ian Petersen

unread,
Jul 2, 2008, 2:20:23 PM7/2/08
to Google-Web-Tool...@googlegroups.com

I think Lex has said what I've been thinking. Suppose you've got
method A calls B calls C calls D. Suppose further that after
type-tightening, D can be inlined into C to produce C'. The result is
a Javascript function that implements B by calling C' by name. If an
RPC return value includes a type that breaks the assumptions that
allowed D to be inlined into C, then the application would have to
react by downloading a new definition for C that implements C by
actually calling D, rather than inlining D. Given that you can
redefine functions on the fly in Javascript, I think it would suffice
to do that.

As Lex said, it'd be complicated. Also, in Lex's other post, he
clarifies a bit and maybe invalidates the thrust of my argument. I
had understood that since the STOB can't, in general, compute the
exact set of require types, it would compute a set containing either
too many or too few types. Too many is bad because of bloat, but too
few is bad because of correctness. My suggestion was meant to make
the too few set useful by allowing dynamic download of the too many
set if necessary. It sounds like the STOB will never produce a set
that's too small, it's just that it's hard or impossible to compute
the set that's "just right". In that case, I figure there's no point
in worrying about dynamically deoptimizing running code.

Ian

Reply all
Reply to author
Forward
0 new messages