Like Matt, I'd love to hear feedback!
-Sarah
Background
Many library APIs implicitly limit methods to objects in particular
abstract states. The ResultSet class of the Java JDBC library
distinguishes between objects that are open or closed, updatable or
not updatable, sensitive or not sensitive, scrollable or not
scrollable, and so on. To use methods that are only available in a
subset of a ResultSet object's states, a programmer must adhere to the
API protocol.
Unfortunately, the restrictions that make up such API protocols are
often buried in multiple locations across API documentation, possibly
across several classes' documentation. The protocol may never be made
explicit, and it can be time consuming to discover and understand it
from prose. This makes it difficult for users to learn how to
interact with a new API, and research indicates that protocol
violations often result not in errors but in unusual runtime behavior.
One study concludes that a single such bug can require days of a
programmer's time, even with expert help. Further, even when there
are resources to teach new users about an API protocol, the protocols
typically are not represented formally. Naturally, informal protocol
representations cannot be used by automatic checkers to ensure
protocol adherence.
These issues motivate the creation of formal models of API protocols.
Unfortunately, discovering the protocols is far from trivial. The
ResultSet class alone can have 33 different abstract states and
approximately 140 methods. Using brute force to identify appropriate
orderings of method calls could perhaps be feasible, but even if we
cared to attempt that, the brute force approach would not be helpful
in identifying multi-object protocols. How would a brute force
approach recognize that Conditions and ReentrantLocks should be
considered together, or that Collection and Iterator should be
considered together?
Problem Statement
It seems clear from the days spent on protocol bugs that programmers
would find it difficult to hand code a formal model of a particular
API's protocol. And with brute force options off the table, it may be
difficult for a programmer to write a program to construct the model
instead. It is therefore desirable to be able to synthesize formal
models of API protocols.
I would like to synthesize deterministic state machines to formally
model API protocols. Each state transition from s1 to s2 would be
associated with the API call that transitions the system from state s1
to state s2. Each state would be associated with the API calls (for
all of the involved classes) that are possible in that state. This
state machine could serve as a guide to programmers attempting to use
an API for the first time, and could also be used in an automatic
checker to identify violations of the protocols.
Spec
Because of the existence of multi-object protocols, it is probably
insufficient to provide the synthesizer with the ability to generate
and run programs. That said, an API is itself the most complete
specification of the API's behavior. If it were feasible, the ideal
would be to use APIs themselves as the only specifications for their
protocols.
Perhaps more realistically, I propose that it may be possible to
provide a body of non-buggy programs as input. The synthesizer could
then make alterations to those programs (and observe the results) to
add constraints. Presumably, this specification would not be written
at the time of use, but would consist of pre-existing API clients
collected for the purpose. These programs would be unlikely to
constitute a complete specification,. Thus, we would allow the
synthesizer to mutate the base programs to construct more input-output
pairs.
> --
>
>