*Pip*
Paper Title: Pip: Detecting the Unexpected in Distributed Systems
Author(s): Patrick Reynolds, Charles Killian, Janet L. Wiener, Jeffrey
c. Mogul, Mehul A. Shah, and Amin Vahdat
Date: NSDI �06. May, 2006
Novel Idea: Pip allows programmers to express their expectations for
application correctness and performance by way of *recognizers* and
*aggregates*, somewhat respectively. Recognizers define a pattern which
matches multiple independent thread-level views of tasks, messages, and
notices. A system under inspection must be traced (typically by source
code annotation) at each node, after which an off-line system reconciles
data, assimilates causal paths, and checks for valid and invalid paths
against the programmer�s stated expectations.
Main Result(s): The paper describes the Pip system, provides an overview
of the syntax and features of the expectations language and annotations
library (libannotate), and reviews four case studies where Pip was used
to find bugs in real distributed systems. See �Evidence� for further
discussion on case studies and performance metrics.
Impact: A simple way to describe system expectations seems like a
powerful tool to me, although it remains to be seen how programmers and
organizations would take to such a system. Is it clear that the effort
in creating �expectations� warrants the number of bugs found/avoided?
Evidence: The authors present four case studies to illustrate the
flexibility and success of Pip at detecting bugs in distributed systems.
Profiled systems included FAB (2 bugs found), SplitStream (13 bugs),
Bullet (2 bugs), and RanSub (2 bugs). In all cases the authors used
source code annotations, leaving more automatic tracing for future work.
Prior/Competitive Work: The authors divide their analysis of related
works into two primary categories: path analysis tools and automated
expectation checking. The former includes works like Project5 and
Magpie, which rely on statistical inference for detecting causal paths,
and Pinpoint, which rely on statistical inference to detect anomalous
paths. The latter includes work like PSpec and MC, which focus on single
nodes only, and Paradyn, which cannot �express [the] causal path
structure of threads, tasks, and messages.�
Reproducibility: Full source to the Pip system is provided, making it
extremely easy for researchers to examine its effectiveness. Due to
limited descriptions, it may be more difficult to duplicate the exact
case studies presented in the paper. In the case of FAB, the authors
cite personal communications with one FAB�s authors. For the other
systems, the authors did not tell us the version in which the identified
bugs were found.
Question: open-ended criticism: in a real environment, are Pip�s
automatic expectation generation features ripe for abuse and misuse? If
�expectations� are not easier to reason about than the original,
well-structured application code, then developers might be tempted to
auto-generate �expectations� without verifying them.
Criticism:
1.) Their direct annotations approach seems risky � Pip will check the
paths that programmers *claim* the program is executing. I see
opportunity for bugs in the annotations themselves; perhaps as simple as
getting a sequence number or �bytes sent� count wrong. For this reason,
I like their approach to modifying middleware more.
2.) The distributed systems they describe all seem relativity
straightforward to model in Pip. How does difficulty in describing
expectations grow as systems get larger and more complex? (It may not be
a linear effort given an increasing likelihood to get the /expectation
itself /wrong/.)/
Future Work:
1.) From a software engineering perspective, it would be interesting to
see how much of Pip�s �expectations� can be derived from UML behavior
models. Perhaps it is possible to rely on these types of architectural
models entirely, without the need for defining a new language.
2.) As syntactic sugar, the authors should consider an async {} block as
means of expressing the notion that tasks, messages and notices within a
single async block may occur in any order, but must all occur by the end
of the block.