Reviews: Pip

Rodrigo Fonseca

unread,

Nov 8, 2010, 7:24:10 PM11/8/10

to CSCI2950-u Fall 10 - Brown

Please post your reviews of Pip here.

Dimitar

unread,

Nov 8, 2010, 10:23:50 PM11/8/10

to CSCI2950-u Fall 10 - Brown

Pip: Detecting the Unexpected in Distributed Systems
Authors: Patrick Reynolds, Charles Killian, Janet L. Wiener, Jeffrey
C. Mogul, Mehul A. Shah, Amin
Vahdat

Date: May 2006

Novel Idea: the paper presents another debugging distributed system
tool. The new approach focuses
on detecting discrepancies between the system's behavior and the
programmer's assumptions about the
behavior. Using this approach the authors are able to detect
structural and performance problems. Also,
the authors claim that their tool can be used by a broad types of
users such as developers and system
administrators.

Main Result: Working implementation of the debugging tool called Pip.
The tool allows programmers
to write description about the expected behavior of the system using
an expectations language. The
behavioral model takes two forms: recognizer that validates paths
and aggregates assert properties in
the paths. Pip also provides GUI so programmers can visualize casual
and communication structure of
their system.

Impact: Pip provides a significantly better bug detection compare to
the tools we have studied so far ,
but its main limitation is that the source code for all modules should
be available and changed should
be made so the system can take advantage of Pip.

Evidence: The authors clearly demonstrate that the system works by
applying Pip to several
distributed system such as FAB, SplitStream, Bullet and RanSub. In all
cases Pip was able to find
bugs that ranged from correctness to performance bugs.

Competitive Work: Project 5 is a similar tool, but unlike PIP infer
causal paths from black-box
networks using traces. It also has limited granularity to host-to-
host communication, and often
the inferred paths are incorrect:

Reproducibility: I think the results in the tests are reproducible.

Criticism/ Question: I think the paper was well written and results
were convincing except for the
performance section (3.5.2 ) which was lacking. How could we infer
the overhead from those
numbers?

Shah

unread,

Nov 8, 2010, 7:35:37 PM11/8/10

to CSCI2950-u Fall 10 - Brown

Title:

Pip: Detecting the Unexpected in Distributed Systems

Authors:

[1] Patrick Reynolds
[2] Charles Killian
[3] Janet L. Wiener
[4] Jeffrey C. Mogul
[5] Mehul A. Shah

Source and Date:

3rd USENIX Symposium on Networked Systems Design & Implementation, San
Jose, CA. May 8-10, 2006.

Novel Idea:

The authors present a debugging aid for distributed systems called
Pip. The tool (which makes use of a declarative language), allows the
programmers to compare actual system behavior with the programmer's
expectation about that behavior.

Main Result:

The scientists state that they found various bugs - large and small -
in an array of distributed systems using Pip.

Impact:

This paper has been cited a significant amount of times (30?). The
idea of having a tool that specifically targets distributed system
environments seems useful - especially as more and more of these
systems are used.

Evidence:

The scientists apply Pip to various distributed systems. They are:
FAB, SplitStream, Bullet and RanSub. In sum, they found 18 bugs and
fixed them. They provide a decent amount of evidence.

Prior Work:

In the Section 6, the authors mention two categories of debugging
approaches: path analysis tools and automated expectation checking.
They clearly state that Pip is the first to fuse both these
approaches.

Competitive Work:

In the first category mentioned above, the authors list several
examples including Project 5, Magpie and Pinpoint. In the second, they
list PSpec, Meta-level and Paradyn.

Reproducibility

Since the source is available for download and the experiments may be
reproducible. However, since no actual detail is given about the
specific tests, it would be difficult to find the exact bugs.

Questions:

Since this is a useful tool, has it gained popularity commercially?

Criticism:

Perhaps the authors could've listed the actual cases out in more
detail?

Ideas for Further Work:

Could this be tested on more distributed systems? Maybe even something
majorly commercial?

Duy Nguyen

unread,

Nov 8, 2010, 11:21:09 PM11/8/10

to brown-csci...@googlegroups.com

Paper Title

Pip: Detecting the Unexpected in Distributed Systems

Authors

Reynolds, Patrick, Killian, Charles, Wiener, Janet L., Mogul, Jeffrey
C., Shah, Mehul A., and Vahdat, Amin

Date
NSDI 2006

Novel Idea
Design a declarative language to describe the expected behaviors of
a system. After that, these expected ones are checked against actual
behaviors to find bugs.

Main Results
The paper mainly describes the syntax and functions of its declarative
language used to define the expected behaviors. This is definitely the
key point of this paper.

Impact
Unknown.

Evidence
Pip was experimented on four different systems: FAB, SplitStream, Bullet
RanSub, over dozen of interesting bugs were found and described at length.

Prior Work
Project 5, Magpie, Pinpoint, expectation checkers: PSpec, meta-level
compilation, and Pardyn.

Competitive Work
Not mentioned.

Reproducibility
Yes

Criticism
Pip, like XTrace, requires programmers to modify source code to support
its annotation which makes it difficult when dealing with closed source
software.

Question
What if the programmers produce buggy declarative codes?

Sandy Ryza

unread,

Nov 8, 2010, 11:44:01 PM11/8/10

to CSCI2950-u Fall 10 - Brown

Title:
Pip: Detecting the Unexpected in Distributed Systems

Authors:
Patrick Reynolds, Charles Killian, Janet L. Wiener, Jeffrey C. Mogul,

Mehul A. Shah, and Amin Vahdat

Date:
NSDI 2006

Novel Idea:
The authors present Pip, a debugging tool for distributed systems that
checks system behavior, defined in terms of paths, which consist of
tasks and messages against expectations. Their approach relies on
annotating code to log relevant events and validating logs against
expectations of system behavior written in an declarative expectation
language that they created.

Main Result(s):
Pip was successful in finding both performance and structural bugs in
the four systems the authors tried it on. Debugging requires full
access to the target system's source code. Depending on the system,
lines of expectation code can range from about 15% to 40% of lines of
code in the system. Compiling logs and checking against expectations
completes within minutes.

Evidence:
The authors applied Pip to four different distributed systems: FAB,
SplitStream, Bullet, and RanSub. They described the effort it took to
put annotations in the source code and write expectations, as well as
how their system helped them find bugs. They also provided statistics
on how many lines of code they had to write, how long Pip took to
analyze results, and how many bugs were found.

Prior Work:
Two categories: automated analysis tools and automated expectation
checking. Magpie, Pinpoint, and Project5, which fall into the former
category, use statistical inference to make guesses about where bugs
might be. Pspec and Paradyn, which fall into the latter category,
automatically check performance expectations.

Competitive Work:
X-Trace and BorderPatrol both gather exact traces using non-
inferential methods. Unlike Pip, BorderPatrol is application-
agnostic, but is consequentially unable to tie certain events together
or check for consistency inside a program.

Reproducibility:
Reproducing results would require both the Pip code and the
expectations for the programs they tested it on.

Criticism & Question:
Because of the design of the four programs they tested it on, Pip was
able to generate annotations automatically. Is this something we
could expect in the majority of cases, and, if not, how much extra
work would be required?

Ideas for further work:
Unless I misunderstood how it worked, it seems like their declarative
language could have been improved by adding the ability to define sub-
paths and behaviors for easier expectation-reuse.

On Nov 8, 7:24 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

James Chin

unread,

Nov 8, 2010, 11:58:13 PM11/8/10

to CSCI2950-u Fall 10 - Brown

Paper Title: “Pip: Detecting the Unexpected in Distributed Systems”

Authors(s): Patrick Reynolds, Charles Killian, Janet L. Wiener,

Jeffrey C. Mogul, Mehul A. Shah, and Amin Vahdat

Date: 2006 (NSDI ‘06)

Novel Idea: This paper presents Pip, an infrastructure for comparing
actual behavior and expected behavior to expose structural errors and
performance problems in distributed systems. Pip allows programmers
to express, in a declarative language, expectations about the system’s
communications structure, timing, and resource consumption. Pip also
includes system instrumentation and annotation tools to log actual
system behavior, and visualization and query tools for exploring
expected and unexpected behavior.

Main Result(s): The authors applied Pip to several applications,
generating the instrumentation for all of them automatically and
refining automatically generated expectations. Pip found unexpected
behavior in each application and helped to isolate the causes of poor
performance and incorrect behavior.

Impact: There are three major benefits of Pip. First, expectations
are a simple and flexible way to express system behavior. Second,
automatically checking expectations helps users find bugs that other
approaches would not find or would not find as easily. Finally, the
combination of expectations and visualization helps programmers
explore and learn about unfamiliar systems.

Evidence: The authors applied Pip to several distributed systems,
including FAB, SplitStream, Bullet, and RanSub. They found 18 bugs
and fixed most of them. Some of the bugs they found affected
correctness. For instance, some bugs would result in SplitStream
nodes not receiving data. Other bugs were pure performance
improvements, as they found places to improve read latency in FAB by
15% to 50%. Finally, they found correctness errors in SplitStream and
RanSub that were masked at the expense of performance.

Prior Work: The dominant tool for debugging distributed systems has
remained unchanged for over
twenty years: printf to log files. The programmer analyzes the
resulting log files manually or with application-specific validators
written in a scripting or string processing language. In our
experience, incautious addition of logging statements generates too
many events, effectively burying the few events that indicate or
explain actual bugs.

Competitive Work: This includes path analysis tools like Project 5,
Magpie, and Pinpoint. This also includes tools that do automated
expectation checking, like PSpec, MC, and Paradyn.

Reproducibility: The findings appear to be reproducible if one follows
the testing procedures outlined in this paper and has access to the
code for FAB, SplitStream, Bullet, and RanSub

Question: Is Pip being used in the industry right now?

Criticism: Only 4 specific distributed systems were tested.

Ideas for further work: Extend Pip to allow parameterized recognizers;
provide a way to constrain similar behavior.

On Nov 8, 7:24 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Abhiram Natarajan

unread,

Nov 8, 2010, 7:25:43 PM11/8/10

to brown-csci...@googlegroups.com

Paper Title: Pip: Detecting the Unexpected in Distributed Systems

Author(s): Patrick Reynolds, Charles Killian, Janet L Wiener, Jeffrey C Mogul, Mehul A Shah, Amin Vahdat

Date: 2006, NSDI

Novel Idea: Development of an automated system for checking the behaviour of a distributed system in order to expose structural errors and performance problems.

Main Result(s): Pip automatically checks actual behaviour against expected behaviour and helps programmers visualise the result to discover the causes of any unexpected behaviour.

Impact: Pip is an infrastructure that allows programmers to express, in a declarative language, expectations about the system's communications structure, timing, and resource consumption.

Evidence: The authors applied Pip to several applications, and found the needed expectations easy to write, starting in each case with automatically generated expectations.

Prior Work: There is quite a bit of relevant in the following realms:

(1) Path analysis tools - Project 5, Magpie, Pinpoint

(2) Automated expectation checking - PSpec, MC, Paradyn & PCL

(3) Domain-specific Languages - Estelle, Pi-Calculs, Join-Calculus, P@, Erlang, Mace

Competitive Work: There is no comparative work as such, because it is a unique system. They do apply Pip to applications like FAB, SplitStream, Bullet and RanSub. Pip found unexpected behaviour in each application, and helped to isolate the causes of poor performance and incorrect behaviour.

Reproducibility: Yes, one should be able to do it. They do make generate some of their own instrumentation; so reproducing that would be hard. The system on the whole seems reproducible.

On Mon, Nov 8, 2010 at 7:24 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Tom Wall

unread,

Nov 8, 2010, 8:28:49 PM11/8/10

to CSCI2950-u Fall 10 - Brown

Pip: Detecting the Unexpected in Distributed Systems

Patrick Reynolds, Charles Killian, Janet L. Wiener, Jeffrey C. Mogul,

Mehul A. Shah and Amin Vahdat
NSDI 2006

Novel Idea:
Pip is another distributed debugging system that attempts to pinpoint
bugs in large applications. It compares application traces against a
set of expectations to determine if an application is buggy. These
expectations can be instrumented automatically and can be specified by
the programmer.

Main Result:
Pip can provide some valuable feedback and is useful for finding
correctness and performance bugs that result from unexpected behavior.

Impact:
It seems quite useful, especially if your application already uses
some of their supported middleware.

Evidence:
They run Pip on a number of distributed systems and find a number of
bugs and and potential optimizations.

Reproducibility:
They only talk about the implementation at a high level, but the
source code is available. The evaluation was mostly anecdotal and
isn't easily verifiable.

Prior/Competitive Work:
Project 5, Magpie and Pinpoint are similar debugging tools which do
causal traces of distributed systems. Project 5 only reports these
traces, while Magpie and Pinpoint attempt to infer potential problems
based on trace data. Pip differs from these in that it relies on
programmmer-specified expected behavior to identify problems rather
than statistical inference. They also not similarities to model
checking, though model checking does all possible traces which makes
it unsuitable for most distributed applications.

There are also many expectation languages and automatic expectation
checking tools. PSpec and Paradyn are similar but are more for
performance and don't support causal paths. MC does static checking
at compile times and is more for finding the root causes of bugs.

Criticism:
While people expect and tolerate to a degree the runtime overhead
associated with a program's debug-mode execution, it is always good to
minimize this overhead. They don't spend much time on the overhead
associated with running Pip. A comparison of the run time of an
application with and without Pip would easily show this. I'd image
writing all that trace data to the database slows things down. If
this overhead is too much, it might become difficult to write
performance expectations.

Future Work Ideas:
There are a number of ways to enhance the expectation language. They
mention things like parameterized expectations and variables. It also
might be nice to test user defined quantities (i.e. things other than
the built in metrics of CPU time, context switches, latency, etc) in
limit statements.

On Nov 8, 7:24 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Siddhartha Jain

unread,

Nov 8, 2010, 8:34:29 PM11/8/10

to brown-csci...@googlegroups.com

Name:Siddhartha Jain

Title:
Pip

Novel Idea:
Pip classifies system behavior as valid and invalid and groups behaviors into sets that one can reason about. Instead of using statistical inference however, it uses path identifiers and input from the programmer in the form of expected behavior of the program to check program behavior.

Main Results:

Pip is described. The expectations language which programmers can use to encode program expectations is described. Pip was run on several distributed systems like FAB, Bullet, etc. and bugs related to performance and correctness were identified using Pip.

Evidence:

Some statistics on the bugs found, the trace duration, etc. are given for the different distributed systems.

Prior Work:

Path analysis tools like PinPoint and Magpie.

Question:

How would Pip perform if it didn't have to also classify behavior as valid or invalid - if that was totally the onus of the sysadmin?

Criticism:

Seems like there is too much work required on part of the programmer. Also what about anamolous behavior that the programmer couldn't identify beforehand.

Reproducibility:

Ideas for Further Work:

It would be interesting to see comparisons with other approaches like NetMedic to finding out bugs.

On Mon, Nov 8, 2010 at 7:24 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Matt Mallozzi

unread,

Nov 8, 2010, 10:29:24 PM11/8/10

to brown-csci...@googlegroups.com

Matt Mallozzi

11/9/10

Title:

Pip: Detecting the Unexpected in Distributed Systems

Authors:

Reynolds, Killian, Wiener, Mogul, Shah, Vahdat

Date:

2006

Novel Idea:

Rather than take an unsupervised approach to detect bugs (epitomized in

Magpie, which treats behavioral outliers as possible bugs), Pip allows the

user to compare actual correctness/performance data to expected data.

Main Results:

A debugging system that is a combination of a declarative expected behavior

language, tools to gather data to compare against expected behavior, tools

to visualize discrepancies between observed and expected behavior, and

utilities to automatically generate expectations.

Impact:

This could make debugging distributed systems much easier for developers.

Also, it is a very useful tool to help new developers or technical users to

fully understand the system, thus minimizing the learning curve.

Evidence:

Anecdotal evidence about how they were able to fix bugs in various

distributed systems using Pip. Some indication of the overhead caused by

Pip.

Prior Work:

Pip builds on the ideas of gdb and gprof, but for a distributed system

rather than a single node.

Competitive Work:

Similar to Project 5 in that causal paths form an important part of the

system information, but Pip deterministically discovers causal paths through

path identifiers rather than through statistics and inference, more similar

to X-Trace.

Reproducibility:

Open source!

Criticism:

A more detailed analysis of Pip's overhead would have been nice. They

mentioned that a lot of the overhead time came from MySQL, but the overhead

from some storage system cannot be completely eliminated.

Ideas For Further Work:

Extend Pip's visualization tools for comparing expected vs. actual behavior

to compare actual behavior from various code revisions, in order to see in

a very fine-grained way the effect of a diff.

On Mon, Nov 8, 2010 at 7:24 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Jake Eakle

unread,

Nov 8, 2010, 7:37:19 PM11/8/10

to brown-csci...@googlegroups.com

Paper Title

Pip: Detecting the Unexpected in Distributed Systems

Author(s)

Patrick Reynolds, Charles Killian, Janet L. Wiener,

Jeffrey C. Mogul‡, Mehul A. Shah‡, and Amin Vahdat†
Date	2006
Novel Idea	Debug distributed systems by means of an expectation language in which users can express expectations for the semantics and resource utilization of both individual 'path instances' and aggregate system behavior.
Main Result(s)	They provide a number of tools to help the developer follow this process from start to finish. The first of these is an annotation library. Applications linked against it produce output whenever they send, receive, or process messages. These are then reconciled into path instances offline. This aspect of Pip may be useful when it is important that performance penalties never be incurred, but in a testing environment, a BorderPatrol-like approach seems much preferable, since it makes much stronger guarantees about the causal linkage in the path instances it generates. The next, and I would argue the heart of Pip's contribution, is an expectation language. This language enables programmers to express expectations about distributed systems in a natural way, hiding some parallelism and providing intuitive primitives, such as futures, for expressing expectations about parallelism that must be reasoned about. They also provide a GUI behavior explorer that displays Pip's findings.
Impact	No time, no time! :( Though prolly this like, inspired BorderPatrol to some degree or something.
Evidence	They use Pip on a number of available distributed systems, and find that it reveals a number of bugs, both performance-related and correctness-related.
Prior Work	They mention a number of previous inference-based distributed debugging tools, and tacitly assert that Pip is the first non-inference-based such tool, which may well be the case. Their expectation-checking language is built in the tradition of a number of prior tools, but makes the novel contributions of supporting expectations about complex causality in path instances.
Reproducibility	They give a nearly complete specification for their expectation language, as well as a brief overview of its implementation. A similar system could be built with some effort.
Question	They spend very, very little time discussion reconciliation of annotations. Given how much effort the authors of BorderPatrol went to to obtain reliable traces, can Pip's reconciliation really be as good as they claim?
Criticism	They evaluate Pip by using it on a bunch of software and showing that they were able to find bugs with it. This is exactly the sort of thing too many papers lack, and it's great that they have it, but I feel that they err in the opposite direction. They give brief, somewhat meaningless performance numbers, but don't spend any time talking about the correctness of any of their software. As already noted, they don't talk nearly enough about reconciliation, and they also don't really talk about how they tested their implementation of the language, or the annotation library.
Ideas for further work	Integrate with BorderPatrol!

--
A warb degombs the brangy. Your gitch zanks and leils the warb.

Visawee

unread,

Nov 9, 2010, 4:40:54 AM11/9/10

to CSCI2950-u Fall 10 - Brown

Paper Title :
Pip: Detecting the Unexpected in Distributed Systems

Author(s) :
Patrick Reynolds, Charles Killian, Janet L. Wiener,

Jeffrey C. Mogul, Mehul A. Shah‡, and Amin Vahdat

Date :
In Proc. 3rd Symp. on Networked Systems Design and Implementation
(NSDI), San Jose, CA, May, 2006

Novel Idea :
(1) A system for automatically checking the behavior of a distributed
system against a programmer’s expectations about the system.
(2) Allows programmers to express expectations about the distributed
system’s communications structure, timing, and resource consumption in
a declarative language.

Main Result(s) :
Pip is able to help developers in pinpointing bugs in distributed
systems. It also help developers in finding a way to improve the
performance of the systems.

Impact :
Allows a developer to quickly understand and debug the distributed
systems.

Prior Work :
There are a number of works focusing on monitoring network status,
obtaining data from many devices and layers. However, those works were
about trying to obtain snapshots of the system as a whole. X-Trace, on
the other hand, aims to trace the actual paths taken by data messages
across many devices and layers.

Evidence :
Applying Pip to several distributed systems (FAB, SplitStream, Bullet,
and RanSub), the authors found 18 bugs including both the bugs
regarding correctness and regarding performance.

Reproducibility :
The results are reproducible. The authors explain about the
declarative language grammar and it’s implementation in detail. The
experiments are also explained in detail.

On Nov 8, 7:24 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Basil Crow

unread,

Nov 8, 2010, 10:02:12 PM11/8/10

to brown-csci...@googlegroups.com

Title: Pip: Detecting the Unexpected in Distributed Systems

Authors: Patrick Reynolds, Charles Killian, Janet L. Wiener, Jeffrey C. Mogul, Mehul A. Shah, and Amin Vahdat

Date: NSDI 2006

Novel idea: Writing and then verifying assertions about the behavior of a distributed system (on a level higher than each individual node) will make it easier to test and debug that system.

Main results: The authors describe a language for writing assertions of a large distributed system, as well as a tool to help generate these assertions. They also present a set of tools for verifying that a system met these assertions.

Impact: This paper might result in more test-driven development of distributed systems.

Evidence: The authors tested the FAB, SplitStream, Bullet, and RanSub systems with Pip and discovered 18 bugs.

Prior work: Pip builds on prior distributed tracing tools such as Pinpoint and Magpie.

Competitive work: As is mentioned in the X-Trace paper, Pip and X-Trace are complementary.

Reproducibility: Source code, Debian packages, and a preconfigured virtual machine are available, greatly lowering the barrier to reproducibility.

Criticism: It's great that the authors provided developers with tools to help them create their assertions. But isn't this a bad habit, tantamount to copying the result of your program into the test-case?

Ideas for further work: The authors of the X-Trace paper believe that some of Pip’s analysis can be performed on X-Trace's task trees; has this been attempted yet?

Abhiram Natarajan

unread,

Nov 8, 2010, 7:26:55 PM11/8/10

to CSCI2950-u Fall 10 - Brown

Paper Title: Pip: Detecting the Unexpected in Distributed Systems

On Nov 8, 7:24 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Joost

unread,

Nov 9, 2010, 12:27:58 PM11/9/10

to CSCI2950-u Fall 10 - Brown

Paper: Pip: Detecting the Unexpected in Distributed Systems

Authors: Patrick Reynolds, Charles Killian, Janet L. Wiener, Jeffrey
C. Mogul, Mehul A. Shah, and Amin Vahdat

Date: NSDI May 2006
Novel Idea: The authors of the paper developed an interface through
which users can specify expected performances of the system into the
program, and bugs can be discovered in a comparison of those expected
ratios to the observed values.
Main Result: The authors succeeded in creating a debugging tool which
conforms to the standards they set above.
Impact: The ideas in the paper seem to have caught on a bit in
distributed systems diagnostics, as evidenced by over a hundred
citations since publication.
Evidence: While most of the paper was dedicated to the architectural
layout and the user interface, the authors did test their program on
four different systems and showed a decent level of diagnostic
success.
Reproducibility: Given that the majority of the paper was about
design, it would be possible given only the paper to implement a
system with similar structure and identical interface, given adequate
time and resources.
Criticism/Question: Given that the paper heralds a new method for
detecting bugs and errors in the system, a comparison of time to fix
difference both in CPU time and man hours would be useful in detecting
how useful this system is compared to a baseline and other previously
practiced protocols.

On Nov 8, 7:24 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Reply all

Reply to author

Forward