11/5 Reviews - BorderPatrol

Steve Gomez

unread,

Nov 5, 2009, 12:16:29 AM11/5/09

to CSCI2950-u Fall 09 - Brown

Author: Eric Koskinen and John Jannotti
Paper Title: "BorderPatrol: Isolating Events for Black-box Tracing"
Date: In EuroSys '08

BorderPatrol is a system for obtaining traces for systems made up of
black boxes, using library interposition and witnessing to infer input-
output pathways and inter-box pathways. BorderPatrol uses active
tracing, but without relying on application specific instrumentation.
So the systems designers can use BorderPatrol broadly, provided the
system itself isn't a blackbox and that the authors' assumptions
(honesty, independence, immediacy in black-boxes) hold reasonably
well.

This paper could have an important impact as a hands-off (active, but
not app-specific) tracing solution. The authors point out that traces
of causal paths are important to developers for diagnostic reasons,
and computing trace data is generally a difficult task. We've seen
(in "Performance Debugging for Distributed Systems of Black Boxes")
that trace data is useful (to find causal paths), difficult to obtain
passively, and difficult to generalize about when using most active
tracing techniques (which usually involve instrumentation).

The authors call out related work as either trying to simplify the
instrumentation burden (the nature of this, however simplified, is
still a pain) or use statistical methods that introduce some margin of
error. Systems relying on statistical methods for causal inference
include Pip and Whodunit (and BorderPatrol seems to be an enhancement
still in this category). Instrumentation and infrastructural efforts
include X-Trace, Magpie, Pinpoint, and Causeway.

The results seem reproducible with a reasonable work. Implementation
details are mostly vague descriptions, but some pseudocode is
provided.

The performance evaluation is mostly clear. One aspect that isn't
demonstrated is how well BorderPatrol does in systems that don't obey
the assumptions perfectly, like immediacy and independence. For
instance, it would be nice (sort of like a control) to compare
BorderPatrol with Apache or Zeus service, then with a different web
server that breaks BorderPatrol, to see how the system degrades.

I have one general question about the independence assumption: If
alternative paths inside a black-box (processing input, or
simultaneous inputs) are fine under independence as long as they are
'correct' (but not necessarily identical), does that skew the trace
data for applications that review the trace? As an exaggerated
example, if one path through the black-box goes to a cache to receive
a correct value and another fetches from some source, the latency for
the box on an input may vary significantly -- taking the longer (but
still correct path) may look like an error. Maybe knowing something
about the black-box (e.g. generally knowing that there is some caching
going on) makes this an non-issue, in real uses.

Dan Rosenberg

unread,

Nov 4, 2009, 11:14:57 PM11/4/09

to brown-cs...@googlegroups.com

Paper Title
BorderPatrol: Isolating Events for Black-box Tracing

Authors
Eric Koskinen, John Jannotti

Date
April, 2008

Novel Idea
The authors present BorderPatrol, a system designed to trace the paths of
requests through loosely-coupled distributed applicaitons composed of several
black-box components.

Main Result
BorderPatrol successfully reconstructs trace information for requests on
several real-world applications, such as Apache, Zeus, and PostgreSQL.

Impact
The techniques presented should provide a good starting point for implementing
tracing functionality in distributed settings.

Evidence
The authors evaluate the performance of BorderPatrol by testing it on live
systems, and determining the accuracy in reconstructing trace information, as
well as assessing overhead compared to un-traced applications. The evaluation
is clearly explained, but especially in assessing the accuracy of the traces,
more detailed and more complex examples may have been more convincing.

Prior Work
The authors expand upon previous tracing solutions, such as Pinpoint, Pip, and
Stardust.

Reproducibility
The project is freely available, its setup is clearly described, and the
experiments performed are relatively straightforward.

Criticism
Is there any way to optimize BorderPatrol so that there's not such a
significant performance hit in exec-bound workloads? Figures are a bit hard to
read, and the trace graphs especially could use additional clarification.

Questions/Ideas for Further Work
Relying on the LD_PRELOAD mechanism restricts BorderPatrol to Linux. How might
it be possible to implement a system like this on Windows (hint: overloading
DLLs on an application-basis is probably not a good idea)? In addition, it
might be desirable to be able to enable and disable tracing (and its associated
overhead) as needed, but dynamically loading a library that overrides libc
doesn't allow that. How could this be designed with an "off" switch that
doesn't require restarting the applications?

Andrew Ferguson

unread,

Nov 4, 2009, 11:38:35 PM11/4/09

to brown-cs...@googlegroups.com

Paper Title
"BorderPatrol: isolating events for black-box tracing"

Authors
Koskinen, Eric and Jannotti, John

Date
EuroSys 2008

Novel Idea
The novel idea of this work is that it does exact tracing of requests
by interposing a shim between library calls so that a collection of
protocol processors can log the message stream in a coordinated
fashion. By carefully managing the flow of messages in the system,
BorderPatrol can trace the system exactly. BorderPatrol also makes a
number of reasonable assumptions about how the black boxes are
programmed, namely that they are programmed using common software
design patterns.

Main Results
The BorderPatrol system and collection of protocol processors (HTTP,
FastCGI, PostgresSQL, X11, DNS, etc.) are presented.

Impact
Unknown.

Evidence
The authors trace the website dearinter.net using BorderPatrol, then
add confessional debugging to the software stack in order to confirm
BorderPatrol's traces. They also show that BorderPatrol can trace
closed-source software such as the Zeus webserver; while they can't
prove that the Zeus trace is 100% accurate, it does produce reasonable
results.

Prior Work
BorderPatrol builds upon prior work in tracing such as Pip, Magpie,
Traceback, and Project5 from HP Labs.

Competitive Work
PinPoint, Causeway, and X-Trace can be considered the competitive
works because they are also pervasive frameworks which are producing
traces, although not always through several layers.

Reproducibility
Yes! The BorderPatrol system is publicly available.

Question/Criticism
What can be done about the performance hits? Some of those are quite
heavy. Also, in the left-most benchmark in Figure 11, was BorderPatrol
also writing to the same disk? If so, that might have skewed the test
rsults

Ideas for further work
None.

Kevin Tierney

unread,

Nov 4, 2009, 10:48:13 PM11/4/09

to brown-cs...@googlegroups.com

Title: BorderPatrol: Isolating Events for Black-box Tracing

Author(s): Eric Koskinen, John Jannotti
Date: EuroSys 08

Novel Idea
BorderPatrol is a system that is able to obtain request traces through
black-box systems without instrumenting or modifying the underlying
code. Through some protocol specific knowledge combined with wrappers
for standard library functions BorderPatrol is able to reconstruct a
trace through various programs. This is made possible by several
assumptions about the underlying black boxes, that they are honest,
immediate, and independent.

Main Result(s)
The authors find that they tracing execution through various programs,
even when no source code is available, is indeed possible. They are
able to trace events through the Zeus webserver, a program with no
public source code available for instrumentation. The overhead
involved seems to be fairly low, although CPU bound workloads do not
seem to scale as well as other loads.

Impact
BorderPatrol seems like it could easily become a standard part of a
debugging toolkit for developers.

Evidence
The authors conduct a case study for dearinter.net as well as use a
test computer under "realistic workloads"

Prior Work
Besides approaches requiring access to sourcecode, BorderPatrol
improves upon methods of making traces using statistical methods
(which may have errors).

Reproducibility
Given access to BorderPatrol, yes.

Question
How should BorderPatrol be deployed? It is said to be production
environment ready, but do we really need such a program on all
production servers all the time?

Criticism
A 10-15% overhead does not seem as low as the authors suggest, and
border patrol does increase the latency in some workloads (like exec
bound workloads) enough that in systems requiring low latency,
BorderPatrol may only be able to be run part time or on some servers.

Ideas for further work
No

James Tavares

unread,

Nov 4, 2009, 10:45:21 PM11/4/09

to CSCI2950u

*BorderPatrol*

Paper Title: BorderPatrol: Isolating Events for Black-box Tracing

Author(s): Eric Koskinen, John Jannotti

Date: EuroSys ‘08

Novel Idea: BorderPatrol obtains precise traces of requests as they flow
through black-box system components by actively observing protocol
communications for any protocol for which a Protocol Processor can be
written for. Protocol Processors may operate by any means necessary to
intercept communications, but BorderPatrol typically uses library
interposition for general I/O and a kernel module for mmap’d I/O.
BorderPatrol’s novelty comes in that is able to obtain traces without
any application-specific instrumentation and with only with a minimal of
effort required in analyzing common protocols.

Main Result(s): The paper describes the BorderPatrol system, provides a
set of assumptions for its correct operation, presents two case studies
for its use, and details a set of experiments which illustrate
BorderPatrol’s low overhead.

Impact: Not sure, but certainly a far more interesting proposal than the
‘Project5’ paper as it provides for *precise* tracing as opposed to some
difficult-to-verify statistical trace.

Evidence: A series of micro-benchmarks are presented show overhead under
varying workloads. Performance is also evaluated against the two case
studies detailed in the paper. In most cases, latency overhead was
measured to be relatively low 10-20%. These results were quite
impressive considering the invasiveness of BorderPatrol, a system which
essentially monitors all traffic in and out of every fragment of every
application at every layer of a network. On the other hand, there were a
few situations where overhead was measured in the hundreds of percent,
namely for fork/exec workloads and serving static pages in Zeus.

Prior/Competitive Work: The paper details a number of alternative
approaches, including full-brown application instrumentation works such
as Magpie and TraceBack, pervasive frameworks such as Pinpoint, Causeway
and X-Trace, and probabilistic correlation such as the other paper
presented today from HP Labs, and Whodunit. The authors argue that
BorderPatrol provides precise tracing (as opposed to probabilistic
models) without requiring changes to applications or infrastructure (as
required in instrumentation and pervasive frameworks.)

Reproducibility: Much more than most; full source and starter
documentation is provided on the Brown CS website for the BorderPatrol
system itself. I’m not sure of the availability of the log data which
the authors replayed to generate their workload.

Question: 1.) It appears that matching up traces linked across network
nodes may rely on comparing timestamps. Assuming my assertion is
correct, my question is: how well is clock skew tolerated? 2.) Is there
an explanation for why BorderPatrol+Zeus outperformed a pure Zeus
configuration in Figure 14?

Criticism: Unless my printer is to blame, it was difficult to read the
dotted lines on Figure 12.

Future Work: While I think I know the answer already, I think it may be
interesting to look at the performance overhead of typical
application-level instrumentation techniques vs. BorderPatrol.

Rodrigo

unread,

Nov 5, 2009, 2:29:22 AM11/5/09

to CSCI2950-u Fall 09 - Brown

Review for Qiao

BorderPatrol
BorderPatrol uses a tracing technique called "active observation" to
trace a request and construct its causal path. By using protocol
processors BorderPatrol achieves precise request trace without the
need of application-specific instrumentation.
BorderPatrol works based on the assumptions that the internals of the
black boxes are honest, immediate, and independent.
The honest assumption says that if an input request and an output
request has the same identifier, they are derived from the same
original request.
The immediate assumption says that if requests go into a blackbox one
at a time, the input request and the immediate output request are
derived from the same original request.
The independent assumption says that the output requests of a blackbox
are the same whether they enter concurrently or sequentially.
Protocol processors implement active observation in a modular way.
They are light weighted and highly reuseable. BorderPatrol uses
library interpostion to pass input to protocol processors before
passing it on to blackboxes.There are two methods to construct linking
between modules: message witness and event isolation.
The case studies show that BorderPatrol works in real world
envrionment. The performance impact of BorderPatrol is studied using
micro-benchmarks.
Previous work either needed application specific instrument or
sacrificed precision of trace. The BorderPatrol presents a model that
understands blackbox distributed system using protocol specific
knowledge.
Idea for future work: Design diagnosis tools using BorderPatrol's
result as input.

小柯

unread,

Nov 5, 2009, 8:50:55 AM11/5/09

to brown-cs...@googlegroups.com

Paper Title: BorderPatrol: Isolating Events for Black-box Tracing

Authors:        Eric Koskinen
                  John Jannotti

Date:        2008

Novel Idea:
    Providing only protocol-related information in a complex distributed system, BorderPatrol - based on some fundamental assumptions which would be easily satisfied by basic software design - could trace requests between components.

Main Result:
    BorderPatrol is created and performed on many real applications. It's thus proved to work as a efficient tool, which only put little burden on existing system.

Impact:
    Many monitor and tracing tools requiring no knowledge about system implementation became a new trend. Therefore, BorderPatrol traces communication precisely based on protocol. In the future, these tools might try to perform tracing based on some general knowledge, which could help them gain better accuracy.

Evidence:
    Authors first state the assumption and requirements for BorderPatrol to work, and then get into details how the strategy and designs are applied. Finally, cases study and evaluation are presented.

Prior Work:


Competitive work:
    NetMedic

Reproducibility:
    Definitely, most detailed are listed, and authors are at Brown.

Question:
    Comparing with the other paper read today, the strategy used in BorderPatrol seems to have better accuracy. Is there any concrete comparison between these two?(They all take system components as black box, but with more information provided in BorderPatrol, it's reasonable to be more precise.)

Criticism:

Ideas for further work:

Spiros E.

unread,

Nov 5, 2009, 7:30:37 AM11/5/09

to CSCI2950-u Fall 09 - Brown

The paper presents BorderPatrol, a tracing framework for systems
composed of blackbox modules. The frameworks assumes knowledge of the
protocols that the modules use to communicate with each other. In
addition, it makes several assumptions about the way the blackbox
modules process requests and asserts that the assumptions hold for the
litany of components they wish to trace.

The performance overhead seems to be unacceptably high (10-15%). Add
to this the fact that tracing cannot be turned on or off without
restarting the services involved (clearing LD_PRELOAD), this does not
seem to be a system that one could deploy in a production system.
Instead, it seems as though this system would be relegated to the
development environment to help catch bugs during testing, or while
attempting to reproduce bugs that arise in production.

A little birdy told me that lighttpd doesn't fit into this model.
Specifically because it violates one of the assumptions of the paper.
(immediacy?) Can we discuss this? And can we discuss the broader
applicability of these assumptions?

joeyp

unread,

Nov 5, 2009, 9:03:32 AM11/5/09

to CSCI2950-u Fall 09 - Brown

BorderPatrol: Isolating Events for Black-box Tracing

Koskinen and Jannotti

EuroSys 2008

This paper presents a tool for tracing requests through systems that
contain black box components. It makes a strong distinction between
components internal behavior versus the protocols they use to
communicate. This is an important distinction, and gives a very clear
definition of a black box component as one that adheres strictly to a
protocol whil leaving its internals unspecified. Without the
specification of protocol a black box effectively black-boxes the
things connecteded to it, since they don't know what protocol they are
inputting or receiving.

Based on the results of this paper, it is clear that a wide variety of
systems and components can be traced in this fashion. With clear
assumptions about the nature of the components in the system, it is
possible to perform this tracing at the protocol boundaries in the
system. The distinction of protocol from component internals is
important here, because it lets us reason about the way these
components communicate while ignoring internals. This makes the order
of effort in tracing somewhat proportional to the number of protocols
in use rather than the number of different components.

Here's a situation I'm confused about. A common approach to Rails
applications is to have a load-balancer that round-robins requests to
multiple application servers on the same machine. This is because
mongrel, for example, is single-threaded, so running multiple
instances acheives concurrency. Say the load balancer/HTTP server
receives a request and directs it to a mongrel A, and this request is
going to take a while. Then it receives another request, for the
login page or something quick, which gets directed to mongrel B. Will
this break the model of immediacy for these two requests? I'm
imagining situations like the A mongrel makes DB queries before and
after the B mongrel makes its requests. Does this fall under the
category of work queues, where the load balancer has an implicit
queueing? Or does BorderPatrol accurately track the connections to
mongrels and therefore not have a problem?

As far as further work goes, it would be interesting to see what
happens at the boundary between a black box tracing system and an
instrumented tracing system. What extra information can the black box
system leverage at the boundary, and what can it provide to a system
based on instrumentation that couldn't otherwise get any useful
information from the uninstrumented components?

Reply all

Reply to author

Forward