BorderPatrol: Isolating Events for Black-box Tracing
Author(s)
Eric Koskinen, John Jannotti
Date
EuroSys, April 2008
Novel Idea
Modifying/parsing the input stream in order to provide event isolation
and improve trace correlation.
Main Result(s)
The paper presents BorderPatrol, a system that intercepts library
calls and, by identifying message boundaries through custom-designed
protocol processors (one per protocol, not per application), logs
protocol events, identify message "witnesses" (matching of
request/response identifiers) and controls the data flow to the
modules to identify its responses given any particular protocol event
(isolating events and improving the correlation).
Impact
The technique improves the inference on the causality of message
traces using the ideas described in the previous two sections, and it
is applicable when we can intercept the points of the system where
input is delivered. Important: they make some assumptions about the
system components (they call honesty, immediacy, and independence).
Evidence
In my opinion, they support their argument by 4 points:
(1) The authors describe how their techniques were implemented, which
provides evidence about the viability of the approach; (2) they
describe common approaches that real-world programs handle multiple
concurrent requests, and say how BorderPatrol can attribute actions to
the correct generator in the most common cases; (3) they show that
their approach has acceptable overhead for the common cases of
workloads (they have bad results on exec-bound workloads, but they
imply that these cases are uncommon) in "micro-benchmarks"; (4) they
show 8-16% overhead in one real-case (note: they have 2% and 96% in
two tests in the other real case, but they have some comments about
the 96% overhead result).
Prior Work + Competitive work
The related work section is extensive. They mention systems that use
application-provided information (Magpie, TraceBack, WebSphere, and
others), and also ones that require inserting metadata on the
applications to perform the inspection (Causeway, X-Trace).
Reproducibility
I think that their "micro-benchmarks" can be reproduced (they give an
idea of what is necessary - it appears reproducible). The BorderPatrol
system is available online; the "control" systems that they used are
obtainable online. I think that the case studies can also be
reproduced, but with much more work (for example, talking to people
responsible by "dearinter.net" to obtain logs and to validate traces).
Interestingly, setting up a custom system of common components would
facilitate the reproducibility, but it would not be "real"). :)
Questions + Criticism
Playing with isolation is a good idea, and the assumptions that they
make appear reasonable at first sight. I'm worried about (1) the task
of modifying all points of input delivery; (2) the real complexity of
protocol processors.
For (1): Considering increasingly complex systems, up to what point
making changes in the input delivery points of the system is viable?
For (2) Isn't it the case that most black-box components use
"black-box messages" to communicate? Although interoperability implies
in standards, the components could still, say, cypher messages to
provide confidentiality. The protocol processors can still be a
"man-in-the-middle", but it doesn't look like they are 100-line
processors anymore.
With concerns (1) and (2), the matter is viability. Isn't easier to
provide hints on the application?
There is another question in the following section (I think it makes
more sense to put it there).
Ideas for further work
In view of the previous section, they could improve/detail better in
an article their infrastructure to building protocol processors.
Another thing, perhaps, is using fast-prototyping and library-rich
languages (python, ruby) to build the protocol processors. [Another
Question] Is there any supporting infrastructure for this languages
already?