*X-Trace*
Paper Title: X-Trace: A Pervasive Network Tracing Framework
Author(s): Rodrigo Fonseca, Geroge Porter, Randy H. Katz, Scott Shenker,
Ion Stoica
Date: NSDI �07, April 2007.
Novel Idea: X-Trace presents a framework which is able to collect traces
across both network nodes and network layers, capturing the actual path
that a particular request takes through the network via the use of
in-band trace data. Assuming no errors during data collection (which is
an out-of-band process), the precise causal path (directed graph) that
the original request took can be recovered.
Main Result(s): The paper lays out a set of design principles (trace
data in-band, report data out-of-band), discusses the format of X-Trace
messages and the operations performed at each relay point, presents
performance data (see �Evidence�), and outlines a number of typical
usage scenarios including a web request and DNS lookup, a web hosting
service, and an overlay network (I3).
Impact: In the years since its release, the X-Trace framework has found
its way into multiple projects, including Coral CDN, Hadoop, Oasis, and
Thrift.
Evidence: The authors use Wikipedia as a motivating example of a
distributed web service with many layers. A failure in a service of this
size and complexity can be difficult to diagnose. As an example, the
authors discuss how a change made to a Wikipedia page may not appear to
the user who made it due to latency in any one of the caching layers
present in Wikipedia�s architecture, and claim that a pervasive tool
such as X-Trace is necessary to pin-point errors across multiple layers,
protocols, and systems. Some performance information is provided in
section 3.4, including pushDown() / pushNext() time complexity (measured
at less than a microsecond), and overhead in an Apache implementation
(measured at 15%).
Prior/Competitive Work: The authors cite a number of relevant works,
ranging from complimentary works such as Pip to a number of projects
which use alternative trace collection methods such as inference
techniques to reconstruct causal paths. X-Trace differs from the
previous work for the reasons outlined in �Novel Idea� above.
Reproducibility: Performing research on X-Trace is super easy as X-Trace
is an open source project available for free download. Exactly
reproducing the author�s performance experiments on Apache would be
difficult without further consultation as little detail is provided.
Question: 1.) It is clear from the paper that X-Trace can be used as a
fault diagnoses tool. How could X-Trace be expanded to include
performance monitoring? 2.) The 15% performance degradation on Apache
mentioned in section 3.4 seems high for simple operations like
pushDown() and pushNext() � could the performance hit be attributable to
an inefficient logging mechanism?
Criticism: The way I understood the protocol, N layers would require N
distinct, concurrent X-Trace headers in order to capture the directed
graph information. While a good description was given of each of the
fields in the X-Trace header, a more detailed discussion of expected
message size overheads might have been useful. Then again, perhaps it is
not a concern given that its use is typically limited to the rare case
when a problem actually occurs?
Future Work:
1.) 1.) Perhaps it would be possible to introduce authentication
mechanisms into X-Trace as a means of dealing with security concerns. An
�Authenticated X-Trace� could prevent unauthorized X-trace requests,
helping to prevent usage of the X-Trace platform to propagate denial of
service attacks. Of course, the overhead imposed by an authentication
scheme, especially if it depends on public key encryption, could far
outweigh the overhead of X-Trace itself.
2.) 2.) The paper lists scoping as future work; I�m sure this has been
considered, but perhaps a simple TTL field applied to layer depth could
do the trick for dynamic scoping.