Reviews: X-Trace

Rodrigo Fonseca‏

לא נקראה,

8 בנוב׳ 2010, 19:23:448.11.2010

עד CSCI2950-u Fall 10 - Brown‏

Please post your reviews of X-Trace here. Please note that I don't
mind criticism, and your grade won't be affected by negative comments.
Also, this means you don't have to be extra nice, unless you really
like the paper :)

Thanks,
Rodrigo

Dimitar‏

לא נקראה,

8 בנוב׳ 2010, 23:36:058.11.2010

עד CSCI2950-u Fall 10 - Brown‏

X-Trace: A Pervasive Network Tracing Framework
Authors Rodrigo Fonseca ,George Porter, Randy H. Katz, Scott Shenker,
Ion Stoica

Date:April 2007

Novel Idea: There are many diagnostic tools for debugging distributed
components , but most of them
are limited to a particular protocol. These tools are unable to
diagnose subtle interactions between
protocols or provide a comprehensive view of the system's behavior.
The paper presents X-trace which
focuses of tracing multiple applications, at different network
layers , and across administrative boundaries.

Main Result: X-Trace achieves integrated tracing by inserting X-Trace
metadata in the requests, and
all network operations resulting from request carry the same
identifier. The three main design
principles for X-Trace are:
1) trace request are sent in-band
2) The collected trace are sent out-of-band
3) the entity that requests tracing is decoupled from the entity that
receives the trace report

Impact: X-Trace can be useful in detecting problems across multiple
administrative domains and
network protocols

Evidence: The authors provide examples of how X-trace could be used
to trace Web requests and DNS queries.

Prior Work: Traceroute , Splunk, Hussain et al and others. The main
difference between them and
X-Trace is that the later traces across devices and layers.

Reproducibility: I think the work is reproducible because the
implementation and design are clearly
explained.

Criticism: A comparison between different diagnostic tool and X-trace
would have enhanced author's
arguments.

Hammurabi Mendes‏

לא נקראה,

8 בנוב׳ 2010, 23:36:348.11.2010

עד brown-csci...@googlegroups.com‏

Paper Title

X-Trace: A Pervasive Network Tracing Framework

Authors

Rodrigo Fonseca, George Porter, Randy Katz, Scott Shenker, Ion Stoica

Date

NSDI'07 - Networked Systems Design and Implementation, April 2007

Novel Idea

Describe and implement a tracing framework that works across different
layers of network protocols providing holistic tracing information.

Main Results

The authors describe and implement X-Trace, a tracing framework that
inserts metadata at various layers of the communication channels, and
is thus able to provide an integrated tracing of events. The framework
respects administrative domain restrictions when generating results.

Impact

The X-Trace framework provides the ability to tackle multiple network
layers and multiple applications in a single "tracing context".

Evidence

The authors start describing the framework design principles,
justifying some architectural features. The architecture itself,
including aspects as metadata propagation and issues involving report
generation, are described with sound arguments.

The paper provides some microbenchmarks on the previously mentioned
aspects and some usage scenario analysis, showing appropriate results.
The usage scenario section provides three examples, and the paper
actually discusses how the system "fits" in the sense of whether it
would indeed provide relevant information in each case (including
information that reveals failures). The "fitting" of a multi-layer and
integrated approach to tracing is particularly well discussed in the
third usage scenario (see Questions + Criticism section below).

Prior Work

They mention network tools and instrumentation protocols such as
traceroute and SNMP. They also cite a work by Hussain et al, focused
on network tracing, and also a work by Kompella et al, focused on
tracing state changes (is it?) in different network layers, and not on
the data flow like X-Trace.

Competitive Work

They mention Splunk, but argue that its log-based approach could
possibly not reveal proper event correlation. They also cite Pinpoint,
but they say this system focuses on inferring fault causality by
analyzing a J2EE-based data flow.

Pip allows its users to express how the system should behave, which is
compared to its actual behavior. Magpie correlates information
obtained at various levels and infers event causality. The paper
argues that Magpie is mostly focused on a single system or distributed
systems that are instrumented in a particular manner.

Finally, they mention two projects, AND and Constellation, as projects
that use inference techniques to produce data flow diagrams.

Reproducibility

The microbenchmarks on metadata propagation is reproducible, as well
as the report infrastructure testing, as the system is available
online.

The usage scenarios require more work if one decides to reproduce
them. I think some details are left out, but I think the purpose of
the section is to analyze the "fitness" of the framework.

Questions + Criticism

[Criticism] It is a really nice paper (honestly). I liked very much
the fact that the usage scenarios section analyzes the "fitting" of
the system, and how relevant information could be generated under
particular situations (particularly, in the third example, the process
vs host failure cases). When proposing a framework, a technique, this
is the most important metric indeed.

I have, though, a bunch of questions and some comments:

(1) [Question] In sizable tracings, how big is the effect of
collisions in the unique() function? Is it appropriate for these
cases?

(2) [Criticism] I think the packet sniffing application that sends
reports is actually something very interesting, and I think it would
deserve more discussion in the paper. I believe so because it directly
affects the feasibility of applying the system in bigger cases.
[Question] Are other protocols implemented besides IP and TCP?

(3) [Questions + Criticisms] How feasible is to use X-Trace in a
reasonably big distributed system, changing multiple applications? Is
there any usage scenario evaluation suggesting the framework
"scalability"?

There are more [Questions] in the following section (it makes more
sense there, again).

Ideas for Further Work

Doing something analogous to the packet sniffing application for
report generation, but now to generate metadata.

The idea is getting input from sockets in the kernel, and try to
identify an application-level protocol. If it is a widely-known, say
HTTP, modify the data flow including X-Trace metadata in the HTTP
header. Also, generate the metadata in the lower level network stack.

[Question] Does it appear viable? (application throughput and latency,
protocol processing inside the kernel, etc)

It is crazy, but could be awesome if we had a big system that
communicated using standard protocols in a system where we can modify
the kernel/runtime system (like the BSDs and Linux).

Visawee‏

לא נקראה,

8 בנוב׳ 2010, 21:45:038.11.2010

עד CSCI2950-u Fall 10 - Brown‏

Paper Title :
X-Trace: A Pervasive Network Tracing Framework

Author(s) :
Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, Ion
Stoica

Date :
4th USENIX Symposium on Networked Systems Design & Implementation
(NSDI’07), April 2007

Novel Idea :
A tracing framework that can trace multiple applications, at different
network layers, and across administrative boundaries.

Main Result(s) :
X-Trace is able to reconstruct a task tree (on X-Trace-enabled
devices) which gives the user a comprehensive view of what network
operations were executed as part of a task. According to the
performance test, X-Trace decreases the system’s throughput roughly
15%.

Impact :
(1) Helps users in analysing distributed systems in order to find
faults in the system or to improve performance of the system.
(2) Helps users in developing a new algorithm for distributed systems.

Prior Work :
There are a number of works focusing on monitoring network status,
obtaining data from many devices and layers. However, those works were
about trying to obtain snapshots of the system as a whole. X-Trace, on
the other hand, aims to trace the actual paths taken by data messages
across many devices and layers.

Evidence :
The authors deployed and evaluated X-Trace in two concrete scenarios
(1) A web hosting site: X-Trace is able to reconstruct a task tree.
When the authors introduce several faults into the system, X-Trace is
able to help the authors in pinpointing the issues.
(2) An overlay network: X-Trace is able to reconstruct a task tree on
three application layers - SNP, I3, and Chord. Again, the authors
inject several faults into the system, and X-Trace is able to pinpoint
the issues.

Reproducibility :
The results are reproducible. The experiments are well explained.
However, a lot of effort is needed in order to do that because there
are a lot of modules that needed to be modified to support X-Trace.

Criticism :
This framework will be very powerful if we can make it as a standard
and have many parties adopting it.

On Nov 8, 7:23 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Jake Eakle‏

לא נקראה,

8 בנוב׳ 2010, 19:34:388.11.2010

עד brown-csci...@googlegroups.com‏

Paper Title

X-Trace: A Pervasive Network Tracing Framework

Author(s)	Rodrigo Fonseca George Porter Randy H. Katz Scott Shenker Ion Stoica
Date	2007
Novel Idea	Multi-level, cross-AD tracing via in-band metadata inserted into communication protocols.
Main Result(s)	If all relevant layers and nodes of the network implement X-Trace, it can provide a full, multilayered trace of any request that asks to be traced. Each node and protocol must be extended to recognize incoming messages with piggybacked XTrace metadata, add to that metadata, and write local reports and send them to a database. An XTrace implementation for a node provides the primitives pushNext and pushDown, which propagate the XTrace metadata to the next node in this node's layer and to the next node in a lower layer (if the outgoing request is built on such a layer and it is also XTrace-enabled) respectively. Anyone on an XTrace-enabled network can initiate a trace request by inserting an XTrace metadata into some message on the network. However, the reports generated thereby are not necessarily returned to the person who initiated it. In fact, if the network crosses ADs (but all nodes still implement XTrace) the owners of all involved ADs' XTrace implementations will receive separate batches of reports.
Impact	Dunno really, but possibly inspired Tracelytics, a local company where I might possibly get an internship this winter maybe?
Evidence	Tests of the reporting framework with ab showed a 15% decrease in system throughput. They give a few simple examples of what an XTrace tree looks like under various failure conditions, and how it can be used to detect the source of issues.
Reproducibility	Not much. No code, bare bones algorithms. They mention the challenges associated with implementing XTrace for protocols with complex message causality semantics, but don't describe any of the strategies they used to overcome them successfully in very great detail.
Question	XTrace metadata contains an extensible options field. Does this mean it only works with protocols that themselves contain such a field? What about protocols with maximum message lengths? What about protocols with no free space at all for XTrace to piggyback on?
Criticism	Especially given the cross-AD nature of XTrace, I feel that the concern over malicious injection of packets with XTrace metadata is understated. The performance section indicates a slowdown of 15% while processing a lot of XTraced messages; if the owner of one AD makes the decision to include XTrace on a machines that communicate with another, what is to prevent the other from causing a similar performance hit on the first at will?
Ideas for further work	My instinct is that one of the highest-priority extensions to XTrace is to make it work with a wider variety of non-tree call graph structures. To do this, the modification to a node must be more significant - for instance, to provide Xtracing capabilities for a server adjudicating a quorum, the XTrace code must be able to detect than an incoming vote is part of a quorum, generate a report about the outcome of the vote, and attach XTrace metadata to the outgoing message(s) reporting the result - perhaps just the one heading back to the node that originally voted, or perhaps a message reporting the outcome to some other specific node. It's not always clear what semantics are desired -- if the XTraced node is on the losing side of a quorum vote, is it causally connected to the outcome of that vote? Do we want to XTrace the responses to every node that voted? If the vote has many inputs but only one output, what happens when two inputs are both XTraced? There is a lot of work to be done here, and doing it well could potentially lead to tools that help diagnose problems in the confusing call graphs where they are most needed.

--
A warb degombs the brangy. Your gitch zanks and leils the warb.

Zikai‏

לא נקראה,

8 בנוב׳ 2010, 23:16:018.11.2010

עד CSCI2950-u Fall 10 - Brown‏

Paper Title: X-Trace: A pervasive Network Tracing Framework

Author(s):
Rodrigo Fonseca George Porter Randy H. Katz Scott Shenker Ion Stoica

(UC Berkeley)

Date/Conference: NSDI 07

Novel Idea: (1) Construct traces of causal paths by inserting metadata
with task identifiers in requests and propagating them along the
causal paths that handling the requests throughout different services
in multiple network layers involved.
(2) Propose a reporting scheme in which the entity that requests
tracing is decoupled from the entity that receives trace reports so
that cross-administrative-domain traces are possible while respecting
policies of each AD.

Main Results: Design, implement and evaluate X-Trace, a cross-layer,
cross-application, cross-administrative-domain tracing framework
designed to reconstruct the user’s task tree.

Evidence: In part4, authors implement X-Trace and deploy it in three
different scenarios: a simple web request with recursive DNS queries,
web hosting site and an overlay network. Authors intentionally inject
faults into the systems and use task tree generated by X-Trace to
figure out the bugs.

Prior Work: Automate propagation of metadata by instrumenting
concurrency libraries and runtime environments [9, 20, 8]
Libasync[17]
EDNS0 [26]

Competitive Work: Pinpoint [9], Pip[20], Magpie[6], AND and
Constellation projects [4]

Reproducibility: X-trace is available on http://www.x-trace.net/wiki/doku.php.
For extensions to protocols which are not implemented currently, the
paper’s illustrations are detailed enough to come up with a new
implementation. So the system and its evaluation are reproducible.

Question: (1) As authors point out, when X-Trace is only partially
deployed, the ability to trace those parts of network is impaired
partly to entirely. Deploying X-Trace in all network layers is
definitely expensive considering updates needed for huge number of
routers and servers. So what is the cost-effective point when we can
achieve useful information while investing a relatively small amount.
A question additional to this is how do we know where to deploy X-
Trace like which routers and servers should we update? Do we need some
other trace tools to come up with some coarse traces to know where to
deploy?
(2) In the networking world, networking and host failures are
inevitable. How does X-Trace come up with a useful task tree in a
faulty environment?

Criticism: It is interesting to compare X-Trace and Pip considering
the fact that both systems need modifications to the target systems in
order to come up with valid results and both modifications are
sometimes costly.

X-Trace does not need deep knowledge into design and implementation of
target distributed systems as Pip does because it works on networking
protocols which are mediums between distributed entities. This
determines two systems are good at handling different types of bugs.
Pip is able to handle both structural and performance bugs while X-
Trace can only handle previous ones. Pip can find bugs inside each
distributed host like the one in FAB while X-Trace focus more on an
architecture level. Pip’s results allow authors to identify bugs,
debug and optimize existing system while X-Trace is good only for
identification.

X-Trace is a cross-layer, cross-application and cross-administrative-
domain framework while Pip has no sense of layer and it is single-
application and single-domain system.

X-Trace has better extensibility then Pip. Because Pip’s expectation
and code annotation way need deep knowledge into design and
implementation, it is hard when there are multiple applications that
come from different vendors and go across different administrative
domains while X-Trace can be some kind of standard followed by all
networking service providers. Furthermore, if X-Trace include more
information in metadata like latency, it may handle more and deeper
bugs.

On Nov 8, 7:23 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Abhiram Natarajan‏

לא נקראה,

8 בנוב׳ 2010, 19:26:368.11.2010

עד CSCI2950-u Fall 10 - Brown‏

Paper Title: X-Trace: A Pervasive Network Tracing Framework

Author(s): <GUESS WHO!>, George Porter, Randy H Katz, Scott Shenker,
Ion Stoica

Date: 2007, NSDI

Novel Idea: Implementation of a framework that provides a
comprehensive view of service behaviour of a system.

Main Result(s): X-Trace, a tracing framework that provides a
comprehensive view of the systems which adopt it; it works across
diverse layers of operations and applications.

Impact: X-Trace is a framework that provides a comprehensive interface
that allows users to monitor the behaviour of a network

Evidence: The authors deploy X-Trace in a number of usage scenarios
and provide evidence that X-Trace makes diagnosing the network
simpler.

Prior Work: Traceroute, Splunk, Systems by (a) Hussain et al. (b)
Kompella et al (c) Aguilera et al., ARM, Pinpoint, Pip, Magpie,
Constellation projects, Causeway, SDI

Competitive Work: Once again, the framework is unique in itself and
one does not really need to specifically compare it with other
frameworks. The authors do deploy it in a diverse set of scenarios and
give sufficient evidence of X-Trace's usability.

Reproducibility: With Prof. Fonseca an email away, one could probably
reproduce it in good measure.

Question: Can we see a live instance of it working?

Criticism: As mentioned in the paper itself, the fact that the network
elements need to be meddled with is not ideal. However, I cannot
imagine how one could obviate that necessity that easily. I liked the
fact that they were open in discussion about the framework's
shortcomings.

On Nov 8, 7:23 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Joost‏

לא נקראה,

8 בנוב׳ 2010, 23:29:038.11.2010

עד CSCI2950-u Fall 10 - Brown‏

Paper: X-Trace: A Pervasive Network Tracing Framework
Authors: Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker,
and Ion Stoica
Date: NSDI April 2007
Novel Idea: This paper presents a new form of tracing a task from
initial request to completion across the internet. This trace is
completed on all levels of the path which are XTrace enabled.
Main Results: The paper presents a working model that is implementable
across multiple platforms that catalogs the paths and routes of
individual tasks. This is accomplished through sending additional
metadata from each node along the route to a unique collector of the
trace information which can then reconstruct a layered graph of the
traffic. This layered graph can be used to localize the point of
failure when a request fails to complete as expected, and is thus a
useful error log.
Impact: The protocols presented in the paper, and the framework laid
out seem to have been used quite a bit since publication. At first
glance the methods seem to be limited to Flash and opensource run
servers/domains.
Evidence: The authors described in detail the manner of implementation
and ran through a set of defined failures and demonstrated how the
subsequent XTrace graphs correctly assessed the point of failure in
the network.
Prior Work/Competative Work: Pinpoint, Pip, Cisco Systems’ Netflows,
Causeway, SDI just to name a few
Reproducibility: Since most of the paper was the creation of the
architecture of the system, and there were few actual experiments run
on the system, creating a similar structure would be possible.
Question/Criticism: How harmful is AD restriction of sending
information to an external recording log to an overall routing failure
for say an ISP, or is the fact that the fault can be traced a
particular domain (even though there is no data from that domain)
enough for the ISP debuggers to conclude that the failure was not in
their domain? Also there were talks of scoping and sampling, but
given the nature of these requests, in particular when not in a given
domain, how reliable are these metrics for actually determining the
location of fault in a system, or even catching it for that matter?

On Nov 8, 7:23 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Matt Mallozzi‏

לא נקראה,

8 בנוב׳ 2010, 21:24:238.11.2010

עד brown-csci...@googlegroups.com‏

Matt Mallozzi

11/9/10

Title:

X-Trace: A Pervasive Network Tracing Framework

Authors:

Fonseca, Porter, Katz, Shenker, Stoica

Date:

2007

Novel Idea:

Supplementing network protocols at multiple layers of the stack to contain

metadata, which allows each layer at each node in a networked system to

log data about actions that are causally related to the original task. This

works across applications and in multiple administrative domains.

Main Results:

A tracing system that logs relevant data about network task execution to a

location particular to each administrative domain, and through offline

processing constructs a task tree for each top-level request. These task

trees can become richer through the cooperation of multiple administrative

domains, but the decision of how much to share rests in the hands of the

individual domains rather than the person who initiated the traced request.

Impact:

This could have a large impact on how networks and the distributed systems

utilizing them are maintained and debugged. However, the modification of

all communication protocols is very prohibitive, so the impact of X-Trace

will likely be much smaller than it should be.

Evidence:

The authors provide evidence that X-Trace does not introduce an undue

overhead into the system being investigated (about 15% decrease in

throughput, which seems to be fairly standard for tracing/logging in a

network). There is also some evidence provided to support the claim that

X-Trace is useful to aid in debugging.

Prior Work:

The Application Response Measurement (ARM) project uses a similar approach

to X-Trace, but with an aim of diagnosing performance problems in

applications rather than performance problems and faults in multiple network

layers.

Competitive Work:

Many other systems try to evaluate network health and performance as whole,

while X-Trace focuses on tracing individual request paths through the

network. No other system seems to have nearly the support that X-Trace does

for tracing requests over multiple ADs while allowing administrators of

different ADs to maintain their own privacy and easily cooperate when

desired.

Reproducibility:

The paper contains a good description of the relevant implementation

details, although it would probably be easier for someone with more

networking background to implement. Unfortunately I am on a plane right now,

so I cannot make my usual comment about whether it is open source or not

(unless I remember to check online before turning this in).

Question:

The trace reporting mechanism seems to be dependent on IP, which may be a

protocol that is being debugged by the system (perhaps someone is testing a

new implementation). Does this reliance hurt the results in such a

situation, or could the system operate under the assumption that the version

of IP under scrutiny either successfully sends packets or it doesn't, as in

it does not corrupt anything along the way?

Criticism:

There are a few pieces of the evaluation that could use improvement. First,

the paper mentions the downsides of trace reports being lost in transit, and

that no report loss was observed during testing, but the system/network load

at which reports start to be lost as well as the maximum report loss rate

before result degradation are not tested.

Second, the evidence that X-Trace provides useful debugging information is

not as strong as possible, since the diagnosis seems to be undetaken by

those who generated the faults - a better approach would have been to bring

in distributed system administrators and developers and have them diagnose

faults in the system, some with and some without X-Trace. This weakness in

the evidence presentation is known to the authors.

Third, there is no mention of how much space all this reporting data takes

up. If it is stored in a Postgres SQL database, the storage size and the load

on the database server could start to become prohibitive as the system

scales. The sampling and scaling techniques mentioned would help with this,

but it is unknown if this is a problem or how bad such a problem would be.

Ideas For Further Work:

The human side of the evaluation mentioned above. Or a thorough test of how

far the system can scale before trace report loss becomes a problem.

Tom Wall‏

לא נקראה,

8 בנוב׳ 2010, 20:31:258.11.2010

עד CSCI2950-u Fall 10 - Brown‏

X-Trace: A Pervasive Network Tracing Framework

Rodrigo Fonseca George Porter Randy H. Katz Scott Shenker Ion Stoica

NSDI 2007

Novel Idea:
X-Trace tracks requests across protocols and between transport layers.
They do this by embedding trace information in the request, so that
the trace follows exactly along the data path. The resuling trace
data provides an easy way to track the causal relationships between
the various hops in an application request.

Main Result:
They show that X-Trace can provide very detailed tracing information
across layers and protocols, providing a useful resource for
diagnostics and debugging.

Impact:
It is hard to judge how much this will catch on because of the
requirements for using it. However, it does seem like it could be an
invaluable resource if the organization doing the trace has complete
control over the AD(s) involved and their application was designed to
use it.

Evidence:
They provide a few (quite different) scenarios in which X-Trace can
provide detailed information that other tools couldn't offer.

Reproducibility:
They did a good job of explaining how X-Trace works at a high level.
The details for each individual protocol weren't there, but the paper
at least gives an explanation of what is required of a protocol to
support X-Trace. Using these requirements it wouldn't be too
difficult to re-implement a particular protocol.

Prior/Competitive Work:
Many commercial tools provide enterprise wide tracing across
protocols, but they primarily rely onl tools like traceroute and SNMP.
Splunk gathers centralizes various logs and seems similar to X-Trace's
reporting mechanism. There are numerous other systems that aim to
provide a wide view of distributed system, though none can really
display the causal relationships between the various hardware and
protocols like X-Trace.

Questions:
The fact that X-Trace requires modifications to software and protocols
might make people reluctant to give X-Trace a try on their system.
Has it been widely adopted?

It is mentioned how ADs have complete control over their own
information and they can expose only what they choose to a report. It
seems like an AD would either not participate at all or disclose
everything it has to offer. When would this not be the case?

Criticism:
The paper provided a good overview, but it did not really address most
of the interesting questions and concerns. I was more interested in an
evaluation, analysis of the impact of ADs beyond one's control, issues
with report generation, basically all the stuff in section 6 :)

Section 3.4 on performance was a little weak. Their test of pushNext()
may be able to handle 1.4 million packets per second, but how many
routers run at 3.2 GHz?

Future Work Ideas:
It was mentioned here and in class project discussions that much work
can be done to improve the reporting infrastructure. One such example
that is mentioned in the paper is that for dynamic configuration of
the scope of a trace. Another big improvement would be to add support
for protocols whose traces tend to form graphs rather than trees.

On Nov 8, 7:23 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Siddhartha Jain‏

לא נקראה,

8 בנוב׳ 2010, 20:24:078.11.2010

עד brown-csci...@googlegroups.com‏

Name:Siddhartha Jain

Title:
X-trace

Novel Idea:

Associating X-trace metadata with a unique task identifier and inserting it in a request such that the metadata is propagated to the lower layers through protocol interfaces and through recursive requests triggered by the original one.

Main Results:

The format of X-trace metadata is described in detail. The framework including propagation to lower layers and the task tree construction is described. Providing support for protocols for carrying X-trace metadata is described.

Evidence:

The task trees for a few component failures are shown. A few usage scenarios are discussed. Some analysis of how the network throughput is affected by the extra metadata is given.

Prior Work:
Prior work in traceroute, Splunk. ARM inserts identifiers in transactional protocols as it targets the application layer.

Reproducibility:
Open source! Enuff 'said

Ideas for Further Work:

How useful would such information be in systems like NetMedic? It would help identify

very well what requests are related and what are not and it seems could be very useful

to diagnose problems.

On Mon, Nov 8, 2010 at 7:23 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

James Chin‏

לא נקראה,

8 בנוב׳ 2010, 23:57:278.11.2010

עד CSCI2950-u Fall 10 - Brown‏

Paper Title: “X-Trace: A Pervasive Network Tracing Framework”

Authors(s): Rodrigo Fonseca, George Porter, Randy H. Katz, Scott
Shenker, Ion Stoica

Date: 2007 (NSDI ‘07)

Novel Idea: This paper presents X-Trace, a tracing framework that
provides a comprehensive view of service behavior for diagnosing
complex systems issues. Current network diagnostic tools only focus
on one particular protocol layer, and the insights they provide on the
application cannot be shared between the user, service, and network
operators. X-Trace is a cross-layer, cross-application tracing
framework designed to reconstruct the user’s task tree. The trace
data generated by X-Trace is published to a reporting infrastructure,
ensuring that different parties can access it in a way that respects
the visibility requirements of network and service operators.

Main Result(s): The authors developed and evaluated X-Trace in two
concrete scenarios: a web hosting site and an overlay network. They
found that with X-Trace, they were able to quickly identify the
location of six injected faults. These faults were chosen because
they are difficult to detect using current diagnostic tools. The
authors also considered the scenario of web requests and recursive DNS
queries.

Impact: X-Trace aims to help users and system administrators find
problems faster than using ordinary tools or problems that they
wouldn’t be able to find otherwise. It enables people to diagnose
complex system problems in ways that haven’t been done before, as it
is truly comprehensive.

Evidence: The authors implemented and described examples and usage
scenarios to provide an indication of the usefulness of X-Trace in
diagnosing and debugging distributed systems. However, the ultimate
measure of success for X-Trace is when it can measurably help users
and system administrators find problems faster than using ordinary
tools or problems that they wouldn’t be able to find otherwise.
Unfortunately, such an analysis was beyond the authors’ means for this
paper, but they are working on moving in this direction.

Prior Work: There has been much prior work on the study of application
behavior, networking monitoring, and request tracking. However, X-
Trace is unique, as its focus is on tracing multiple applications, at
different network layers, and across multiple administrative
boundaries.

Competitive Work: A number of tools focus on monitoring network status
and aggregating data from many devices and layers, including
traceroute and Splunk. Other competitive work includes the
Application Response Measurement (ARM) project, Pinpoint, Pip, Magpie,
the AND and Constellation projects, Causeway, and SDI. However, X-
Trace is different in that it captures causal connections between
requests at different layers, recovers the task trees associated with
multi-layer protocols, and produces deterministic traces of individual
task executions that are useful for examining their individual
characteristics.

Reproducibility: X-Trace is publicly available, so the findings appear
to be reproducible if one follows the testing procedures outlined in
this paper.

Question: What’s the status of this project? Has X-Trace been tested
in industrial applications?

Criticism: Unfortunately, a “real-world” analysis was beyond the
authors’ means for this paper, but they are working on moving in this
direction. Also, I noticed one minor typographical error: in the last
sentence of section 7, “metadaa” should be spelled “metadata.”

Ideas for further work: Deploy and test X-Trace in the industry
somewhere; use the data generated by X-Trace’s instrumented systems
for new and existing algorithms.

On Nov 8, 7:23 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Basil Crow‏

לא נקראה,

8 בנוב׳ 2010, 22:04:148.11.2010

עד brown-csci...@googlegroups.com‏

Title: X-Trace: A Pervasive Network Tracing Framework

Authors: Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, Ion Stoica

Date: NSDI 2007

Novel idea: X-Trace is a network tracing framework which traces causal paths between different network layers and administrative domains.

Main results: The authors present instrumentation libraries for applications in C, C++, Ruby, and Java. There is also a daemon process which collects and stores reports, reconstructs the causal paths, and displays visualizations.

Impact: X-Trace might enable sysadmins to find problems faster than they would using ordinary tools.

Evidence: The authors deployed X-Trace in a hosted web service and an overlay network, then injected six faults which would normally be difficult to diagnose. They quickly identified each fault using X-Trace.

Prior work: X-Trace builds on prior distributed tracing tools such as Pinpoint and Magpie.

Competitive work: Unlike its predecessors, X-Trace focuses on tracing causality between different network layers and administrative domains. X-Trace is cited as an inspiration for Google's Dapper tracing infrastructure.

Reproducibility: The BSD-licensed source code is available on GitHub.

Question: Section 3.2 mentions a packet sniffing application which sends reports on behalf of services and applications that cannot be modified to include libxtrreport. What are some examples of these services?

Criticism: The authors of Google's Dapper framework mention one inefficiency in X-Trace: "traces are collected not only at node boundaries but also whenever control is passed between different software layers within a node."

Ideas for further work: Almost every web application or distributed system has a basic logging framework baked in, but very few have X-Trace baked in. Lowering the barrier to entry would be helpful to many developers who might benefit from causal tracing.

Sandy Ryza‏

לא נקראה,

9 בנוב׳ 2010, 1:13:229.11.2010

עד CSCI2950-u Fall 10 - Brown‏

Title:
X-Trace: A Pervasive Network Tracing Framework

Authors:
Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, Ion
Stoica

Date:

NSDI '07

Novel Idea:
The authors present an approach to obtaining precise causal traces
through distributed systems. It relies on modifying system
components protocols to pass metadata along with their messages. It
transmits logging information out-of-band, which, once compiled, can
be used to reconstruct full causal traces or partial traces of the
parts of the system that support X-Trace.

Main Result(s):
The authors show that X-Trace is useful in tracing execution and
finding bugs and faults in a variety of different distributed systems
ranging from overlay networks to tunnels. It's limitations primarily
lie in its reliance on modifications to both the applications and
protocols that it traces through.

Evidence:
The authors describe in detail the process of adding X-Trace support
in three different usage scenarios - a basic web request and its
recursive DNS queries, a multi-tiered web hosting site, and an overlay
network. In each one, the introduce faults into the systems and
describe how X-Trace is able to help identify them.

Impact:
At the time of its writing, it was being used by the developers of
DONA, a content-based routing scheme for the Internet. I also heard
someone was trying to start a company around it but I don't remember
the details

Prior Work & Competitive Work:
Project5 and NetMedic attempt to use statistical methods to infer
faulty areas and highly-trafficked paths in distributed systems in an
application.

BorderPatrol is application-agnostic, but not protocol-agnostic, and
is able to find certain types of exact causal traces.

In comparison to these, X-Trace is perhaps the most intrusive on
applications of anything we've looked at, but because of it, is able
to provide the most detailed and full causal traces. Pip is also
intrusive and likely requires more interaction with and understanding
of higher level application dynamics to write expectations.

Reproducibility:
The X-Trace code is open source. It might be difficult to reconstruct
the second two systems it was tested on without further detail.

Criticism:
More attention could have been given to the consequences of node
failures and their interaction with report losses.

Question:
What's the importance of the difference between the causal relations
recorded push down and push next?

In practice, has report loss been an issue? If so, how much?

Ideas for further work:
http://en.wikipedia.org/wiki/X_(Trace_Adkins_album)

On Nov 8, 7:23 pm, Rodrigo Fonseca <rodrigo.fons...@gmail.com> wrote:

Duy Nguyen‏

לא נקראה,

8 בנוב׳ 2010, 23:20:198.11.2010

עד brown-csci...@googlegroups.com‏

Title:
X-Trace: A Pervasive Network Tracing Framework

Authors:

Rodrigo Fonseca George Porter Randy H. Katz Scott Shenker Ion Stoica

Date:
NSDI 2007

Novel Idea:
Inserting tracing data into messages at all layers. These messages will be
carried through the protocol stack. One can have a comprehensive view of
the whole system by investigating the reports generated by XTrace-enabled
nodes when they receive tracing data.

Main Result(s):
XTrace has been instrumented into many large scale systems like CoralCDN,
Hadoop. In this paper, 3 experiments were done by authors: a simple HTTP
request, a web hosting site and an overlay network.

Impact:
XTrace is getting popular. The thing is XTrace idea is simple but effective
, so it can be adopted by many software vendors as long as they have control
of their source code.

Evidence:
The experiments are described in detail. There are 2 main cases which show that
XTrace can be easily/difficultly instrumented.

Prior Works:
SNMP, traceroute, pinpoint, magpie,...

Competitive works:
From pure tracing perspective, there are many tools which can be considered
XTrace's competitors. But from the point of giving a comprehensive view of
large scale distributed systems, I'm not sure if there is other tools available.

Reproducibility
Yes

Question/Criticism:
One may hesitate to instrument XTrace into his systems because of requiring to
change software/protocols. But I think that's fair if you want to have a reliable
and accurate tracing data.

On Mon, Nov 8, 2010 at 7:23 PM, Rodrigo Fonseca <rodrigo...@gmail.com> wrote:

Shah‏

לא נקראה,

10 בנוב׳ 2010, 3:10:3910.11.2010

עד CSCI2950-u Fall 10 - Brown‏

Title:

X-Trace: A Pervasive Network Tracing Framework

Authors:

[1] Rodrigo Fonseca
[2] George Porter
[3] Randy H. Katz
[4] Scott Shenker
[5] Ion Stoica

Source and Date:

4th USENIX Symposium on Networked Systems Design & Implementation,
Cambridge, MA. April 11-13, 2007.

Novel Idea:

The authors present a framework called X-Trace that, unlike other
diagnostic tools, provides a comprehensive view of the entire system -
including layers and applications.

Main Result:

The authors state that, with X-Trace, service operators can meet their
visibility requirements. This is novel idea that knows no parallels
since it involves looking at several layers and culling information
that can be used among several parties.

Impact:

This paper has been cited a significant amount of times (30 or so).
The idea is novel and certainly holds promise for the future.

Evidence:

The scientists provide a fair amount of evidence to back up their
work. Specifically, they describe three scenarios in detail:

[1] Web request and recursive DNS queries

[2] A web hosting site

[3] An overlay network

Further they conduct experiments related to propagation, in
particular, checking the ‘pushnext’ function.

Prior Work:

The authors idea seems novel. As they mention in detail in Section 6:
there’s a lot of competitive or related work but the idea that X-Trace
puts forth is fresh.

Competitive Work:

In Section 7, the authors cover a large body of competitive work in
detail. They mention ‘traceroute’, SNMP and Netflows (by Cicso) as
peer protocols that allows operators to inspect instrumentation data.
They also refer to Splunk as a commercial, rival version that is used
in IT. Then they list the work of other researchers like Hussain etl
al., Kompella et. al. and Aguilera et. al. On the products front they
mention such products as Application Response Measurement (ARM),
Pinpoint, Magpie, AND and Constellation. They conclude by mentioning
Causeway and SDI.

Reproducibility

The fact that the source code is freely available for downloading
along with the details specififed in the paper suggests strongly that
the experiments are reproducible.

Questions:

I’ve meant to ask this before but what does the term ‘shepherd’ mean?
Why do researchers typically thank their ‘shepherds’?

Criticism:

None. Actually, the scientists do a great job of listing out several
shortcomings of X-Trace early during the paper. This is a first amog
the papers we’ve covered.

Ideas for Further Work:

Perhaps making X-Trace more popular would be cool. The idea of having
a startup based on this idea is great.

Rodrigo Fonseca‏

לא נקראה,

16 בנוב׳ 2010, 18:01:2116.11.2010

עד brown-csci...@googlegroups.com‏

Hi Vinit,

Shepherd is a concept some conferences have, and it works like this: you submit a paper, and it gets accepted. You also get back a bunch of comments from reviewers, with points to improve the paper, things to fix, request for experiments, etc. The shepherd is a member of the program committee of the conference who is responsible for making sure you follow all of these recommendations and fixes the paper. After the acceptance, and before the conference, there is a deadline for the camera-ready version of the paper, which is then published. The shepherd could actually reject the paper if the authors don't do a good job, but I have yet to see that happen (meaning the authors will generally do the best they can to fix the paper).

I hope that clarifies it.

Thanks,
Rodrigo

השב לכולם

השב למחבר

העבר לנמענים