General-purpose request context: Baggage, Tracing Plane, and Baggage Buffers

256 views
Skip to first unread message

Jonathan Mace

unread,
Jan 20, 2017, 5:14:29 PM1/20/17
to Distributed Tracing Workgroup
Hi all,

Some of you might know me from my work Pivot Tracing with my advisor Rodrigo Fonseca at Brown.  The purpose of this e-mail is to alert you to our "Tracing Plane" prototype along with 'Baggage Buffers', an IDL for generating context interfaces. https://github.com/JonathanMace/tracingplane
This is an ongoing research project, but is at the stage where we can show what we've done and what we're doing.

I've recently been working on the problems of "what should propagated context look like?" and "can different tracing tools use the same context?".  The 'Baggage' abstraction that we introduced in Pivot Tracing heads in this direction, but doesn't really provide a complete solution.  The benefit of truly general-purpose metadata propagation is that you no longer have to re-instrument your systems if you want to try out new tracing tools or if you want to change existing tracing tools.  They can just plug in to the existing instrumented system.  This is especially useful if redeploying all of your services is a non-starter.

Side note on terminology: When I use the term 'Baggage', I'm talking about an overarching, general purpose context that is propagated throughout a system (one per request).  When I talk about contexts specific to a tracing tool, I mean the metadata that that tool specifically wants to have propagated.

What is the tracing plane?
1. The APIs that you use in your system to pass around baggage
2. The underlying data format of the baggage that is passed around
3. The underlying protocol for accessing and manipulating tool-specific contexts within a baggage
4. An IDL, Baggage Buffers, for easily specifying the layout of tool-specific contexts.
5. The accessor APIs (generated by Baggage Buffers) that you use to get and set values contexts.

The tracing plane has the following features:

1. You only need to instrument your system once.  You do this using an instrumentation API that treats baggage as an opaque object (e.g., you think of it as just a blob, versus metadata for a specific tool (e.g., Zipkin headers))
2. You can easily specify new contexts using the Baggage Buffers IDL (similar to protocol buffers).  Baggage and Baggage Buffers are general purpose, so there are a wide variety of simple (integers, bytes, strings) and complex (sets, maps, counters, state-based CRDTs) data types.
3. If you want to deploy a new tool and propagate some new context, you don't need to make changes to any of the components in your system.  Your components will seamlessly propagate the new context of this new tool in the existing baggage instrumentation.
4. If you want to deploy a new tool and many of your components won't be using the tool at all, those components don't need any changes whatsoever; they will blindly (but correctly) propagate the baggage.  For example, suppose you're auditing a database -- you add tags to each request in your front end to specify the user, then at the database you check who is the user.  Intermediate layers do not use or modify the tags, and they will continue working with no updates.
5. Baggage can transparently carry multiple contexts from different tracing tools simultaneously.  If you have several different tracing tools that all want to propagate their own contexts, it works transparently
6. If you wish to update the specification of a context (e.g., in Baggage Buffers), updates can be incrementally deployed.  Old versions of the code will continue to work with the old context specification, and just ignore & propagate any fields they do not recognize.
7. If components in your system have size restrictions, you can trim baggage.  Later on, you are able to detect if your context used to be present in baggage, but was lost along the way.
8. Propagating context is very cheap.  Inserting or retrieving context values is very cheap.  I believe it is cheap enough that even resource-constrained components such as middleboxes can participate.

----------------------

A quick example.

In some of the discussions about trace context header propagation, they might look as follows in baggage buffers:

bag TraceContext {
    fixed32 traceId = 1;
    fixed32 spanId = 2;
    TraceOptions options = 3;
}    

bag TraceOptions {
    bool sampled = 1;
    int8 traceLevel = 2;
}


Baggage buffers will generate the code for this context as well as accessors.  All you need to do is start using it, eg

TraceContext.get()

^^- accesses the baggage, finds tracecontext if it exists.

The main focus of the tracing plane is having a carefully designed underlying serialized representation for contexts and baggage.  We separate this into layers called the Atom Layer and the Baggage Layer.  The data representation in the Atom Layer enables baggage merging, serialization, and propagation, without needing to know about the baggage contents.  The data representation in the Baggage Layer enables contexts from different tools to exist side-by-side, enables nested data structures and sets, and enables accessing contexts and fields.

----------------------

So, this message was just a quick intro our work.  It's premature but exciting!  There is more documentation (no tutorials yet) on the GitHub repo: https://github.com/JonathanMace/tracingplane
We will be sharing a preprint paper on the work soon as well.  Hopefully I can give a talk at the February workshop.

Looking forward to discussion on this!

Jon


Adrian Cole

unread,
Jan 31, 2017, 2:13:00 AM1/31/17
to Jonathan Mace, Distributed Tracing Workgroup
Hi, Jon.

Thanks for the overview. This is interesting indeed.

I'm curious which responsibilities one would expect the output of this
to address. For example, I peeked at the repo and it seems in-process
attachment of bags is in-scope of the Baggage apis. I agree with the
general sentiment that this part is laborious and relatively
undifferentiated heavy lifting between libraries that need to
propagate a context in-process.

Case in point, right now, regardless of htrace, opentracing or any
number of instrumentation (or logging libraries), each have
incompatible code that perform similar mechanics to deal with the
issue of propagating contexts.

So, since this is a promise to obviate a very real pain point :) I'd
suggest making an example which shows for example, grpc htrace and
opentracing actually participating in the same layer without
overwriting eachother and while being able to collaborate with
eachother. For example, grpc metadata is wider scope than tracing,
whereas htrace and opentracing have similar propagation apis.

This sort of thing might make the value more concrete as even if the
abstract notion is exciting, many of us may have seen failures to
achieve this too often :P

hope this helps,
-A
> --
> You received this message because you are subscribed to the Google Groups
> "Distributed Tracing Workgroup" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to distributed-tra...@googlegroups.com.
> To post to this group, send email to distribut...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/distributed-tracing/e8ee96cb-c2ac-4cb1-ba03-3b77dd9cb11f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Blaine...@sony.com

unread,
Apr 27, 2017, 8:35:31 PM4/27/17
to Distributed Tracing Workgroup
Hi all,

I am very interested in the idea of Baggage or Meta-Data that should flow through a system along with the functional information.  In our case we are moving to a Micro-Service Architecture and have needs for several pices of data, some of which have been discussed here.

So far we have identified:
General:
  Request ID, for logging and tracing
  Session ID, for logging and state recreation
Metrics:
  Timing (receipt time, response time, ..)
Flags:
   Sampled
   debug
   feature on/off

This data needs to be accessible to the tracing, logging, and business code.   As we look at microservices we noticed that though they deploy independently several services may collaborate to deliver new functionality to the end user.  The services may not be in direct communication.  In a system as large as the one we manage, we are faced with two choices for rolling new features - a monolithic deploy with new feature to a whole new stack, or flagging a subset of messages to enable the new feature when all needed components have been released.  We currently do a large monolithic deploy and that is becoming unwieldy.  Some of us were looking into OpenZipkin when we realized that feature flags need to be meta-data for messages as well.  This allows code paths to be launched dark, acceptance tested, released to a subset of users, and then generally available.

I know I am jumping in the middle of things, but what can we do to help move this concept forward in a OpenZipkin compatible way?

Thanks
Blaine

Adrian Cole

unread,
Apr 27, 2017, 8:44:41 PM4/27/17
to Blaine...@sony.com, Distributed Tracing Workgroup
I know I am jumping in the middle of things, but what can we do to help move this concept forward in a OpenZipkin compatible way?
I'll respond to this point separately :) Zipkin, the server, has no idea what applications propagate to eachother. It only sees the span information. For example, Finagle, the first zipkin tracer, propagates user identification and deadline info in the same propagated baggage as it does the tracing header. gRPC similarly propagates its own baggage regardless of if spans are sent to zipkin or not. So, basically to help scope things, I think what you are looking for is something that Zipkin compatible (and other 3rd party) tracers are likely to be able to adopt or not interfere with. Make sense? I'll respond to the meat of the discussion as well.

Yuri Shkuro

unread,
Apr 27, 2017, 9:20:11 PM4/27/17
to Blaine...@sony.com, Distributed Tracing Workgroup
Hi Blaine,

The problem you're describing is primarily about how you write your microservices to support distributed context propagation. It involves two main pieces, (1) serializing the context from inbound and into outbound RPC calls, and (2) propagating the context in-process. While simple in theory, it requires cooperation of potentially many different frameworks, possibly in different languages. So you'd want an instrumentation standard that can be supported by many frameworks & languages, as well as your own services. The OpenTracing API is one such standard that is supported in many languages and many open source frameworks. http://opentracing.io/

The scope of OpenTracing is larger that just context propagation, since it actually includes tracing and building a semantic call graph, but you will probably want that anyway, not just Baggage.

--YS

Adrian Cole

unread,
Apr 27, 2017, 9:38:13 PM4/27/17
to Blaine...@sony.com, Distributed Tracing Workgroup
Hi, Blaine.

I have drawn a diagram below.

Most tracers currently maintain their own context, even if they allow arbitrary data to be inserted. Some tracer apis (like OpenTracing) encourage other systems data to be placed into this context via the tracer api.

Many frameworks provide a propagated context (ex gRPC, Finagle). In these cases, other systems can use a generic api to retrieve and insert data (thus not relying on Tracer api to do that). These usually support arbitrary data, though how it is sent across wire may or may not be defined.

In rare cases, (Baggage, Finagle), how data is sent across the wire is defined generically, so subsystems do not need to know how to marshal things into headers or whatnot.

Hope this helps!
-A

Inline image 3

--
You received this message because you are subscribed to the Google Groups "Distributed Tracing Workgroup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distributed-tracing+unsub...@googlegroups.com.
To post to this group, send email to distributed-tracing@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distributed-tracing/bc8982cd-41ca-4bea-98c0-551b48319400%40googlegroups.com.

Blaine...@sony.com

unread,
Apr 28, 2017, 2:26:58 PM4/28/17
to Distributed Tracing Workgroup
Yuri & Adrian,

Thanks for the info.  Here is what I am bringing back to my teams:

Here is what I have learned.
In the community the nomenclature is as follows:

Zipkin: Is a service that consumes tracing info
Tracing Library: The Code that collects tracing info and sends to Zipkin : Can be independent or part of a larger framework

Brave is a common tracing lib for OpenZipkin

Frameworks can also propagate OpenTracing data, mentioned were gRPC and Finagle

Did I grok that correctly?

Is OpenTracing the prefered google term for further research?
Is there a location where I can find a good collection of relevant tools? (Brave, OpenZipking, ....)?
If not, is there a wiki where I can start a list that is regularly used by this community - (looks like the creation of a FAQ page may be an outstanding issue.)

Thanks for the clarification and help
-Blaine

Yuri Shkuro

unread,
Apr 28, 2017, 2:50:25 PM4/28/17
to Blaine...@sony.com, Distributed Tracing Workgroup
Hi Blaine,

you may find this video helpful https://vimeo.com/177303440, it explains why OpenTracing exists and the problems it solves.

Brave is an instrumentation library for Zipkin, and Zipkin itself is a tracing backend. Zipkin currently has the largest support in open source since it's been around the longest, but if you start instrumenting your code with Brave, for example, you will be tied to Zipkin nomenclature and data format. 

On the other hand, if you instrument your code with OpenTracing API, you can still use Zipkin backend, but also any number of other backends (e.g. we just open sourced Jaeger - http://uber.github.io/jaeger).



--
You received this message because you are subscribed to the Google Groups "Distributed Tracing Workgroup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distributed-tracing+unsub...@googlegroups.com.
To post to this group, send email to distributed-tracing@googlegroups.com.

Adrian Cole

unread,
Apr 28, 2017, 9:46:32 PM4/28/17
to Yuri Shkuro, Blaine...@sony.com, Distributed Tracing Workgroup
Nice pitch Yuri!

Lock in is a funny thing. literally using headers named uber from copyright uber code is not the usual way out of lockin. I will avoid falling further prey to the trolling, but nice one!

Back to business,

There is in fact work in this space which is John Mace and co on Baggage. This is solving a problem at a lower abstraction than whoevers library. With sufficient investment here, customers dont need to care about what version of what open api is being used for tracing. As mentioned before OpenTracing encourages people to use tracing api for arbitrary data (also called baggage) so thats a choice.

Beyond mechanics of adapting libraries, there's no reason except *lack or wire format* preventing an open ecosystem from working. By open I mean choosing to use whatever library you want, even tools not prefixed open.

Outside things mentioned so far, you can also look at wire format posts on this google group including TraceContext (led by google but without a specific feature on generic baggage) and more recently by Microsoft (which does permit variable headers)

The solution to this problem will end up one or more libraries depending on your deployment and a propagation format that is either implicit or implicit and shared by all of these libraries.

It really doesnt matter if your backend is Zipkin, X-Ray, Jaeger, StackDriver, LightStep, HTrace or what have you unless... they are conflated with the runtime. For example, if you deploy to Amazon, their services will have a preferred trace identifier and maybe even a "baggage" format, similarly to other clouds like Google. If you choose to use tools that dont agree with that format, and the tools cannot or will not change to support it, you might end up with broken traces.

To this point, I think the best thing to do is look at existing art like OpenTracing, commonly used libraries in Zipkin or heck Jaeger ( if Yuri's pitch worked :p ) Personally, I would like to see Baggage solve this, but I admit it is early days on that.

Most importantly, you should categorize your deployment as this is very much needed to make any of this relevant. Do you need to pass through amazon or other cloud apis or gateways? Which libraries and frameworks do you primarily use? Right fit often involves off the shelf components, custom ones, and even in some cases agents or side cars. Since they all need to agree, knowing the context of who needs to agree narrows the problem.

Hope this helps,
-A

On 29 Apr 2017 02:50, "'Yuri Shkuro' via Distributed Tracing Workgroup" <distribut...@googlegroups.com> wrote:
Hi Blaine,

you may find this video helpful https://vimeo.com/177303440, it explains why OpenTracing exists and the problems it solves.

Brave is an instrumentation library for Zipkin, and Zipkin itself is a tracing backend. Zipkin currently has the largest support in open source since it's been around the longest, but if you start instrumenting your code with Brave, for example, you will be tied to Zipkin nomenclature and data format. 

On the other hand, if you instrument your code with OpenTracing API, you can still use Zipkin backend, but also any number of other backends (e.g. we just open sourced Jaeger - http://uber.github.io/jaeger).


On Fri, Apr 28, 2017 at 2:26 PM, <Blaine...@sony.com> wrote:
Yuri & Adrian,

Thanks for the info.  Here is what I am bringing back to my teams:

Here is what I have learned.
In the community the nomenclature is as follows:

Zipkin: Is a service that consumes tracing info
Tracing Library: The Code that collects tracing info and sends to Zipkin : Can be independent or part of a larger framework

Brave is a common tracing lib for OpenZipkin

Frameworks can also propagate OpenTracing data, mentioned were gRPC and Finagle

Did I grok that correctly?

Is OpenTracing the prefered google term for further research?
Is there a location where I can find a good collection of relevant tools? (Brave, OpenZipking, ....)?
If not, is there a wiki where I can start a list that is regularly used by this community - (looks like the creation of a FAQ page may be an outstanding issue.)

Thanks for the clarification and help
-Blaine

--
You received this message because you are subscribed to the Google Groups "Distributed Tracing Workgroup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distributed-tracing+unsubscribe...@googlegroups.com.

To post to this group, send email to distributed-tracing@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Distributed Tracing Workgroup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distributed-tracing+unsub...@googlegroups.com.
To post to this group, send email to distributed-tracing@googlegroups.com.

Adrian Cole

unread,
Apr 28, 2017, 10:38:46 PM4/28/17
to Blaine...@sony.com, Distributed Tracing Workgroup
Is there a location where I can find a good collection of relevant tools? (Brave, OpenZipking, ....)?
I've carved this out and given folks partipating here direct write access. I think that's a good way to give you a jump start on things we've discussed in the past, and also a way for the next person asking to skip ahead!


Please do edit as you see fit ( goes for all )

adrian.f.cole

unread,
May 2, 2017, 9:46:37 PM5/2/17
to Distributed Tracing Workgroup
Hi, Blaine and anyone else interested. It turns out that this feature is not just needed for arbitrary data, but in any case where the inbound propagation data is wider than the "default" context. While I'm rather excited about Jon's work, at least in Brave we need to do this anyway or else we will break traces with X-Ray.

Here's a design which is specific to zipkin/brave (java), but obviously welcome if anyone cares to consider it elsewhere https://github.com/openzipkin/brave/issues/390
Reply all
Reply to author
Forward
0 new messages