Trace encoding - traceEv(String/Stack) deferred output always sent at tail of trace file leaving it incomplete until stopped.

chrisst...@gmail.com

unread,

Dec 31, 2016, 11:26:53 AM12/31/16

to golang-nuts

Hello, I wrote (rather derived from the stdlib) a reentrant Parser for the Go trace files, so I could stream partial events as they are returned from runtime.ReadTrace(). After learning about the format and reading the go15trace design doc everything makes sense. I've noticed that runtime.traceEvString/traceEvStack are not written until the trace is shutting down. A hashtable/map is kept for the stack traces and strings and it greatly condenses the trace but does have the caveat that a stack trace is not complete until it is stopped. I can reliably work around getting the traceEvString data by ignoring the seq and correlating the PC with FuncForPC but can't think of a similar work around for traceEvStack. Is there a way to do this?

As a quick hack I added a flush() method to traceStackTable and flush before waiting for trace to be ready to dequeue to get the behavior I want. Would there be any interest of a proper implementation of this? The exact proposal being to send new items in traceStackTable and the string map to the buffer queue so ReadTrace can pick them up as they are first seen. They can still be tracked the same way to prevent duplication of strings/stacks in the trace. Based on my testing and limited understanding of ReadData, it will send pending data in the fullHead traceBufPtr, there is not a threshold of N buffers before writes. But if I'm incorrect may need modified to have runtime.ReadTrace() always flush any available trace data or a runtime(|/trace).FlushTrace() so a user could signal this explicitly. This doesn't affect any of the tooling for traces, they don't have a concept of a footer, just a header and events so they will continue to work as is.

Use cases could be numerous, you could stream events of functions / goroutines as they are created / stopped with line numbers and files in real time for debugging. Basically anything that you get from go tool trace, but (near) real time which may enable some neat visualization tools. What sparked my interest is probably not as interesting to others, but I was issuing tokens to track counts/ordering of functions by wrapping a inner func. The caveat here being function signatures have to match and it doesn't look very clean and has edge cases for reliability and is limited to "was this called and how many times".. but it is a bit simpler to reason with then complicated signaling / sync prims. With a parser for trace files at runtime I can Watch(fn interface{}) any function during a test by taking the func ptr (reflect.ValueOf(fn).Pointer()) and calling FuncForPC. Then using the file/line/name and correlating it to the trace to validate all kinds of additional data points available in trace events. Might be a better way to do this, it's just an experiment but it works.

For now I can simply Start() and Stop() at the beginning and end of each test for my own use case, so if this makes no sense to anyone else, no worries thanks for reading regardless.

-Chris

Tarmigan

unread,

Jan 4, 2017, 1:19:12 AM1/4/17

to chrisst...@gmail.com, golang-nuts, dmi...@stackimpact.com, Dmitry Vyukov

Hi Chris,

I think that the parsing of the runtime trace information like you are
describing is very interesting. I have been thinking about a similar
continuous tracing and automatic anomaly detection for my application,
but have not yet tried to implement anything, so your experience is
helpful. My application has limited connectivity and I am only
interested in anomalous high latencies that occur < 0.01% of the time,
so the existing manual and live-streaming tracing options are not a
good fit for my needs.

Did you see the "StackImpact" post to gonuts from a few months ago?
Their github repo includes an internal package to parse and filter the
trace format. You could also see how they handle the traceEvStack
issue (I have not looked myself).

Online parsing, filtering, and analyzing the trace output seems like a
common desire and enhancement to the standard pprof/trace
functionality (which is great itself). At least 3 of us want to use
the runtime trace output without having to manually run a trace and
define start and stop points for the trace.

Dmitry, would you be open in establishing a stable trace format with
these extra features that Chris is asking about? It would be nice to
have a common package to parse and extract features in realtime from a
streaming trace.

Thanks,
Tarmigan

> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Dmitry Vyukov

unread,

Jan 4, 2017, 4:22:36 AM1/4/17

to Tarmigan, chrisst...@gmail.com, golang-nuts, dmi...@stackimpact.com

Streaming stacks and strings sounds fine to me. I don't see any
potential issues here.

Streaming data in real time (not waiting for an internal buffer to be
full) is somewhat more problematic, but probably doable.

Stable public format is problematic. The format proven to be unstable
over the last releases. However maybe we could just expose parser for
each version without particular guarantees.

If you want anything from this to happen, the first step would be to
file an issue at http://golang.org/issue/new

chrisst...@gmail.com

unread,

Jan 22, 2017, 12:52:40 PM1/22/17

to golang-nuts, tarmiga...@gmail.com, chrisst...@gmail.com, dmi...@stackimpact.com

Hello, thanks for the response, excuse the late reply here got a bit busy over the holidays.

On Wednesday, January 4, 2017 at 2:22:36 AM UTC-7, Dmitry Vyukov wrote:

Streaming stacks and strings sounds fine to me. I don't see any
potential issues here.

Awesome, I think this alone would add some cool possibilities for tooling since you can get a lot of information from stack & event data alone. The other thing is timing- I am curious if there is anyway to emit the frequency occasionally- say the start of the batch, or some other method? I'm still not sure how timing works exactly I was having trouble getting the timing information from a 1.5 event normalized to a 1.7, the request may make no sense all together but if there was a way to be able to make a best-guess at timing information from purely look behind state that would be nice as well.

Streaming data in real time (not waiting for an internal buffer to be
full) is somewhat more problematic, but probably doable.

Now that I understand more about the trace internals, particularly how it batches and is tied to the buffer I understand how this could be difficult. Real time may be an overreach, perhaps we could send a new batch every 250ms for programs we can determine are not very busy with a simple heuristic of some sort. That would help for programs that are not doing much work while you debug them, since you could see the very few events that are happening very close to the time they occur giving some additional sensory to what's happening in your application.

Stable public format is problematic. The format proven to be unstable
over the last releases. However maybe we could just expose parser for
each version without particular guarantees.

I've noticed it's had some changes, the biggest being 1.5 -> 1.7 but other than that I found it wasn't too bad to normalize past events to the latest format. Using the trace code and go tool as reference I was able to write a basic Encoder/Decoder https://github.com/cstockton/go-trace/tree/master/encoding I feel like it wasn't too difficult of an exercise, but I also didn't do the hardest part (all the post processing and "consistency" stuff in the "order.go" state machine.). The changes thus far have been made in a way that I think you can bring forward prior version events to begin correlation on a single structured event aside from the timing which I mentioned above (which I think is just my expertise lacking rather than a technical challenge?).

If you want anything from this to happen, the first step would be to
file an issue at http://golang.org/issue/new

I filed one last night https://github.com/golang/go/issues/18744, thanks again for the response

chrisst...@gmail.com

unread,

Jan 22, 2017, 1:03:36 PM1/22/17

to golang-nuts, chrisst...@gmail.com, dmi...@stackimpact.com, dvy...@google.com, tarmiga...@gmail.com

Hello,

On Tuesday, January 3, 2017 at 11:19:12 PM UTC-7, Tarmigan wrote:

Hi Chris,

I think that the parsing of the runtime trace information like you are
describing is very interesting. I have been thinking about a similar
continuous tracing and automatic anomaly detection for my application,
but have not yet tried to implement anything, so your experience is
helpful. My application has limited connectivity and I am only
interested in anomalous high latencies that occur < 0.01% of the time,
so the existing manual and live-streaming tracing options are not a
good fit for my needs.

Hi there, I agree with you here that it could be helpful! I particularly think that being able to see file / line / function names of Go routines as the start / stop (or very close to when they do) might be able to help build mental imagery to support trouble shooting. You could technically do this now with a sampling of your data and building graphs / tools that treat the sampling as a real time- but it removes the ability for you to give the program inputs. This approach in some cases may tighten the feedback loop when trying to have a high level understanding before you deep dive into the granular debugging with go tool, pprof, fmt.Println(`WHYISTHISHAPPENING`, myvalue) etc :)

Did you see the "StackImpact" post to gonuts from a few months ago?
Their github repo includes an internal package to parse and filter the
trace format. You could also see how they handle the traceEvStack
issue (I have not looked myself).

I just looked, I did see a internal package but it appears to be a copy of the same parser from the go tool. They may have a different one somewhere I didn't see though?

Online parsing, filtering, and analyzing the trace output seems like a
common desire and enhancement to the standard pprof/trace
functionality (which is great itself). At least 3 of us want to use
the runtime trace output without having to manually run a trace and
define start and stop points for the trace.

Cool, well I published the work I did do in another thread, it's not complete or of the same quality you will find in standard library but it does have a bunch of documentation inline and a couple commands that filter/grep events that may give ya some ideas to play with. Thanks for the reply.

Reply all

Reply to author

Forward