Kernel tracing in distributed systems

loic....@gmail.com

unread,

Apr 23, 2018, 4:09:20 PM4/23/18

to OpenTracing

Hi everyone,

I am working in the DORSAL Lab (https://www.dorsal.polymtl.ca/en) which focuses on understanding the performance of complex systems using tracing. The lab had been developing the LTTng kernel and user-space tracer, which is now supported by EfficiOS. Analyzing kernel traces made it very useful to understand complex contention problem, boot-up time of virtual machines or virtual machines performance. Of course, the problem of the current approach taken by LTTng is that analyzing the trace is performed offline, which means the integration with monitoring tools or dashboards is poor.

I was thinking of working on integrating the LTTng kernel traces into other tracing or monitoring tools so as to add useful insight into the behavior of distributed systems based on microservices and containers. From what I understand from the OpenTracing ecosystem, the tools currently available make it possible to detect complex performance problems, but not always to explain them. Adding chosen traces from the Linux kernel and analyzing them live along with OpenTracing events could make it possible to explain subtle bottlenecks or errors in complex systems.

I have a few options in mind for that integration, but I'd like to know if some people here already thought about kernel tracing as an additionnal source of information in analyses. Comments of that are very welcome! Also, if someone has an example of performance problem that might be hard to explain using the current tools, I'd be very happy to know.

Cheers,

Loïc.

Yuri Shkuro

unread,

Apr 23, 2018, 6:20:05 PM4/23/18

to loic....@gmail.com, OpenTracing

I think this was an attempt to do that, not sure how far it went: https://github.com/opentracing-contrib/perfevents

--
You received this message because you are subscribed to the Google Groups "OpenTracing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opentracing+unsubscribe@googlegroups.com.
To post to this group, send email to opent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opentracing/d261d6da-8f07-400a-b517-458a83c9a73f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Sigelman

unread,

Apr 23, 2018, 6:54:47 PM4/23/18

to Yuri Shkuro, Sambasivan, Raja R, loic....@gmail.com, OpenTracing

I'm cc'ing Raja who was recently talking with me about a similar effort. Definitely valuable and I'd love to hear more about where you take this, Loic.

At Google we did some experiments tying Dapper trace data (from userspace) with ktrace data from the kernel. It didn't have the "zero observable performance overhead" guarantee of "regular Dapper" and thus was not enabled everywhere-by-default. I didn't work directly on that effort, but they were definitely able to diagnose and quantify some fascinating problems in the boundary between the userspace networking stack (GRPC-like) and kernel network queuing.

Best wishes,

Ben

On Mon, Apr 23, 2018 at 3:20 PM Yuri Shkuro <shk...@gmail.com> wrote:

I think this was an attempt to do that, not sure how far it went: https://github.com/opentracing-contrib/perfevents

On Mon, Apr 23, 2018 at 4:09 PM, <loic....@gmail.com> wrote:

Hi everyone,

I am working in the DORSAL Lab (https://www.dorsal.polymtl.ca/en) which focuses on understanding the performance of complex systems using tracing. The lab had been developing the LTTng kernel and user-space tracer, which is now supported by EfficiOS. Analyzing kernel traces made it very useful to understand complex contention problem, boot-up time of virtual machines or virtual machines performance. Of course, the problem of the current approach taken by LTTng is that analyzing the trace is performed offline, which means the integration with monitoring tools or dashboards is poor.

I was thinking of working on integrating the LTTng kernel traces into other tracing or monitoring tools so as to add useful insight into the behavior of distributed systems based on microservices and containers. From what I understand from the OpenTracing ecosystem, the tools currently available make it possible to detect complex performance problems, but not always to explain them. Adding chosen traces from the Linux kernel and analyzing them live along with OpenTracing events could make it possible to explain subtle bottlenecks or errors in complex systems.

I have a few options in mind for that integration, but I'd like to know if some people here already thought about kernel tracing as an additionnal source of information in analyses. Comments of that are very welcome! Also, if someone has an example of performance problem that might be hard to explain using the current tools, I'd be very happy to know.

Cheers,
Loïc.

--
You received this message because you are subscribed to the Google Groups "OpenTracing" group.

To unsubscribe from this group and stop receiving emails from it, send an email to opentracing...@googlegroups.com.

To post to this group, send email to opent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opentracing/d261d6da-8f07-400a-b517-458a83c9a73f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenTracing" group.

To unsubscribe from this group and stop receiving emails from it, send an email to opentracing...@googlegroups.com.

To post to this group, send email to opent...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/opentracing/CAJ9wD%2BpELDAH5SXEcVzjVZ_Zt0%2BT_U071cW_gOa3kxiE8Usssw%40mail.gmail.com.

loic....@gmail.com

unread,

Apr 23, 2018, 7:30:58 PM4/23/18

to OpenTracing

Hi Yuri,

Thanks for the pointer! It looks like it needs explicit instrumentation to use it though. I can explain what I had in mind in my reply to Ben below.

Le lundi 23 avril 2018 18:20:05 UTC-4, Yuri Shkuro a écrit :

I think this was an attempt to do that, not sure how far it went: https://github.com/opentracing-contrib/perfevents

On Mon, Apr 23, 2018 at 4:09 PM, <loic....@gmail.com> wrote:

Hi everyone,

I am working in the DORSAL Lab (https://www.dorsal.polymtl.ca/en) which focuses on understanding the performance of complex systems using tracing. The lab had been developing the LTTng kernel and user-space tracer, which is now supported by EfficiOS. Analyzing kernel traces made it very useful to understand complex contention problem, boot-up time of virtual machines or virtual machines performance. Of course, the problem of the current approach taken by LTTng is that analyzing the trace is performed offline, which means the integration with monitoring tools or dashboards is poor.

I was thinking of working on integrating the LTTng kernel traces into other tracing or monitoring tools so as to add useful insight into the behavior of distributed systems based on microservices and containers. From what I understand from the OpenTracing ecosystem, the tools currently available make it possible to detect complex performance problems, but not always to explain them. Adding chosen traces from the Linux kernel and analyzing them live along with OpenTracing events could make it possible to explain subtle bottlenecks or errors in complex systems.

I have a few options in mind for that integration, but I'd like to know if some people here already thought about kernel tracing as an additionnal source of information in analyses. Comments of that are very welcome! Also, if someone has an example of performance problem that might be hard to explain using the current tools, I'd be very happy to know.

Cheers,
Loïc.

--
You received this message because you are subscribed to the Google Groups "OpenTracing" group.

To unsubscribe from this group and stop receiving emails from it, send an email to opentracing...@googlegroups.com.

loic....@gmail.com

unread,

Apr 23, 2018, 7:38:15 PM4/23/18

to OpenTracing

Hi Ben,

Thanks again for your time. Let us discuss with Raja that if you have some time in the coming days.

I wouldn't worry too much about the overhead as I am planning to use LTTng snapshots capabilities. Here is how it would work: we use what you call "regular" Dapper in production (little overhead), but when an alert is raised on a metric (CPU usage too high, request too long to process), a signal is sent to the faulting machine so as to get a snapshot of the events LTTng collected during, say, the last minute. When no alert is raised, the events from LTTng are discarded so there is no overhead processing them. When we collect a snapshot, we have two options:

- either we pre-process it so as to send high-level information using the OpenTracing API (for exemple each process execution time would be represented by a span, this way Jaeger can display the density of context switches just as any other trace, which can help understand what happened)

- or we process the snapshot + the OpenTracing trace separately and display the results of the analysis somewhere else (in Grafana for example).

I'd really like to think about integration with tools like Jaeger or Grafana so as to make the most of the current toolchain.

Cheers.

loic....@gmail.com

unread,

Apr 24, 2018, 6:24:41 AM4/24/18

to Sambasivan, Raja R, Ben Sigelman, Yuri Shkuro, OpenTracing

Hi Raja,

I was actually thinking that both approaches (the one taken by Google and the one taken by your students) make sense and I’d really like to know more about the project of your students! The main drawback of their approach could be overhead and also difficulty to integrate these changes into the LTTng mainline, but it looks really promising. On the other hand, the approach taken by Google has this problem that only events from within the span are collected, which can be a problem if you consider how the system transitions into the span and outside of it.

Cheers!

Le 24 avr. 2018 à 02:07, Sambasivan, Raja R <r...@bu.edu> a écrit :

Hi Loïc,

What you are proposing sounds really cool and I think it’d be very useful.

I do have two (amazingly smart) high-school students working on extending end-to-end tracing into the kernel. They are propagating the trace context into the kernel and modifying LTTNG’s trace points to accept it. I’m not sure about the performance overheads of this approach yet…but, their final project is due in a month, so we should know by then :).

You might also want to check out this recent paper from Google: http://bit.ly/2qUXONS . Instead of propagating context into the kernel, they use a specific pattern of syscalls to demarcate the start and end of application-level work (i.e., spans) within kernel traces. (See Section 3.3.2.) This allows them to attribute work done by syscalls executed between start and ends of a span to that span. (See the paper for how they handle context switches.)

Anyway, I’m happy to chat more and hear about what direction you decide to go in.

Regards,

Raja

Loïc Gellé

unread,

May 2, 2018, 8:20:16 AM5/2/18

to Sambasivan, Raja R, Ben Sigelman, Yuri Shkuro, OpenTracing

Hi Raja,

Just reacting to something in your message. Could you describe quickly how your high-school students managed to propagate the trace context into the kernel? (in terms of modifying the tracers and patching the kernel).

Thanks,

Loïc.

Le 24 avr. 2018 à 02:07, Sambasivan, Raja R <r...@bu.edu> a écrit :

Hi Loïc,

What you are proposing sounds really cool and I think it’d be very useful.

I do have two (amazingly smart) high-school students working on extending end-to-end tracing into the kernel. They are propagating the trace context into the kernel and modifying LTTNG’s trace points to accept it. I’m not sure about the performance overheads of this approach yet…but, their final project is due in a month, so we should know by then :).

You might also want to check out this recent paper from Google: http://bit.ly/2qUXONS . Instead of propagating context into the kernel, they use a specific pattern of syscalls to demarcate the start and end of application-level work (i.e., spans) within kernel traces. (See Section 3.3.2.) This allows them to attribute work done by syscalls executed between start and ends of a span to that span. (See the paper for how they handle context switches.)

Anyway, I’m happy to chat more and hear about what direction you decide to go in.

Regards,

Raja

On Apr 23, 2018, at 6:54 PM, Ben Sigelman <b...@lightstep.com> wrote:

Reply all

Reply to author

Forward