L2met Logging convention questions

richard schneeman

unread,

Jul 18, 2013, 1:54:54 PM7/18/13

to l2...@googlegroups.com

Hey,

We're in the process of instrumenting the build pack so that it spits out metrics to a log that can then be parsed. I'm interested in using some existing standards instead of creating my own and I am interested in your opinion on some needs of our system.

## Correlated Measurements

To be able to get the measurements we want we need 2 correlated numbers (any 2 of: start time, end time, or duration) so that we can generate a gantt style chart for seeing how different code segments relate to others.

Right now l2met treats multiple measurements as separate entities:

2013-07-07T19:02:40+00:00 app[lpxc]: measure.foo.start=1 measure.foo.duration=43

Do you have any thoughts on how to represent that two measurements are related in a standard way?

## Tagged Measurements

Once we have the data parsed and saved to a backend we want to be able to go backwards. I.e. find a really slow app asset compilation and then find the app id, so we can contact the owner later. Something maybe like this:

Ryan Smith

unread,

Jul 22, 2013, 11:02:43 AM7/22/13

to richard schneeman, l2...@googlegroups.com

With respect to correlated measurements, I have a few suggestions. You might consider measuring each section of code independently. In your log output, include some id to correlate measurements out of band. For example,

heroku-app-id=1234 measure.buildpack.download=50ms

heroku-app-id=1234 measure.app.compile=10ms

This will allow you to analyze app compiles and buildpack downloads. If you detect problems with either of these components, you can go back to your logs and try to find the offending app-id. You could use the max value of the measurements to give you a number to aid in your log search.

On tagged measurements. It seems that you cut off your email. But, I hope my previous example which highlights putting an app-id into the log message will get you what you need. Just drain your logs into papertrail and search for heroku-app-id=1234.

--
You received this message because you are subscribed to the Google Groups "l2met" group.
To unsubscribe from this group and stop receiving emails from it, send an email to l2met+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Aseem Kishore

unread,

Jul 23, 2013, 12:34:44 AM7/23/13

to Ryan Smith, richard schneeman, l2...@googlegroups.com

Hey Ryan,

If you detect problems with either of these components, you can go back to your logs and try to find the offending app-id.

This is a step we struggle with -- going from e.g. seeing issues in Librato to finding the offending lines in the logs. Do you have any tips? Or, what's your own workflow?

Thanks much for any insight you can provide!

Aseem

Aseem Kishore

unread,

Jul 23, 2013, 12:35:33 AM7/23/13

to Ryan Smith, richard schneeman, l2...@googlegroups.com

Sorry, I missed the max value tip. That's certainly a good tip. Is there anything else you do?

Ryan Smith

unread,

Jul 23, 2013, 1:47:40 PM7/23/13

to Aseem Kishore, richard schneeman, l2...@googlegroups.com

This is still a rough area IMHO. Also, I am not exactly sure where to make improvements. One thing I have done in the past (which is quite rugged) is to put thresholds in my logging functions. Something like this:

def measure(name, th=nil)

start = Time.now

result = yield

elapsed = Time.now - start

msg = ""

if th && elapsed >= th

msg += "warning=above-threshold "

end

msg += "measure.#{name}=#{elapsed} "

$stdout.puts(msg)

end

Message has been deleted

Ryan Smith

unread,

Aug 1, 2013, 11:26:34 PM8/1/13

to Matt Button, l2...@googlegroups.com

Right on! Distributed tracing is a great addition to rate/time analysis.

On Tue, Jul 23, 2013 at 12:41 PM, Matt Button <that.mat...@gmail.com> wrote:

A bit off topic, but it might be worth looking at roadtrip (or zipkin) for ideas. Zipkin's very heavy, but seems to do all the gantt chart stuff.

--

Reply all

Reply to author

Forward