Zipkin 1.3 includes highlighting of spans in error state and improvements to Cassandra storage

364 views
Skip to first unread message

Adrian Cole

unread,
Jul 10, 2016, 4:18:31 AM7/10/16
to zipkin-user

Zipkin 1.3 includes highlighting of spans in error state and improvements to the Cassandra storage component.

The below is a copy of the release notes, which can be found here.

Error annotations

Inspired by recent work in OpenTracing, we've added a new annotation "error". When an annotation value, this indicates when a potentially transient error occurred. When a binary annotation key, the value is a human readable message associated with a error resulting in a failed span. See #1140 for details.

Thanks to @virtuald the UI acts according to these rules, highlighting degraded spans yellow, and failed ones red.

trace
Instrumentation (like Brave, zipkin-tracer etc) need to change to support this. Please help if you have time!

Span.timestamp, duration 0 coerce to null

We've noticed some instrumentation log invalid timestamp and duration of 0, when they meant to log null. Timestamp or duration of 0 microseconds are invalid or don't explain latency. We now coerce these 0s to null. For cases where a sub-microsecond span duration occurred, you should round up to 1. See#1155 and #1176

Elasticsearch daily bucket fix

We found and fixed a concurrency bug that could put spans into the wrong daily buckets. See #1175

Cassandra

Schema bug fix

We found a bug where traces against the same service in the same millisecond weren't indexed. This affects indexes only (trace data itself wasn't lost). For example, you might find a trace that exists in cassandra, but you can't query it using the api.

Specifically, the following indexes now have trace_id added to their PRIMARY_KEY definitions.

  • service_span_name_index
  • service_name_index
  • annotations_index

There's no automatic data migration available. The most straight-forward way to address this in an existing cluster is to drop the following indexes and restart a zipkin server (which will recreate them as long as CASSANDRA_ENSURE_SCHEMA=true). You can also update the indexes manually based on theschema

Tuning

We've done a lot of work tuning the amount of data written to indexes on a per-span basis. Those using Cassandra should see a significant drop in index size due to reasons documented in the tuning section of the README.

Query logging

Those supporting zipkin may need to debug query latency. We now use the QueryLogger which is enabled when the log category "com.datastax.driver.core.QueryLogger" is at debug or trace level. Trace level includes bound values. See #1156


Reply all
Reply to author
Forward
0 new messages