grpc++: issues with opencensus tracing [cmake]

132 views
Skip to first unread message

natha...@skeyecode.com

unread,
Oct 3, 2018, 10:51:33 AM10/3/18
to grpc.io

Hello,


I have some issues setting up grpc++ and Opencensus.

Before anything else, I have built grpc_opencensus_plugin(and its dependencies) with Cmake. Could I have missed something regarding some kind of static init? Dynamic linking?


Anyway, the prometheus stats are correctly exported, but I can not get the traces to work(both stdout and zipkin).
The traces are correctly logged/exported, but they always have the same “Parent SpanId: 30c6ffffff7f0000”.


Based on the stacktrace below, I can confirm one of CensusContext’s constructors is indeed called for each request, but it calls StartSpanWithRemoteParent instead of StartSpan.

The result is that in zipkin, I basically have only one “meta span” for the whole duration of the server. That span is typically 200-300s, depending on how often I restart the server(local dev server).


Example: I start something in the client, a span with 2-3 requests appears in zipkin. Then 5 minutes later, I do something else client side and the same span will be modified, and now have a duration of 300s.


grpc::CensusContext::CensusContext(grpc::CensusContext * this, absl::string_view name, const opencensus::trace::SpanContext & parent_ctxt) (....grpc/src/cpp/ext/filters/census/context.h:54)

grpc::GenerateServerContext(absl::string_view tracing, absl::string_view stats, absl::string_view primary_role, absl::string_view method, grpc::CensusContext * context) (....grpc/src/cpp/ext/filters/census/context.cc:34)

grpc::CensusServerCallData::OnDoneRecvInitialMetadataCb(void * user_data, grpc_error * error) (....grpc/src/cpp/ext/filters/census/server_filter.cc:113)

exec_ctx_run(grpc_closure * closure, grpc_error * error) (....grpc/src/core/lib/iomgr/exec_ctx.cc:40)

grpc_closure_run(const char * file, int line, grpc_closure * c, grpc_error * error) (....grpc/src/core/lib/iomgr/closure.h:258)

recv_initial_metadata_ready(void * arg, grpc_error * error) (....grpc/src/core/ext/filters/deadline/deadline_filter.cc:298)

exec_ctx_run(grpc_closure * closure, grpc_error * error) (....grpc/src/core/lib/iomgr/exec_ctx.cc:40)

grpc_closure_run(const char * file, int line, grpc_closure * c, grpc_error * error) (....grpc/src/core/lib/iomgr/closure.h:258)

hs_recv_initial_metadata_ready(void * user_data, grpc_error * err) (....grpc/src/core/ext/filters/http/server/http_server_filter.cc:289)

exec_ctx_run(grpc_closure * closure, grpc_error * error) (....grpc/src/core/lib/iomgr/exec_ctx.cc:40)

grpc_core::ExecCtx::Flush(grpc_core::ExecCtx * this) (....grpc/src/core/lib/iomgr/exec_ctx.cc:134)

pollset_work(grpc_pollset * pollset, grpc_pollset_worker ** worker_hdl, grpc_millis deadline) (....grpc/src/core/lib/iomgr/ev_epollex_linux.cc:1195)

pollset_work(grpc_pollset * pollset, grpc_pollset_worker ** worker, grpc_millis deadline) (....grpc/src/core/lib/iomgr/ev_posix.cc:313)

grpc_pollset_work(grpc_pollset * pollset, grpc_pollset_worker ** worker, grpc_millis deadline) (....grpc/src/core/lib/iomgr/pollset.cc:48)

cq_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) (....grpc/src/core/lib/surface/completion_queue.cc:1030)

grpc_completion_queue_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) (....grpc/src/core/lib/surface/completion_queue.cc:1106)

grpc::CompletionQueue::AsyncNextInternal(grpc::CompletionQueue * this, void ** tag, bool * ok, gpr_timespec deadline) (....grpc/src/cpp/common/completion_queue_cc.cc:56)

grpc::CompletionQueue::Next(grpc::CompletionQueue * this, void ** tag, bool * ok) (....grpc/include/grpcpp/impl/codegen/completion_queue.h:171)

ServerImpl::HandleRpcs(ServerImpl * this) (.../grpc_servers/api_server_async.cpp:105)

ServerImpl::Run(ServerImpl * this, const std::__cxx11::string & server_address) (.../grpc_servers/api_server_async.cpp:69)



What can I do to have 1 span per request?

Should I instrument the route manually? But in that case what is the point of the census plugin?

Should this come from the client?


PS: I don’t know if that is relevant, but I’m using the async routes(unary, not streaming).

PS2: the stats(prometheus) works fine

PS3: client is grpc-java, android

g-easy

unread,
Oct 4, 2018, 10:22:56 PM10/4/18
to grpc.io
On Thursday, October 4, 2018 at 12:51:33 AM UTC+10, natha...@skeyecode.com wrote:

Hello,


I have some issues setting up grpc++ and Opencensus.

Before anything else, I have built grpc_opencensus_plugin(and its dependencies) with Cmake. Could I have missed something regarding some kind of static init? Dynamic linking?


There's no init other than registering exporters, which is sounds like you've done.
 


Anyway, the prometheus stats are correctly exported, but I can not get the traces to work(both stdout and zipkin).
The traces are correctly logged/exported, but they always have the same “Parent SpanId: 30c6ffffff7f0000”.


Based on the stacktrace below, I can confirm one of CensusContext’s constructors is indeed called for each request, but it calls StartSpanWithRemoteParent instead of StartSpan.


That part makes sense. The server-side span is a child span of the client-side span.
What trace context is the client sending?

The behavior you describe is consistent with the client always sending the same span id to the server (which would be a bug)
Message has been deleted
Message has been deleted

natha...@skeyecode.com

unread,
Oct 5, 2018, 6:34:44 AM10/5/18
to grpc.io
I have not set up anything related to opencensus client-side, so I suppose the client is not sending a trace/span ID.
But in that case, the server should generate a new span no?
Or if that's a requirement, at least assert there is a client span?

edit: In fact the client(grpc-java) was indeed not sending a traceid, my bad.[I was missing a dep on 'opencensus-impl-lite']
In that case, why is the server using a fakeid?

Should not the behavior be something like:
Check if the client has sent a grpc-trace-bin:
if yes create a child span
if not create a new pan

OT: As I'm also testing opentracing: is there a proper/clean way to add a new context?
Right now I'm hacking it in via grpc_census_call_set_context, using my custom class based on the one used by census in server_filter.h.
Is this possible to easily increase GRPC_CONTEXT_COUNT?

natha...@skeyecode.com

unread,
Oct 5, 2018, 12:23:05 PM10/5/18
to grpc.io
additional data(sorry, can't edit a post)

With opencensus-impl-lite as: implementation, api, runtimeOnly [ie client-side tracing enabled]


env var API_GRPC_SERVER_ADDRESS is not set: default: localhost:11043

 

INFO
: env var API_GRPC_DB_POOL_SIZE is not set

INFO
: DbWorker2: nb_workers: 2

API_GRPC_POSTGRES_URI
: ***

API_GRPC_POSTGRES_URI
: ***

D1005
18:09:23.033466067 7152 ev_posix.cc:170] Using polling engine: epollex

D1005
18:09:23.033715624 7152 dns_resolver.cc:348] Using native dns resolver

Server listening on localhost:11043

I1005
18:09:25.520007955 7152 server_filter.cc:97] OnDoneRecvInitialMetadataCb: tracing_slice: 0000004b562beaac41cd2d97f7fab19de57801dcbfdb506d4b1d270200

I1005
18:09:25.520063817 7152 server_filter.cc:108] PrintSlice: tracing_slice 1 = 0000004b562beaac41cd2d97f7fab19de57801dcbfdb506d4b1d270200

I1005
18:09:25.520071226 7152 server_filter.cc:140] OnDoneRecvInitialMetadataCb: tracing_str = , tracing_str_len = 29

I1005
18:09:25.520075789 7152 server_filter.cc:151] OnDoneRecvInitialMetadataCb: census_str = , census_str_len = 0

I1005
18:09:25.520085429 7152 context.cc:52] GenerateServerContext: tracing: 0000004b562beaac41cd2d97f7fab19de57801dcbfdb506d4b1d270200

I1005
18:09:25.520094241 7152 context.cc:56] GenerateServerContext: trace_ctxt.span_id: dcbfdb506d4b1d27

I1005
18:09:25.520098174 7152 context.cc:58] GenerateServerContext: trace_ctxt.trace_id: 004b562beaac41cd2d97f7fab19de578

I1005
18:09:25.520101802 7152 context.cc:60] GenerateServerContext: trace_ctxt.trace_options: 00

D1005
18:09:25.520517309 7180 grpc_auth.h:45] DebugPrintAuthContext: transport_security_type = ssl

D1005
18:09:25.520533911 7180 grpc_auth.h:45] DebugPrintAuthContext: x509_common_name = 5f457e1df42371574b417e7c8b0bce6d

D1005
18:09:25.520542979 7180 grpc_auth.h:45] DebugPrintAuthContext: x509_pem_cert = -----BEGIN CERTIFICATE-----


With no runtime dependency on opencensus:

env var API_GRPC_SERVER_ADDRESS is not set: default: localhost:11043

 

INFO
: env var API_GRPC_DB_POOL_SIZE is not set

INFO
: DbWorker2: nb_workers: 2

API_GRPC_POSTGRES_URI
: ***

API_GRPC_POSTGRES_URI
: ***

D1005
18:14:19.863397659 13625 ev_posix.cc:170] Using polling engine: epollex

D1005
18:14:19.863636976 13625 dns_resolver.cc:348] Using native dns resolver

Server listening on localhost:11043

I1005
18:14:24.985050778 13625 server_filter.cc:97] OnDoneRecvInitialMetadataCb: tracing_slice:

I1005
18:14:24.985181932 13625 server_filter.cc:108] PrintSlice: tracing_slice 1 =

I1005
18:14:24.985200299 13625 server_filter.cc:140] OnDoneRecvInitialMetadataCb: tracing_str = , tracing_str_len = 0

I1005
18:14:24.985209776 13625 server_filter.cc:151] OnDoneRecvInitialMetadataCb: census_str = , census_str_len = 0

I1005
18:14:24.985229555 13625 context.cc:52] GenerateServerContext: tracing:

I1005
18:14:24.985247689 13625 context.cc:56] GenerateServerContext: trace_ctxt.span_id: 0000000000000000

I1005
18:14:24.985259110 13625 context.cc:58] GenerateServerContext: trace_ctxt.trace_id: 00000000000000000000000000000000

I1005
18:14:24.985273832 13625 context.cc:60] GenerateServerContext: trace_ctxt.trace_options: 00

D1005
18:14:24.985863180 13656 grpc_auth.h:45] DebugPrintAuthContext: transport_security_type = ssl

D1005
18:14:24.985875911 13656 grpc_auth.h:45] DebugPrintAuthContext: x509_common_name = 5f457e1df42371574b417e7c8b0bce6d

D1005
18:14:24.985883892 13656 grpc_auth.h:45] DebugPrintAuthContext: x509_pem_cert = -----BEGIN CERTIFICATE-----

Both logs are after a fresh server start, and one client request(the same one in both cases).
It's grpc++ v1.1.5, with a few log statements added in server_filter.cc and context.cc

Only in the second case I get a trace:

Name: Recv.skcapipb.SkcApi/DevicePinCreateTx

TraceId-SpanId-Options: 00000000000000000000000000000000-605b2c23170cd2bc-01

Parent SpanId: 0000000000000000 (remote: true)

Start time: 2018-10-05T18:14:24.985405404+02:00

End time: 2018-10-05T18:14:25.029826624+02:00

Attributes: (0 dropped)

Annotations: (0 dropped)

Message events: (0 dropped)

Links: (0 dropped)

Span ended: true

Status: OK

In the first one, nothing is logged nor exported to zipkin.

Jayant Kolhe

unread,
Oct 5, 2018, 12:33:47 PM10/5/18
to natha...@skeyecode.com, census-d...@googlegroups.com, grpc-io
Adding census-developers@

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ea959c19-f2e8-4159-8c73-6e5031d7c20e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bogdan Drutu

unread,
Oct 5, 2018, 1:04:34 PM10/5/18
to Jayant Kolhe, Emil Mikulic, natha...@skeyecode.com, census-d...@googlegroups.com, grpc-io

You received this message because you are subscribed to the Google Groups "census-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to census-develop...@googlegroups.com.
To post to this group, send email to census-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/census-developers/CALLfn8DFSyfLOHWuh8epyhhB2nFomhcGx%3D7FjOwZ7nmEkehvEQ%40mail.gmail.com.

Emil Mikulic

unread,
Oct 8, 2018, 5:53:29 AM10/8/18
to natha...@skeyecode.com, census-d...@googlegroups.com, grp...@googlegroups.com, Jayant Kolhe, Bogdan Drutu
To me it looks like there's a bug in GenerateServerContext where missing (and invalid) parent contexts are being passed through.

I sent out a pull request to fix this: https://github.com/grpc/grpc/pull/16805

Nathan, could you please take a look and see if this fixes your problem?

Without the PR, I was able to reproduce the problem when hello_server doesn't receive the grpc-trace-bin metadata. I ended up with bogus parent context like: a0f7ba65967f0000a55c0054967f0000-2300000000000000-c3

You received this message because you are subscribed to a topic in the Google Groups "grpc.io" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/grpc-io/WkA-MgFxLz4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to grpc-io+u...@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

Nathan Prat

unread,
Oct 8, 2018, 7:59:48 AM10/8/18
to grpc.io
It seems to be a step in the good direction, a context without parent is indeed created when I do not add a dependency on openceus-impl in the java client.
But the traces are still not exported(zipkin and stdout).
Do you have see traces in zipkin/console with your patch?

I'm guessing I'm missing something in the setup/config, but I can't figure out what.

random facts:
- c++ server, async
- using a custom AuthMetadataProcessor

I will try to get a basic client(helloworld sample) that can reproduce this.

Emil Mikulic

unread,
Oct 9, 2018, 1:06:12 AM10/9/18
to Nathan Prat, grp...@googlegroups.com
On Mon, Oct 8, 2018 at 10:59 PM Nathan Prat <natha...@skeyecode.com> wrote:
It seems to be a step in the good direction, a context without parent is indeed created when I do not add a dependency on openceus-impl in the java client.
But the traces are still not exported(zipkin and stdout).

Both the Java client and the C++ server have a zipkin exporter registered, yes?
And the spans are sampled for export?
 
Do you have see traces in zipkin/console with your patch?

If the client span is sampled, then the server span should also be sampled.

If there is no client span, then (with my PR) the server should create a span without a parent, and use the default sampler.
 
I'm guessing I'm missing something in the setup/config, but I can't figure out what.

random facts:
- c++ server, async
- using a custom AuthMetadataProcessor

I will try to get a basic client(helloworld sample) that can reproduce this.

Yes please, this would be very helpful. :)

There's also the standalone example https://github.com/census-ecosystem/opencensus-cpp-example/tree/master/grpc but you need to edit the WORKSPACE to point it at a modified version of grpc. You could just point it at https://github.com/g-easy/grpc/tree/ocplugin which has my fix.
 

Nathan Prat

unread,
Oct 9, 2018, 7:03:59 AM10/9/18
to grpc.io
Both the Java client and the C++ server have a zipkin exporter registered, yes?
And the spans are sampled for export?

If the client span is sampled, then the server span should also be sampled.

My bad, I was using the default sampling of the client(1/10000). Setting the client to "alwaysSample" everything seems to work
as expected.

Thank you very much for your help, and sorry if I have made you lose your time.
I should have read the docs better...

g-easy

unread,
Nov 7, 2018, 1:04:59 AM11/7/18
to grpc.io
On Tuesday, October 9, 2018 at 10:03:59 PM UTC+11, Nathan Prat wrote:
Thank you very much for your help

You're very welcome. I'm glad you got it working. :)
Reply all
Reply to author
Forward
0 new messages