Re: [grpc-io] Re: grpc++: issues with opencensus tracing [cmake]

7 views
Skip to first unread message

Jayant Kolhe

unread,
Oct 5, 2018, 12:33:47 PM10/5/18
to natha...@skeyecode.com, census-d...@googlegroups.com, grpc-io
Adding census-developers@

On Fri, Oct 5, 2018 at 3:34 AM <natha...@skeyecode.com> wrote:
I have not set up anything related to opencensus client-side, so I suppose the client is not sending a trace/span ID.
But in that case, the server should generate a new span no?
Or if that's a requirement, at least assert there is a client span?

edit: In fact the client(grpc-java) was indeed not sending a traceid, my bad.[I was missing a dep on 'opencensus-impl-lite']
In that case, why is the server using a fakeid?

Should not the behavior be something like:
Check if the client has sent a grpc-trace-bin:
if yes create a child span
if not create a new pan

OT: As I'm also testing opentracing: is there a proper/clean way to add a new context?
Right now I'm hacking it in via grpc_census_call_set_context, using my custom class based on the one used by census in server_filter.h.
Is this possible to easily increase GRPC_CONTEXT_COUNT?

On Friday, October 5, 2018 at 4:22:56 AM UTC+2, g-easy wrote:
On Thursday, October 4, 2018 at 12:51:33 AM UTC+10, natha...@skeyecode.com wrote:

Hello,


I have some issues setting up grpc++ and Opencensus.

Before anything else, I have built grpc_opencensus_plugin(and its dependencies) with Cmake. Could I have missed something regarding some kind of static init? Dynamic linking?


There's no init other than registering exporters, which is sounds like you've done.
 


Anyway, the prometheus stats are correctly exported, but I can not get the traces to work(both stdout and zipkin).
The traces are correctly logged/exported, but they always have the same “Parent SpanId: 30c6ffffff7f0000”.


Based on the stacktrace below, I can confirm one of CensusContext’s constructors is indeed called for each request, but it calls StartSpanWithRemoteParent instead of StartSpan.


That part makes sense. The server-side span is a child span of the client-side span.
 

The result is that in zipkin, I basically have only one “meta span” for the whole duration of the server. That span is typically 200-300s, depending on how often I restart the server(local dev server).


Example: I start something in the client, a span with 2-3 requests appears in zipkin. Then 5 minutes later, I do something else client side and the same span will be modified, and now have a duration of 300s.

 


grpc::CensusContext::CensusContext(grpc::CensusContext * this, absl::string_view name, const opencensus::trace::SpanContext & parent_ctxt) (....grpc/src/cpp/ext/filters/census/context.h:54)

grpc::GenerateServerContext(absl::string_view tracing, absl::string_view stats, absl::string_view primary_role, absl::string_view method, grpc::CensusContext * context) (....grpc/src/cpp/ext/filters/census/context.cc:34)

grpc::CensusServerCallData::OnDoneRecvInitialMetadataCb(void * user_data, grpc_error * error) (....grpc/src/cpp/ext/filters/census/server_filter.cc:113)

exec_ctx_run(grpc_closure * closure, grpc_error * error) (....grpc/src/core/lib/iomgr/exec_ctx.cc:40)

grpc_closure_run(const char * file, int line, grpc_closure * c, grpc_error * error) (....grpc/src/core/lib/iomgr/closure.h:258)

recv_initial_metadata_ready(void * arg, grpc_error * error) (....grpc/src/core/ext/filters/deadline/deadline_filter.cc:298)

exec_ctx_run(grpc_closure * closure, grpc_error * error) (....grpc/src/core/lib/iomgr/exec_ctx.cc:40)

grpc_closure_run(const char * file, int line, grpc_closure * c, grpc_error * error) (....grpc/src/core/lib/iomgr/closure.h:258)

hs_recv_initial_metadata_ready(void * user_data, grpc_error * err) (....grpc/src/core/ext/filters/http/server/http_server_filter.cc:289)

exec_ctx_run(grpc_closure * closure, grpc_error * error) (....grpc/src/core/lib/iomgr/exec_ctx.cc:40)

grpc_core::ExecCtx::Flush(grpc_core::ExecCtx * this) (....grpc/src/core/lib/iomgr/exec_ctx.cc:134)

pollset_work(grpc_pollset * pollset, grpc_pollset_worker ** worker_hdl, grpc_millis deadline) (....grpc/src/core/lib/iomgr/ev_epollex_linux.cc:1195)

pollset_work(grpc_pollset * pollset, grpc_pollset_worker ** worker, grpc_millis deadline) (....grpc/src/core/lib/iomgr/ev_posix.cc:313)

grpc_pollset_work(grpc_pollset * pollset, grpc_pollset_worker ** worker, grpc_millis deadline) (....grpc/src/core/lib/iomgr/pollset.cc:48)

cq_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) (....grpc/src/core/lib/surface/completion_queue.cc:1030)

grpc_completion_queue_next(grpc_completion_queue * cq, gpr_timespec deadline, void * reserved) (....grpc/src/core/lib/surface/completion_queue.cc:1106)

grpc::CompletionQueue::AsyncNextInternal(grpc::CompletionQueue * this, void ** tag, bool * ok, gpr_timespec deadline) (....grpc/src/cpp/common/completion_queue_cc.cc:56)

grpc::CompletionQueue::Next(grpc::CompletionQueue * this, void ** tag, bool * ok) (....grpc/include/grpcpp/impl/codegen/completion_queue.h:171)

ServerImpl::HandleRpcs(ServerImpl * this) (.../grpc_servers/api_server_async.cpp:105)

ServerImpl::Run(ServerImpl * this, const std::__cxx11::string & server_address) (.../grpc_servers/api_server_async.cpp:69)



What can I do to have 1 span per request?

Should I instrument the route manually? But in that case what is the point of the census plugin?

Should this come from the client?


PS: I don’t know if that is relevant, but I’m using the async routes(unary, not streaming).

PS2: the stats(prometheus) works fine

PS3: client is grpc-java, android


What trace context is the client sending?

The behavior you describe is consistent with the client always sending the same span id to the server (which would be a bug)

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/ea959c19-f2e8-4159-8c73-6e5031d7c20e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bogdan Drutu

unread,
Oct 5, 2018, 1:04:32 PM10/5/18
to Jayant Kolhe, Emil Mikulic, natha...@skeyecode.com, census-d...@googlegroups.com, grpc-io

You received this message because you are subscribed to the Google Groups "census-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to census-develop...@googlegroups.com.
To post to this group, send email to census-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/census-developers/CALLfn8DFSyfLOHWuh8epyhhB2nFomhcGx%3D7FjOwZ7nmEkehvEQ%40mail.gmail.com.

Emil Mikulic

unread,
Oct 8, 2018, 5:53:27 AM10/8/18
to natha...@skeyecode.com, census-d...@googlegroups.com, grp...@googlegroups.com, Jayant Kolhe, Bogdan Drutu
To me it looks like there's a bug in GenerateServerContext where missing (and invalid) parent contexts are being passed through.

I sent out a pull request to fix this: https://github.com/grpc/grpc/pull/16805

Nathan, could you please take a look and see if this fixes your problem?

Without the PR, I was able to reproduce the problem when hello_server doesn't receive the grpc-trace-bin metadata. I ended up with bogus parent context like: a0f7ba65967f0000a55c0054967f0000-2300000000000000-c3

You received this message because you are subscribed to a topic in the Google Groups "grpc.io" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/grpc-io/WkA-MgFxLz4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to grpc-io+u...@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
Reply all
Reply to author
Forward
0 new messages