quick question about pollset_kick() in ev_epoll1_linux.cc

25 views
Skip to first unread message

cau...@gmail.com

unread,
Jan 8, 2018, 7:51:50 PM1/8/18
to grpc.io
Hi,

We are trying to upgrade to grpc 1.8.3 from 1.6 and seeing an issue with one of our internal tests that works in 1.6 but fails in 1.8.3, where we're seeing a bunch of these message:

E0108 23:10:24.009205427   68294 ev_epoll1_linux.cc:1098]    assertion failed: next_worker->state == KICKED
E0108 23:10:24.011880853   68330 ev_epoll1_linux.cc:1098]    assertion failed: next_worker->state == KICKED
E0108 23:10:24.011978910   68369 ev_epoll1_linux.cc:1098]    assertion failed: next_worker->state == KICKED

Looking at the code in branch 1.8.x [1], I see these two lines:

1098:  GPR_ASSERT(next_worker->state == KICKED);
1099:  SET_KICK_STATE(next_worker, KICKED);

I'm curious why the code is asserting X right before it's making X true? It makes me look further and I see:

1044  } else if (next_worker->state == KICKED) {
        ...
        goto done;
      ...
1063  } else if (next_worker->state == UNKICKED) {
        ...
        goto done;
      ...
1072  } else if (next_worker->state == DESIGNATED_POLLER) {
        ...
        goto done;
      } else {
1098    GPR_ASSERT(next_worker->state == KICKED);

that seems to be a bug somewhere, since if we reach line 1098, then next_worker->state cannot be KICKED?

Ideas?

Thanks!

Sree Kuchibhotla

unread,
Jan 9, 2018, 5:33:50 PM1/9/18
to cau...@gmail.com, grpc.io
Hi,

This looks like a bug in the code. 

 - Does this failure happen consistently ?
 - Could you give more details on your test that is causing this? (would be ideal if I can create something similar)

Also, would it be possible for you to rerun your tests by replacing the assert with the following log line ?
(I am interested in seeing what is the state the root and root->next workers are)

     if (next_worker->state != KICKED) {
gpr_log(GPR_ERROR, "root: %p (%d), next: %p (%d)", root_worker, root_worker->state, next_worker, next_worker->state); }

thanks,
Sree


--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/11b1743c-1f45-4f6c-b983-64559b36d040%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Giang Nguyen

unread,
Jan 9, 2018, 7:46:06 PM1/9/18
to Sree Kuchibhotla, grpc.io
On Tue, Jan 9, 2018 at 2:33 PM, Sree Kuchibhotla <sr...@google.com> wrote:
Hi,

This looks like a bug in the code. 

 - Does this failure happen consistently ?
 - Could you give more details on your test that is causing this? (would be ideal if I can create something similar)

Also, would it be possible for you to rerun your tests by replacing the assert with the following log line ?
(I am interested in seeing what is the state the root and root->next workers are)

     if (next_worker->state != KICKED) {
gpr_log(GPR_ERROR, "root: %p (%d), next: %p (%d)", root_worker, root_worker->state, next_worker, next_worker->state); }

Hi Sree,

Yea, kind of 100% failure rate. We believe it's related to https://github.com/grpc/grpc/issues/13873. Moving "import grpc" after fork'ing seems to resolve the issue for us.

Unfortunately it's non-trivial to run the tests against a modified grpc because our environment is bazelized and we have not been able to build grpc in our bazel environment.

Thanks,

Sree Kuchibhotla

unread,
Jan 9, 2018, 7:55:16 PM1/9/18
to Giang Nguyen, grpc.io
oh.. I didn't realize you were doing a fork() call.  grpc actually does not support fork  and is known to create strange issues like the one you reported.
We did some work to mitigate some specific uses of fork (https://github.com/grpc/grpc/pull/13025) but by and large it is not it is not supported.

Moving "import grpc" after forking seems like a great idea. 

thanks,
Sree

Giang Nguyen

unread,
Jan 10, 2018, 11:39:33 AM1/10/18
to Sree Kuchibhotla, grpc.io
On Tue, Jan 9, 2018 at 4:55 PM, Sree Kuchibhotla <sr...@google.com> wrote:
oh.. I didn't realize you were doing a fork() call.  grpc actually does not support fork  and is known to create strange issues like the one you reported.

Ah, right, OK. We'll work around this as we run into weird problems.

Though I'm still curious about the logic in that function, the forking issue notwithstanding: should that be an "assert not reached" on line 1098?

Sree Kuchibhotla

unread,
Jan 10, 2018, 1:00:01 PM1/10/18
to Giang Nguyen, grpc.io
>> Though I'm still curious about the logic in that function, the forking issue notwithstanding: should that be an "assert not reached" on line 1098?

Yes, the following line 1099 is redundant after the assert. 
SET_KICK_STATE(next_worker, KICKED);
I will remove it.

-Sree

Giang Nguyen

unread,
Jan 10, 2018, 1:13:27 PM1/10/18
to Sree Kuchibhotla, grpc.io
OK cool. And I think just to make the error clearer we should change
the assert to some kind of "assert not reached" since by three
possible kick enum values have already been checked by that time.
Thanks!
Reply all
Reply to author
Forward
0 new messages