GPR_ASSERT in grpc++

3,083 views
Skip to first unread message

Andrey Myznikov

unread,
Mar 10, 2015, 8:02:41 AM3/10/15
to grp...@googlegroups.com
Hello Folks,

   I found the use of grpc++ in real applications to be much problematic due to wide usage of GPR_ASSERT() macro in grpc++ for error checking.
The macro from gprc/support/log.h is intended only for internal invariants checking, and calls abort() if assertion fails.

But in grpc++ it is used for all error checks, including recoverable run-time errors.

For example, the fragment of code from BlockingUnaryCall():
  GPR_ASSERT((cq.Pluck(&buf) && buf.got_message) || !status.IsOk());
  return status;

For what to "return status" from here if it always will be OK ?

This leads to abort my application on connection loss or on other recoverable errors.

The comment to this macro says: If the error can be recovered from,  without the possibility of corruption, or might best be reflected via  an exception in a higher-level language, consider returning error code.

I expect to get some error code in my application, or some c++ exception as the last resort.
But do not abort() my application.

What you thinks should be the best way to get rid these abort's on recoverable errors?

Can I hope that the problem will solved in the immediate future releases of grpc++ ?


Thanks!









Nicolas Noble

unread,
Mar 10, 2015, 11:48:04 AM3/10/15
to Andrey Myznikov, grp...@googlegroups.com
Our usage of GPR_ASSERT in these cases is generally more of a "//TODO: implement recovery" than anything else. We usually like to test every codepath thoroughly, and this is one codepath that might be harder for us to simulate during tests - but not impossible. If you have specific examples other than the one you just described, feel free to let us know. Also, I am going to file a github issue for the one you just reported but feel free to directly file github issues on other similar issues that are currently preventing you from using the code.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/23fa8c1b-6807-48a7-a703-c31e200e841f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrey Myznikov

unread,
Mar 10, 2015, 12:12:51 PM3/10/15
to Nicolas Noble, grp...@googlegroups.com
Nicolas, thank you for the reply!
    But it still not clear for me what yuo mean  "to directly file github issues" ?
How exactly I can "directly file github issues" ?

Vijay Pai

unread,
Mar 10, 2015, 12:13:23 PM3/10/15
to grp...@googlegroups.com, andrey....@gmail.com
Hi Andrey,

As Nico noted, we can look into this issue, but let me note that the specific line there doesn't require that status is ok. It is an or condition, so it will only fail if status is OK despite the first logical test (pluck AND got message) failing. We would expect that first test to fail only if status is not ok, which is why there's still a return of status in the next line, and that should not abort. That said, if the code is aborting there, either there is an internal invariant failure or there is a bug that needs fixing because we have not exercised all failure cases. When you say "connection loss or other recoverable errors", are you finding that this code fails whenever the remote drops the connection, when this happens at certain points, or under some other circumstances?

Regards,
Vijay

Andrey Myznikov

unread,
Mar 10, 2015, 12:26:42 PM3/10/15
to Vijay Pai, grp...@googlegroups.com
Vijay,   thank you for the reply!

Actually the line preventing me from using application is in grpc/src/server/server.cc:278 ( in function void Server::PerformOpsOnCall())

 assertion failed:GRPC_CALL_OK == grpc_call_start_batch(call->call(), ops, nops, buf)

The example how to get this abort():
    I create clent and server pair, start server, start client,  then at client side begin receive stream of messages from server, and at some point manually kill the client (pressing Ctrl+C or by any other mean).
I expect to get some kind of error code at server side, but server just craches instead :( .

Such behavior prevents me from usnig this (excellent !) API in my production applications.

How can we solve this issue ?

Andrey Myznikov

unread,
Mar 10, 2015, 12:42:35 PM3/10/15
to grp...@googlegroups.com, andrey....@gmail.com
Vijay,
  
   Concerning to BlockingUnaryCall() assertion:  I do not like to get abort() of my application if ANY part of the "OR" condition fails.
I would like to get some sensible error code if my call was unseccessfull (grpc had transport problems, server connection lost, server just crached, etc).
Therefore I think that abort() is not appropriate here.

Thanks!





вторник, 10 марта 2015 г., 18:13:23 UTC+2 пользователь Vijay Pai написал:

Abhishek Kumar

unread,
Mar 10, 2015, 1:00:39 PM3/10/15
to Andrey Myznikov, grp...@googlegroups.com
Hi Andrey,

The assertion in  BlockingUnaryCall() is specifying the list of possible outcomes guaranteed by the API. i.e., as long as the implementation is true to the API there should be no way to trigger this assertion at run time. 
 
(We still need to investigate and fix the issue that triggered the other assertion at grpc/src/server/server.cc:278)

That said, we understand the reluctance to having aborts() in production application and will look into alternatives that will allow us to turn the assertions into loud errors when compiled with the right options.

Thanks for reporting the issue
-Abhishek 

Andrey Myznikov

unread,
Mar 10, 2015, 1:23:04 PM3/10/15
to grp...@googlegroups.com, andrey....@gmail.com
Abhishek,

For the assertion in BlockingUnaryCall() I still unclear:

Status BlockingUnaryCall(..)
{
   ...

  GPR_ASSERT((cq.Pluck(&buf) && buf.got_message) || !status.IsOk());
  return status;
}

Allow us assume that cq.Pluck() fails due to TRANSPORT problems, and (cq.Pluck(&buf) && buf.got_message) is false (no message received due to server unavailable).  Should the 'status' be OK here ? I expect the value of status someting like StatusCode::UNAVAILABLE, or so., but not OK.
Therefore I expect to get GPR_ASSERT(false || false) here.
But instead of abort() I would like to the code  it in my application.

Clarify me please what I missed.

Thanks!









вторник, 10 марта 2015 г., 19:00:39 UTC+2 пользователь Abhishek Kumar написал:

Andrey Myznikov

unread,
Mar 10, 2015, 1:27:19 PM3/10/15
to grp...@googlegroups.com, andrey....@gmail.com
Oh, Sorry for this, I found my misstake with BlockingUnaryCall(), do not repy please.


вторник, 10 марта 2015 г., 19:23:04 UTC+2 пользователь Andrey Myznikov написал:

Vijay Pai

unread,
Mar 10, 2015, 3:37:15 PM3/10/15
to Andrey Myznikov, grp...@googlegroups.com
Andrey,
To get some more information: are you using code pulled from HEAD recently, or are you using code from the original alpha release, or otherwise more than a week ago? A situation that would trip the abort in that same line of code was resolved in PR #920 (https://github.com/grpc/grpc/pull/920), and I'm trying to make sure that this isn't a duplicate of that problem.

Thanks!
Vijay

Andrey Myznikov

unread,
Mar 10, 2015, 5:13:20 PM3/10/15
to grp...@googlegroups.com, andrey....@gmail.com
Right now pulled the code  from g...@github.com:grpc/grpc.git.
Problem still exists - server aborts when client craches due to Ctrl+C or other reason.
The status code returned from grpc_call_start_batch() in Server::PerformOpsOnCall() is 5 (GRPC_CALL_ERROR_ALREADY_INVOKED).
Loks like grpc engine does not detect that client disapears and keeps previous call in queue for infinite time.

But in most of real-life cases the engine could detect that clients disapears - via ICMP, Keep-Alive, some kind of timeout on worst case....

The sample code to reproduce the problem:

/////////////////
// test.proto
syntax = "proto3";

package tester;

service Test {
  rpc ReadEvents (ReadEventsRequest) returns (stream TestEvent) {}
}

message ReadEventsRequest {
}

message TestEvent {
  string event_name = 1;
}

///////////////////////////
// client.cc

int main()
{
  grpc_init();

  TestEvent e;
  ClientContext context;

  unique_ptr<Test::Stub> tester(
      Test::NewStub(CreateChannel("localhost:7000", InsecureCredentials(), ChannelArguments())));

  unique_ptr<ClientReader<TestEvent>> rd =
        tester->ReadEvents(&context, ReadEventsRequest());

  printf("begin read\n");
  while ( rd->Read(&e) ) {
    printf("event: %s\n", e.event_name().c_str());
  }
  printf("end read\n");

  rd->Finish();

  grpc_shutdown();
  return 0;
}

///////////////////////
// server.cc

class TestService
  : public Test::Service
{
  unique_ptr<Server> server_;

public:

  TestService(const string & address) {
    ServerBuilder builder;
    builder.AddPort(address, InsecureServerCredentials());
    builder.RegisterService(this);
    server_ = builder.BuildAndStart();
  }

  void run() {
    server_->Wait();
  }

private:
  Status ReadEvents(ServerContext* context, const ReadEventsRequest* request, ServerWriter< TestEvent>* writer)
  {
    TestEvent e;
    char line[256];

    (void)(context); // unused
    (void)(request); // unused

    while ( fgets(line, sizeof(line) - 1, stdin) && strncmp(line, "quit", 4) != 0 ) {
      e.set_event_name(line);
      if ( !writer->Write(e) ) {
        fprintf(stderr,"leave\n");
        break;
      }
    }
    return Status::OK;
  }
};

int main()
{
  grpc_init();

  TestService service("localhost:7000");
  service.run();

  grpc_shutdown();

  return 0;
}





вторник, 10 марта 2015 г., 21:37:15 UTC+2 пользователь Vijay Pai написал:

Vijay Pai

unread,
Mar 10, 2015, 5:22:21 PM3/10/15
to Andrey Myznikov, grp...@googlegroups.com
Andrey, thank you for your detailed bug report. This is now being tracked as issue #999 (lucky!) : https://github.com/grpc/grpc/issues/999 . We should be able to take care of this one as it is similar to a problem that we dealt with before in the async server case with PR #920 that I referenced earlier.

Lei Wang

unread,
Mar 14, 2019, 5:03:56 AM3/14/19
to grpc.io
TOTALLY agree with you. I only use `Status` to check whether the service response is good.
Reply all
Reply to author
Forward
0 new messages