Recovery in grpc-go

787 views
Skip to first unread message

Johan Jian An Sim

unread,
Apr 1, 2016, 1:01:01 AM4/1/16
to grpc.io
Instead of panicking and exit the program in one of the bad request due to acesssing nil pointer or etc, is there a way to gracefully recovery from it?

Recently, I have wrongfully returned a (nil, nil) from one of the handler and it causes the program to exit (https://github.com/grpc/grpc-go/blob/master/server.go#L421) due to the marshaling error. I could easily correct this error but I am wondering if somewhere between the code, I have accidentally done something wrongly and causes a panic and would like a way to recover from it. One way to do so could be to insert the recovery block in every handler. Is there a better place to do so?

The panic and exit situation is actually worsen by the fact that my program is running inside a Kubernetes cluster. So the bad request actually killed the server and Kubernetes cluster will try to bring back the program. The bad request is then sent again by the sender due to not receiving the response and the cycle continues.

Qi Zhao

unread,
Apr 1, 2016, 1:58:33 AM4/1/16
to Johan Jian An Sim, grpc.io
The general rule is that the code should NOT crash/panic due to the error from the peer (e.g., the peer sends you a malformed msg) but it is okay to panic when dealing with a local fatal error which might direct the library into an undefined state. The code you pointed out is dealing with a local marshaling error and as I mentioned in the comments it is almost always a fatal error. 



Please refer to my previous arguments. This is NOT the case in gRPC-Go. The peer error should not kill the process. 

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/9e5ca88b-6237-4ef0-b585-f4c2688e1ab4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks,
-Qi

Johan Sim

unread,
Apr 1, 2016, 2:31:39 AM4/1/16
to Qi Zhao, grpc.io
Well. I understand that it is definitely a reason to panic in library in some cases. Just wondering is there a way for us to insert a recovery middleware somewhere to prevent some unexpected panic from the code.

On Fri, 1 Apr 2016 at 09:58 Qi Zhao <zh...@google.com> wrote:
The general rule is that the code should NOT crash/panic due to the error from the peer (e.g., the peer sends you a malformed msg) but it is okay to panic when dealing with a local fatal error which might direct the library into an undefined state. The code you pointed out is dealing with a local marshaling error and as I mentioned in the comments it is almost always a fatal error. 


On Thu, Mar 31, 2016 at 6:01 PM, Johan Jian An Sim <joha...@gmail.com> wrote:
Instead of panicking and exit the program in one of the bad request due to acesssing nil pointer or etc, is there a way to gracefully recovery from it?

Recently, I have wrongfully returned a (nil, nil) from one of the handler and it causes the program to exit (https://github.com/grpc/grpc-go/blob/master/server.go#L421) due to the marshaling error. I could easily correct this error but I am wondering if somewhere between the code, I have accidentally done something wrongly and causes a panic and would like a way to recover from it. One way to do so could be to insert the recovery block in every handler. Is there a better place to do so?

The panic and exit situation is actually worsen by the fact that my program is running inside a Kubernetes cluster. So the bad request actually killed the server and Kubernetes cluster will try to bring back the program. The bad request is then sent again by the sender due to not receiving the response and the cycle continues.
Please refer to my previous arguments. This is NOT the case in gRPC-Go. The peer error should not kill the process. 

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/9e5ca88b-6237-4ef0-b585-f4c2688e1ab4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks,
-Qi

Qi Zhao

unread,
Apr 1, 2016, 4:35:59 AM4/1/16
to Johan Sim, grpc.io
You probably do not get it. The point is that the process will crash or panic if there is high chance it enters into an usable state due to some fatal error. There is no reliable recovery from a fatal error (e.g., you write to a closed channel.) and there is no reason keeping running a process which is in fatal error status.

On Thu, Mar 31, 2016 at 7:31 PM, Johan Sim <joha...@gmail.com> wrote:
Well. I understand that it is definitely a reason to panic in library in some cases. Just wondering is there a way for us to insert a recovery middleware somewhere to prevent some unexpected panic from the code.
On Fri, 1 Apr 2016 at 09:58 Qi Zhao <zh...@google.com> wrote:
The general rule is that the code should NOT crash/panic due to the error from the peer (e.g., the peer sends you a malformed msg) but it is okay to panic when dealing with a local fatal error which might direct the library into an undefined state. The code you pointed out is dealing with a local marshaling error and as I mentioned in the comments it is almost always a fatal error. 


On Thu, Mar 31, 2016 at 6:01 PM, Johan Jian An Sim <joha...@gmail.com> wrote:
Instead of panicking and exit the program in one of the bad request due to acesssing nil pointer or etc, is there a way to gracefully recovery from it?

Recently, I have wrongfully returned a (nil, nil) from one of the handler and it causes the program to exit (https://github.com/grpc/grpc-go/blob/master/server.go#L421) due to the marshaling error. I could easily correct this error but I am wondering if somewhere between the code, I have accidentally done something wrongly and causes a panic and would like a way to recover from it. One way to do so could be to insert the recovery block in every handler. Is there a better place to do so?

The panic and exit situation is actually worsen by the fact that my program is running inside a Kubernetes cluster. So the bad request actually killed the server and Kubernetes cluster will try to bring back the program. The bad request is then sent again by the sender due to not receiving the response and the cycle continues.
Please refer to my previous arguments. This is NOT the case in gRPC-Go. The peer error should not kill the process. 

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/9e5ca88b-6237-4ef0-b585-f4c2688e1ab4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks,
-Qi



--
Thanks,
-Qi

Johan Sim

unread,
Apr 1, 2016, 5:08:10 AM4/1/16
to Qi Zhao, grpc.io
But in the same way, there is no reliable way to prevent unexpected panic as well (unsafe type assertion, index an array out of bounds, nil pointer access). Something that might have slipped the code review process.

Instead of crashing the process for these type of error, can't we just return an Internal error code and therefore the process will be able to serve the next request?

I am not talking specifically about that portion of the code just in case you misunderstood.

On Fri, 1 Apr 2016 at 12:35 Qi Zhao <zh...@google.com> wrote:
You probably do not get it. The point is that the process will crash or panic if there is high chance it enters into an usable state due to some fatal error. There is no reliable recovery from a fatal error (e.g., you write to a closed channel.) and there is no reason keeping running a process which is in fatal error status.
On Thu, Mar 31, 2016 at 7:31 PM, Johan Sim <joha...@gmail.com> wrote:
Well. I understand that it is definitely a reason to panic in library in some cases. Just wondering is there a way for us to insert a recovery middleware somewhere to prevent some unexpected panic from the code.
On Fri, 1 Apr 2016 at 09:58 Qi Zhao <zh...@google.com> wrote:
The general rule is that the code should NOT crash/panic due to the error from the peer (e.g., the peer sends you a malformed msg) but it is okay to panic when dealing with a local fatal error which might direct the library into an undefined state. The code you pointed out is dealing with a local marshaling error and as I mentioned in the comments it is almost always a fatal error. 


On Thu, Mar 31, 2016 at 6:01 PM, Johan Jian An Sim <joha...@gmail.com> wrote:
Instead of panicking and exit the program in one of the bad request due to acesssing nil pointer or etc, is there a way to gracefully recovery from it?

Recently, I have wrongfully returned a (nil, nil) from one of the handler and it causes the program to exit (https://github.com/grpc/grpc-go/blob/master/server.go#L421) due to the marshaling error. I could easily correct this error but I am wondering if somewhere between the code, I have accidentally done something wrongly and causes a panic and would like a way to recover from it. One way to do so could be to insert the recovery block in every handler. Is there a better place to do so?

The panic and exit situation is actually worsen by the fact that my program is running inside a Kubernetes cluster. So the bad request actually killed the server and Kubernetes cluster will try to bring back the program. The bad request is then sent again by the sender due to not receiving the response and the cycle continues.
Please refer to my previous arguments. This is NOT the case in gRPC-Go. The peer error should not kill the process. 

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/9e5ca88b-6237-4ef0-b585-f4c2688e1ab4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks,
-Qi



--
Thanks,
-Qi

Johan Sim

unread,
Apr 1, 2016, 5:20:41 AM4/1/16
to Qi Zhao, grpc.io
Basically just a way to prevent putting the recover in the beginning of every handler such as:

func (s *Server) DoSomeRequest(ctx context.Context, req *SomeRequest) (res *SomeResponse, err error) {
defer func() {
if r := recover(); r != nil {
err = grpc.Errorf(codes.Internal, "Something is very wrong")
}
}()

        ......

Qi Zhao

unread,
Apr 1, 2016, 5:52:28 PM4/1/16
to Johan Sim, grpc.io
okay, I am talking about the fatal errors in grpc lib and it seems you want to recover some non-fatal errors in your applications. Your case can be easily done in the interceptor which is be out very soon (in process of some google internal code review for some codegen change).

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/9e5ca88b-6237-4ef0-b585-f4c2688e1ab4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks,
-Qi



--
Thanks,
-Qi



--
Thanks,
-Qi
Reply all
Reply to author
Forward
0 new messages