Getting "all SubConns are in TransientFailure" sending to local grpc service.

16,030 views
Skip to first unread message

Paul Breslin

unread,
Nov 13, 2017, 4:36:19 PM11/13/17
to grpc.io

I'm running local grpc services under Docker for Mac. All has been fine but today I started getting intermittent failures:
rpc error: code = Unavailable desc = all SubConns are in TransientFailure
when my test code sends a message to one of the services. The test code also runs inside a docker container.

Sometime restarting the docker daemon would make this go away but for some reason the problem is now happening consistently.

I've tried updating to the latest stable Docker for Mac and updating to the current grpc release code.

I'm stuck - not sure what to try next. We're currently using: go version go1.8.3 linux/amd64

Suggestions are welcome.

Menghan Li

unread,
Nov 14, 2017, 6:59:33 PM11/14/17
to grpc.io
This error means the connection to your server is down for some reason.
Can you look at the client side logs and see if there's anything interesting there?

Thanks,
Menghan

yuf...@chope.co

unread,
Dec 4, 2017, 7:22:22 AM12/4/17
to grpc.io
Hi Paul,

can i ask did you have solved the issue. i have the same problem..

Paul Breslin

unread,
Dec 4, 2017, 8:44:41 AM12/4/17
to grpc.io
We didn't really solve it but discovered a work-around. For some reason if I start my services in one script and then run the tests from a separate script it seems to work fine. So it may have to do with some extra delay time between starting the containers and then attempting to run the client code.

rav...@gmail.com

unread,
Dec 21, 2017, 9:57:42 PM12/21/17
to grpc.io
How to fix / debug such issue?

I keep getting this error:
rpc error: code = Unavailable desc = all SubConns are in TransientFailure

The same client - server logic works fine if I remove the TLS credentials ... any help to resolve would be appreciated!

Yufeng Liu

unread,
Dec 21, 2017, 10:18:11 PM12/21/17
to rav...@gmail.com, grpc.io
Hi Ravijo,

I have fixed the issue, I just change the service code below. The cert is bought normal cert from “https://www.rapidssl.com/“. 

certificate, err := credentials.NewServerTLSFromFile(conf.CRT, conf.KEY)
    if err != nil {
        log.Errorf("could not load server key pair: %s", err)
    }

I don’t know that can help you anything.


-- 
You received this message because you are subscribed to a topic in the Google Groups "grpc.io" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/grpc-io/yCUwuHycNWk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/44f304ae-3d12-49ce-9931-0b8608be6a27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ravi

unread,
Dec 22, 2017, 2:08:42 PM12/22/17
to grpc.io
Hi Yufeng,

My server side code exactly like yours.
My certificates and keys are fine, because when I plug them into example route_guide code (grpc-go/examples/route_guide) they work.

My server-client logic is also fine without certificates. The moment I enable certificates, I get this error:
rpc error: code = Unavailable desc = all SubConns are in TransientFailure


My Server side code:
   
    lis, err := net.Listen("tcp", port)
   
if err != nil {
       
return fmt.Errorf("Failed to listen: %s", err)
   
}
    creds
, err := credentials.NewServerTLSFromFile(certFile, keyFile)
   
if err != nil {
       
return fmt.Errorf("could not load keys: %s", err)
   
}

    opts
:= []grpc.ServerOption{grpc.Creds(creds)}
    grpcServer
:= grpc.NewServer(opts...)

    pb
.RegisterHelloServer(grpcServer, newServer())

   
if err := grpcServer.Serve(lis); err != nil {
       
return fmt.Errorf("Failed to start Hello Server: %s", err)
   
}


My Client side code:
================

 
    creds, err := credentials.NewClientTLSFromFile(certFile, "")
   
if err != nil {
        log
.Fatalf("could not load cert: %s", err)
   
}
    conn
, err = grpc.Dial(port, grpc.WithTransportCredentials(creds))
   
if err != nil {
        log
.Fatalf("Failed to connect to server: %s", err)
       
return
   
}

    defer conn
.Close()
    c
:= pb.NewHelloClient(conn)

    r, err := c.HelloServer(context.Background(), &pb.Request{Name: "Myname", Id:10})


Josh Humphries

unread,
Dec 22, 2017, 6:43:34 PM12/22/17
to Ravi, grpc.io
If you use a custom dialer, specify the "insecure" dial option in the GRPC client, but then handle TLS in your custom dialer, you can get at the actual error messages that are causing the transport failure.

Here's an example I used in a command-line tool, where I wanted to be able to show users a good error message when there was a TLS issue preventing things from working:
https://github.com/fullstorydev/grpcurl/blob/master/grpcurl.go#L916

I've considered filing a bug with the grpc-go project about this. The ClientConn has information about the actual errors that cause the SubConn transient failure, but provide no API to access it (like for logging/error reporting): https://github.com/grpc/grpc-go/blob/master/clientconn.go#L989.



----
Josh Humphries
jh...@bluegosling.com

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.

Ravi Jonnadula

unread,
Dec 22, 2017, 8:56:40 PM12/22/17
to Josh Humphries, grpc.io
Hi Josh,

Thanks for sharing your thoughts.

In my case, grpc.Dial is successful, there is no error for this call.
The error occurs when the rpc call is invoked.

Josh Humphries

unread,
Dec 22, 2017, 10:55:11 PM12/22/17
to Ravi Jonnadula, grpc.io
Hi, Ravi,
Yes, I understand. That is because grpc.Dial doesn't actually return an error just because there are issues establishing socket connections -- it asynchronously starts a client that will transparently retry dialing as needed (possibly continuously dialing, with some backoff, depending on the nature of the connection failure).

While you can try to use dial options grpc.WithBlock() and grpc.FailOnNonTempDialError(true), in my experience this still usually results in only a timeout error from grpc.Dial. In order to get visibility into the actual errors, you need a custom dialer that also performs the TLS handshake so that you can adequately capture the error (log it or otherwise). This will likely shed much light on why all connections are always in transient failure state.



----
Josh Humphries
jh...@bluegosling.com

rav...@gmail.com

unread,
Jan 2, 2018, 2:14:48 PM1/2/18
to grpc.io
Hi Josh,

Thanks a lot for your help.

Yes, the custom dialer helped to understand the actual error message (in my case the certificate is valid for the "hostname", but the client is using "localhost" to connect to the server).
After changing it, things started to work.

thanks.

----
Josh Humphries
jh...@bluegosling.com


----
Josh Humphries
jh...@bluegosling.com

To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages