Cosmos DB: Invalid handshake

324 views
Skip to first unread message

Dmitry N

unread,
Nov 4, 2019, 6:01:18 AM11/4/19
to Gremlin-users
Hello,

unable to connect to Azure Cosmos DB with Gremlin Console or Java Client 3.4.4: Invalid handshake response getStatus: 404 Not Found.

Connection works fine with 3.4.3

Issue occurs with endpoints .gremlin.cosmosdb.azure.com and .gremlin.cosmosdb.azure.com.

Configuration file:


hosts: [<name>.gremlin.cosmosdb.azure.com]
port: 443
username: /dbs/<db>/colls/<collection>
password: <password>
connectionPool: {
  enableSsl: true}
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { serializeResultToString: true }}



Any advice how could this be resolved?

Thank you in advance.

Full exception text:

➤ bin/gremlin.sh

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> :remote connect tinkerpop.server /home/dmitry/work/tinkerpop/config/cosmosdb.yaml
ERROR org.apache.tinkerpop.gremlin.driver.Handler$GremlinResponseHandler  - Could not process the response
io.netty.handler.codec.http.websocketx.WebSocketHandshakeException: Invalid handshake response getStatus: 404 Not Found
    at io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker13.verify(WebSocketClientHandshaker13.java:226)
    at io.netty.handler.codec.http.websocketx.WebSocketClientHandshaker.finishHandshake(WebSocketClientHandshaker.java:276)
    at org.apache.tinkerpop.gremlin.driver.handler.WebSocketClientHandler.channelRead0(WebSocketClientHandler.java:69)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297)
    at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
    at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1478)
    at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1227)
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1274)
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:617)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:534)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:748)


Regards,
Dmitry

Stephen Mallette

unread,
Nov 4, 2019, 1:12:57 PM11/4/19
to gremli...@googlegroups.com
Hmm, there weren't many changes to the Java driver in 3.4.4 - maybe this had something to do with it?


I also note this change in the CHANGELOG:

> Fixed Java driver authentication problems when calling the driver from multiple threads.

Note that this line really should have been removed because we actually ended up reverting that change.So, that issue should not be the problem. As I've not noticed this problem with other graph providers we would probably need some help trying to get to the bottom of this one.



--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/9d6fa2a5-5414-4f3e-86cc-75b83f6ddd8e%40googlegroups.com.

Oliver Towers

unread,
Nov 4, 2019, 11:36:28 PM11/4/19
to gremli...@googlegroups.com
Here's my guess as to what's happened:

Connecting to Cosmos DB Gremlin endpoint requires a host name which includes the Cosmos DB account name.

Cosmos DB has a single public IP address per region where inbound connections are load balanced across gateways instances. The gateway instance will attempt resolves the account by parsing it from the host name of the inbound connection.

This commit changed the way host URIs were built, switching from using host name to using the address instead.


So without the full host name, the gateway can't resolve the account name, resulting in NotFound error.

Assuming this is what's happening (will need to confirm), it seems like the change to using IP addresses should be behind a configuration?

Oliver

Stephen Mallette

unread,
Nov 5, 2019, 7:03:01 AM11/5/19
to gremli...@googlegroups.com
Oliver, thanks for that analysis. Please let us know if you can confirm that this is in fact the problem. 

> Assuming this is what's happening (will need to confirm), it seems like the change to using IP addresses should be behind a configuration?

maybe. this all seems really "hard" and i'm not so smart about gateways to the DNS to the hostname to the load balancer stuff anymore. if the answer is "configuration" that's fine, but it seems like this should be "easy" for users: provide an IP or hostname to whatever you're connecting to and the driver just "figures it out". does anyone know how to make it dead simple in such a way that we support all these different backend setups without more dials/buttons for users to fuss with?



Oliver Towers

unread,
Nov 5, 2019, 5:59:24 PM11/5/19
to gremli...@googlegroups.com
  I was able to repro the scenario as I described it, and confirmed from server logs. So the underlying issue is that using IP addresses for socket address will be invalid for Cosmos DB Gremlin.

One thing to note is that the error occurs at a different point depending on whether you run from a Linux or Windows environment.

Here is the output for the issue for Windows:

gremlin> :remote connect tinkerpop.server ../apache-tinkerpop-gremlin-console-3.4.4\conf\remote-secure.yaml
log4j:WARN No appenders could be found for logger (com.jcabi.manifests.Manifests).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
==>Configured olivert-dev.gremlin.cosmos.azure.com/40.65.106.154:443
gremlin> :> g.inject(0)
Host did not respond in a timely fashion - check the server status and submit again.

In terms of what a simple fix would be, I'm not sure.

Robert Dale

unread,
Nov 5, 2019, 7:55:40 PM11/5/19
to gremli...@googlegroups.com
I think we're going to have to back out https://issues.apache.org/jira/browse/TINKERPOP-2289 as a breaking change and discuss what it is we want to support.

Robert Dale


Stephen Mallette

unread,
Nov 6, 2019, 7:28:29 AM11/6/19
to gremli...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages