Strange nREPL network issue

279 views
Skip to first unread message

Colin Fleming

unread,
Mar 5, 2014, 5:56:28 AM3/5/14
to clojur...@googlegroups.com
 Hi all,

One of my users is reporting a very strange issue with nREPL, Cursive #290. The summary is that for their large main application, when they connect with the REPL from Cursive they get a ConnectException: Operation timed out. The REPL is started with Leiningen, and this works fine from the command line. This affects everyone in their office and also happens when they work from home, which makes me think it's not an obvious issue with their network config, at least. Oddly, smaller toy projects work fine so I think I can rule out firewall issues.

One potential problem I've found is that I was binding the server to 127.0.0.1 but connecting the client to localhost. I saw in the lein code that they changed that a while back because of unspecified IPV6 issues. Reading around, I believe that's just in case the user has ::1 ahead of 127.0.0.1 in their hosts file. In any case if that were the problem I'd expect a connection refused, not a timeout. Either way, I've given them a dev build to test and I'm waiting to hear back.

The code I'm using to connect is fairly unexceptional:

(repl/print state (str "Connecting to " description "...\n"))
(if-let [connection (try
                      (nrepl/connect :host host :port port)
                      (catch Exception e
                        (print-exception state "Error connecting" e)
                        (stop state)
                        nil))]

According to their report, the error is definitely in the nrepl/connect call.

One thing that is different between Cursive and lein is the JVM used. They're on macs, so they're running Cursive under an Apple 1.6 JVM but lein under an Oracle (or maybe OpenJDK, now that I think of it) 1.7 JVM. This gave them previous issues with client certs when connecting to clojars, but I'm not sure how that would affect this.

Does anyone have any idea what might be going on here? I must admit I'm running out of ideas - any suggestions gratefully accepted.

Cheers,
Colin

Chas Emerick

unread,
Mar 18, 2014, 5:04:54 AM3/18/14
to clojur...@googlegroups.com
This is absolutely a networking issue, esp. given the mention that "when disabling the WiFi it starts up instantly".  As far as home vs. office, the discussion doesn't mention whether it's the same machines in question or not; if so, then that convinces me further that there might be something unusual about the network config in question.

That it is a timeout instead of a connection refusal is confusing, for sure.  Not sure what to say there.

I'm far from a network engineer, so my tangible advice is going to be minimal.  I'd ask them for the output of `ifconfig -a`, and see if there's anything obviously fishy, e.g. two interfaces with the same IP or something.

Finally, if they're patient enough to attempt to telnet to the ostensibly open nREPL endpoint, that might yield a clue.  You could give them a bencode command string to send, and see what comes back (or doesn't).

Hope that's not entirely useless.

Good luck,

- Chas
--
You received this message because you are subscribed to the Google Groups "clojure-tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-tool...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Colin Fleming

unread,
Mar 18, 2014, 8:31:32 AM3/18/14
to clojur...@googlegroups.com
Thanks Chas, I'll try the telnet idea if they're willing, that sounds like it might throw up what's happening.

Cheers,
Colin

Colin Fleming

unread,
Mar 19, 2014, 4:50:04 AM3/19/14
to guns, clojur...@googlegroups.com, Colin Fleming
Thanks for the suggestions everyone. In the latest version of Cursive I now connect to 127.0.0.1 as Leiningen does and that helps, so I'm going with guns's suggestion about some OSX DNS funkiness. I'll read that post carefully and make sure I understand it since this is the second OSX DNS weirdness report I've had recently.

Thanks again for all the advice!

Cheers,
Colin


On 19 March 2014 02:05, guns <se...@sungpae.com> wrote:
anagrius, from https://github.com/cursiveclojure/cursive/issues/290

> All dependencies have already been downloaded and when disabling the

> WiFi it starts up instantly.

On Wed  5 Mar 2014 at 11:56:28PM +1300, Colin Fleming wrote:

> They're on macs,

> One potential problem I've found is that I was binding the server to
> 127.0.0.1 but connecting the client to localhost. I saw in the lein
> code that they changed that a while back because of unspecified IPV6
> issues. Reading around, I believe that's just in case the user has ::1
> ahead of 127.0.0.1 in their hosts file. In any case if that were the
> problem I'd expect a connection refused, not a timeout. Either way,
> I've given them a dev build to test and I'm waiting to hear back.

You might get a timeout if DNS resolution for "localhost" fails. I can
think of one outrageous scenario where this might be the case:

- User has accidentally deleted the "127.0.0.1 localhost" entry from
  /etc/hosts (perhaps while adding locally resolving domains for web
  development or ad blocking)

- User is using OS X, which infuriatingly changes the active DNS
  resolver depending on which interface is active and if an Internet
  connection is available¹

- The User's top DNS resolver during a WiFi connection can resolve
  "localhost", but the top resolver when offline cannot.

This sounds pretty unlikely, I admit, but I'd put some money on it if
was feeling lucky.

    guns

¹ cf. my serverfault answer on a related topic:
  https://serverfault.com/questions/22419/set-dns-server-on-os-x-even-when-without-internet-connection/164215#164215

Laurent PETIT

unread,
Apr 3, 2014, 3:16:25 PM4/3/14
to clojur...@googlegroups.com, guns, Colin Fleming
Yeah I also changed every "localhost" occurence to 127.0.0.1 a while back for ccw, and it helped (not heard about such problems since then)


Reply all
Reply to author
Forward
0 new messages