Help debugging handin client connection problem

69 views
Skip to first unread message

William J. Bowman

unread,
Sep 18, 2021, 7:13:13 PM9/18/21
to Racket Users
I need some help debugging an issue with the handin package. The handin plugin (client) displays “Making secure connection to <handin server> …”, and simply hangs. Closing the dialog and trying again never resolves the issue.

The only method that seems to resolve the issue, although inconsistently, is restarting DrRacket, opening a new file, and trying to submit that new file. This sometimes, but not always, enables the client to connect. Once it does connect, the issue doesn't seem to recur for some time. The client can make multiple successful submissions, at least until the end of lecture (maybe related to the next time they disconnect/reconnect to the internet).

We running Racket 7.8 on the server and 8.1 BC on the clients. We've seen the issue occur on many operating system---old and new versions of macOS, Windows 10, and at one report on Linux.

I can't just upgrade the clients to 8.2, since there's a bug in 8.2 that affects rendering inexact numbers in BSL, so I really want some confidence about what the issue is before I start upgrading versions.

Anecdotally, the problem seems more common this semester compared to the previous semester, and we upgraded the clients to 8.1 this semester, suggesting the clients are at fault.

When this problem occurs, there is nothing in the log on the handin server, suggesting the client did not even manage to initiate the connection to the server. In particular, the server never seems to make it to this log line:
https://github.com/racket/handin/blob/ac08937cc6b1eca8abe3d4d4df59876f95cbea17/handin-server/main.rkt#L679
This is one the earliest log lines and before pretty much anything happens, so we're *PRETTY SURE* the client is blocking.

Right now, my best guess is that we might be affected by this bug, which causes SSL ports to block incorrectly:
https://github.com/racket/racket/issues/3804

If so, it would probably be in the client, unless `(ssl-addresses r)` can block in the same way on the server, since otherwise the above log line would execute.

However, if it is the client, I don't have any explanation about why restarting DrRacket would workaround the bug, or why it sometimes doesn't work.

I'd appreciate any help.

--
William J. Bowman

William J. Bowman

unread,
Sep 18, 2021, 9:59:55 PM9/18/21
to Sam Tobin-Hochstadt, Racket Users
I just tried this, but I can't seem to connect.
http://cs110.students.cs.ubc.ca:7979/
gives "connection reset", and
https://cs110.students.cs.ubc.ca:7979/
gives "secure connection failed".

There's no prompt to accept the certificate (which I wouldn't expect, because we're using a CA signed certificate through Let's Encrypt, not a self-signed certificate).

I'm currently experiencing the problem on my own client. I'm not sure if that's related; I also couldn't connect from my phone.

--
William J. Bowman

On Sat, Sep 18, 2021 at 09:24:05PM -0400, Sam Tobin-Hochstadt wrote:
> Have you tried visiting the server with a browser? That should work,
> although you'll have to accept the certificate. It might also indicate some
> aspect of the behavior.
>
> Sam
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Racket Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to racket-users...@googlegroups.com.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/racket-users/YUZyWlsY9CdCDyPu%40williamjbowman.com
> > .
> >

William J. Bowman

unread,
Sep 18, 2021, 11:06:10 PM9/18/21
to Sam Tobin-Hochstadt, Racket Users
Since I'm currently experiencing the issue, I've been able to get some better data. I've managed to reproduce it in 8.2.0.2 CS, which suggests it's not https://github.com/racket/racket/issues/3804.

Restarting twice DrRacket hasn't helped, nor has resetting my wifi connection.

After connecting via a browser, I notice a lot of the following in the log that seem to correlate with my attempts in the browser:
> [-|2021-09-18T19:37:45] handin: unknown protocol: #"GET / HTTP/1.1"
> ...
> [-|2021-09-18T19:37:53] ERROR: ssl-accept/enable-break: accept failed (error:1408F09C:SSL routines:ssl3_get_record:http request)

As expected, nothing seem to correlate with my attempts to connect from the handin plugin.

This makes me suspect the server, but I can't reconcile that with why there's nothing in the logs.

--
William J. Bowman
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/YUaZj9v0Lch0jfMC%40williamjbowman.com.

William J. Bowman

unread,
Sep 18, 2021, 11:29:07 PM9/18/21
to Racket Users
I've confirmed it's definitely client side, by redirecting the handin server's address to 127.0.0.1 in /etc/hosts, and listening with `nc -l`. The handin client hangs on "Making secure connection ..." and nc display nothing at all. A few restarts and `nc -l` displays a bunch of gibberish that I'm guessing is the handin protocol, and killing `nc` triggers the handin client to report a connection error.

So it's:
- handin client side
- maybe related to openssl
- nondeterministic
- when it occurs, it will recur until you restart DrRacket
- when it doesn't occur, it will not recur until you restart DrRacket
- affects 8.1 BC
- affects 8.1 CS
- affects 8.2.0.2 CS
- results in the client failing send anything to the network

--
William J. Bowman
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/YUao5ov6j7JCJHLW%40williamjbowman.com.

William J. Bowman

unread,
Sep 19, 2021, 11:58:51 PM9/19/21
to Racket Users
I think I've debugged the issue, but it's only present in our locally modified version of the client, although the root cause could affects others. In case others have minor modifications to the client, or anyone modifies the client in the future:

It was a race condition between some error checking logic and connection initialization. If the error occured before the connection initialized, then the connection would be hung. I'm guessing this is related to the `go-sema` but I'm not entirely sure.

We added some additional error checking that happens at line:
https://github.com/racket/handin/blob/ac08937cc6b1eca8abe3d4d4df59876f95cbea17/handin-client/client-gui.rkt#L353
We simply checked that the current file was saved and raised an error if not:
> (unless filename
> (report-error "File is not saved. Please save the file and try again."))

This occurs in parallel with initializing the connection:
https://github.com/racket/handin/blob/ac08937cc6b1eca8abe3d4d4df59876f95cbea17/handin-client/client-gui.rkt#L345

If the error checking raises an error before the connection is established, it seems that the connection logic completely hangs, and the connection can never be used.

We can't move the error checking BEFORE the initialization, since `report-error` relies on the `comm-cust` variable, which is initialized through mutation by `(init-comm)`.

Instead, I've moved the error reporting to happen AFTER the connection has definitely been established, right before a user tries to submit. This is a shame, since it principle it can happen in parallel with initialization, but I can figure out how to untangle this code enough to do that without risking the race condition.

--
William J. Bowman
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/YUauWYAeXzzk9lU/%40williamjbowman.com.
Reply all
Reply to author
Forward
0 new messages