Issue with Pantheon multi-processing testing

62 views
Skip to first unread message

贾连晨

unread,
Apr 17, 2024, 9:02:53 AM4/17/24
to Pantheon


Hello,

I'm trying to take advantage of the server's multiple CPU cores to perform tests quickly. However, I've encountered a problem. When I run the tests, I get the error "Failed to connect to tunnel server after 5 tries, exiting.." It seems that the server and client cannot establish a normal channel.

To analyze the problem, I added some output statements in the code. In `tunnelservershell.cc`, before line 135 `send_wrapper_only_datagram( listening_socket, (uint64_t) -2 );`, I added `cerr<<"return "<<listening_socket.local_address().ip()<<" "<<listening_socket.local_address().port()<<endl;`. In `tunnelclientshell.cc`, at line 131 `send_wrapper_only_datagram( server_socket, (uint64_t) -1 );`, I added `cerr<<"send:"<<server_socket.local_address().ip()<<" "<<server_socket.local_address().port()<<endl;`.

After observing the input and output, I found that the output contents are as follows:

```
send:100.64.0.2 41649
Tunnelserver got connection from tunnelclient
return 100.64.0.1 57822
Tunnelclient got connection from tunnelserver at 100.64.0.1 41649
Tunnel is connected
[tsm] tunnel 2 python /home/jlc/disk/pantheon/src/wrappers/cubic.py receiver 45667 None
Tunnelclient received no response from tunnelserver, retrying 1/5
send:100.64.0.2 48302
Tunnelclient received no response from tunnelserver, retrying 2/5
send:100.64.0.2 48302
Tunnelclient received no response from tunnelserver, retrying 3/5
send:100.64.0.2 48302
[tcm] tunnel 1 python /home/jlc/disk/pantheon/src/wrappers/cubic.py sender 100.64.0.3 44859 None 1
[tcm] tunnel 2 python /home/jlc/disk/pantheon/src/wrappers/cubic.py sender 100.64.0.3 45667 None 2
Tunnelclient received no response from tunnelserver, retrying 4/5
send:100.64.0.2 48302
Tunnelclient received no response from tunnelserver, retrying 5/5
send:100.64.0.2 48302
Failed to connect to tunnel server after 5 tries, exiting..
Tunnel connection timeout
[tcm] tunnel 1 mm-tunnelclient $MAHIMAHI_BASE 36048 100.64.0.4 100.64.0.3 --ingress-log=/home/jlc/disk/pantheon/tmp/cubic_acklink_run1_flow1_uidcffa5765-98d2-4b87-a57d-ee6e05676611.log.ingress --egress-log=/home/jlc/disk/pantheon/tmp/cubic_datalink_run1_flow1_uidcffa5765-98d2-4b87-a57d-ee6e05676611.log.egress
[tcm] tunnel 1 readline
Tunnelclient listening for server on port 46617
send:100.64.0.2 46617
Tunnelclient received no response from tunnelserver, retrying 1/5
send:100.64.0.2 46617
Tunnelclient received no response from tunnelserver, retrying 2/5
send:100.64.0.2 46617
Tunnelclient received no response from tunnelserver, retrying 3/5
send:100.64.0.2 46617
Tunnelclient received no response from tunnelserver, retrying 4/5
send:100.64.0.2 46617
Tunnelclient received no response from tunnelserver, retrying 5/5
send:100.64.0.2 46617
Failed to connect to tunnel server after 5 tries, exiting..
Tunnel connection timeout
[tcm] tunnel 1 mm-tunnelclient $MAHIMAHI_BASE 36048 100.64.0.4 100.64.0.3 --ingress-log=/home/jlc/disk/pantheon/tmp/cubic_acklink_run1_flow1_uidcffa5765-98d2-4b87-a57d-ee6e05676611.log.ingress --egress-log=/home/jlc/disk/pantheon/tmp/cubic_datalink_run1_flow1_uidcffa5765-98d2-4b87-a57d-ee6e05676611.log.egress
[tcm] tunnel 1 readline
Tunnelclient listening for server on port 58701
send:100.64.0.2 58701
Tunnelclient received no response from tunnelserver, retrying 1/5
send:100.64.0.2 58701
Tunnelclient received no response from tunnelserver, retrying 2/5
send:100.64.0.2 58701
Tunnelclient received no response from tunnelserver, retrying 3/5
send:100.64.0.2 58701
```

It looks like there is a port number error. I have included the complete log and the `test_mp.py` file that reproduces this problem in the appendix.
log.txt
test_mp.py
Reply all
Reply to author
Forward
0 new messages