[BUG] Fail to setup connection when running multiple Indigo flows

101 views
Skip to first unread message

Xudong Liao

unread,
Aug 12, 2021, 2:24:02 AM8/12/21
to Pantheon
Hi,

I am Xudong Liao, a PhD student at HKUST. I am using Pantheon to evaluate the performance of various congestion control algorithms. Thanks for providing such an excellent playground. 

As described in the title. I faced a problem when running multiple Indigo flows in Pantheon with the emulated link. Currently, I didn't find the root cause of this bug and would appreciate your help.

The following is the shell cmds and corresponding error message, as well as the output of tcpdump. Some observations of this bug are also included.

The shell cmd (CMD1) of running Indigo is: 

python src/experiments/test.py local --schemes "indigo" --runtime 150 -f 3 --interval 40 --uplink-trace src/experiments/12mbps.trace --downlink-trace src/experiments/12mbps.trace --append-mm-cmds "--uplink-queue=droptail --uplink-queue-args=bytes=1000000" --prepend-mm-cmds "mm-delay 10".

I tried to vary the shell command to run pantheon and found:
- This error seems to only happen in running schemes that are sender_first, such as Indigo and QUIC. Running other receiver_first schemes such as CUBIC, Copa didn't see the connection error.
- This error would occur when "mm-delay" or "mm-loss" was used in "prepend-mm-cmd". That is, the following shell command works well:

python src/experiments/test.py local --schemes "indigo" --runtime 150 -f 3 --interval 40 --uplink-trace src/experiments/12mbps.trace --downlink-trace src/experiments/12mbps.trace --append-mm-cmds "--uplink-queue=droptail --uplink-queue-args=bytes=1000000" 

The error message of running CMD1 is: 
===============
Testing scheme indigo for experiment run 1/1...
$ /home/xudong/tmp/pantheon/src/wrappers/indigo.py run_first
[tunnel server manager (tsm)] $ python /home/xudong/tmp/pantheon/src/experiments/tunnel_manager.py
tunnel manager is running
prompt [tsm]
[tunnel client manager (tcm)] $ mm-delay 10 mm-link src/experiments/12mbps.trace src/experiments/12mbps.trace --uplink-log=/home/xudong/tmp/pantheon/src/experiments/data/indigo_mm_datalink_run1.log --downlink-log=/home/xudong/tmp/pantheon/src/experiments/data/indigo_mm_acklink_run1.log --uplink-queue=droptail --uplink-queue-args=bytes=1000000 python /home/xudong/tmp/pantheon/src/experiments/tunnel_manager.py
tunnel manager is running
prompt [tcm]
[tsm] tunnel 1 mm-tunnelserver --ingress-log=/home/xudong/tmp/pantheon/tmp/indigo_datalink_run1_flow1_uidfa235f5f-a35c-40b0-baf7-64d7fa0f789a.log.ingress --egress-log=/home/xudong/tmp/pantheon/tmp/indigo_acklink_run1_flow1_uidfa235f5f-a35c-40b0-baf7-64d7fa0f789a.log.egress
[tsm] tunnel 1 readline
[tcm] tunnel 1 mm-tunnelclient $MAHIMAHI_BASE 42201 100.64.0.4 100.64.0.3 --ingress-log=/home/xudong/tmp/pantheon/tmp/indigo_acklink_run1_flow1_uidfa235f5f-a35c-40b0-baf7-64d7fa0f789a.log.ingress --egress-log=/home/xudong/tmp/pantheon/tmp/indigo_datalink_run1_flow1_uidfa235f5f-a35c-40b0-baf7-64d7fa0f789a.log.egress
[tcm] tunnel 1 readline
Tunnelclient listening for server on port 52104
Tunnel is connected
[tcm] [tsm] tunnel 1 python /home/xudong/tmp/pantheon/src/wrappers/indigo.py sender 40051
tunnel 2 mm-tunnelserver --ingress-log=/home/xudong/tmp/pantheon/tmp/indigo_datalink_run1_flow2_uid28f93ad7-de4a-4075-a48d-ec867a3f1863.log.ingress --egress-log=/home/xudong/tmp/pantheon/tmp/indigo_acklink_run1_flow2_uid28f93ad7-de4a-4075-a48d-ec867a3f1863.log.egress
[tsm] tunnel 2 readline
[tcm] tunnel 2 mm-tunnelclient $MAHIMAHI_BASE 36323 100.64.0.4 100.64.0.3 --ingress-log=/home/xudong/tmp/pantheon/tmp/indigo_acklink_run1_flow2_uid28f93ad7-de4a-4075-a48d-ec867a3f1863.log.ingress --egress-log=/home/xudong/tmp/pantheon/tmp/indigo_datalink_run1_flow2_uid28f93ad7-de4a-4075-a48d-ec867a3f1863.log.egress
[tcm] tunnel 2 readline
Tunnelclient listening for server on port 54989
Tunnel is connected
[tcm] tunnel 2 python /home/xudong/tmp/pantheon/src/wrappers/indigo.py sender 36787
[tsm] tunnel 3 mm-tunnelserver --ingress-log=/home/xudong/tmp/pantheon/tmp/indigo_datalink_run1_flow3_uid361eef63-b1bf-4732-b71e-606ecff395b8.log.ingress --egress-log=/home/xudong/tmp/pantheon/tmp/indigo_acklink_run1_flow3_uid361eef63-b1bf-4732-b71e-606ecff395b8.log.egress
[tsm] tunnel 3 readline
[tcm] tunnel 3 mm-tunnelclient $MAHIMAHI_BASE 48974 100.64.0.4 100.64.0.3 --ingress-log=/home/xudong/tmp/pantheon/tmp/indigo_acklink_run1_flow3_uid361eef63-b1bf-4732-b71e-606ecff395b8.log.ingress --egress-log=/home/xudong/tmp/pantheon/tmp/indigo_datalink_run1_flow3_uid361eef63-b1bf-4732-b71e-606ecff395b8.log.egress
[tcm] tunnel 3 readline
Tunnelclient listening for server on port 33384
Tunnel is connected
[tcm] tunnel 3 python /home/xudong/tmp/pantheon/src/wrappers/indigo.py sender 41739
[sender] Listening on port 40051
[sender] Listening on port 36787
[sender] Listening on port 41739

[tsm] tunnel 1 python /home/xudong/tmp/pantheon/src/wrappers/indigo.py receiver 100.64.0.4 40051
[sender] Handshake success! Receiver's address is 100.64.0.3:49698

[tsm] tunnel 2 python /home/xudong/tmp/pantheon/src/wrappers/indigo.py receiver 100.64.0.4 36787
[receiver] Handshake timed out and retrying...
Traceback (most recent call last):
  File "/home/xudong/tmp/pantheon/third_party/indigo/env/run_receiver.py", line 40, in <module>
    main()
  File "/home/xudong/tmp/pantheon/third_party/indigo/env/run_receiver.py", line 31, in main
    receiver.handshake()
  File "/home/xudong/tmp/pantheon/third_party/indigo/env/receiver.py", line 66, in handshake
    self.sock.sendto('Hello from receiver', self.peer_addr)
socket.error: [Errno 101] Network is unreachable
Traceback (most recent call last):
  File "/home/xudong/tmp/pantheon/src/wrappers/indigo.py", line 33, in <module>
    main()
  File "/home/xudong/tmp/pantheon/src/wrappers/indigo.py", line 28, in main
    check_call(cmd)
  File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/xudong/tmp/pantheon/third_party/indigo/env/run_receiver.py', '100.64.0.4', '36787']' returned non-zero exit status 1
^Ckill_proc_group: killed process group with pgid 20332
kill_proc_group: killed process group with pgid 20333
kill_proc_group: killed process group with pgid 20351
kill_proc_group: killed process group with pgid 20352
kill_proc_group: killed process group with pgid 20360
kill_proc_group: killed process group with pgid 20356
kill_proc_group: killed process group with pgid 20363
tunnel_manager: caught signal 15 and cleaned up

Traceback (most recent call last):
  File "src/experiments/test.py", line 808, in main
    run_tests(args)
  File "src/experiments/test.py", line 782, in run_tests
    Test(args, run_id, cc).run()
  File "src/experiments/test.py", line 735, in run
    if not self.run_congestion_control():
  File "src/experiments/test.py", line 686, in run_congestion_control
    return self.run_with_tunnel()
  File "src/experiments/test.py", line 569, in run_with_tunnel
    if not self.run_second_side(send_manager, recv_manager, second_cmds):
  File "src/experiments/test.py", line 501, in run_second_side
    time.sleep(self.interval)
KeyboardInterrupt
Error in tests!
===============

I used tcpdump to capture the packet trace of pantheon-tunnel. The following is the output of tcpdump on tunnel2's UDPsocket in running CMD1.
===============
$ sudo tcpdump -B 4096 -i any -n port 36323 -vvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
13:34:10.263775 IP (tos 0x0, ttl 64, id 38719, offset 0, flags [DF], proto UDP (17), length 83)
    100.64.0.1.36323 > 100.64.0.2.54989: [udp sum ok] UDP, length 55
^C
1 packet captured
1 packet received by filter
0 packets dropped by kernel
===============


The error message of running CMD1 with QUIC is:
===============
[tsm] tunnel 2 python /home/xudong/tmp/pantheon/src/wrappers/quic.py receiver 100.64.0.4 41855
[0812/140834.022588:ERROR:quic_connection.cc(1650)] Client: Write failed with error: -109 (Unknown error -109)
[0812/140834.022693:ERROR:quic_connection.cc(1563)] Client: failed writing 1350 bytes from host Uninitialized address to address 100.64.0.4:41855 with error code -109
Failed to connect to 100.64.0.4:41855. Error: QUIC_PACKET_WRITE_ERROR
[0812/140835.103378:ERROR:quic_simple_client.cc(77)] Connect failed: ERR_ADDRESS_UNREACHABLE
Failed to initialize client.
[0812/140836.179137:ERROR:quic_simple_client.cc(77)] Connect failed: ERR_ADDRESS_UNREACHABLE
Failed to initialize client.
[0812/140837.255141:ERROR:quic_simple_client.cc(77)] Connect failed: ERR_ADDRESS_UNREACHABLE
Failed to initialize client.
[0812/140838.331121:ERROR:quic_simple_client.cc(77)] Connect failed: ERR_ADDRESS_UNREACHABLE
Failed to initialize client.
===============

Thanks for your time.

Best regards,
Xudong


Francis Y. Yan

unread,
Aug 16, 2021, 3:25:17 PM8/16/21
to Xudong Liao, Pantheon
Hello Xudong,

Thanks for your email. I wish I could reproduce the problem on my side -- the "Network is unreachable" error is not common and seems to be a routing issue, which can be tricky to resolve.

Could you try creating two Pantheon-tunnels (mm-tunnelserver/mm-tunnelclient) following the printed commands and run Indigo inside them? Can you still see the unreachable network error?

Best,
Francis

--
You received this message because you are subscribed to the Google Groups "Pantheon" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pantheon-stanf...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pantheon-stanford/0c876f9f-6aca-4e3f-8427-f041fa5c1a74n%40googlegroups.com.

Xudong Liao

unread,
Aug 17, 2021, 1:35:41 AM8/17/21
to Francis Y. Yan, Pantheon
Hello Francis,

Thanks for your reply. I tried to manually launch the Indigo sender and receiver inside the Pantheon-tunnels. It works well.

Best regards,
Xudong

Francis Y. Yan

unread,
Aug 17, 2021, 1:18:10 PM8/17/21
to Xudong Liao, Pantheon
Uhmm then I'm stumped why the automated scripts didn't work.

Best,
Francis
Reply all
Reply to author
Forward
0 new messages