UDP packet loss (diagnosis help)

28 views
Skip to first unread message

Fernando Lopez-Lezcano

unread,
Feb 28, 2022, 2:25:13 PM2/28/22
to jacktri...@googlegroups.com, na...@ccrma.stanford.edu
Hi, anyone out there with some recipe to help debug UDP packet loss?
That is, to try to pinpoint _which_ router is actually dropping packets?

Long story: I am on Sonic fiber which has been a dream (moved over from
Concast which was a nightmare for network performances at the beginning
of the pandemic). Sonic has been high bandwith and low latency. Very
nice, no problems.

As of two weeks ago or so I started experiencing dropouts in our
Quarantine Sessions (using jacktrip, of course). I finally got time to
do some debugging yesterday and I can see packet loss in my link in only
one direction (coming from the server to me, when going from home to the
server there is no packet loss).

I still keep a Concast connection (for legacy reasons, I was hoping to
be done with it by now) and if I switch to that I see no packet loss.

Sonic is still investigating... sigh...

I can see where things are going through with a traceroute, is there any
way to try to pinpoint which router is dropping packets?

Thanks for any help...
-- Fernando


PS: a run of iperf3 showing the problem, this simulates a stereo
jacktrip connection (see the second "reverse mode" list)):

$ ~/software/jacktrip/tpf-netperf -n 2 -s 256
Configured options:
BITRES: 16 (-b)
SAMPLERATE: 48000 (-r)
BLOCKSIZE: 256 (-s)
CHANNELS: 2 (-n)
DURATION: 30s (-d)

Calculated parameters:
PACKETSIZE: 1040 bytes
PACKETRATE: 187.50/s
BANDWIDTH: 1.560Mb/s

-------------------------------------------------------------------
Now running test:
LOCAL >>>>>>>>>>>>>>> TELEMATIC server
Executing command:
iperf3 -u -c cm-toast.stanford.edu -p 4494 -l 1040 -b 1.560M -i 2 -t 30
-P 1
Connecting to host cm-toast.stanford.edu, port 4494
[ 5] local 192.168.42.7 port 55252 connected to 171.64.197.122 port 4494
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-2.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 2.00-4.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 4.00-6.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 6.00-8.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 8.00-10.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 10.00-12.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 12.00-14.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 14.00-16.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 16.00-18.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 18.00-20.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 20.00-22.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 22.00-24.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 24.00-26.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 26.00-28.00 sec 381 KBytes 1.56 Mbits/sec 375
[ 5] 28.00-30.00 sec 381 KBytes 1.56 Mbits/sec 375
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter
Lost/Total Datagrams
[ 5] 0.00-30.00 sec 5.58 MBytes 1.56 Mbits/sec 0.000 ms 0/5625
(0%) sender
[ 5] 0.00-30.01 sec 5.58 MBytes 1.56 Mbits/sec 0.226 ms 0/5625
(0%) receiver

iperf Done.
LOCAL <<<<<<<<<<<<<<< TELEMATIC server
Executing command:
iperf3 -u -c cm-toast.stanford.edu -R -p 4494 -l 1040 -b 1.560M -i 2 -t
30 -P 1
Connecting to host cm-toast.stanford.edu, port 4494
Reverse mode, remote host cm-toast.stanford.edu is sending
[ 5] local 192.168.42.7 port 33185 connected to 171.64.197.122 port 4494
[ ID] Interval Transfer Bitrate Jitter
Lost/Total Datagrams
[ 5] 0.00-2.00 sec 382 KBytes 1.56 Mbits/sec 0.053 ms 0/376 (0%)
[ 5] 2.00-4.00 sec 380 KBytes 1.56 Mbits/sec 0.029 ms 1/375
(0.27%)
[ 5] 4.00-6.00 sec 380 KBytes 1.56 Mbits/sec 0.043 ms 1/375
(0.27%)
[ 5] 6.00-8.00 sec 380 KBytes 1.56 Mbits/sec 0.056 ms 1/375
(0.27%)
[ 5] 8.00-10.00 sec 376 KBytes 1.54 Mbits/sec 0.047 ms 5/375
(1.3%)
[ 5] 10.00-12.00 sec 381 KBytes 1.56 Mbits/sec 0.043 ms 0/375 (0%)
[ 5] 12.00-14.00 sec 381 KBytes 1.56 Mbits/sec 0.029 ms 0/375 (0%)
[ 5] 14.00-16.00 sec 380 KBytes 1.56 Mbits/sec 0.016 ms 1/375
(0.27%)
[ 5] 16.00-18.00 sec 380 KBytes 1.56 Mbits/sec 0.046 ms 1/375
(0.27%)
[ 5] 18.00-20.00 sec 381 KBytes 1.56 Mbits/sec 0.067 ms 0/375 (0%)
[ 5] 20.00-22.00 sec 380 KBytes 1.56 Mbits/sec 0.009 ms 1/375
(0.27%)
[ 5] 22.00-24.00 sec 379 KBytes 1.55 Mbits/sec 0.019 ms 2/375
(0.53%)
[ 5] 24.00-26.00 sec 380 KBytes 1.56 Mbits/sec 0.010 ms 1/375
(0.27%)
[ 5] 26.00-28.00 sec 381 KBytes 1.56 Mbits/sec 0.049 ms 0/375 (0%)
[ 5] 28.00-30.00 sec 378 KBytes 1.55 Mbits/sec 0.049 ms 3/375
(0.8%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter
Lost/Total Datagrams
[ 5] 0.00-30.01 sec 5.58 MBytes 1.56 Mbits/sec 0.000 ms 0/5626
(0%) sender
[ 5] 0.00-30.00 sec 5.56 MBytes 1.56 Mbits/sec 0.049 ms 17/5626
(0.3%) receiver

iperf Done.

Mike O'Connor

unread,
Feb 28, 2022, 3:04:10 PM2/28/22
to Jacktrip-users
hi Fernando,

i like MTR for that kind of thing.  it's a one sided view though -- you'd have to try it from each end...

this is on Debian 11 -- installed with " apt install mtr "

the Loss column is not very useful, because lots of hops will rate-limit the pings MTR is sending.  the "last avg best wrst" are round-trip ping times (shows how close Google is cached to the Linode i'm using).  StDev is a good surrogate for jitter...


root@localhost:/usr/local/bin#  mtr -rwzc10 google.com
Start: 2022-02-28T19:54:45+0000
HOST: localhost                               Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS63949  2600:3c01::8678:acff:fe0d:a641   0.0%    10    1.1   1.0   1.0   1.1   0.1
  2. AS63949  2600:3c01:3333:2::1              0.0%    10    0.7   0.6   0.5   0.7   0.1
  3. AS63949  2600:3c01:3333:5::2              0.0%    10    0.8   3.3   0.5  26.7   8.2
  4. AS???    2001:678:34c:5d::2               0.0%    10    1.4   1.5   1.4   1.5   0.0
  5. AS15169  2607:f8b0:8319::1                0.0%    10    1.4   1.6   1.4   1.9   0.1
  6. AS15169  2001:4860:0:1::594e              0.0%    10    1.6   1.6   1.5   1.7   0.1
  7. AS15169  2001:4860:0:1004::f              0.0%    10    1.6   1.9   1.6   4.3   0.8
  8. AS15169  2001:4860::c:4002:292d           0.0%    10    2.8  25.1   2.5  67.1  29.3
  9. AS15169  2001:4860::1:0:c1af              0.0%    10    1.8   1.8   1.6   2.0   0.1
 10. AS???    ???                             100.0    10    0.0   0.0   0.0   0.0   0.0
 11. AS15169  2001:4860:0:1::1c19              0.0%    10    1.7   1.6   1.5   1.7   0.1
 12. AS15169  sfo03s07-in-x0e.1e100.net        0.0%    10    1.6   1.4   1.4   1.6   0.1

mike

--
You received this message because you are subscribed to the Google Groups "jacktrip-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jacktrip-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jacktrip-users/ddb0a259-02fd-4409-2bd2-915077345a19%40ccrma.stanford.edu.

Fernando Lopez-Lezcano

unread,
Mar 1, 2022, 1:16:16 AM3/1/22
to jacktri...@googlegroups.com, Mike O'Connor, na...@ccrma.stanford.edu
On 2/28/22 12:04 PM, Mike O'Connor wrote:
> hi Fernando,

Hey Mike,

> i like MTR for that kind of thing.  it's a one sided view though --
> you'd have to try it from each end...
>
> this is on Debian 11 -- installed with " apt install mtr "
>
> the Loss column is not very useful, because lots of hops will rate-limit
> the pings MTR is sending.  the "last avg best wrst" are round-trip ping
> times (shows how close Google is cached to the Linode i'm using).  StDev
> is a good surrogate for jitter...

Thanks for the incantation ... after posting I did find mtr (I already
had it installed, I think I tried to use it for the same purpose 2 years
ago). No conclusive evidence, as you wrote routers don't answer all
requests so it is impossible to know where the packets are lost.

It is weird, the packet loss does not seem to depend a lot on traffic,
that is, I start my jacktrip session and then send/receive a lot of udp
traffic - many times what jacktrip uses (using iperf between the same
two hosts) and nothing really changes in the jacktrip rate of glitches.
So it does not look like a traffic shaper. Maybe some internal buffer in
a router overflows periodically? This is going to be difficult.

Sigh... (they said they did not understand why it would be so negative
to loose a few packets ... so I explained, we'll see if they read or care)

Thanks for the tip!
-- Fernando


> root@localhost:/usr/local/bin#  mtr -rwzc10 google.com <http://google.com>
> <http://sfo03s07-in-x0e.1e100.net>        0.0%    10    1.6   1.4   1.4
>   1.6   0.1
>
> mike
>
>> On Feb 28, 2022, at 1:25 PM, Fernando Lopez-Lezcano
>> iperf3 -u -c cm-toast.stanford.edu <http://cm-toast.stanford.edu> -p
>> 4494 -l 1040 -b 1.560M -i 2 -t 30 -P 1
>> Connecting to host cm-toast.stanford.edu
>> <http://cm-toast.stanford.edu>, port 4494
>> iperf3 -u -c cm-toast.stanford.edu <http://cm-toast.stanford.edu> -R
>> -p 4494 -l 1040 -b 1.560M -i 2 -t 30 -P 1
>> Connecting to host cm-toast.stanford.edu
>> <http://cm-toast.stanford.edu>, port 4494
>> Reverse mode, remote host cm-toast.stanford.edu
>> <http://cm-toast.stanford.edu> is sending
>> <mailto:jacktrip-user...@googlegroups.com>.
>> <https://groups.google.com/d/msgid/jacktrip-users/ddb0a259-02fd-4409-2bd2-915077345a19%40ccrma.stanford.edu>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "jacktrip-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jacktrip-user...@googlegroups.com
> <mailto:jacktrip-user...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jacktrip-users/7591A5D7-029E-43D5-8BEA-B707B19679EF%40gmail.com
> <https://groups.google.com/d/msgid/jacktrip-users/7591A5D7-029E-43D5-8BEA-B707B19679EF%40gmail.com?utm_medium=email&utm_source=footer>.

Fernando Lopez-Lezcano

unread,
Mar 1, 2022, 2:07:52 AM3/1/22
to jacktri...@googlegroups.com, Mike O'Connor, na...@ccrma.stanford.edu
On 2/28/22 10:16 PM, Fernando Lopez-Lezcano wrote:
...
> It is weird, the packet loss does not seem to depend a lot on traffic,
> that is, I start my jacktrip session and then send/receive a lot of udp
> traffic - many times what jacktrip uses (using iperf between the same
> two hosts) and nothing really changes in the jacktrip rate of glitches.

I think I'm wrong in my thinking.

The extra traffic is going through a different set of ports, so it could
still be some traffic shaping or filtering taking place. I need some new
measurements to take that into consideration...

...

_and_ it looks like upping the packet size does not influence much the
average number of packets lost per second. See below for 30 second tests
that span from 64 fpp to 1024 fpp. In _average_ all of the tests are
dropping around 30 packets in 30 seconds pretty much independently of
how many packets are sent! (in a longer test it is about 0.943 packets
per second lost)

-- Fernando


== 64 fpp

[ ID] Interval Transfer Bitrate Jitter
Lost/Total Datagrams
[ 5] 0.00-30.01 sec 5.84 MBytes 1.63 Mbits/sec 0.000 ms 0/22504
(0%) sender
[ 5] 0.00-30.00 sec 5.83 MBytes 1.63 Mbits/sec 0.026 ms 29/22501
(0.13%) receiver

== 128 fpp

[ ID] Interval Transfer Bitrate Jitter
Lost/Total Datagrams
[ 5] 0.00-30.01 sec 5.67 MBytes 1.58 Mbits/sec 0.000 ms 0/11252
(0%) sender
[ 5] 0.00-30.00 sec 5.65 MBytes 1.58 Mbits/sec 0.021 ms 32/11251
(0.28%) receiver

== 256 fpp

[ ID] Interval Transfer Bitrate Jitter
Lost/Total Datagrams
[ 5] 0.00-30.01 sec 5.58 MBytes 1.56 Mbits/sec 0.000 ms 0/5626
(0%) sender
[ 5] 0.00-30.00 sec 5.54 MBytes 1.55 Mbits/sec 0.080 ms 38/5626
(0.68%) receiver

== 512 fpp

[ ID] Interval Transfer Bitrate Jitter
Lost/Total Datagrams
[ 5] 0.00-30.01 sec 5.54 MBytes 1.55 Mbits/sec 0.000 ms 0/2813
(0%) sender
[ 5] 0.00-30.00 sec 5.48 MBytes 1.53 Mbits/sec 0.071 ms 29/2813
(1%) receiver

== 1024 fpp

[ ID] Interval Transfer Bitrate Jitter
Lost/Total Datagrams
[ 5] 0.00-30.01 sec 5.51 MBytes 1.54 Mbits/sec 0.000 ms 0/1406
(0%) sender
[ 5] 0.00-30.00 sec 5.38 MBytes 1.51 Mbits/sec 0.078 ms 33/1406
(2.3%) receiver

Mike O'Connor

unread,
Mar 1, 2022, 9:55:18 AM3/1/22
to Jacktrip-users
hi Fernando

i'm doing a lot of Jacktrip testing these days and those periods of many dropouts drive me crazy too. i don't have any good theories, but i'm watching your progress with great interest so keep posting your results. i'm happy to join you if you ever need a hand with the tests.

one approach i've taken is to stand up a bunch of Linodes, looking for a really *bad* one. i had one that i treasured -- it was super fragile, really inefficient, etc. i used it as a development/test machine because i knew that it would instantly highlight problems with my code. unfortunately, i broke it and had to give it up. but maybe that "look for a weak link" approach would highlight your problem too.

oh. here's a thought. remember the old "log in with lots of sessions on offset ports" trick? i didn't realize that i could still do that with Hub server/clients. i stood up 10 Hub server sessions on a pretty-small (2 cpu) Linode and then started logging into them with port-offset Hub client sessions from here. the interesting thing is that the clients all connected and sent audio from the server's perspective. from my end, the first client thought it connected and the remaining ones didn't. they sat in the "waiting for peer" state, even though the server was seeing their audio. i was offsetting the Remote Client Names to keep track of them in the mixer. it was magnificent! the server started getting *really* overloaded and flaky as the Linode ran out of memory. maybe a stress test like that would flush out the trouble spot? i'd be happy to help you set something like that up.

mike

Fernando Lopez-Lezcano

unread,
Mar 1, 2022, 2:48:55 PM3/1/22
to jacktri...@googlegroups.com, Mike O'Connor, na...@ccrma.stanford.edu
On 3/1/22 6:55 AM, Mike O'Connor wrote:
> hi Fernando
>
> i'm doing a lot of Jacktrip testing these days and those periods of many dropouts drive me crazy too. i don't have any good theories, but i'm watching your progress with great interest so keep posting your results. i'm happy to join you if you ever need a hand with the tests.

Thanks for the encouragement...

In this case the problem is apparently happening all the time (ie: is
not what I call "internet weather"), and it started cropping up not that
long ago after roughly two years of great performance. So something has
broken or there is some sort of throttling going on.

Right now I have two test (routing) scenarios:

a) Sonic -> Cenic -> Stanford
b) Concast -> He.net -> Stanford

a -> dropouts
b -> no dropouts (but higher latency of course)

Same equipment and software on both ends for both cases. Dropouts are
not "local" (I tried bypassing the Sonic router and connecting directly
to the OMT).

So, it looks like either Sonic or Cenic are introducing the problem (not
Stanford or my equipment as both are a constant in both tests).

I wonder if you have any host in which you could temporarily run iperf3
so I can run some tests? (simulation of jacktrip stereo one-way traffic
to home, 1.5Mb/s for a few minutes). Maybe we can find a host that is
routed differently (ie: not going through Cenic) to try to see who/what
is causing this.

Best,
-- Fernando

Mike O'Connor

unread,
Mar 1, 2022, 3:05:32 PM3/1/22
to Fernando Lopez-Lezcano, Jacktrip-users
sure. i've gotten really good at standing up Linodes. would a few of those work? we could sprinkle them around various Linode datacenters. their network seems pretty good. i can build them, or i can show you how -- it's just a public stackscript that throws all the needed stuff together.

Fernando Lopez-Lezcano

unread,
Mar 2, 2022, 1:02:08 AM3/2/22
to jacktri...@googlegroups.com, Mike O'Connor, Fernando Lopez-Lezcano
On 3/1/22 12:05 PM, Mike O'Connor wrote:
> sure. i've gotten really good at standing up Linodes. would a few of those work? we could sprinkle them around various Linode datacenters. their network seems pretty good. i can build them, or i can show you how -- it's just a public stackscript that throws all the needed stuff together.

And pretty much without me asking Mike just set up one, contacted me and
let me use it! (not for free of course, I owe him some beers when we
meet in person :-)

Kudos and many thanks for the help!

A summary of what this looks like below...
Best,
-- Fernando


--------
The third path to the linode instance clarified things a bit more. I
think I know what is happening but please correct me if I am making some
mistake...

< these are selected parts of the email I sent to Sonic (they are
silent, so far) >

This is what I have tested with (detailed traceroute below):

a) Sonic -> Cenic -> Stanford
b) Concast -> He.net -> Stanford
c) Sonic -> Cogentco -> Linode instance (in NJ!)

a -> dropouts (incoming udp traffic from the world only, outgoing is fine)
b -> no dropouts (0% packet loss)
c -> no dropouts (0% packet loss)

Comparing a and b we could say that Stanford is not at fault. Comparing
b and c we could say that the Sonic routers (probably) are not at fault
and it is Cenic that has a misconfiguration in a router, or is actively
throttling udp traffic from the world to Sonic (or more accurately, to
my OMT which is my only test point).

This is the connection point through cogentco (no dropped packets):

----
10 102.ae1.nrd1.pao1.sonic.net (70.36.205.6) 4.411 ms
100.ae1.nrd1.equinix-sj.sonic.net (75.101.33.185) 4.545 ms
102.ae1.nrd1.pao1.sonic.net (70.36.205.6) 4.748 ms

11 hu0-3-0-2.ccr31.sjc04.atlas.cogentco.com (38.104.141.81) 5.119 ms
4.778 ms 4.844 ms
----

and this is the connection point through Cenic (dropped packets):

----
10 100.ae1.nrd1.equinix-sj.sonic.net (75.101.33.185) 4.664 ms 4.520
ms 4.735 ms

11 dc-svl-agg8--svl-agg10-300g.cenic.net (137.164.11.81) 6.076 ms
eqix-sv5.cenic.com (206.223.117.118) 4.206 ms 4.599 ms
----

It does appear that 75.101.33.185 on Sonic's end is connecting to both
(I do not understand why 3 ip addresses are in the hop 10 of the
Congentco route - while I am a sysadmin I am not a network
specialist...), so, it is very likely the problem is in Cenic (or in the
port in Sonic's router that connects to Cenic).

I would imagine Sonic would now contact Cenic and kindly tell them to
stop throttling traffic[*] coming into Sonic from the world :-)

< I can dream, can't I? >

....

[*] == A bit more about traffic:

The nature of the packet loss is very clear and repeatable. At low
aggregated bandwidth and packet size (what you would find in voice over
ip traffic) there are dropped packets but very few. This would degrade
the quality of the call, but probably not that much, or not in a way
that would immediately be noticed.

As the bandwidth goes up the % packet loss goes up. With the bandwidth
used by the software I have been using (approx 1.5Mbits/s, not much in
the context of a 1G/s connection) I see what for me are substantial
drops, as detailed in previous emails. Furthermore, across many tests
the number of dropped packets is approximately constant, about 1 dropped
packet per second. The interval between dropped packets is randomized,
but overall I was getting between 29 and 33 dropped packets in a 30
second interval over many tests. At all times (ie: not really depending
on when I did the test). This would suggest this is not a misconfiguration.

The weird part is that if I keep the same bandwidth but I change the
size of the packets (ie: sending more or less packets but keeping the
Mb/s constant), the NUMBER of dropped packets does not change
significantly. In my tests yesterday evening I still kept seeing about
30 packets dropped in a 30 second interval. All the while going from
about 64 bytes per packet to 1024+ or so.

So, something in their router is, for the bandwidth in my connection
(1.5Mb/s), picking one packet randomly every second and dropping it,
regardless of the packet size. Not very effective for congestion
control, so perhaps this is just a misconfiguration? Or maybe it is just
a simple way to cut corners and not route so much traffic.


== route to linode instance

$ traceroute 172.104.5.245
traceroute to 172.104.5.245 (172.104.5.245), 30 hops max, 60 byte packets
1 ControlPanel.Home (192.168.42.1) 1.077 ms 1.366 ms 1.645 ms
2 lo0.bras1.sncrca11.sonic.net (157.131.132.81) 5.541 ms 5.648 ms
5.854 ms
3 * 157-131-210-226.static.sonic.net (157.131.210.226) 21.433 ms *
4 157-131-210-193.static.sonic.net (157.131.210.193) 19.307 ms
157-131-210-174.static.sonic.net (157.131.210.174) 17.942 ms 17.968 ms
5 0.ae2.cr1.lsatca11.sonic.net (157.131.209.161) 8.933 ms
0.ae1.cr1.colaca01.sonic.net (157.131.209.65) 22.265 ms 22.285 ms
6 0.ae1.cr1.snjsca11.sonic.net (157.131.209.149) 16.089 ms 12.142
ms 0.ae0.cr1.lsatca11.sonic.net (157.131.209.86) 4.626 ms
7 * * *
8 * * *
9 100.ae1.nrd1.equinix-sj.sonic.net (75.101.33.185) 4.300 ms 4.524 ms *
10 102.ae1.nrd1.pao1.sonic.net (70.36.205.6) 4.411 ms
100.ae1.nrd1.equinix-sj.sonic.net (75.101.33.185) 4.545 ms
102.ae1.nrd1.pao1.sonic.net (70.36.205.6) 4.748 ms
11 hu0-3-0-2.ccr31.sjc04.atlas.cogentco.com (38.104.141.81) 5.119 ms
4.778 ms 4.844 ms
12 be2430.ccr22.sfo01.atlas.cogentco.com (154.54.88.185) 6.196 ms
hu0-3-0-2.ccr31.sjc04.atlas.cogentco.com (38.104.141.81) 4.763 ms 4.831 ms
13 be3109.ccr21.slc01.atlas.cogentco.com (154.54.44.138) 20.833 ms
be2379.ccr21.sfo01.atlas.cogentco.com (154.54.42.157) 5.943 ms
be3109.ccr21.slc01.atlas.cogentco.com (154.54.44.138) 20.228 ms
14 be3110.ccr32.slc01.atlas.cogentco.com (154.54.44.142) 20.895 ms
be3109.ccr21.slc01.atlas.cogentco.com (154.54.44.138) 20.875 ms
be3037.ccr21.den01.atlas.cogentco.com (154.54.41.146) 30.789 ms
15 be3035.ccr21.mci01.atlas.cogentco.com (154.54.5.90) 42.214 ms
42.268 ms 42.408 ms
16 be3036.ccr22.mci01.atlas.cogentco.com (154.54.31.90) 43.070 ms
be2831.ccr41.ord01.atlas.cogentco.com (154.54.42.166) 52.963 ms 53.155 ms
17 be2831.ccr41.ord01.atlas.cogentco.com (154.54.42.166) 53.629 ms
be2832.ccr42.ord01.atlas.cogentco.com (154.54.44.170) 53.835 ms
be2718.ccr22.cle04.atlas.cogentco.com (154.54.7.130) 60.409 ms
18 be2717.ccr21.cle04.atlas.cogentco.com (154.54.6.222) 60.493 ms
be2718.ccr22.cle04.atlas.cogentco.com (154.54.7.130) 60.548 ms
be2717.ccr21.cle04.atlas.cogentco.com (154.54.6.222) 60.825 ms
19 be2889.ccr41.jfk02.atlas.cogentco.com (154.54.47.50) 73.082 ms
be2890.ccr42.jfk02.atlas.cogentco.com (154.54.82.246) 73.231 ms
be3294.ccr31.jfk05.atlas.cogentco.com (154.54.47.218) 74.328 ms
20 38.104.75.138 (38.104.75.138) 77.339 ms 73.552 ms
be3295.ccr31.jfk05.atlas.cogentco.com (154.54.80.2) 72.333 ms
21 38.104.75.138 (38.104.75.138) 73.749 ms 74.011 ms *
22 * * *
23 * * *
24 * xxxx.ip.linodeusercontent.com (xx.xx.xx.xx) 74.000 ms *

== route to stanford server

$ traceroute cm-toast.stanford.edu
traceroute to cm-toast.stanford.edu (171.64.197.122), 30 hops max, 60
byte packets
1 ControlPanel.Home (192.168.42.1) 1.181 ms 0.630 ms 0.847 ms
2 lo0.bras1.sncrca11.sonic.net (157.131.132.81) 3.532 ms 3.534 ms
3.430 ms
3 157-131-210-226.static.sonic.net (157.131.210.226) 21.318 ms
21.251 ms 21.249 ms
4 157-131-210-193.static.sonic.net (157.131.210.193) 24.229 ms
157-131-210-174.static.sonic.net (157.131.210.174) 22.930 ms 23.037 ms
5 0.ae2.cr1.lsatca11.sonic.net (157.131.209.161) 9.212 ms
0.ae1.cr1.colaca01.sonic.net (157.131.209.65) 9.279 ms
0.ae2.cr1.lsatca11.sonic.net (157.131.209.161) 9.089 ms
6 * 0.ae0.cr1.lsatca11.sonic.net (157.131.209.86) 7.135 ms
0.ae1.cr1.snjsca11.sonic.net (157.131.209.149) 21.811 ms
7 * * 0.ae1.cr1.snjsca11.sonic.net (157.131.209.149) 19.939 ms
8 * * *
9 * 100.ae1.nrd1.equinix-sj.sonic.net (75.101.33.185) 3.838 ms 4.456 ms
10 100.ae1.nrd1.equinix-sj.sonic.net (75.101.33.185) 4.664 ms 4.520
ms 4.735 ms
11 dc-svl-agg8--svl-agg10-300g.cenic.net (137.164.11.81) 6.076 ms
eqix-sv5.cenic.com (206.223.117.118) 4.206 ms 4.599 ms
12 dc-stanford--svl-agg4-100ge.cenic.net (137.164.23.145) 4.576 ms
dc-svl-agg8--svl-agg10-300g.cenic.net (137.164.11.81) 5.389 ms 5.397 ms
13 dc-stanford--svl-agg4-100ge.cenic.net (137.164.23.145) 5.220 ms
5.821 ms 5.751 ms
14 * noa-east-rtr-vl2.SUNet (171.64.255.134) 5.837 ms *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *

Fernando Lopez-Lezcano

unread,
Mar 8, 2022, 8:42:51 PM3/8/22
to jacktri...@googlegroups.com, Fernando Lopez-Lezcano, Mike O'Connor
On 3/1/22 10:01 PM, Fernando Lopez-Lezcano wrote:
...
> A summary of what this looks like below...

And an update...

< BTW, this was never UDP packet loss, it was packet loss, period - when
I switched to testing TCP I could see retries that more or less matched
in number the loss I was seeing in UDP >

Further debugging (Stanford networking suggested trying Ping Plotter
which I found useful) seemed to show problems inside Sonic's network in
the Stanford to Home direction and very occasional problems in Cenic in
the Home to Stanford direction. It is hard to catch routers in the act
of dropping packets when all you can do is look from the outside.

Performance for me on Sunday (Quarantine Sessions #84!) was bad, with
constant dropouts. Luckily my contribution was going in the other
direction so it was not affected. See attached a 24 hour plot of
performance in the bad direction that I started Sunday evening.

More debugging followed yesterday with the same conclusions. Right after
I sent an email to Sonic this morning asking for comments on the status
of the ticket - I had not heard from them since 2/28 - I tested again
and magically sometime between last night and this morning packets
started flowing with more grace again. See second graph earlier today (I
kept the same y axis scale). There are still drops but not the storm
that I was experiencing before. I have yet to compare with Concast, and
I have not yet fired up my audio connection to test with real audio.

We'll see what happens this week and specially over the weekend.

Support from Sonic has been disappointing, on the other hand Stanford
was really helpful and was trying to find where this was happening.

Best, and keep the packets flowing!
-- Fernando
sonic_su_home_packets_lost_20220306_6pm.pdf
sonic_su_home_packets_lost_20220308_11am.pdf
Reply all
Reply to author
Forward
0 new messages