Documentation how .pkt script works

141 views
Skip to first unread message

Ravi Kerur

unread,
Aug 21, 2023, 5:34:15 PM8/21/23
to packetdrill
Hello,

I am debugging tcp/slow_start/slow-start-ack-per-2pkt.pkt failure and want to understand how the script works. Any documentation on the script syntax and format.

Snippets of the script shown below. I understand basic socket/setsocketopt/bind/listen...
   0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

  +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
  +.1 < . 1:1(0) ack 1 win 257
   +0 accept(3, ..., ...) = 4
   +0 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0

   +0 write(4, ..., 30000) = 30000
   +0 > P. 1:10001(10000) ack 1
   +0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

However, I am not sure about the above 3 lines? 
Is write writing to sndbuf,
P -- TCP PSH?
assert reading via TCP_INFO on sender congestion window?
If I have to debug the failure what would be the best way?

Reason being, initially test started failing with following error messages
run_packet.c 709] map_outbound_live_packet
run_packet.c 1493] packet time_usecs: 1692653421581819
run.c 186] expected: 0.306 actual: 0.201  (secs)
stderr:
slow-start-ack-per-2pkt.pkt:27: error handling packet: timing error: expected outbound packet at 0.305893 sec but happened at 0.200515 sec; tolerance 0.004000 sec
script packet:  0.305893 P. 10001:14001(4000) ack 1
actual packet:  0.200515 P. 10001:14001(4000) ack 1 win 256
Ran    3 tests:    0 passing,    3 failing,    0 timed out (3.39 sec): ./tcp/slow_start/slow-start-ack-per-2pkt.pkt

I added --tolerance_usecs=130000 to the .pkt at the beginning, now the script fails at
run.c 390] gettimeofday: 1692653524.867610
stderr:
slow-start-ack-per-2pkt.pkt:25: error in Python code
Traceback (most recent call last):
  File "/tmp/code_64Afd5", line 98, in <module>
    assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd
AssertionError: 100
slow-start-ack-per-2pkt.pkt: error executing code: 'python3' returned non-zero status 1

Ran    3 tests:    0 passing,    3 failing,    0 timed out (3.54 sec): ./tcp/slow_start/slow-start-ack-per-2pkt.pkt

Thanks,
Ravi

David Morley

unread,
Aug 21, 2023, 5:58:23 PM8/21/23
to Ravi Kerur, packetdrill
Hi Ravi,
+0 write(4, ..., 30000) = 30000
Indeed this would write to the write buffer, which in this case has file descriptor 4, from the accept() syscall.
P -- TCP PSH
Again, yes this is correct, the referenced line is expecting a packet to be sent out as we have placed data in the write buffer with the PSH bit set.
 +0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
Packetdrill saves the state of different system parameters when lines of Python code are included in order to check these assertions later. The mapping between the packetdrill name and the referenced value can be found by looking for instances of the emit_var function. You can

I would not recommend adjusting the tolerance of tests so drastically, the rationale for tolerance_usecs is that there's some variance in exactly how long certain operations will take based on hardware, etc. A tolerance of 0.004000 sec seems to work
fairly reliably for the majority of tests so changing it by many factors of 10 probably isn't the right solution here.

For debugging tests, there are many different approaches but I find it useful to add additional prints to the tests such as:
+0 %{ print(f"Variable I care about: {variable_name}") }%
For particularly tricky cases, I tend to take a look at tcpdump/bpftrace the relevant kernel function calls.

I'm sure there are numerous other useful approaches that others might be able to chime in with, those are the ones I personally find the most useful.

Best,
David




 
 


--
You received this message because you are subscribed to the Google Groups "packetdrill" group.
To unsubscribe from this group and stop receiving emails from it, send an email to packetdrill...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/packetdrill/5076e829-ab0f-4808-ab64-19ac2df4389fn%40googlegroups.com.

Ravi Kerur

unread,
Aug 23, 2023, 1:41:37 PM8/23/23
to packetdrill
Hi David,

Thanks for your response, so there is no document to refer to esp. syntax of each statements in the .pkt file?

 Based on your debug experience what does the timing errors usually indicate? From the logs it looks like the outbound event did happen albeit earlier. Does this mean optimization in the kernel might be the area to look at?

In parallel, let me start with the debug prints as well to understand the behavior.

Thanks,
Ravi

David Morley

unread,
Aug 23, 2023, 2:31:07 PM8/23/23
to Ravi Kerur, packetdrill
so there is no document to refer to esp. syntax of each statements in the .pkt file
The packetdrill syntax is fully defined here: https://github.com/google/packetdrill/blob/master/syntax.md. Without reading the grammar the mental model to keep in mind is C system calls + embedded python code + roughly tcpdump syntax for asserting packets. The original paper is also a useful reference for understanding how packetdrill asserts different timing ranges: https://research.google/pubs/pub41316/. Keep in mind any python based assertions run after the system call + packet injection in packetdrill, and don't occur until after the tests have completed executing.

 Based on your debug experience what does the timing errors usually indicate?
There's no general reason for why a packetdrill test fails due to a timing error. Each test is making an implicit set of assumptions that in a failing case somehow are incorrect. Any one of those could be the cause of the failure.

Let's take a look at this specific test failure though, as that's much more concrete.

In English prose:
In this test, we have an initial 3-way handshake, followed by a write of 30,000 bytes to the send buffer. By default, tcpi_snd_cwnd is initialized to 10 (in my and the test's understanding of kernel behavior at least) and we send out a single (very large) 10,000 byte packet. This packet can be 100% of the congestion window as 'sysctl -q net.ipv4.tcp_tso_win_divisor=100'. As our cwnd of 10 is now full, we wait until we receive ACK packets, which (as we are in slow start) will both free 2 packets from our cwnd and add 2 packets to our cwnd. So our cwnd should go from 10 to 12 to 14 until eventually reaching 20. 

So, what could be going wrong here? I'd first check to make sure in tcpdump that we are actually seeing that 10,000 byte packet. The timing here is pretty much just controlled by the cwnd AFAICT, so if you're seeing this packet sent out earlier, than it would seem your cwnd is greater than the 10 that was asserted (though it's strange that assertion didn't fail). AFAICT you should be in cubic from the defaults.sh script, but if you're seeing different slow start behavior, the congestion control is the actually code generally controlling it.

Best,
David



Ravi Kerur

unread,
Aug 23, 2023, 3:47:19 PM8/23/23
to packetdrill
Hi David,

Thanks for the pointers. I had skimmed through syntax.md earlier, will go through it again to understand the scripts.

Will do some debugging and get back to you.

Thanks,
Ravi

Neal Cardwell

unread,
Aug 23, 2023, 4:52:56 PM8/23/23
to Ravi Kerur, packetdrill
In addition to the USENIX paper David pointed to (here) there is a more how-to/tutorial-style article we published in USENIX login here: "Drilling Network Stacks with packetdrill".

The general rule is that packetdrill tries to keep system call notation similar to strace syntax and packet notation similar to tcpdump syntax.

So, to quote the tcpdump main page: "Tcpflags are some combination of S (SYN), F (FIN), P (PSH), R (RST), U (URG), W (CWR), E (ECE) or `.' (ACK)"


Looking at the original error message you shared:

slow-start-ack-per-2pkt.pkt:27: error handling packet: timing error: expected outbound packet at 0.305893 sec but happened at 0.200515 sec; tolerance 0.004000 sec

script packet:  0.305893 P. 10001:14001(4000) ack 1
actual packet:  0.200515 P. 10001:14001(4000) ack 1 win 256

... as well as the assertion failure that shows the cwnd in your test is 100, that cwnd of 100 explains the failure. You seem to have some sort of customization to your machine that causes the cwnd to be 100 rather than the normal initial cwnd of 10. Because the cwnd is 100, the sender is able to send 10001:14001(4000) immediately rather than needing to wait for an ACK in line 26:

  +.105 < . 1:1(0) ack 2001 win 257

Thus the 10001:14001(4000) goes out 100ms earlier than expected. Thus the error message.

best regards,
neal



Ravi Kerur

unread,
Aug 23, 2023, 5:45:23 PM8/23/23
to packetdrill
Hi Neal,

Thanks for more pointers, I will look into it. We get the kernel from our infosec team so it's more blackbox testing for me right now.
You are absolutely right that initcwnd is set to 100. Without tolerance_usecs it went past that line and failed at later stage. I overlooked assert error value which shows 100 as it happened after adding tolerance_usecs.
I added following line
ip route change 192.0.2.1 via 192.168.0.2 dev tun0 initcwnd 10

and test passes v4 and v4mappedv6.

Thanks,
Ravi
Reply all
Reply to author
Forward
0 new messages