I have been looking at why netperf throughput is so low (<1Mbps) on my test machines. It was netperf TCP_STREAM from Akaros to Linux. Both machines have mlx4 NIC and I disabled offloads on Linux.
After some debugging with tcpdump and wireshark, it turns out to be
With block extra data disabled, I can get decent throughput.
Now I'm looking at the code...
—
Reply to this email directly or view it on GitHub.![]()
I was guessing it has something to do with odd, so I had the patch below. And I'm seeing outputs like this:
csum 2e58 tcpcksum 2e 58 extra_len 1460 nr_extra_bufs 186 lens 71 365 nodd 2 nzero 183 nnull 180 noff 4
csum 2e58 tcpcksum 2e 58 extra_len 1460 nr_extra_bufs 848 lens 71 365 nodd 2 nzero 845 nnull 842 noff 4
csum 2e58 tcpcksum 2e 58 extra_len 1460 nr_extra_bufs 848 lens 71 365 nodd 2 nzero 845 nnull 842 noff 4
csum 2e58 tcpcksum 2e 58 extra_len 1460 nr_extra_bufs 848 lens 71 365 nodd 2 nzero 845 nnull 842 noff 4
csum 2e58 tcpcksum 2e 58 extra_len 1460 nr_extra_bufs 848 lens 71 365 nodd 2 nzero 845 nnull 842 noff 4
csum 2e58 tcpcksum 2e 58 extra_len 1460 nr_extra_bufs 848 lens 71 365 nodd 2 nzero 845 nnull 842 noff 4
csum 2e58 tcpcksum 2e 58 extra_len 1460 nr_extra_bufs 848 lens 71 365 nodd 2 nzero 845 nnull 842 noff 4
This is interesting because we see most of the extra_data have len=0 and base=0. This is only a 1460 byte packet, where are these extra_data's coming from???
diff --git a/kern/src/net/tcp.c b/kern/src/net/tcp.c index fd80f87..29fe83a 100644 --- a/kern/src/net/tcp.c +++ b/kern/src/net/tcp.c @@ -1069,8 +1069,29 @@ struct block *htontcp4(Tcp * tcph, struct block *data, Tcp4hdr * ph, if (tcb != NULL && tcb->nochecksum) { h->tcpcksum[0] = h->tcpcksum[1] = 0; } else { + int seq_lt(uint32_t x, uint32_t y); csum = ~ptclcsum(data, TCP4_IPLEN, TCP4_PHDRSIZE); hnputs(h->tcpcksum, csum); + if (tcb && seq_lt(tcb->snd.ptr, tcb->snd.nxt)) { + int nodd = 0, nzero = 0, nnull = 0, noff = 0; + for (int i = 0; i < data->nr_extra_bufs; i++) { + if (data->extra_data[i].len % 2) + nodd++; + if (data->extra_data[i].len == 0) + nzero++; + if (!data->extra_data[i].base) + nnull++; + if (data->extra_data[i].off > 0) + noff++; + } + printk("csum %4x tcpcksum %2x %2x extra_len %d nr_extra_bufs %d lens", + csum, h->tcpcksum[0], h->tcpcksum[1], data->extra_len, data->nr_extra_bufs); + for (int i = 0; i < data->nr_extra_bufs; i++) { + if (data->extra_data[i].len % 2) + printk(" %d", data->extra_data[i].len); + } + printk(" nodd %d nzero %d nnull %d noff %d\n", nodd, nzero, nnull, noff); + } data->checksum_start = TCP4_IPLEN + TCP4_PHDRSIZE; data->checksum_offset = ph->tcpcksum - ph->tcpsport; data->flag |= Btcpck;
--
You received this message because you are subscribed to the Google Groups "Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akaros+un...@googlegroups.com.
To post to this group, send email to aka...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
It seems that we're missing an adjustment to odd here:
diff --git a/kern/src/net/ipaux.c b/kern/src/net/ipaux.c index 5c070b8..98a7bb9 100644 --- a/kern/src/net/ipaux.c +++ b/kern/src/net/ipaux.c @@ -252,8 +252,8 @@ uint16_t ptclcsum_one(struct block *bp, int offset, int len) hisum += ptclbsum(addr, x); else losum += ptclbsum(addr, x); + odd = (odd + x) & 1; len -= x; - } losum += hisum >> 8; losum += (hisum & 0xff) << 8;
but even with this patch there're still checksum errors, though much fewer.
--
You received this message because you are subscribed to the Google Groups "Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akaros+un...@googlegroups.com.
To post to this group, send email to aka...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
if (offset < BHLEN(bp)) {
x = MIN(len, BHLEN(bp) - offset);If len < BHLEN(bp) then MIN() can be lower than offset, and x a huge number.Maybe an assert() in there, during debugging (or always)?PS: No idea why Github did not get my reply. I included in my To:
Mind, I have absolutely no idea of that code, but this might be troubling (line 231):if (offset < BHLEN(bp)) { x = MIN(len, BHLEN(bp) - offset);If len < BHLEN(bp) then MIN() can be lower than offset, and x a huge number.
I tried to disable the tx checksum offload code path, but that didn't help. So I just implemented and use a separate checksum function instead of ptclcsum, and now I don't get checksum errors. So there is something wrong with ptclcsum. I'm taking a closer look at the code.
The one I'm currently using is
+static uint16_t my_csum(struct block *bp, int offset, int len) +{ + uint64_t hi = 0, lo = 0, sum; + int curpos_offset; + + if (bp->next) + return 0; + for (int i = 0; i < BHLEN(bp); i++) { + int curpos = i; + if (curpos < offset) + continue; + int relative_index = curpos - offset; + if (relative_index >= len) + goto scanned; + if (relative_index % 2 == 0) + hi += bp->rp[i]; + else + lo += bp->rp[i]; + } + curpos_offset = BHLEN(bp); + for (int i = 0; i < bp->nr_extra_bufs; i++) { + struct extra_bdata *ebd = &bp->extra_data[i]; + const uint8_t *buf = (const uint8_t *)(ebd->base) + ebd->off; + for (int j = 0; j < ebd->len; j++) { + int curpos = curpos_offset + j; + if (curpos < offset) + continue; + int relative_index = curpos - offset; + if (relative_index >= len) + goto scanned; + if (relative_index % 2 == 0) + hi += buf[j]; + else + lo += buf[j]; + } + curpos_offset += ebd->len; + } + assert(curpos_offset == BLEN(bp)); +scanned: + sum = (hi << 8) + lo; + while (sum >> 16) + sum = (sum >> 16) + (sum & 0xffff); + return (~sum) & 0xffff; +}
--
We must anticipate that x86_64 is not the only architecture we'll run on, though.
I wrote a unit test to compare the results from ptclbsum and a much simpler (slower) checksum function. After reading the code again and again, I think it can be fixed by simply
diff --git a/kern/src/net/ptclbsum.c b/kern/src/net/ptclbsum.c index eed828d..837d121 100644 --- a/kern/src/net/ptclbsum.c +++ b/kern/src/net/ptclbsum.c @@ -185,6 +185,8 @@ uint16_t ptclbsum(uint8_t * addr, int len) uint64_t sum = in_cksumdata(addr, len); union q_util q_util; union l_util l_util; + if ((uintptr_t)addr & 1) + sum <<= 8; REDUCE16; return cpu_to_be16(sum); }
At least this passes the unit test, and convinced myself.
Fixes and tests were merged at f922e80...1165c2b
Closed #23.