Hi, I have an application with a tcp client that occasionally gets into a strange state where it continually polls even with no data. I'm looking for suggestions on what may be going on.
I'm running on Ubuntu 14.04 with VMA 6.6.4 / MLNX_OFED_LINUX-2.2-1.0.1. VMA_RX_POLL is set to 100000.
Here is the output from vma_stats:
======================================================
Fd=[73]
- TCP, Blocked
- Connected = [x.x.x.x:9148]
Tx Offload: 1740 KB / 5560 / 0 / 0 [bytes/packets/drops/errors]
Rx Offload: 2088 KB / 11248 / 0 / 0 [bytes/packets/eagains/errors]
Rx byte: cur 0 / max 2256 / dropped 0 / limit 0
Rx pkt : cur 0 / max 3 / dropped 0
Rx poll: 336946390 / 2011 (0.00%) [miss/hit]
======================================================
Fd=[54]
- UDP, Blocked, MC Loop Enabled
- Bound IF = [y.y.y.y:10048]
- Member of = [y.y.y.y]
Rx Offload: 136662 KB / 2552648 / 0 / 0 [bytes/packets/eagains/errors]
Rx byte: cur 0 / max 178777 / dropped 0 / limit 4194304
Rx pkt : cur 0 / max 3230 / dropped 0
Rx poll: 136014 / 2313542 (94.45%) [miss/hit]
======================================================
Fd=[55]
- UDP, Blocked, MC Loop Enabled
- Bound IF = [z.z.z.z:10048]
- Member of = [z.z.z.z]
Rx Offload: 136662 KB / 2552648 / 0 / 0 [bytes/packets/eagains/errors]
Rx byte: cur 0 / max 207126 / dropped 0 / limit 4194304
Rx pkt : cur 0 / max 3779 / dropped 0
Rx poll: 138248 / 2312221 (94.36%) [miss/hit]
======================================================
poll
Polling CPU:0%
======================================================
CQ=[0]
Packets dropped: 0
Packets queue len: 0
Drained max: 0
Buffer pool size: 43
Buffer disorder: 49.74%
======================================================
======================================================
RING=[0]
Packets count: 5117097
Packets bytes: 497341835
Interrupt requests: 41291399870
Interrupt received: 140131
Moderation frame count: 0
Moderation usec period: 0
======================================================
-------------------------------------------------------------------------------
The Rx Poll misses are continually increasing for the tcp client FD 73 with a hit rate of near zero.
If I jack up the log level and trace level, I see the following:
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : si_tcp[fd=73]:2442:rx_wait_helper() 73: too many polls without data blocking=1
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:793:request_notification()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:820:request_notification() returning with 0
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC : si_tcp[fd=73]:2389:rx_wait_helper()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:630:poll_and_process_element_rx()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:570:poll_and_process_helper_rx()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:279:poll()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : si_tcp[fd=73]:2442:rx_wait_helper() 73: too many polls without data blocking=1
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:793:request_notification()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:820:request_notification() returning with 0
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC : si_tcp[fd=73]:2389:rx_wait_helper()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:630:poll_and_process_element_rx()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:570:poll_and_process_helper_rx()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:279:poll()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : si_tcp[fd=73]:2442:rx_wait_helper() 73: too many polls without data blocking=1
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:793:request_notification()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:820:request_notification() returning with 0
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC : si_tcp[fd=73]:2389:rx_wait_helper()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:630:poll_and_process_element_rx()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:570:poll_and_process_helper_rx()
Time: 2391708.000 Pid: 32219 Tid: 32241 VMA FUNC+ : cqm[0x7fbcb802da10]:279:poll()
This pattern seems to repeat 10 times every 250us. However, there is another pattern that is more like this:
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC : si_tcp[fd=73]:2389:rx_wait_helper()
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC+ : si_tcp[fd=73]:2442:rx_wait_helper() 73: too many polls without data blocking=1
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC : si_tcp[fd=73]:2389:rx_wait_helper()
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC+ : si_tcp[fd=73]:2442:rx_wait_helper() 73: too many polls without data blocking=1
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC : si_tcp[fd=73]:2389:rx_wait_helper()
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC+ : si_tcp[fd=73]:2442:rx_wait_helper() 73: too many polls without data blocking=1
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC : si_tcp[fd=73]:2389:rx_wait_helper()
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC+ : si_tcp[fd=73]:2442:rx_wait_helper() 73: too many polls without data blocking=1
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC : si_tcp[fd=73]:2389:rx_wait_helper()
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC+ : si_tcp[fd=73]:2442:rx_wait_helper() 73: too many polls without data blocking=1
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC : si_tcp[fd=73]:2389:rx_wait_helper()
Time: 2391852.250 Pid: 32219 Tid: 32241 VMA FUNC+ : si_tcp[fd=73]:2442:rx_wait_helper() 73: too many polls without data blocking=1
To me, it seems like it wants to block but for some reason it doesn't. Anyone have any thoughts on this?
-- Bud