Using PcapPlusPlus for Firewall development

820 views
Skip to first unread message

vice...@gmail.com

unread,
Feb 5, 2018, 5:30:32 PM2/5/18
to PcapPlusPlus support
Hello.

I've been playing with the examples (really nice tutorial, docs and examples bwt, thanks), but I have one question. Is possible to use PcapPlusPlus to not just sniff but control the packet flow?. Let me explain my self: I'm looking to develop a firewall-like software (is more complicated) that will operate in all layers (will check from L2 mac address to L7 http app recognition with DPI). I've done this before with Netfilter Hooks and Queues. The box running the software will be acting either as a switch (bridged interfaces) or router (iptables forwarding) So I am wondering, is there a way for me to tell the OS what to do with the packets that need to be forwarded from one network to the other (drop, accept, modify it before being forwared)?. Or should I stick to Netfilter Queues and use the Packet++ library on top of it?

Thanks

vice...@gmail.com

unread,
Feb 5, 2018, 5:34:56 PM2/5/18
to PcapPlusPlus support
Actually, let me correct myself: "is there a way for me to tell the OS".. I'm looking to use kernel bypass with PF_RING, so I guess the right question is: "Can a PcapPlusPlus app using PF_RING drop or modify packets before being forwaded?". 

Thanks,

vice...@gmail.com

unread,
Feb 5, 2018, 6:02:29 PM2/5/18
to PcapPlusPlus support
Sorry for 3 consecutive posts, but after reading another topic (https://groups.google.com/forum/#!topic/pcapplusplus-support/7JA1WDfgTgM) I think I have a better understanding. Should I do the packet forwarding inside the app instead of trying to use brctl to bridge the interfaces right?

PcapPlusPlus Support

unread,
Feb 7, 2018, 11:12:54 AM2/7/18
to PcapPlusPlus support
Yes, PcapPlusPlus provides wrappers for both PF_RING and DPDK, both do kernel bypass and will fit your needs.
You can read the documentation on how to use them and decide which one to choose.
Regarding your question - yes, you should do the packet forwarding inside the application. You can take a look at the PF_RING example and DPDK example to understand how to do that.
Please reach out to me if you have more questions.

Thanks,

vice...@gmail.com

unread,
Feb 7, 2018, 11:25:26 AM2/7/18
to PcapPlusPlus support
Thanks. I've done a successful packet forwarding test (inside the application) using libpcap, and will test with PF_RING next. Thanks for pointing me to the examples.

vice...@gmail.com

unread,
Feb 8, 2018, 9:14:13 AM2/8/18
to PcapPlusPlus support
Hello again. I have two more questions about doing the packet forwarding:

1) Is there a way for me to differentiate when a packet have arrived to an interface and when is being sent? Right now, I'm sending what I receive on interface A to interface B and vice versa, but for testing purposes, I have applied a filter on each interface to only capture the traffic from specific MAC address. If I don't do this, when I receive a packet on interface A, and send it to B, the callback is called again so the packet is send to A and so on (infinite loop). 

2) I have the packet forwarding working, and test with ICMP, DNS Queries and so on. Everything works OK until I tested HTTP. When I got the server HTTP 200 OK on one interface, and try to send it to the other one, I get this error: Packet length [2962] is larger than device MTU [1500]. I made a traffic capture, and see that the server sent two separated packets, but I think I'm receiving a jumbo packet on the call back containing the complete reassembled payload. Still, couldn't find anything about that on the PcapPlusPlus documentation. Is that right? Is the library buffering the packets and sending the complete payload to the callback. If that's the case, can this behaviour be changed/configured? Or should I receive it this way and fragment it again before sending to the other interface to avoid exceeding its MTU? Is there anything in the library that can be reused for this purpose?

Thanks

PcapPlusPlus Support

unread,
Feb 8, 2018, 1:05:57 PM2/8/18
to PcapPlusPlus support
Hi,

I'm happy that you managed to create the packet forwarding application.
Actually it'd be great if you can share some of your code, maybe we can create a packet forwarding example application so other people can use. Questions about packet forwarding are quite common, it'd be great to have an example application that shows how to do that. Please let me know what you think.

Regarding your questions:
1) I have to check, I'm not sure. Are you using libpcap, PF_RING or DPDK?
2) From what you described it seems that your packet receiving interface is configured with higher MTU than the packet sending interface. This is something that should be configurable in the interface settings. You can check the MTU configuration using "ifconfig | grep MTU"

Please let me know if it works

Thanks,

vice...@gmail.com

unread,
Feb 8, 2018, 1:46:13 PM2/8/18
to PcapPlusPlus support
Hello!

Sure, once if have the basic forwarding working, I would be glad to prepare an example application to contribute with the project. :)

1) On current example I'm using libcap. But we would like to switch to DPDK, that's what I'm working on right now.
2) Actually MTU is 1500 on both interfaces (are virtio interfaces on a KVM guest).

Regarding question #2, after the TCP handshake is completed, and the client sends the HTTP Request, I can see the packet from the server arriving to the bridge with tcpdump:

<Public Server IP>.80 > 10.0.2.200.48254: Flags [.], cksum 0x67ca (correct), seq 1:1449, ack 81, win 114, options [nop,nop,TS val 492233134 ecr 19212564], length 1448: HTTP, length: 1448
HTTP/1.1 200 OK
Date: Thu, 08 Feb 2018 18:14:29 GMT

I just changed the real IP address to <Public Server IP> on this post for security reasons.  But on the PcapPlusPlus app, I see this:

Packet length: 2962 [Bytes], Arrival time: 2018-02-08 19:28:17.744806
Ethernet II Layer, Src: 52:54:00:00:01:64, Dst: 52:54:00:00:02:c8
IPv4 Layer, Src: <Public Server IP>, Dst: 10.0.2.200
TCP Layer, [ACK], Src port: 80, Dst port: 48262
Payload Layer, Data length: 2896 [Bytes]

Please notice that the packet on tcp dump shows the correct payload size (1448), but the app output displays a packet frame length of 2962 bytes. These numbers are shown just printing the parsed packet: 

std::cout << "Packet arrived from server device:\n" << parsedPacket.toString();

Then I checked the Payload layer size, and is in exactly 2x the one shown on tcpdump. Also checked that PcapPlusPlus displays the correct MTU for both interfaces:

Packet frame length: 2962
Source interface MTU: 1500
Destination interface MTU: 1500

Finally, when the app tries to forward the packet to the other interface, I get this error:

Packet length [2962] is larger than device MTU [1500]

So what I can't understand is how can a packet of  2962 bytes be received on a 1500 MTU interface. The only logical explanation for me was the automatic buffer / reassembly I mentioned on the previous post, but I checked again the tcpdump capture and I did not receive two packets, just the one shown in this post. Do you have any clue on how can this be happening?

Thanks

PcapPlusPlus Support

unread,
Feb 9, 2018, 3:42:06 AM2/9/18
to pcappluspl...@googlegroups.com
Hi,

What you are describing is indeed very strange. It seems like the packet data len in PcapPlusPlus is exactly x2 the real packet data len.
The way it works, PcapPlusPlus takes the packet length from libpcap: pkthdr->caplen. So as far as I understand, the only way to get a packet that long is from corrupted or very strange libpap headers.
Can you please print the caplen?

You need to print the following:
Get into the code in PcapLiveDevice.cpp and in line 132 (in method onPacketArrives) you'll see pkthdr->caplen.
Can you please print its value?

Thanks,
Message has been deleted

vice...@gmail.com

unread,
Feb 9, 2018, 9:26:41 AM2/9/18
to PcapPlusPlus support

Hello,

I printed the value, it shows 2962 bytes too.

pkthdr->caplen: 2962
Packet arrived from server device:
Packet length: 2962 [Bytes], Arrival time: 2018-02-09 15:21:23.818923
Ethernet II Layer, Src: 52:54:00:00:01:64, Dst: 52:54:00:00:02:c8
IPv4 Layer, Src: <Public Server IP>, Dst: 10.0.2.200
TCP Layer, [ACK], Src port: 80, Dst port: 48282
HTTP response, HTTP/1.1 200 OK
Payload Layer, Data length: 1807 [Bytes]
packet->getRawDataLen(): 2962
packet->getFrameLength(): 2962
Source interface MTU: 1500
Destination interface MTU: 1500
Packet length [2962] is larger than device MTU [1500]

Both interfaces have MTU set on 1500 bytes:

ens7      Link encap:Ethernet  HWaddr 52:54:00:00:01:04  
          inet addr:10.0.1.4  Bcast:10.0.1.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe00:104/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:15559 errors:0 dropped:9736 overruns:0 frame:0
          TX packets:2229 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:6083157 (6.0 MB)  TX bytes:162129 (162.1 KB)

ens8      Link encap:Ethernet  HWaddr 52:54:00:00:02:04  
          inet addr:10.0.2.4  Bcast:10.0.2.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe00:204/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:12432 errors:0 dropped:9934 overruns:0 frame:0
          TX packets:73 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:883271 (883.2 KB)  TX bytes:5890 (5.8 KB)

May be a problem with the KVM virtio interfaces and libpcap? Or some misconfiguration on my KVM interfaces?

Thanks

PcapPlusPlus Support

unread,
Feb 9, 2018, 11:35:00 AM2/9/18
to PcapPlusPlus support
Yes, I guess it's some configuration issue.
I'm not sure the MTU value you get from ifconfig is reliable.
I've did some googling and I found this thread which explains how to find the mtu between 2 machines. Maybe you can try running it on the machine that sends the packet to your application:

traceroute --mtu <target>

Also, it's worth checking what data resides in the last 1448 bytes. Is the data really duplicated?

Maybe you can try using DPDK instead of libpcap and see if the problems goes away.

Please let me know if any of those work

Thanks,

vice...@gmail.com

unread,
Feb 9, 2018, 12:32:50 PM2/9/18
to PcapPlusPlus support
I tested the traceroute and tracepath commands on both interfaces of the machine that sends the packets to the app. It works OK on the other NIC, but fails on the one connected to the machine where the app runs.

traceroute --mtu 10.0.1.4
traceroute to 10.0.1.4 (10.0.1.4), 30 hops max, 65000 byte packets
 1  10.0.1.100 (10.0.1.100)  2998.306 ms !H  2999.089 ms !H  2999.640 ms !H

tracepath 10.0.1.4
 1?: [LOCALHOST]                                         pmtu 1500
 1:  10.0.1.100                                          2996.365ms !H
     Resume: pmtu 1500 

traceroute --mtu 192.168.122.1
traceroute to 192.168.122.1 (192.168.122.1), 30 hops max, 65000 byte packets
 1  192.168.122.1 (192.168.122.1)  0.124 ms F=1500  0.114 ms  0.055 ms

tracepath 192.168.122.1
 1?: [LOCALHOST]                                         pmtu 1500
 1:  192.168.122.1                                         0.138ms reached
 1:  192.168.122.1                                         0.080ms reached
     Resume: pmtu 1500 hops 1 back 1 

So I guess we can be sure now that is a configuration problem, unrelated to PcapPlusPlus and libcap. I think it has something to do with running KVM VMs. I thought I had found the cause on those links:

But disabled tcp offloading on every single interface and machine, and the problem is still present. I think using DPDK the problem may go away, because of the kernel bypass. I'm working on the DPDK version, will let you know the results. Later will try to solve the misconfiguration problem, and post the cause and solution here (if I found one) for others that may face the same problems. 

Regarding my other question (detecting if the received packet is being received or being forwarded), I was thinking on having a table of MAC address known to be on each side of the bridge (like a switch does). That way I can always know when a packet is being received or forwarded based on the src MAC address. But i was just wondering if something on PCapPlusPlus exists for this (or in libcap/DPDK, so it could be incorporated to PcapPlusPlus)?.

Thanks 

vice...@gmail.com

unread,
Feb 9, 2018, 12:54:09 PM2/9/18
to PcapPlusPlus support
I guess I have messed something up trying to fix the problem. After a clean boot of both machines, traceroute shows the correct MTU. But the problem on the app persists.

From sending machine to bridge machine:
traceroute to 10.0.1.4 (10.0.1.4), 30 hops max, 65000 byte packets
 1  10.0.1.4 (10.0.1.4)  0.219 ms F=1500  0.192 ms  0.108 ms

From bridge machine to sending machine:
traceroute to 10.0.1.100 (10.0.1.100), 30 hops max, 65000 byte packets
 1  10.0.1.100 (10.0.1.100)  0.336 ms F=1500  0.174 ms  0.186 ms

From bridge machine to bridged machine (http client):
traceroute to 10.0.2.200 (10.0.2.200), 30 hops max, 65000 byte packets
 1  10.0.2.200 (10.0.2.200)  0.397 ms F=1500  0.201 ms  0.225 ms

So I guess I'm back to square 1.  I'll check the payload as suggested and let you know the results.

Thanks

PcapPlusPlus Support

unread,
Feb 10, 2018, 5:34:04 PM2/10/18
to PcapPlusPlus support
Thanks for providing this information.
The issue must be in the way libpcap calculates the packet length, but I'm not sure what it is. What OS are you using? Also, you said you're running it on KVM, can you try something else like VirtualBox or even bare metal?

Regarding your question about packet direction: I didn't find a way to know if a packet is incoming or outgoing, but there is an option in libpcap to capture only incoming or outgoing packets, using pcap_setdirection().
However, I don't have this option in PcapPlusPlus yet. If you're interested in adding it, either of us can do it (shouldn't be so complicated). Please let me know what you think

vice...@gmail.com

unread,
Feb 11, 2018, 6:55:41 AM2/11/18
to PcapPlusPlus support
Hello. I've made some progress. I made a basic DPDK version, and any of the two issues is present: Packets forwarded to one interface don't invoke the callback again (no loop), and the MTU is correct. Also, when using libcap, I had another problem: I didn't receive the correct checksum, so I had to recalculate it before forwarding, or the receiving end would drop the packet. This doesn't happen when using DPDK. I guess the source of all this problems is the KVM virtio driver used by the kernel. I found some posts suggesting to turn off tcp offloading on the host bridge, but didn't have any luck. Maybe I'll look deeper on it in the future, but since the original goal was to use DPDK anyway, I will continue that path for now. I'll let you know when I try it with VirtualBox or bare metal. I'm using Ubuntu 14.04 on host and Ubuntu 16.04 on guests (x86_64). Also I would be interested on adding the support for capturing only incoming packets, just to get a better understanding of the PCapPlusPlus library. I'll give it a try once I finish the main goal of my research, and let you know the results.

Now I have two additional questions on the DPDK version:

1) As I said, all the problems mentioned have gone. ARP, ICMP, UDP and TCP tests works OK. Then I decided to test its performance using iperf3. After the transmission of certain amount of MB, the bridge stops working. The app continues running and shows no error messages or crashes, but the communication between both ends don't work anymore. The app is really simple, I'm just setting a callback on each interface, and forwarding the received traffic from one to the other. What could be happening? It seems like there could be a memory leak, but since DPDK using hugepages, I'm not sure on how to diagnose this. Should I manually free any memory reserved by DPDK or Pcap++ on the callbacks? The code is something like:

clientDevice->startCaptureMultiThreads(onClientPacketArrives, NULL, clientCoreMask);

static void onClientPacketArrives(pcpp::MBufRawPacket* packets, unsigned int numOfPackets, unsigned char threadId, pcpp::DpdkDevice* dev, void* cookie)
{
for (int i = 0; i < numOfPackets; i++)
{
serverDevice->sendPacket(&packets[i], 0);
}
}

2) I did the iperf3 test using bridge_utils instead of my app, and got a performance of about 2Gbit. I've read several papers and posts on Internet that shows a benchmark result of about 10Gbit for DPDK. Do you have any idea of the performance I should expect? I think the performance hit of using Pcap++ and Packet++ shouldn't be that big right?

3) Right now I'm bridging only two interfaces. My test VM had 2 cores, and I have to add a third one for the example to run, when calling startCaptureMultiThreads it won't allow me to use core 0 since it is reserved for the main app. But I tried to use the second core to process the packets of both interfaces and it won't work. It throws this error:

Cannot create capture thread #1 for device 'DPDK_0': [Unknown error -16]

So my question is: Can I use the same core to listen on more than one interface? I think it can be done, actually I have been reviewing the DPDKFilter example and I think I should work with less cores than interfaces (but at least two cores). Is 

Thanks!

PcapPlusPlus Support

unread,
Feb 11, 2018, 5:38:52 PM2/11/18
to PcapPlusPlus support
Hi,

Thanks for the update. If you find what's wrong with KVM virtio configuration please let me know. I never used this configuration so it's hard for me to help.

In any case, I think DPDK is probably a better solution for your use-case.

Regarding your questions:

1) You'll probably need to debug deeper and find out which part stops responding (it it the sender? receiver? etc.). But my first guess would be that for some reason you run out of mbufs. mbuf is the object used by DPDK to store packet data. To optimize performance and avoid frequent memory allocation DPDK requires the user to allocate a mbuf pool at the beginning of the application, and then each time a packet arrives the user gives DPDK an array of mbufs from this pool and DPDK places packet data in them. It's the user responsibility to release these mbufs (so they get back to the pool). However, PcapPlusPlus takes care of all this work for you: it allocates the mbuf pool at the beginning of the application and each time a packet arrives it gives DPDK mbufs from the pool (so DPDK can store packet data in them), wraps these mbufs with MBufRawPacket objects (which inherits RawPacket) and then gives you an array of MBufRawPack to use. After the callback you're using is done the array of MBufRawPack is destroyed and the mbufs should be released and go back to the pool. But maybe something in this mechanism isn't working properly and mbufs aren't getting released. In order to check that you get use 2 methods provided in DpdkDevice:

DpdkDevice::getAmountOfFreeMbufs()   ==> returns the amount of free mbufs in the pool
DpdkDevice::getAmountOfMbufsInUse() ==> returns the amount of used mbufs (which are currently not in the pool)

Please track the number of mbufs over time and let me know if it's going down

2) DPDK performance is indeed almost wire speed, which means almost 10Gbps on 10Gbps interfaces. However you're using a VM and you'll never get such performance because the host OS copies all packets before it sends them to the guest. There are ways to achieve better performance on VMs, please read DPDK documentation (for example: here), but you'll probably never achieve 10Gbps. Also, PcapPlusPlus must have some overhead so even in the best scenario you'll probably won't reach the "bare metal" DPDK performance.

3) If you want 1 core to handle more than 1 interface you should use DpdkWorkerThread and activate it through
DpdkDeviceList::startDpdkWorkerThreads()

This is how DpdkExample-FilterTraffic does that. You can read more here


I hope this answers your questions, please let me know if you need anything more

Thanks,

Vicente Robles Bango

unread,
Feb 12, 2018, 12:59:02 PM2/12/18
to PcapPlusPlus support
Hi

Thanks for your answers. It was really helpful. I would try #2 and #3 and let you know if I have any doubts.

Regarding the main problem (#1), I think the issue is indeed caused by a leak on the Mbufs usage. I tracked the numbers, and when the app is just started I can see this results:

Client device AmountOfFreeMbufs: 3966
Client device AmountOfMbufsInUse: 129
Server device AmountOfFreeMbufs: 3967
Server device AmountOfMbufsInUse: 128

Once the iperf3 test stops working, I can see that the free amount of Mbufs have decreased to below 100 on the server device:

Client device AmountOfFreeMbufs: 3935
Client device AmountOfMbufsInUse: 160
Server device AmountOfFreeMbufs: 75
Server device AmountOfMbufsInUse: 4020

So I reversed the test (run the iperf server on the client machine) and noticed this time is the client who runs out of free Mbufs.

Then I tested using ping with a very low interval (1ms) to make sure the traffic on both directions would be the same and confirmed that both devices runs out of free Mbufs:

Client device AmountOfFreeMbufs: 83
Client device AmountOfMbufsInUse: 4012
Server device AmountOfFreeMbufs: 75
Server device AmountOfMbufsInUse: 4020

After that, just to confirm the problem cause, I increased the MBUF_POOL_SIZE from 4095 to 8191 and 16837. Some iperf3 tests that didn't completed before, completed without issues after the change (because the buffers will last a little longer, allowing the iperf to transfer the neeed data for the test). So the problem is clear: somehow the Mbufs are not being released. Then I added some lines to both MBufRawPacket::~MBufRawPacket and rte_pktmbuf_free, and confirmed that they are being called. Actually for each packet (icmp request from client to server, and icmp reply on the other direction) I can see that rte_pktmbuf_free is being called twice (I think that would be the expected behaviour, since I'm forwarding each packet). Still, even if sometimes the Mbufs in use decrease one or two units, it keeps growing over time until the app stops working:

1 packets received on client device
Client device AmountOfFreeMbufs: 3769
Client device AmountOfMbufsInUse: 326
Server device AmountOfFreeMbufs: 3539
Server device AmountOfMbufsInUse: 556
rte_pktmbuf_free called!
rte_pktmbuf_free called!

1 packets received on server device
Client device AmountOfFreeMbufs: 3770
Client device AmountOfMbufsInUse: 325
Server device AmountOfFreeMbufs: 3537
Server device AmountOfMbufsInUse: 558
rte_pktmbuf_free called!
rte_pktmbuf_free called!

So, when I was writting this response, it bothered me the fact that when running the iperf only one of the devices did run out of MBufs. And it ocurred to me that the cause may be something I was doing wrong when forwarding the packets. So I commented the forwarding lines of code (clientDevice->sendPacket serverDevice->sendPacket), and repeated the ping test (from client to server), trying to flood the app. I know since there is no forwarding, the server would never receive the packets and there will be no icmp replies, but I wanted to check if the clientDevice did run out of MBufs because of the ICMP requests received. It did not. The amount of used MBuf remained more or less the same (around 128), even testing ping in flood mode with a 10ms interval and 500 bytes payload for a whole minute (>5k packets sent). 

After all this tests, I think the problem has something to do with the way I'm forwarding the packets in the callback. Maybe I'm doing something wrong? Maybe I'm supposed to clone the received MBufRawPacket before sending it to the other device and using the same object is preventing rte_pktmbuf_free from doing the work on any of the Devices?

I checked your code on DpdkExample-FilterTraffic, I know my code is different since I'm just using two simple callbacks and startCaptureMultiThreads instead of custom Workers and as suggested in your third response. I would try this approach to test if the behaviour is the same, but I wanted to ask: I noticed two things on your AppWorkerThread.h code:

1) When calling dev->receivePackets you use a Packet array instead of the MBufRawPacket I receive in the callbacks. Does it make any difference?
2) After iterating the packets, you call delete on the array (delete [] packetArr). I tried to do the same with the array I receive in the callback but got a segmentation fault.

I don't have a deep understanding of Pcap++ internals yet to answer those questions, but I almost sure I'm doing something the wrong way. Could you please confirm what would be the correct/better way to forward the packet to the other device inside each callback?. I hope all these tests and results rings any bells on what the problem may be.

Thanks, your help with this issue is really being appreciated.
 

Vicente Robles Bango

unread,
Feb 12, 2018, 4:27:20 PM2/12/18
to PcapPlusPlus support
Hello again.

Just wanted to let you know that I tried the DpdkWorkerThread approach. I understand better now how the distribution of devices/queues/threads/cores is done, and everything works wells (I could run the example with just 1 core for main and 1 core for a thread that manages the two devices). Bad news is, the behavior persists: it runs out of MBufs when I send the received packets to the second device. Just wanted to let you know this result, in case it gives you a hint on what the problem could be.

Thanks. 

Vicente Robles Bango

unread,
Feb 13, 2018, 11:36:51 AM2/13/18
to PcapPlusPlus support
I did one more test: An app that receives ICMP requests from one interface and spoofs the ICMP replies. So it is like the previous tests but the difference is I'm creating the packet to be sent from scratch, and sending it to the same interface on which the request was received. Same result: the device runs out of MBufs. I think I'll need to dig deeper on DPDK internals to find out what's wrong. Do you have any ideas on what can be wrong?. Thanks

PcapPlusPlus Support

unread,
Feb 13, 2018, 8:12:02 PM2/13/18
to PcapPlusPlus support
Hi,

I had a look in the DpdkDevice::sendPacket() code and I think I have a bug there which results in mbufs being allocated but not released, which is exactly what you experience.
I will try that also, but please add the following line: packetIndex++; after line #993 in DpdkDevice.cpp. It should look like this:

991        newMBuf->data_len = rawPacket->getRawDataLen();
992
993        mBufArr[packetsToSendInThisIteration] = newMBuf;
994        packetIndex++;
995        packetsToSendInThisIteration++;
996        numOfSendFailures = 0;

Please let me know if it solves the problem

Thanks,

Vicente Robles Bango

unread,
Feb 14, 2018, 5:14:30 AM2/14/18
to PcapPlusPlus support
Hello. Thanks for the feedback.

I tested this change, but actually it makes things worse: the bridge does not work from the start.

But I think I can confirm that there is a bug on this class/method, since I tested the l2fwd example from DPDK, and it works OK, reaching 1Gb with iperf and almost no packets dropped.

I'll read the DpdkDevice::sendPacket() and let you know if I found anything. Please let me know if you need me to do any other tests.

PcapPlusPlus Support

unread,
Feb 14, 2018, 8:18:42 AM2/14/18
to PcapPlusPlus support
I'm sorry, this is probably not the right fix. I'll also try to go over the code and find a fix
If you find anything yourself please let me know

PcapPlusPlus Support

unread,
Feb 17, 2018, 6:51:43 PM2/17/18
to PcapPlusPlus support
Hi, I think I found the bug and fixed it. Please pull the latest code. The fix is in commit: 66a7bf61782e3274554013ff26ff105b6ed74cbb
I also added mbuf leakage check to the sending packets unit-tests (in Pcap++Test) and it seems leakage has stopped.
Please let me know if problem is indeed resolved

Vicente Robles Bango

unread,
Feb 18, 2018, 1:40:41 PM2/18/18
to PcapPlusPlus support
Hello.... 

I can confirm I tested the fix, and it works perfect now. No more MBuf leaks after transmitting several Gb of data through the bridge. I just want to send a big THANK YOU. This fix is an important step in the roadmap for our project. Now I'm experiencing a performance lower than expected (compared to pure DPDK bridge), but I'm 99% sure it is caused by my dev environment and/or the simplistic approach of my test code. I will do further tests with a better environment, and if it does not improve, will (once again) ask for your kind help (of course, if you have any general tips or suggestion on this, please don't hold back) . 

Last but not less important: I have not forgot about the bridge example contribution to the project. Once I finish the test and have a better understanding of the library and DPDK, I will clean up my code and contact you for it,

Thanks again, this library is awesome, I'm really having a good time developing on top of it. Keep the good work!. 

Vicente

PcapPlusPlus Support

unread,
Feb 18, 2018, 3:13:31 PM2/18/18
to PcapPlusPlus support
Thank you very much for all the good words, and thanks for using PcapPlusPlus!
I'm looking forward to you contribution to this project and of course don't hesitate to reach out if you have more questions or suggestions

Thanks!

Vicente Robles Bango

unread,
Feb 23, 2018, 1:51:16 PM2/23/18
to PcapPlusPlus support
Hi.. it's me again!

Now that the bug is gone, I have been doing some basic performance tests, not in absolute terms, but comparing a simple Pcap++ against de l2fwd example from DPDK. I noticed several things:

- The PcapPlusPlus example consumes a lot more CPU than the DPDK example.
- The PcapPlusPlus example performs at a rate between 30% and 60% of the DPDK performance.
- The PcapPlusPlus example is less stable than the DPDK example, and more affected by other CPU load.

Initially, I thought that the cause was my code, because I was reading a burst of packets, but sending them one by one inside a loop (like most of the PcapPlusPlus examples), so I changed my app to use sendPackets instead of sendPacket. But the performance and load didn't improve in a visible way. So, I analyzed both, the DpdkDevice class and the DPDK example code, and noticed that your class uses rte_eth_tx_burst, while l2fwd example uses rte_eth_tx_buffer instead. I will try to modify the DPDK class to use the buffer approach to compare the results, since it makes sense it could be the cause for the high CPU usage during the performance test. Anyway, I wanted to ask your opinion on this, and if should the buffer version shows a better performance, would you be willing to include support for this in the library (maybe configurable at compilation time, or dynamic with a setting, or different send methods). Any thoughts on this are really appreciated.

Thanks. 

Vicente Robles Bango

unread,
Feb 23, 2018, 5:53:45 PM2/23/18
to PcapPlusPlus support
Hello!

Here's an update. I actually replaced the rte_eth_tx_burst call with rte_eth_tx_buffer. Everything works OK, but I have only tested with a buffer of size=1, because a I need to implement a time-based buffer flush mechanism (otherwise, it won't send any packets until the buffer is full). But good news is: I found the cause of the high CPU usage: The Worker thread infinite loop, was calling receivePackets over and over. I added a very small delay using usleep, and the CPU usage dropped drastically. Also the performance increased to levels close to the DPDK example as expected. So, great news, the fix was on the worker. Anyway, I'm still curious about the performance gains that using the buffer could bring, would try to finish it when I have the time, but still would like to ask if you have considered using it, and if so, what are your thoughts on this.

Thanks, 

PcapPlusPlus Support

unread,
Feb 23, 2018, 6:50:30 PM2/23/18
to PcapPlusPlus support
Hi Vicente,

High CPU usage is actually normal, even required in DPDK applications. Let me explain the reason: DPDK is meant for high speed packet processing, as close as possible to wire speed. So in crowded networks, where incoming packet rate is high, in order for the application not to miss packets, it must pull packets from the rx queue as fast as possible. The way to do that is to use all CPU available, run in an endless loop and pull packets from the rx queue in each iteration. That's the reason for the high CPU consumption.
High CPU consumption shouldn't affect the performance, it should actually improve it, so it's weird to hear that you improve performance by slowing down the application.
Actually, if you take a look at the l2fwd example, it also has an endless loop (main.c, line 257).
The reasons for the performance difference between l2fwd and you app may be for various reasons:
  • In the endless loop in your worker - are you trying to both send and receive packets in each iteration (same as done in l2fws)?
  • As you may see in l2fwd app, it first tries to send packets and only then read incoming packets from the rx queue. Are you doing the same in your code?
  • In l2fwd app they use rte_eth_tx_buffer to buffer packets, and packets are actually sent on the wire only after a certain amount of time which is more or less 100us. I currently don't have this capability in PcapPlusPlus but if you buffer packets yourself and send it using sendPackets it should serve more a less the same purpose

Please let me know what you think.

Thanks,

Vicente Robles Bango

unread,
Mar 1, 2018, 7:20:03 AM3/1/18
to PcapPlusPlus support
Hello.

Sorry for my late response. I finally got dedicate HW to do more tests and switched focus to prepare that environment. Well, regarding my last post answers, here are my thoughts:
  • In the endless loop in your worker - are you trying to both send and receive packets in each iteration (same as done in l2fws)? 
Yeah I do both in same iteration.

  • As you may see in l2fwd app, it first tries to send packets and only then read incoming packets from the rx queue. Are you doing the same in your code? 
I actually noticed that in the code, and tested both ways (receiving and sending in the same iteration / sending in the next iteration). It did not produce significant differences in the results. But I would like to ask why do they do it that way in the example? What's the reason for this? 
  • In l2fwd app they use rte_eth_tx_buffer to buffer packets, and packets are actually sent on the wire only after a certain amount of time which is more or less 100us. I currently don't have this capability in PcapPlusPlus but if you buffer packets yourself and send it using sendPackets it should serve more a less the same purpose. 
Yes, I could make my own buffer, but since DPDK have this functions, I thought I would be better to use them instead of reinventing the wheel. As I said, I actually did the changes to PcapPlusPlus code to use rte_eth_tx_buffer, but have had no time to implement the timer, so did only tests with a buffer size of 1 (it makes no sense to compare performance). I will let you know the results when I get back to this.

Now, regarding the CPU usage and the performance increase I noticed. I fully understand your reasoning, it makes sense. I actually did another change to the code (additional to the small delay in the worker using usleep. I increased RX_BURST_SIZE on DpdkDevice.cpp from 64 to 256. Sorry I forgot to mention that on my previous message. But the fact is that making just this change (without the small delay) did not produce a big performance increase. I think this change makes more sense, and have a theory on the sleep/CPU issue: I was testing all this on my development laptop using multiple KVM virtual machines. Maybe the high CPU usage on the bridge VM is preventing the other two VMs (the ones running the iperf server and client) from performing OK. Anyway, as I said, I've got dedicated hardware to continue tests, so I will try all this again and let you know the results.

Finally, I have one new question. Do you have tested, of know of anybody who have tested PcapPlusPlus running on a VMware VM? My new test server have ESXI installed on it, and I've setup the exact same environment that I had on KVM. The main difference is the NIC model (vmxnet3 instead of virtio). I tested the l2fwd example and everything works OK, but when I try to run my test apps, no packers are received when calling receivePackets (always returns 0). I will recheck my code again, but I since is the same one that I was using on KVM, I wanted to ask if you think the change to VMware any special treatment from the PcapPlusPlus point of view?

Thanks

Vicente Robles Bango

unread,
Mar 5, 2018, 1:32:34 PM3/5/18
to PcapPlusPlus support
Hello again.

I have some news. I solved the issue of the 0 received packets by increasing DEFAULT_MBUF_POOL_SIZE. Originally (on the KVM environment) I set this value to 2047. If I use this same value on the ESXI env, no packets are bridge, and no error messages are displayed, neither on startup or during the tests. If I increase this value to 4095, some packets are bridged but then it stalls (like when we had the MBuf leak before). Finally, after increasing it to 8191 the bridge started to work as expected. I checked there is no MBuf leak, so I guess the problem is related to quantity of queues on each NIC. On the KVM env each NIC had 2 tx/rx queues. On the ESXI, I noticed each NIC have 16rx / 8tx queues. Could you please give us your thoughts on this? Is this the normal behavior, and is in fact related to the number of queues on the NIC?. If so, what would be a rule of thumb to calculate the needed MBUF_POOL_SIZE? 

The good news, is that after these changes, we tested iperf between two VMs through the bridge, and got a performance of 4Gb (both with l2fwd and our PcapPlusPlus bridge). This was using iperf3 with no parameters. On the same conditions, the performance of a regular linux bridge was around 2Gb. Now the weird part: We tested iperf3 with parallel connections (using -P option). As we increased this value from 8, 16 up to 128, we notices both the l2fwd and PcapPlusPlus app performance dropped to around 3Gb. But the performance of bridge_utls increased up to 4Gb and even 5Gb. We have no explanation for this. Since I noticed on the PcapPlusPlus page that you have tested the library on VMWare, I would like to know what have been you're experience? Have you achieveded better performance? Should it be better, at least closer to the 10Gb that we get from VM to VM tests?. Do you have an explanation for the opposite behavior of DPDK and bridge utils as the parallel connections increase?

Thanks,  

PcapPlusPlus Support

unread,
Mar 7, 2018, 5:16:58 AM3/7/18
to PcapPlusPlus support
Hi Vicente,

Thanks for the great analysis! I'm happy to see you managed to reach the same performance you get in l2fwd in PcapPlusPlus.
Actually if you pull the latest code we added some more optimization which should make PcapPlusPlus a little bit faster.

Regarding the mbuf pool size - I actually don't have any idea how the pool size should matter. When you ran your tests on less than 8191 - are you noticing packets don't arrive to the application or not sent from it?

Regarding iperf3 with multiple streams - again, I'm not sure how to answer. The last time I ran PcapPlusPlus on VMWare VM was a long time ago and I don't remember the performance details.
The only thing I can think of is to increase the number of worker threads - if you have more and more packets coming into the system at the same time, the only way to handle more is adding more worker threads. How many do you have now?

Thanks,

appco...@gmail.com

unread,
Mar 7, 2018, 9:15:31 AM3/7/18
to PcapPlusPlus support
Hello

Regarding the mbuf pool size issue, the behavior is that no packets are received unless I increase the value. I think maybe the quantity is not enough for all the queues? I noticed the number of free MBuf is too slow (around 30) by just starting the app. What is stranges is that it just fails silently, no error messages are shown.

Regarding the performance, I tested with up to 3 threads for each NIC (6 worker threads + 1 main app thread). I can't test with more, because of license limitations of VMWare I can't assign more than 8 cores to each VM. But compared to a test run with just 1 NIC for each interface, the results are quite the same. Not even small improvements were noticed.

Thanks for the information about the new PcapPlusPlus version. I'll try it and let you know how it goes. Do you think it will be worth it to try it with DPKD 18 instead of DPDK 16? Since the performance is now similar to l2fwd, I think the performance issues are DPDK related (maybe I'm missing some configuration/tuning).

Thanks!

PcapPlusPlus Support

unread,
Mar 7, 2018, 2:59:05 PM3/7/18
to PcapPlusPlus support
Hi Vicente,

Thanks for the info.

Regarding the mbuf pool size - when you put a low number - do you notice zero packets arriving to you app or zero packets sent from your app? This may help uncover the mystery

Regarding performance - I have another idea - maybe when you increase the number of parallel connections all packets still arrive to the same rx queue? This is usually the NIC responsibility to load balance packets between rx queues. The hash function usually uses 5-tuple but I think it can be configured otherwise (I need to check how to do that). Maybe in your case the hash function redirects all packets to the same rx queue and that's why you see performance degradation with growing number of parallel connections. Could you please check that and let me know?

Regarding DPDK 18.02 vs 16.11 - I think it's worth upgrading, newer versions are usually better. Doen't hurt to try anyway :)

Thanks,

appco...@gmail.com

unread,
Mar 9, 2018, 9:53:42 AM3/9/18
to PcapPlusPlus support
Hi

Regarding the mbuf pool size: I notice zero packets arriving to my app.

Regarding performance: I think you may be right and it could be related to the distribution of packets inside the NIC. I did a test, and only received packets from queues #0, #5, #7 and #11. Never received a packet from another one, even running the iperf3 test. Actually, 99% of packets are being received on queue #11, just a few packets are received in the other queues. When I check the quantity of available queues using dev->getTotalNumOfRxQueues() / dev->getTotalNumOfTxQueues(), it shows 16 rx queues and 8 tx queues. The weird thing, is that when I run "sudo ethtool -x eth1" (before taking the NIC away from kernel) it shows just 4 queues. Somehow, the quantity of queues changes when the NIC is handed to DPDK and the driver changes from vmxnet3 to igb_uio. 

I will try to give DPDK 18 a shot next week, and let you know how it goes.

Thank you!

PcapPlusPlus Support

unread,
Mar 9, 2018, 10:35:50 AM3/9/18
to PcapPlusPlus support
Hi,

Regarding mbuf pool size:

Regarding performance: it makes sense that the number of rx queues changes between the kernel and DPDK because the kernel may use a different driver than vmxnet3. Because it's a virtual NIC and not a physical one, the driver can actually make a difference.
Regarding packet distribution: if you see that 99% of the packets arrive to 1 rx queue it means your packet distribution doesn't fit the hash function of the (virtual) NIC. Now, it depends what your application is meant to do and what kind of traffic you're trying to simulate using iperf:
  • If you're trying to simulate real ISP traffic, you need to inject a more diverse traffic, meaning traffic with different client/server IPs and client/server ports
  • If you're trying to simulate traffic from certain clients to certain servers, you need to look in the vmxnet3 driver (or PMD, I'm not sure) if the hash function it uses can be changed, for example: to check only client/server ports and not client/server IPs. I'm not sure if the driver supports it, but it worth to try

Anyway, I'm not sure DPDK 18 can help you with that: if the hash function can be configured in the PMD, it may be supported in earlier versions as well. And if it can only be configured in the vmxnet3 driver, it's a VMWare issue rather than DPDK.

I can help you check this out if you want


Thanks,

PcapPlusPlus Support

unread,
Mar 9, 2018, 10:37:01 AM3/9/18
to PcapPlusPlus support
Oops, forgot to write about the mbuf pool size: I'm not sure why this may be happening, let me think about it a little more...
Message has been deleted
Message has been deleted

Vicente Robles Bango

unread,
Mar 11, 2018, 10:40:52 AM3/11/18
to PcapPlusPlus support
Hi

I understand your point on the traffic and its distribution on the NIC queues. I think it makes a lot of sense, but since I get better performance with a regular linux kernel bridge than what I get with the DPDK, I guess the problem is the hash function of the uio_pci_generic driver (the one used with DPDK) and not the one of the vmxnet3 driver, right?. I have been trying to found more information on DPDK + VMWare configuration/tunning, but haven't found anything on controlling the queue number or hash algorithm.

I would really appreciate your help on researching this, making sure we can get a better performance with DPDK than using a regular kernel bridge is really important for us. Otherwise, it just doesn't make sense to use it, but rather turn our focus to netfilter hooks and queues.

Thanks!

PD: Sorry for the deleted posts.

PcapPlusPlus Support

unread,
Mar 11, 2018, 12:19:23 PM3/11/18
to pcappluspl...@googlegroups.com
Hi Vicente,

I started doing some research and then I remembered I already did that research before :)
Apparently NIC offloading is called Receive Side Scaling (RSS), and DPDK supports changing the offloading hash function, as long as the NIC supports it.
You can read about RSS in this post:


Then I took a look in PcapPlusPlus code and apparently a basic type of RSS is already configured:


There are actually 3 lines related to RSS:

 portConf.rxmode.mq_mode = DPDK_CONFIG_MQ_MODE;
 portConf
.rx_adv_conf.rss_conf.rss_key= DpdkDevice::m_RSSKey;
 portConf
.rx_adv_conf.rss_conf.rss_hf = ETH_RSS_IPV4 | ETH_RSS_IPV6;

The first line instructs DPDK to use RSS.
The second line provide a RSS key.
The third line configures the type of hash functions to use. You can find the full list of options here: http://dpdk.org/doc/api/rte__ethdev_8h_source.html (lines 403-480).

In your app, you should probably change the hash function to something else which match your traffic.

I did some research on vmxnet3 and seems it supports only 4 RSS types (http://bpmfdtnl.com/lxr/source/dpdk/drivers/net/vmxnet3/vmxnet3_ethdev.h)

#define VMXNET3_RSS_OFFLOAD_ALL ( \
 ETH_RSS_IPV4
| \
 ETH_RSS_NONFRAG_IPV4_TCP
| \
 ETH_RSS_IPV6
| \
 ETH_RSS_NONFRAG_IPV6_TCP
)

Load balance by IPv4 (I think that clientIP + serverIP, but I' not sure)
Load balance by IPv6 (I think that clientIP + serverIP, but I' not sure)
Load balance by IPv4 non-frag & TCP (I think that means 5-tuple)
Load balance by IPv6 non-frag & TCP (I think that means 5-tuple)

You can try changing it to ETH_RSS_NONFRAG_IPV4_TCP and see if it solves your issue.
If it works, I'd appreciate if you can add a feature in PcapPlusPlus to change RSS types.

Please let me know if it helps.

Thanks,

Vicente Robles Bango

unread,
Mar 12, 2018, 9:47:18 AM3/12/18
to PcapPlusPlus support
Hello!

Thanks a lot for all this detailed information. It actually makes a lot of sense and could explain the performance issue. I will give it a try and let you know the results as soon as possible. 

Of course I would add the feature to PcapPlusPlus, sure!.

Thanks!

PcapPlusPlus Support

unread,
Mar 12, 2018, 9:30:36 PM3/12/18
to Vicente Robles Bango, PcapPlusPlus support
Thanks Vicente,

I'll be waiting for your results

Thanks,

--
You received this message because you are subscribed to the Google Groups "PcapPlusPlus support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcapplusplus-support+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pcapplusplus-support/062bdf1a-feb9-4497-adfd-11b4090a285b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Vicente Robles Bango

unread,
Mar 16, 2018, 9:23:53 AM3/16/18
to PcapPlusPlus support
Hello.

I finally got time to test the RSS options. Good news is, it works as expected, and packets are now being received on different NIC queues. If I run iperf3 with 4 parallel connections, I can see those packets arriving on 4 different queues.

Bad news is, I think there are anoother mbuf bug. I downloaded the last PcapPlusPlus version (still using DPDK 16), but found these problems:

- After running several tests, I start to see this output "Couldn't set new allocated mBuf size to 1514 bytes". Eventually, the bridge stops working. If I run the iperf3 with a higher amount of parallel connections (16 or more), the problem starts earlier. So, I guess it have something to do with more queues using its mbuf pool?

- If I increase the quantity of parallel connections (32, 64, 128), it get worse, the app crashes with the following segmentation fault:

PANIC in vmxnet3_unmap_pkt():
EOP desc does not point to a valid mbuf11: [/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2fc647541d]]

Notice this error is also mbuf related. I tried to tweak the mbuf pool size and increase the hugepages amount, but didn't help. Maybe there is another leak? What do you think? Please let me know if you need me to test something else or provide any other information.

Thanks

PcapPlusPlus Support

unread,
Mar 16, 2018, 7:40:59 PM3/16/18
to PcapPlusPlus support
Hi Vicente,

It's nice to hear that RSS is working. Which RSS type did you use to make it work?
I'd be great if you can add a feature to change RSS types.

Regarding the mbuf error, I see 2 issues there:
  • First, I think there is a bug in DpdkDevice::sendPacketsInner(). If an error occurs after the mbuf has been created (as happens in your case), then the mbuf isn't freed. Please find this line:
LOG_ERROR("Couldn't set new allocated mBuf size to %d bytes", rawPacket->getRawDataLen());

and add a call to free the mbuf right after it:

rte_pktmbuf_free(newMBuf);

This should solve the leakage
  • Now to the real problem which is why rte_pktmbuf_append() failed for the first place. You'll probably need to debug this in order to understand more, as I don't have access to your environment. When observing this method documentation (here) it says that it'll return NULL if there is not enough tailroom in the mbuf. So I'd suggest you first check the tailroom and the headroom sizes using these methods:
printf("tailroom is: %d\n", rte_pktmbuf_tailroom(newMBuf));
printf
("headroom is: %d\n", rte_pktmbuf_headroom(newMBuf));

Please let me know the results and we'll think together how to continue the investigation.

Thanks,

Vicente Robles Bango

unread,
Mar 22, 2018, 7:08:00 PM3/22/18
to PcapPlusPlus support
Hello.

After some days, I'm back with the results. Sorry for the delay.

Regarding DPDK 18.02. I installed, tested it again with the l2fwd example, and got very similar results that the ones with DPDK 16.11. That was expected. Here comes the strange part. When testing PcapPlusPlus with DPDK 18, the performance drops to something around 10 or 20 MBs only. To make sure there was nothing else on the environment that had changed, I switched back and forth between 16 and 18 (each time I recompiled both PcapPlusPlus and my app). With DPDK 16 the results are more or less similar to l2fwd. With DPDK 18, it never reaches more than 20MB. So I was wondering, maybe I'm missing something? Have you tested this combination before?

Regarding the new MBuf problem/leak. Nothing helped to increase the performance and/or prevent the app from crashing. Something I noticed is that the quantity of times that the code reachs the "Couldn't set new allocated mBuf size to %d bytes" point of code is really small (about 4 or 6 times running a 10 sec iperf3 test). So, I just did a lot of tests and gathered all the information I could. I will try to present the facts here, with no particular order, please let me know if something is missing or confusing:

- I monitored the tailroom and headroom. For every single test and point of time, headroom value was 128. Tailroom was constant at 2048 when testing using 1 connection. When testing with several connections, its value was 534 always.

- I did a lot of iperf3 tests with different durations, parallel connections and the behaviour is as follows:

-parallel=1 -time=10, No MBuf errors, no crashes, no problems
-parallel=4 -time=10, Between 2 and 10 MBuf errors. Some of the connections transmits 0 bytes (but only on some iterations, see below **)
-parallel=1 -time=40, No MBuf errors, no crashes, no problems
-parallel=8 -time=10, Between 2 and 10 MBuf errors. More connections transmits 0 bytes and are more constant between iterations. After a few runs, the app crashes with PANIC (see below *)
-parallel=1 -time=80, No MBuf errors, no crashes, no problems
-parallel=16 -time=10, App won't even start. Crashes with panic inmediately
-parallel=1 -time=160, No MBuf errors, no crashes, no problems

- The described behavior is the same with/without the rte_pktmbuf_free call (it makes sense, because it reaches that point of code only a few times.

- All those test used  DEFAULT_MBUF_POOL_SIZE = 16*1024-1. If I increase it to 32 or 64 the behavior is the same. But if I decrease it, the problems arise earlier (for example, PANIC error happens with only 4 parallel connections)

- The problem sometimes seems to be degenerative (a leak or something), because after some runs, it have the PANIC crash problem or more iperf3 connections transmits zero data. But is confusing, because sometimes I see this behavior from the first run. Also, sometimes even when I see no communication, it works if I cancel the iperf run and waits a couple of seconds (without restarting the test app).

- When I see 0 bytes transmitted on some iperf connections, I don't see the MBuff error message, it just fails silently (as happened a couple of weeks before)

- When I run the same tests without changing the RSS formula, the performance degrades, but it doesn't crash, doesn't shows MBuff errors, and I don't see 0 bytes connections. So the problem is related to the quantity of queues receiving packets. 

I know this is a lot of information, maybe not consistent or complete enough, but I hope it will ring any bells for you.

Thanks!


* This is the PANIC error:

PANIC in vmxnet3_unmap_pkt():
EOP desc does not point to a valid mbuf11: [/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7effa3c8d41d]]
10: [/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7effa45796ba]]

** This is what I mean when I said some connections transmits 0 bytes:

[  4]   7.00-8.00   sec   258 MBytes  2.16 Gbits/sec   20    195 KBytes       
[  6]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0    202 KBytes       
[  8]   7.00-8.00   sec   107 MBytes   900 Mbits/sec   85    103 KBytes       
[ 10]   7.00-8.00   sec   181 MBytes  1.52 Gbits/sec   45    264 KBytes       
[SUM]   7.00-8.00   sec   546 MBytes  4.58 Gbits/sec  150   

PcapPlusPlus Support

unread,
Mar 23, 2018, 11:52:07 AM3/23/18
to PcapPlusPlus support
Hi Vicente,

Thank you very much for the detailed information. I've read it thoroughly and it appears there are couple of different issues:
  • Enabling RSS causes some kind of mbuf leakage or corruption
  • Using PcapPlusPlus with DPDK 18.02 gives poor results

I don't believe the 2 issues are related, so my suggestion is to focus on the first one first.


From the information you provided about the tailroom being constant 2048 with 1 connection and 534 with more than 1 connection, there's obviously something here. This is also the reason for getting the "Couldn't set new allocated mBuf size to *** bytes" error.

Let me elaborate on that:

mbuf structure is described in DPDK documentation:


http://dpdk.org/doc/guides/prog_guide/mbuf_lib.html


As you can read in this doc, tailroom is the amount of bytes left in the mbuf after packet data.

I'm not sure where you put the rte_pktmbuf_tailroom() call but from your results on 1 connection (value being 2048) it seems you put this call just after allocating the mbuf from the mbuf pool (when packet data is still empty).

Assuming you didn't change the place of the rte_pktmbuf_tailroom() call going from 1 connection to multiple connection, there is something strange here: it seems that when enabling RSS and allocating a new mbuf from the mbuf pool, the mbuf is not really empty and tailroom value of 534 indicates there is a packet data with length of 1514. Something here doesn't make sense.

To ensure this theory I'd add a call to rte_pktmbuf_data_len() to verify packet data length is indeed 1514.

Please add it to your code and let me know if it's indeed the value you're getting.


I'm not sure why that happens, but I have a few questions that may shed some light and help us investigate:

  • How many worker threads are you using? Assuming more than one - did you make sure each worker reads packets from different rx queue(s)?
  • Please make sure that you call rte_pktmbuf_tailroom() and rte_pktmbuf_data_len() just after allocating the mbuf, meaning just after the call to rte_pktmbuf_alloc()
  • According to rte_pktmbuf_alloc() documentation the allocated mbuf size should be 0. It's worth verifying if you're seeing something different
  • Assuming you're using more than one worker thread, I'm suspecting some kind of race condition, but I can't put my finger on where it might be because mbuf pool should be thread safe (as you can read here). Is there anything in your code that may point to a race condition?

I'd appreciate if you can investigate in those directions and let me know your finding.


Thanks,

Reply all
Reply to author
Forward
0 new messages