multiple instances of netsniff-ng with AF_PACKET hash fanout

741 views
Skip to first unread message

Michał Purzyński

unread,
Apr 7, 2015, 3:24:01 AM4/7/15
to netsn...@googlegroups.com
I'm trying to implement the PACKET_FANOUT_HASH support. The idea is to
have multiple netsniff-ng processes running and each sharing a portion
of flows. The fanout hash mode would send all packets from the same
flow to the same socket and all processes would join the same socket
group.

I tried to do it like this (once the code works I will refactor it,
it's an ugly hack now ;-)

https://gist.github.com/mpurzynski/5be9f5693e66dbe16947

Single process netsniff-ng works with this code, when I start another
one the first crashes.

What I'm doing wrong here?

Michał Purzyński

unread,
Apr 7, 2015, 3:24:20 AM4/7/15
to netsn...@googlegroups.com
OOM killed told me I had to give the development VM more memory. I did
and the code works - different flows are hashed among two instances.

What do you think about the idea (the code is ugly)? If there's a
chance you would accept a push request, let me know you'd like to code
to look like.

Also, I think that kernel level defragmentation should be enabled, or
fragmented packets from flow A might end up in a different instance
than the flow A because they will have a different hash.

Tobias Klauser

unread,
Apr 7, 2015, 7:35:54 AM4/7/15
to netsn...@googlegroups.com, Michał Purzyński
On 2015-04-04 at 17:09:33 +0200, Michał Purzyński <michalpu...@gmail.com> wrote:
> OOM killed told me I had to give the development VM more memory. I did
> and the code works - different flows are hashed among two instances.
>
> What do you think about the idea (the code is ugly)? If there's a
> chance you would accept a push request, let me know you'd like to code
> to look like.

Very nice, I like the idea a lot!

We prefer patches submitted by e-mail to this list rather than github
pull requests, as this makes it easier for us to review. Please see the
file SubmittingPatches in the netsniff-ng source directory for
additional information.

Daniel Borkmann

unread,
Apr 7, 2015, 10:07:05 AM4/7/15
to netsn...@googlegroups.com, michalpu...@gmail.com, tkla...@distanz.ch
I'm delighted to see work in this direction! Awesome!

I guess we would need to pass cluster_id from the command line, so
that the user could configure this.

What's the exact error message/crash? Are you talking about a kernel
crash? If user land, could you strace -f that?

Thanks,
Daniel

Daniel Borkmann

unread,
Apr 7, 2015, 10:19:46 AM4/7/15
to netsn...@googlegroups.com, michalpu...@gmail.com, tkla...@distanz.ch
Btw, both cluster_mode and cluster_id should be 32 bit, but that doesn't
explain the crash of the other one. ;)

> Thanks,
> Daniel

Michał Purzyński

unread,
Apr 7, 2015, 10:33:25 AM4/7/15
to netsn...@googlegroups.com, tkla...@distanz.ch
So the crash was caused by a real OOM condition, I was testing it in a
VM with 512MB RAM. Changed to 4GB and now I have two netsniff-ng
instances working and getting portion of the flows, hashed by the
kernel, per flow :-)

I will work on the code and submit patches for review in a few days.

Daniel Borkmann

unread,
Apr 7, 2015, 10:59:17 AM4/7/15
to netsn...@googlegroups.com, tkla...@distanz.ch, michalpu...@gmail.com
On 04/07/2015 04:28 PM, Michał Purzyński wrote:
> So the crash was caused by a real OOM condition, I was testing it in a
> VM with 512MB RAM. Changed to 4GB and now I have two netsniff-ng
> instances working and getting portion of the flows, hashed by the
> kernel, per flow :-)
>
> I will work on the code and submit patches for review in a few days.

Ok, great, looking forward!

Daniel Borkmann

unread,
Apr 8, 2015, 7:43:43 AM4/8/15
to netsn...@googlegroups.com, tkla...@distanz.ch, michalpu...@gmail.com
On 04/07/2015 04:34 PM, Daniel Borkmann wrote:
> On 04/07/2015 04:28 PM, Michał Purzyński wrote:
>> So the crash was caused by a real OOM condition, I was testing it in a
>> VM with 512MB RAM. Changed to 4GB and now I have two netsniff-ng
>> instances working and getting portion of the flows, hashed by the
>> kernel, per flow :-)
>>
>> I will work on the code and submit patches for review in a few days.
>
> Ok, great, looking forward!

I think you might also want to let the user choose which fanout
discipline to use in kernel side. There are several useful ones
available.

Jon Schipp

unread,
Apr 10, 2015, 4:31:57 PM4/10/15
to netsn...@googlegroups.com, Tobias Klauser, michalpu...@gmail.com
Awesome, glad to see this coming up again :)



--
You received this message because you are subscribed to the Google Groups "netsniff-ng" group.
To unsubscribe from this group and stop receiving emails from it, send an email to netsniff-ng+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Michał Purzyński

unread,
Apr 11, 2015, 5:04:54 PM4/11/15
to Jon Schipp, netsn...@googlegroups.com, Tobias Klauser
OK, try one. I'm ready to accept heavy artillery fire ;-) Man, it
takes a while to find a free letter for getopt.


Two new parameters were added:
-C <cluster id> with integer that represents the socket fanout group
identifier and must be shared between all processes in the group
-K hash/lb/cpu/rnd - the type of fanout. The only really useful here
is "hash" because it preserves flows. If it is choosen, kernel side
defragmentation is enabled as well (fragments would have different
hash).

Now, kernel does not allow to choose what we are hashing on, and it
seems to be 4-tuple.

I tested it with lb and hash cluster types and everything worked. The
lb cluster type is useless (as is anything that is not "hash" but
given how advanced the nesniff-ng software is, someone might find it
useful and it's just a few lines more.

Patch is below (inline, for comments).


diff -uprN netsniff-ng/netsniff-ng.c
netsniff-ng-multiprocess-clean/netsniff-ng.c
--- netsniff-ng/netsniff-ng.c 2015-04-11 18:48:19.861108673 +0200
+++ netsniff-ng-multiprocess-clean/netsniff-ng.c 2015-04-11
18:51:01.286156858 +0200
@@ -60,12 +60,13 @@ struct ctx {
bool randomize, promiscuous, enforce, jumbo, dump_bpf, hwtimestamp, verbose;
enum pcap_ops_groups pcap; enum dump_mode dump_mode;
uid_t uid; gid_t gid; uint32_t link_type, magic;
+ uint32_t cluster_id, cluster_type;
};

static volatile sig_atomic_t sigint = 0;
static volatile bool next_dump = false;

-static const char *short_options =
"d:i:o:rf:MNJt:S:k:n:b:HQmcsqXlvhF:RGAP:Vu:g:T:DBU";
+static const char *short_options =
"d:i:o:rf:MNJt:S:k:n:b:HQmcsqXlvhF:RGAP:Vu:g:T:DBU:C:K:";
static const struct option long_options[] = {
{"dev", required_argument, NULL, 'd'},
{"in", required_argument, NULL, 'i'},
@@ -81,6 +82,8 @@ static const struct option long_options[
{"user", required_argument, NULL, 'u'},
{"group", required_argument, NULL, 'g'},
{"magic", required_argument, NULL, 'T'},
+ {"cluster-id", required_argument, NULL, 'C'},
+ {"cluster-type", required_argument, NULL, 'K'},
{"rand", no_argument, NULL, 'r'},
{"rfraw", no_argument, NULL, 'R'},
{"mmap", no_argument, NULL, 'm'},
@@ -376,7 +379,7 @@ static void receive_to_xmit(struct ctx *
bpf_dump_all(&bpf_ops);
bpf_attach_to_sock(rx_sock, &bpf_ops);

- ring_rx_setup(&rx_ring, rx_sock, size_in, ifindex_in, &rx_poll,
false, ctx->jumbo, ctx->verbose);
+ ring_rx_setup(&rx_ring, rx_sock, size_in, ifindex_in, &rx_poll,
false, ctx->jumbo, ctx->verbose, ctx->cluster_id, ctx->cluster_type);
ring_tx_setup(&tx_ring, tx_sock, size_out, ifindex_out, ctx->jumbo,
ctx->verbose);

dissector_init_all(ctx->print_mode);
@@ -932,7 +935,7 @@ static void recv_only_or_dump(struct ctx
printf("HW timestamping enabled\n");
}

- ring_rx_setup(&rx_ring, sock, size, ifindex, &rx_poll,
is_defined(HAVE_TPACKET3), true, ctx->verbose);
+ ring_rx_setup(&rx_ring, sock, size, ifindex, &rx_poll,
is_defined(HAVE_TPACKET3), true, ctx->verbose, ctx->cluster_id,
ctx->cluster_type);

dissector_init_all(ctx->print_mode);

@@ -1366,6 +1369,23 @@ int main(int argc, char **argv)
case 'h':
help();
break;
+ case 'C':
+ ctx.cluster_id = (uint32_t) strtoul(optarg, NULL, 0);
+ break;
+ case 'K':
+ if (!strncmp(optarg, "hash", strlen("hash")))
+ ctx.cluster_type = PACKET_FANOUT_HASH;
+ else if (!strncmp(optarg, "lb", strlen("lb")))
+ ctx.cluster_type = PACKET_FANOUT_LB;
+ else if (!strncmp(optarg, "cpu", strlen("cpu")))
+ ctx.cluster_type = PACKET_FANOUT_CPU;
+ else if (!strncmp(optarg, "rnd", strlen("rnd")))
+ ctx.cluster_type = PACKET_FANOUT_RND;
+ else if (!strncmp(optarg, "rollover", strlen("rollover")))
+ ctx.cluster_type = PACKET_FANOUT_ROLLOVER;
+/* else if (!strncmp(optarg, "qm", strlen("qm")))
+ ctx.cluster_type = PACKET_FANOUT_QM;*/
+ break;
case '?':
switch (optopt) {
case 'd':
diff -uprN netsniff-ng/ring_rx.c netsniff-ng-multiprocess-clean/ring_rx.c
--- netsniff-ng/ring_rx.c 2015-04-11 18:48:19.877111409 +0200
+++ netsniff-ng-multiprocess-clean/ring_rx.c 2015-04-11 18:50:50.661402061 +0200
@@ -209,9 +209,23 @@ static void alloc_rx_ring_frames(int soc
rx_ring_get_size(ring, v3));
}

+void create_cluster(int sock, uint32_t cluster_id, uint32_t cluster_mode)
+{
+ uint32_t cluster_option = 0;
+ int ret = 0;
+
+ if (cluster_mode == PACKET_FANOUT_HASH)
+ cluster_mode = PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_DEFRAG;
+ else
+ cluster_option = (cluster_id | (cluster_mode << 16));
+ ret = setsockopt(sock, SOL_PACKET, PACKET_FANOUT,(void
*)&cluster_option, sizeof(cluster_option));
+ if (ret < 0)
+ panic("Cannot set fanout ring mode!\n");
+}
+
void ring_rx_setup(struct ring *ring, int sock, size_t size, int ifindex,
struct pollfd *poll, bool v3, bool jumbo_support,
- bool verbose)
+ bool verbose, uint32_t cluster_id, uint32_t cluster_type)
{
fmemset(ring, 0, sizeof(*ring));
setup_rx_ring_layout(sock, ring, size, jumbo_support, v3);
@@ -220,6 +234,7 @@ void ring_rx_setup(struct ring *ring, in
alloc_rx_ring_frames(sock, ring);
bind_ring_generic(sock, ring, ifindex, false);
prepare_polling(sock, poll);
+ create_cluster(sock, cluster_id, cluster_type);
}

void sock_rx_net_stats(int sock, unsigned long seen)
diff -uprN netsniff-ng/ring_rx.h netsniff-ng-multiprocess-clean/ring_rx.h
--- netsniff-ng/ring_rx.h 2015-04-11 18:48:19.877111409 +0200
+++ netsniff-ng-multiprocess-clean/ring_rx.h 2015-04-11 18:50:50.661402061 +0200
@@ -13,7 +13,7 @@

extern void ring_rx_setup(struct ring *ring, int sock, size_t size,
int ifindex,
struct pollfd *poll, bool v3, bool jumbo_support,
- bool verbose);
+ bool verbose, uint32_t cluster_id, uint32_t cluster_type);
extern void destroy_rx_ring(int sock, struct ring *ring);
extern void sock_rx_net_stats(int sock, unsigned long seen);



On Fri, Apr 10, 2015 at 10:31 PM, Jon Schipp <jons...@gmail.com> wrote:
> Awesome, glad to see this coming up again :)
>
> On Wed, Apr 8, 2015 at 6:43 AM, Daniel Borkmann <bork...@iogearbox.net>
> wrote:
>>
>> On 04/07/2015 04:34 PM, Daniel Borkmann wrote:
>>>
>>> On 04/07/2015 04:28 PM, Michał Purzyński wrote:
>>>>
>>>> So the crash was caused by a real OOM condition, I was testing it in a
>>>> VM with 512MB RAM. Changed to 4GB and now I have two netsniff-ng
>>>> instances working and getting portion of the flows, hashed by the
>>>> kernel, per flow :-)
>>>>
>>>> I will work on the code and submit patches for review in a few days.
>>>
>>>
>>> Ok, great, looking forward!
>>
>>
>> I think you might also want to let the user choose which fanout
>> discipline to use in kernel side. There are several useful ones
>> available.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "netsniff-ng" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to netsniff-ng...@googlegroups.com.

Tobias Klauser

unread,
Apr 13, 2015, 3:23:49 AM4/13/15
to Michał Purzyński, Jon Schipp, netsn...@googlegroups.com
On 2015-04-11 at 18:59:08 +0200, Michał Purzyński <michalpu...@gmail.com> wrote:
> OK, try one. I'm ready to accept heavy artillery fire ;-) Man, it
> takes a while to find a free letter for getopt.
>
>
> Two new parameters were added:
> -C <cluster id> with integer that represents the socket fanout group
> identifier and must be shared between all processes in the group
> -K hash/lb/cpu/rnd - the type of fanout. The only really useful here
> is "hash" because it preserves flows. If it is choosen, kernel side
> defragmentation is enabled as well (fragments would have different
> hash).
>
> Now, kernel does not allow to choose what we are hashing on, and it
> seems to be 4-tuple.
>
> I tested it with lb and hash cluster types and everything worked. The
> lb cluster type is useless (as is anything that is not "hash" but
> given how advanced the nesniff-ng software is, someone might find it
> useful and it's just a few lines more.
>
> Patch is below (inline, for comments).

The patch looks line-wrapped and has tabs converted to spaces. Could you
please resend it without these changes? Usually this is due to the mail
agent line-wrapping even pasted text. I'd suggest to use `git
send-email' which will generate a proper patch e-mail which can directly
be reviewed and applied. Care to retry?

(Therefor I did not review the patch content-wise yet)

Thanks
Tobias

Daniel Borkmann

unread,
Apr 13, 2015, 4:02:10 AM4/13/15
to netsn...@googlegroups.com, Jon Schipp, Tobias Klauser, michalpu...@gmail.com
On 04/11/2015 06:59 PM, Michał Purzyński wrote:
> OK, try one. I'm ready to accept heavy artillery fire ;-) Man, it
> takes a while to find a free letter for getopt.
>
>
> Two new parameters were added:
> -C <cluster id> with integer that represents the socket fanout group
> identifier and must be shared between all processes in the group
> -K hash/lb/cpu/rnd - the type of fanout. The only really useful here
> is "hash" because it preserves flows. If it is choosen, kernel side
> defragmentation is enabled as well (fragments would have different
> hash).

I'd prefer if we name it "fanout-type" and "fanout-group" perhaps,
that way it's clear that we mean the packet-sockets fanout mechanism.
What do you think?

Other than that, there's a couple of more fanout disciplines:

http://lingrok.org/xref/linux-net-next/net/packet/af_packet.c#1371

So it would be: hash/lb/cpu/rnd/qm/roll

I think we might need to describe the exact meaning of that in the
--help option. But I'm totally fine if we do that as follow-up.

> Now, kernel does not allow to choose what we are hashing on, and it
> seems to be 4-tuple.

With QM (queue-mapping), you can just use the NICs HW queue flow
steering for the fanout group distribution.

> I tested it with lb and hash cluster types and everything worked. The

That's great.

> lb cluster type is useless (as is anything that is not "hash" but
> given how advanced the nesniff-ng software is, someone might find it
> useful and it's just a few lines more.

Sure, if we add it, we might also want to give the user a choice. I
think that hash/cpu/qm and roll (== moves to the next fanout socket
after the first's queue is full) might be useful.

Michał Purzyński

unread,
Apr 14, 2015, 2:57:12 AM4/14/15
to Tobias Klauser, Jon Schipp, netsn...@googlegroups.com
Hopefuly done!! I have learned a few useful skills in the process.

Result: 250 2.0.0 OK <something> - gsmtp

Michał Purzyński

unread,
Apr 14, 2015, 2:57:27 AM4/14/15
to Daniel Borkmann, netsn...@googlegroups.com, Jon Schipp, Tobias Klauser
Thanks a lot for a review. I have corrected the naming, set up the
whole git-email infrastructure and (I think so) sent the email here.

ars...@gmail.com

unread,
Apr 30, 2015, 11:33:38 AM4/30/15
to netsn...@googlegroups.com, jons...@gmail.com, tkla...@distanz.ch, bork...@iogearbox.net
Hi all,

I have been using netsniff-ng for some time now and am very excited about packet fanout feature.

Have one AF_PACKET hash fanout functionality related question if somebody has time to comment :

how can I get 3 or more netsniff-ng instances in one fanout-group output into 1 single PCAP file ?

So far, I have tried to start 3 instances with :

sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --mmap --ring-size 256MiB --bind-cpu 18 --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth5." --interval 60sec &
sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --mmap --ring-size 256MiB --bind-cpu 20 --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth5." --interval 60sec &
sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --mmap --ring-size 256MiB --bind-cpu 22 --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth5." --interval 60sec &

However, since interval isn't exactly 60 seconds, after 1-2 days I end up with separate output files, like :

-rw-r--r-- 1 root root 135M Apr 30 14:44 /mnt/sdb1/netcapture/eth5.1430405040.pcap
-rw-r--r-- 1 root root 125M Apr 30 14:44 /mnt/sdb1/netcapture/eth5.1430405041.pcap
$ tcpslice /mnt/sdb1/netcapture/eth5.1430404980.pcap -t
/mnt/sdb1/netcapture/eth5.1430404980.pcap 2015y04m30d14h43m00s733651u 2015y04m30d14h44m00s742344u
$ tcpslice /mnt/sdb1/netcapture/eth5.1430404981.pcap -t
/mnt/sdb1/netcapture/eth5.1430404981.pcap 2015y04m30d14h43m01s118241u 2015y04m30d14h44m01s138441u

Am I doing something wrong the way I start instances, is there different way to start 3 instances to write into single output pcap file ?

Also, I was wondering if there are any plans to make command line ability to start multiple instances using something like one command line with --bind-cpu 18,20,22 and one --out file, which would trigger 3 netsniff-ng instances while output goes into single output pcap file ? ( SolarCapture/SolarFlare uses that approach with multiple capture cores and one writeout core )

Let me know if you need more details.

Best Regards

Ivan

Daniel Borkmann

unread,
Apr 30, 2015, 11:42:49 AM4/30/15
to ars...@gmail.com, netsn...@googlegroups.com, jons...@gmail.com, tkla...@distanz.ch
Hi Ivan,

On 04/30/2015 05:28 PM, ars...@gmail.com wrote:
> Hi all,
>
> I have been using netsniff-ng for some time now and am very excited about packet fanout feature.

Cool, great to hear! :)

> Have one AF_PACKET hash fanout functionality related question if somebody has time to comment :
>
> how can I get 3 or more netsniff-ng instances in one fanout-group output into 1 single PCAP file ?

You below command-line invocation looks good to me. Letting all processes
write into one single pcap file at one, I'm afraid, won't work. There are
various reasons, i.e. it would corrupt the pcap file as there's no
synchronization between the processes to write a single packet atomically
into the pcap.

You also wouldn't want to do that. ;) Because assume if such a possibility
would exist, then the bottleneck becomes easily the write to disc on that
single file.

You rather want to have parallelism all the way to the hardware in the best
case. If you need to merge file, there could f.e. be a background process
grabbing individual pcap files and merge them based on the time-stamps into
a single one, e.g. mergecap:

https://www.wireshark.org/docs/wsug_html_chunked/AppToolsmergecap.html

Hope that helps,

Thanks,
Daniel

Vadim Kochan

unread,
Apr 30, 2015, 12:09:59 PM4/30/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch
> --
> You received this message because you are subscribed to the Google Groups "netsniff-ng" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to netsniff-ng...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Hi,

What about if netsniff-ng can fork children so each children will use
separate output file in specified directory and at the end after all
children done then the main netsniff-ng will merge these files into one, and
remove the files which were generated by children...

Just thoughts ...

Regards,
Vadim Kochan

Daniel Borkmann

unread,
Apr 30, 2015, 12:29:54 PM4/30/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch
Hi Vadim,

On 04/30/2015 05:58 PM, Vadim Kochan wrote:
> On Thu, Apr 30, 2015 at 05:42:41PM +0200, Daniel Borkmann wrote:
...
>>> sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --mmap --ring-size 256MiB --bind-cpu 18 --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth5." --interval 60sec &
>>> sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --mmap --ring-size 256MiB --bind-cpu 20 --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth5." --interval 60sec &
>>> sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --mmap --ring-size 256MiB --bind-cpu 22 --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth5." --interval 60sec &
>>>
...
> What about if netsniff-ng can fork children so each children will use
> separate output file in specified directory and at the end after all
> children done then the main netsniff-ng will merge these files into one, and
> remove the files which were generated by children...
>
> Just thoughts ...

I am not opposed to this idea, I think it would be nice, (but!) only
if it's done right. What I'm a bit concerned is the rising complexity
for such a command line configuration.

That makes me wondering if perhaps a flex/bison possibility should be
designed for netsniff-ng, which can parse a config file, run in daemon
mode and spawn such sub-processes.

Maybe like: netsniff-ng < config

Michał Purzyński

unread,
Apr 30, 2015, 12:40:14 PM4/30/15
to netsn...@googlegroups.com, ars...@gmail.com, Jon Schipp, Tobias Klauser
Having a separate write-only thread could even help the performance if
done right. This will raise the complexity of netsniff-ng code a LOT,
though.

We could go from here to multiple write processes to better use modern
storage subsystems. In fact here we go back to multiple independent
processes because you might want to have two writer processes that
save packets to two independent RAID groups. This should be better
than one giant group.

And we're back to square one ;-)

So far Daniel's idea seems to be best to take, at least for the start.

ars...@gmail.com

unread,
Apr 30, 2015, 12:42:55 PM4/30/15
to netsn...@googlegroups.com, jons...@gmail.com, ars...@gmail.com, tkla...@distanz.ch
Hi Daniel, Vadim,

thanks for your suggestions !

Regarding :

> > You also wouldn't want to do that. ;) Because assume if such a possibility would exist, then the bottleneck becomes easily the write to disc on that single file. You rather want to have parallelism all the way to the hardware in the best case. If you need to merge file, there could f.e. be a background process grabbing individual pcap files and merge them based on the time-stamps into a single one, e.g. mergecap:

I suggested that approach based on SolarCapture which was available by SolarFlare for their cards. They have 1 solar_capture process where you define multiple capture cores and one writeout core. Not sure if anyone used SolarFlare SFN5122/7122 cards. They used to offer free solar_capture tool for near loss-less capture. The issue of bottleneck on write to HD should be
reduced by using SSD. We use SSD and lots of RAM.

The ultimate scenario for lossless capture I was suggested by SolarFlare is something like : 6 Rx queues for NIC and 6 capture cores and 1 writeout core for solar_capture, all on same physical CPU. Not sure how they did it and why did they suggest this approach, but their tool doesn't start multiple solar_capture processes, but shows one process using 300% of CPU if you use 3 capture cores for example.

Regarding :

> What about if netsniff-ng can fork children so each children will use
> separate output file in specified directory and at the end after all
> children done then the main netsniff-ng will merge these files into one, and remove the files which were generated by children...

this would be also possible, but I am afraid mergecap might cause bottleneck on I/O because we need to constantly write new files while trying to merge files generated by children. 1-minute files are 1-2GB and we have to capture 24/7. That's why I was hoping netsniff-ng could put that into one output file to avoid post-processing of captured data.

Regards

Ivan

ars...@gmail.com

unread,
Apr 30, 2015, 2:06:41 PM4/30/15
to netsn...@googlegroups.com, tkla...@distanz.ch, jons...@gmail.com, ars...@gmail.com
Hi Michal,

sorry I didn't refresh page and didn't see you comment when responding.

> Regarding "In fact here we go back to multiple independent
> processes because you might want to have two writer processes that
> save packets to two independent RAID groups. This should be better
> than one giant group. "

The issue is that I already write pcap output files onto two SSD which I have available, so I don't have any other fast storage for temp. children pcap files. And storing them onto same SSD then having to read them out generating final file while netsniff-ng already creates new capture files would likely double I/O on it.

Is there any way to output files to RAM so mergecap can run on files while in RAM when it produces final output ? Maybe to RAM Disk ?
We have lots of RAM and can add more if needed.

I was told that files in Linux usually stay sort of cashed to the extend of available RAM so that mergecap would "likely" work fast if temp files were just generated because their content would be available in RAM, but I am not sure how to confirm that.

Regards

Ivan

Michał Purzyński

unread,
Apr 30, 2015, 2:06:52 PM4/30/15
to netsn...@googlegroups.com, Jon Schipp, ars...@gmail.com, Tobias Klauser
> I suggested that approach based on SolarCapture which was available by SolarFlare for their cards. They have 1 solar_capture process where you define multiple capture cores and one writeout core. Not sure if anyone used SolarFlare SFN5122/7122 cards. They used to offer free solar_capture tool for near loss-less capture. The issue of bottleneck on write to HD should be
> reduced by using SSD. We use SSD and lots of RAM.

Some can do it, some can't ;-)

I would have to buy 50TB of SSD that would fail frequently. I keep it
now on a bunch of SATA and it works and several disks die per year
(2-3).

>
> The ultimate scenario for lossless capture I was suggested by SolarFlare is something like : 6 Rx queues for NIC and 6 capture cores and 1 writeout core for solar_capture, all on same physical CPU. Not sure how they did it and why did they suggest this approach, but their tool doesn't start multiple solar_capture processes, but shows one process using 300% of CPU if you use 3 capture cores for example.


I can only guess on "how" but I might know "why".

1. 6+1 process? because we have 8 cores CPUs

2. why a single cpu? to keep the NUMA affinity and avoid the QPI cross-talk

3. why a single cpu again? Because L3 cache is shared between the
cores on modern CPU and is inclusive, so all data from L1 and L2 are
in L3 (guaranteed on Intel). So instead of going through RAM (which is
slow) one thread can capture packets and the other one that writes
them to disk takes data from either the other core L1/L2 (if lucky) or
L3 (all the time). This is fast.
Also they seem to use threads not processes. Does not matter a lot on
Linux (performance wise) - might be easier for sharing data.

Now I would add DCA to the mix, I wonder if they use it.

Also they might use some kind of direct NIC access instead of pushing
packets through the kernel like we do.

Daniel Borkmann

unread,
Apr 30, 2015, 2:12:03 PM4/30/15
to netsn...@googlegroups.com, tkla...@distanz.ch, jons...@gmail.com, ars...@gmail.com
On 04/30/2015 08:03 PM, ars...@gmail.com wrote:
...
> Is there any way to output files to RAM so mergecap can run on files while in RAM when it produces final output ? Maybe to RAM Disk ?
> We have lots of RAM and can add more if needed.

Yes, you should be able to mount tmpfs and the like to store
pcaps in ram.

ars...@gmail.com

unread,
Apr 30, 2015, 2:20:12 PM4/30/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch
Hi Michal,

agree on 1-3. Number varies depending how many cores are available.
DCA is also used. And they use kernel bypass too :)
But they no longer offer free solar_capture tool either :(

I was wondering about Vadim's suggestion :

"What about if netsniff-ng can fork children so each children will use
separate output file in specified directory and at the end after all
children done then the main netsniff-ng will merge these files into one, and
remove the files which were generated by children... "

Would this be possible but by keeping files in RAM rather than in directory before main netsniff-ng would merge them into single final file ?

Alternative could be me creating RAM disk for temp files. I guess that should work too although it adds complexity of separate scripts I would have to run to merge files into one main 1-minute capture file.

Regards

Ivan

ars...@gmail.com

unread,
Apr 30, 2015, 2:41:32 PM4/30/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch
Hi Daniel,

I will likely try separate output files and mergecap for start w/o specific RAM disk, because according to search "ramfs : This memory is generally used by Linux to cache recently accessed files so that the next time they are requested then can be fetched from RAM very quickly."

Considering I create 1-minute files and machine doesn't do anything else but capture, I would assume that separate files will likely still be in RAM when I try to merge them into one file.

Vadim's approach would be great though as it would be all in one simple command line and one task w/o separate merging.

Regards

Ivan

ars...@gmail.com

unread,
Apr 30, 2015, 2:56:17 PM4/30/15
to netsn...@googlegroups.com, ars...@gmail.com, tkla...@distanz.ch, jons...@gmail.com
While I am trying to work using separate files and changed prefix for each instance trying to pursue mergecap approach, I notice that only 1 file ( last prefix named instance ) gets written.

I see 3 netsniff-ng processes when I start it in top as well as in ps aux :

sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 18 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth50." --interval 60sec &
sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 20 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth51." --interval 60sec &
sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 22 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth52." --interval 60sec &

$ ps aux | grep netsniff-ng
root 2286 0.0 0.0 43312 1672 pts/0 S 18:47 0:00 sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 18 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix eth50. --interval 60sec
root 2287 0.0 0.0 43312 1676 pts/0 S 18:47 0:00 sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 20 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix eth51. --interval 60sec
root 2288 0.0 0.0 43312 1672 pts/0 S 18:47 0:00 sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 22 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix eth52. --interval 60sec
root 2289 2.9 0.8 226492 210060 pts/0 SL 18:47 0:07 /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 18 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix eth50. --interval 60sec
root 2290 3.3 0.8 226492 210060 pts/0 SL 18:47 0:08 /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 20 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix eth51. --interval 60sec
root 2291 2.8 0.8 226492 210060 pts/0 SL 18:47 0:06 /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 22 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix eth52. --interval 60sec

But I see only eth52.epoch.pcap written :

-rw-r--r-- 1 root root 251M Apr 30 18:49 /mnt/sdb1/netcapture/eth52.1430419680.pcap
-rw-r--r-- 1 root root 272M Apr 30 18:50 /mnt/sdb1/netcapture/eth52.1430419740.pcap
-rw-r--r-- 1 root root 245M Apr 30 18:51 /mnt/sdb1/netcapture/eth52.1430419800.pcap
-rw-r--r-- 1 root root 214M Apr 30 18:52 /mnt/sdb1/netcapture/eth52.1430419860.pcap
-rw-r--r-- 1 root root 57M Apr 30 18:52 /mnt/sdb1/netcapture/eth52.1430419920.pcap

Am I doing something wrong ?

Regards

Ivan

Daniel Borkmann

unread,
May 1, 2015, 7:06:15 AM5/1/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch, michalpu...@gmail.com
On 04/30/2015 08:19 PM, ars...@gmail.com wrote:
...
> Would this be possible but by keeping files in RAM rather than in directory before main netsniff-ng would merge them into single final file ?

If you mean "files" in the sense of files that should be open(2)'ed etc, then
you need an underlying file system which does management for you. And that can
be something like tmpfs et al which is entirely in ram.

If you mean, to accumulate data by all instances in a huge, shared buffer,
then I'm not yet seeing it how it should scale. You will certainly need in
addition atomic operations to make sure buffer slots will not get corrupted,
and we should avoid such contention, plus you have to get that to disc at
some point.

Lets say, each process would have it's own buffer it operates on for each
single core, and there's one core dedicated to collect from all buffers, and
write them to disc. That would require substantial changes in how netsniff-ng
operates, but currently, you could spread that load to various discs and not
a single one, arguably with the mentioned drawback that you need to merge
pcaps if you prefer one big one. But even for that you could dedicate a
single core and get a cronjob running that merges them and moves the data
out of ram. I think that the interval such a merged pcap covers might
usually not that huge anyway to keep it manageable.

> Alternative could be me creating RAM disk for temp files. I guess that should work too although it adds complexity of separate scripts I
> would have to run to merge files into one main 1-minute capture file.

Yep, that would require a bit of scripting, but it shouldn't be difficult.
Why not implementing a generic script, that we could merge upstream as an
example use case? That way, it gives a good starting point for you and others? ;)

Daniel Borkmann

unread,
May 1, 2015, 7:19:28 AM5/1/15
to netsn...@googlegroups.com, ars...@gmail.com, tkla...@distanz.ch, jons...@gmail.com
Hi Ivan,

On 04/30/2015 08:55 PM, ars...@gmail.com wrote:
> While I am trying to work using separate files and changed prefix for each instance trying to pursue mergecap approach, I notice that only 1 file ( last prefix named instance ) gets written.
>
> I see 3 netsniff-ng processes when I start it in top as well as in ps aux :
>
> sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 18 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth50." --interval 60sec &
> sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 20 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth51." --interval 60sec &
> sudo nohup /usr/local/sbin/netsniff-ng --fanout-group 1 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 22 --notouch-irq --silent --in eth5 --out /mnt/sdb1/netcapture/ --prefix "eth52." --interval 60sec &

Interesting, lets say if you use --fanout-type lb, would that make a difference?

This should round robin each packet between these 3 processes, so you should
definitely see something unless we have a bug. ;) Let me know, so we can look
further ...

Thanks!
Daniel

ars...@gmail.com

unread,
May 1, 2015, 8:55:47 AM5/1/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch
Hi Daniel,

> Interesting, lets say if you use --fanout-type lb, would that make a difference?
>
> This should round robin each packet between these 3 processes, so you should
> definitely see something unless we have a bug. ;) Let me know, so we can look
> further ...

no change if using --fanout-type lb. I see all 3 instances run and show similar CPU % in top, but only last file seems to be on my SSD.
BTW, --fanout-type qm returns "$ Cannot set fanout ring mode!", that's why I switched to type cpu. ( considering I use irq affinity script to lock Rx queues to certain CPUs and also I lock netsniff-ng, I guess it doesn't matter which one I use cpu or qm )

Let me know if you need more details about my system or if there is anything else I can help with to troubleshoot.

Regarding merge script, one I use isn't very generic and it has one flaw :)

Since I generate 1 minute files and I try to make pcap files start on 00.000, I used "HiRes time under GNU-Linux" script to wait until exact begin of minute before I start netsniff-ng.

After that, I just run cronjob every minute to merge files from previous minute using something like this :

t2=`date +%s --date="1 minutes ago"`
curminute=$(($t2 - ($t2 % ( 1 * 60 ))))
t2=$curminute

file1=/mnt/sdc1/netcapture/eth6.$t2.pcap

Obviously, in order to make it universal, we would need to know how many instances of netsniff-ng run and some people might record into rotating set of files or space, some record to the extend of space and delete % of oldest files. Not sure how to tackle those subjects, but will try to come up with something.

Also, the issue with my approach is that files in --interval 60 drift little bit and next file doesn't really start at 00.000 of next minute, so after certain time my formula doesn't work because file eth6.1430482620.pcap will become 1430483341.pcap ( on solar_capture somehow, it stays within 1 ms for months running 24/7 )

Maybe I have to use incron IN_MOVED_TO to trigger tasks rather predicting file names.

Regards

Ivan

Vadim Kochan

unread,
May 1, 2015, 12:50:04 PM5/1/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch
> --
> You received this message because you are subscribed to the Google Groups "netsniff-ng" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to netsniff-ng...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


Hi,

As I understand that main issue is that you want constantly sniffing
into files and then at some time you want to glue them together in the
single one ?

If that is correct, I was thinking what about to make able netsniff-ng
to output to another subdir after some 'time' or 'capture size' criteria
will be reached, for example, you specify to netsniff-ng some output dir
'pcap_eth0' where it creates subdir named by timestamp (for example) and
each instance of netsniff-ng starts capture, and after some capture size
is reached or time interval then netsniff-ng creates another subdir and
switches there, then you can you probably can collect the captured files
from these subdirectories ... well I hope my explanation is basically
clear ... sorry if you did not understand my poor English:)

Regards,
Vadim Kochan

ars...@gmail.com

unread,
May 1, 2015, 1:09:46 PM5/1/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch
Hi Vadim,

your understanding is correct.

I need to capture all the time into 1-minute files and need to merge files from previous minute ( once recording completes ) into one 1-minute file.

Your approach using sub-directories named by epoch is interesting since it would make it easier for mergecap to just merge all files in that sub-directory knowing they all belong to same minute of interest.

I will try it that way, but currently I can't output into 3 different files as I pointed out to Daniel earlier. Only last instance I start writes file when I start 3 of them.

Regards

Ivan

Daniel Borkmann

unread,
May 1, 2015, 1:11:50 PM5/1/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch
On 05/01/2015 07:07 PM, ars...@gmail.com wrote:
...
> I will try it that way, but currently I can't output into 3 different files as I pointed out to Daniel earlier. Only last instance I start writes file when I start 3 of them.

Yes, I'll try looking into that over the weekend.

Thanks,
Daniel

Daniel Borkmann

unread,
May 6, 2015, 5:52:53 PM5/6/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch
Sorry for the late answer.

On 05/01/2015 02:53 PM, ars...@gmail.com wrote:
...
>> Interesting, lets say if you use --fanout-type lb, would that make a difference?
>>
>> This should round robin each packet between these 3 processes, so you should
>> definitely see something unless we have a bug. ;) Let me know, so we can look
>> further ...
>
> no change if using --fanout-type lb. I see all 3 instances run and show similar CPU % in top, but only last file seems to be on my SSD.
> BTW, --fanout-type qm returns "$ Cannot set fanout ring mode!", that's why I switched to type cpu. ( considering I use irq affinity script to lock Rx queues to certain CPUs and also I lock netsniff-ng, I guess it doesn't matter which one I use cpu or qm )

The qm won't work in your case since your kernel is too old.

So I was just using the following:

netsniff-ng --fanout-group 1 --fanout-type rnd --fanout-opts defrag --ring-size 128MiB --bind-cpu 0 --notouch-irq --silent \
--in lo --out /tmp/netcapture/ --prefix "a." --interval 60sec &

netsniff-ng --fanout-group 1 --fanout-type rnd --fanout-opts defrag --ring-size 128MiB --bind-cpu 1 --notouch-irq --silent \
--in lo --out /tmp/netcapture/ --prefix "b." --interval 60sec &

netsniff-ng --fanout-group 1 --fanout-type rnd --fanout-opts defrag --ring-size 128MiB --bind-cpu 2 --notouch-irq --silent \
--in lo --out /tmp/netcapture/ --prefix "c." --interval 60sec &

And doing a ping 127.0.0.1, gives me pcap files in /tmp/netcapture/ with all three {a,b,c}.
prefixes; plus after 60sec they start a new one.

I've also tried two instances dumping to two different files with various fanout-types,
they seem to work as expected. That means that fanout in general seems to function.

The only issue (as I don't know what traffic you're seeing) could be that your rxhash
always falls into fanout member with that prefix, strange.

Does the above example work for you on loopback? If you have two instances in the same
group outputting to a normal pcap file with rnd, do you both files get written?

Daniel Borkmann

unread,
May 6, 2015, 6:02:23 PM5/6/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch
Also fanout-type by hash on physical interface looks correct in my case, f.e.:

netsniff-ng --fanout-group 2 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 0 --notouch-irq --silent \
--in wlp2s0b1 --out /tmp/netcapture/ --prefix "a." --interval 60sec &

netsniff-ng --fanout-group 2 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 1 --notouch-irq --silent \
--in wlp2s0b1 --out /tmp/netcapture/ --prefix "b." --interval 60sec &

netsniff-ng --fanout-group 2 --fanout-type hash --fanout-opts defrag --ring-size 128MiB --bind-cpu 2 --notouch-irq --silent \
--in wlp2s0b1 --out /tmp/netcapture/ --prefix "c." --interval 60sec &

E.g. if I'm running ping 8.8.8.8, then that /only/ lands with some other stuff in c's pcap,
which is as expected.

Cheers,
Daniel

Michał Purzyński

unread,
May 6, 2015, 6:15:36 PM5/6/15
to netsn...@googlegroups.com, ars...@gmail.com, Jon Schipp, Tobias Klauser
That were my results as well.

It would be worthwhile to investigate how the hash is generated in the
kernel. I remember, somewhere in the original patch that introduced
this functionality, developers decided to let the hardware generate
rxhash and pass it to user space to avoid L1 cache miss.

As we all know, 82599 hashing is no symmetric. But in my testing both
a->b and b->a traffic goes correctly to the same cluster. Maybe the
hardware hash is not used at all?

I will dig deeper.

Daniel Borkmann

unread,
May 6, 2015, 6:32:48 PM5/6/15
to netsn...@googlegroups.com, ars...@gmail.com, Jon Schipp, Tobias Klauser, michalpu...@gmail.com
On 05/07/2015 12:14 AM, Michał Purzyński wrote:
> That were my results as well.
>
> It would be worthwhile to investigate how the hash is generated in the
> kernel. I remember, somewhere in the original patch that introduced
> this functionality, developers decided to let the hardware generate
> rxhash and pass it to user space to avoid L1 cache miss.
>
> As we all know, 82599 hashing is no symmetric. But in my testing both
> a->b and b->a traffic goes correctly to the same cluster. Maybe the
> hardware hash is not used at all?

Depends, for example, in ixgbe which I have a couple of, you can see in
the driver that in ixgbe_process_skb_fields() the rxhash can be copied
from hw ring descriptor into skb_set_hash() [1] (only l3). Quite a number
of drivers support offload for that.

At latest, if your nic doesn't support l4 hash, then in pf_packet when
doing the fanout, skb_get_hash() [2] gets called to build one with the
kernel's flow dissector in sw; l4 in order to get a bit more entropy.

That's why queue mapping (qm) can be less costly in some cases, for example.

In any case, it also doesn't really matter if 'hash' is used, even 'rr' or
'rnd' should populate other pcaps, which I find strange that apparently it
was not the case by Ivan. Works from my side. Would be good if you could
clarify the questions from the other mail, Ivan.

> I will dig deeper.

[1] http://lingrok.org/xref/linux-net-next/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c#1593
[2] http://lingrok.org/xref/linux-net-next/net/packet/af_packet.c#1285

ars...@gmail.com

unread,
May 7, 2015, 2:53:51 AM5/7/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, tkla...@distanz.ch, michalpu...@gmail.com
Hi all,

> The qm won't work in your case since your kernel is too old

$ uname -a
Linux 3.2.0-74-generic #109-Ubuntu SMP Tue Dec 9 16:45:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Do I need newer kernel ? 3.2.0-82-generic is available.

> The only issue (as I don't know what traffic you're seeing) could be that > your rxhash always falls into fanout member with that prefix, strange.

> I would need to run more isolated test because I capture gigabytes of > data, but I can tell by size of capture files that I don't get all data > since
> my 1 minute files are lot smaller than single file I get when capturing >w/o fanout feature.

> Does the above example work for you on loopback? If you have two instances > in the same
> group outputting to a normal pcap file with rnd, do you both files get > written?

For rnd and qm, I get "Cannot set fanout ring mode!" when trying to use Lo or physical NIC eth5 in my case.
For hash mode results in only last instance file being written. Tried lo and eth5.
Also, tried group 1 and 2 just in case.

Let me know if I can provide any additional details.

Regards

Ivan

Michał Purzyński

unread,
May 7, 2015, 2:54:06 AM5/7/15
to ars...@gmail.com, Jon Schipp, Tobias Klauser, netsn...@googlegroups.com

I used 3.13 for testing. It's in Ubuntu as HWE stack.

Daniel Borkmann

unread,
May 7, 2015, 3:11:54 AM5/7/15
to netsn...@googlegroups.com, ars...@gmail.com, Jon Schipp, Tobias Klauser, michalpu...@gmail.com
On 05/07/2015 02:32 AM, Michał Purzyński wrote:
> I used 3.13 for testing. It's in Ubuntu as HWE stack.

So, following commit was added v3.0-rc4-846-gdc99f60 ...

commit dc99f600698dcac69b8f56dda9a8a00d645c5ffc
Author: David S. Miller <da...@davemloft.net>
Date: Tue Jul 5 01:45:05 2011 -0700

packet: Add fanout support.

... but would be great if you have the chance to try something more recent
as Michal pointed out.

> On May 7, 2015 2:29 AM, <ars...@gmail.com> wrote:
>
>> Hi all,
>>
>>> The qm won't work in your case since your kernel is too old
>>
>> $ uname -a
>> Linux 3.2.0-74-generic #109-Ubuntu SMP Tue Dec 9 16:45:49 UTC 2014 x86_64
>> x86_64 x86_64 GNU/Linux
>>
>> Do I need newer kernel ? 3.2.0-82-generic is available.
>>
>>> The only issue (as I don't know what traffic you're seeing) could be
>> that > your rxhash always falls into fanout member with that prefix,
>> strange.
>>
>>> I would need to run more isolated test because I capture gigabytes of >
>> data, but I can tell by size of capture files that I don't get all data >
>> since
>>> my 1 minute files are lot smaller than single file I get when capturing
>>> w/o fanout feature.
>>
>>> Does the above example work for you on loopback? If you have two
>> instances > in the same
>>> group outputting to a normal pcap file with rnd, do you both files get >
>> written?
>>
>> For rnd and qm, I get "Cannot set fanout ring mode!" when trying to use Lo
>> or physical NIC eth5 in my case.

Ok, I see. Since your kernel doesn't support that.

>> For hash mode results in only last instance file being written. Tried lo
>> and eth5.

And that does also happen for rr/lb mode (round robin), right? Even if you
remove the --out and --silent, etc?

One terminal: netsniff-ng --fanout-group 1 --fanout-type rr --in lo
Another: netsniff-ng --fanout-group 1 --fanout-type rr --in lo

And then ping 127.0.0.1 ? In any case, I'd recommend trying out a newer
kernel (yours is roughly 4 years old).

Cheers,
Daniel

ars...@gmail.com

unread,
May 7, 2015, 11:00:02 AM5/7/15
to netsn...@googlegroups.com, michalpu...@gmail.com, tkla...@distanz.ch, jons...@gmail.com, ars...@gmail.com
Hi all,

I am very sorry but it looks like this was working all the time at least in couple of modes but I messed up ls command and didn't see other files !
I use ls eth5* rather than eth5-0*, eth5-1* and there were so many files in folder so I didn't see other instances when sorted by name !

This works just fine with this older kernel, other than qm and rnd modes, which don't really need that much. Since I lock Rx queues to CPUs, I could use that mode. Also, mode "lb" was also producing very even file size and evenly distributed CPU utilization. 3 instances for example :

-rw-r--r-- 1 root root 484M May 7 14:35 /mnt/sdb1/netcapture/eth5-0.1431009240.pcap
-rw-r--r-- 1 root root 431M May 7 14:36 /mnt/sdb1/netcapture/eth5-0.1431009300.pcap

-rw-r--r-- 1 root root 485M May 7 14:35 /mnt/sdb1/netcapture/eth5-1.1431009240.pcap
-rw-r--r-- 1 root root 431M May 7 14:36 /mnt/sdb1/netcapture/eth5-1.1431009300.pcap

-rw-r--r-- 1 root root 487M May 7 14:35 /mnt/sdb1/netcapture/eth5-2.1431009240.pcap
-rw-r--r-- 1 root root 430M May 7 14:36 /mnt/sdb1/netcapture/eth5-2.1431009300.pcap

For start, I will use that approach of processing sub-folders based on names via mergecap counting on Linux cash still holding most recent 1 minute files in cache for fast processing.

I would keep an eye if someone introduces that method of eventually directing multiple netsniff-ng child instances into one which would write into single file. Considering fast I/O and SSDs used, that might still work ok for some users.

Regards

Ivan

Daniel Borkmann

unread,
May 7, 2015, 11:02:28 AM5/7/15
to netsn...@googlegroups.com, michalpu...@gmail.com, tkla...@distanz.ch, jons...@gmail.com, ars...@gmail.com
On 05/07/2015 04:47 PM, ars...@gmail.com wrote:
...
> I am very sorry but it looks like this was working all the time at least in couple of modes but I messed up ls command and didn't see other files !

Ok, no problem. It's good that we now have it verified from a couple of people
that it works fine. :))

Cheers & thanks,
Daniel

ars...@gmail.com

unread,
May 7, 2015, 12:23:40 PM5/7/15
to netsn...@googlegroups.com, ars...@gmail.com, tkla...@distanz.ch, jons...@gmail.com, michalpu...@gmail.com
Daniel,

is there any difference or advantage between lb/hash vs cpu mode ?

If I use affinity to lock let's say my 4 Rx queues to CPUs 2,4,6,8, would most efficient way of running netsniff-ng be to run fanout-type cpu and bind 4 instances to CPUs 10,12,14,16 ( assuming these 8 cores are on 1 physical CPU ) or it doesn't really matter ?

I didn't see much difference in lb vs hash mode when it comes to output file sizes and CPU utilization, but let me know if one has advantage over the other in some form or fashion.

Regards

Ivan

Daniel Borkmann

unread,
May 7, 2015, 12:40:02 PM5/7/15
to netsn...@googlegroups.com, ars...@gmail.com, tkla...@distanz.ch, jons...@gmail.com, michalpu...@gmail.com
On 05/07/2015 06:22 PM, ars...@gmail.com wrote:
...
> I didn't see much difference in lb vs hash mode when it comes to output file sizes and CPU utilization,
> but let me know if one has advantage over the other in some form or fashion.

It probably depends on what you want. ;) So from the fanout demuxing
disciplines (lb vs hash vs cpu) themselves, cpu seems to be the most
lightweight (haven't measured them yet), i.e. it's just a cpu % fanout-group-size.
With lb, you obviously need synchronization with regards to demuxing
to a group member (that is, 2 atomic reads, 1 atomic cmpxchg) and
with fanout hash, worst case you have to go into the kernel flow dissector
to pick up flow keys to generate a l4 hash to demux over. If you can
distribute the load via RSS and the use cpu, or on newer kernels, qm,
then that's probably most lightweight. hash or a combination with RSS
and cpu/qm, I find useful since it allows flows to stay pcap-local
unlike lb.

Cheers,
Daniel

ars...@gmail.com

unread,
May 7, 2015, 3:41:59 PM5/7/15
to netsn...@googlegroups.com, ars...@gmail.com, jons...@gmail.com, michalpu...@gmail.com, tkla...@distanz.ch
Hi Daniel,

sounds good.

I will try distributed load via RSS and fanout cpu, and on some machines which for some reason keep hitting just one IRQ ( I have myricom 10G card configured with myri10ge_max_slices=8 but Rx keeps hitting just one IRQ all the time), will try to go for hash rather than lb.

Regards

Ivan
Reply all
Reply to author
Forward
0 new messages