I manage a large lab environment that has no NSM. Just to get an idea of what I'm up against, I'm currently TAPing (but failing) one of two border switches, which, during the day, said TAP is a fully saturated 10G link. I will eventually need to monitor the second border switch for a total of 20Gbps.
Hardware limitations:
- E5-2620 (2x 2GHz 6x cores, or 24 (with HT) total cores)
- 192GB of RAM
I can use several of these servers if needed, but each server can't support more than 192GB of RAM. I know this is not recommended for a 10G link, so I will need recommendations for what needs to be cut out, likely rules.
Q1- will cutting down on rules allow me to monitor a 10G link with 192GB of RAM?
I don't have experience deploying server/sensor configurations. Having read the wiki, it is not clear to me how the storage needs to be deployed if I use one "server" and two "sensors", presuming that I can leverage one 192GB RAM node for one 10G TAP.
When I initially deployed SO on one of the above servers as a standalone node, within 30 minutes I had used at least 5% of 2TB, and looking at the NIC stats now, I see the server can't keep up at all:
Packets: 8,857,581,801
Dropped: 56,371,058
...and hundreds of thousands of bad "SURICATA STREAM" events.
Q2- should a "server" have the most storage space, or should the "sensors"?
Each node could have up to 14TB of space, but I presume the sensors don't need that much.
One of my biggest problems is that I don't grasp how all of the SO tools work with each other. I do not know how to troubleshoot my issues, or where bottlenecks are.
Q3- Is there a high-level overview of how Suricata works with Bro, or Squert with MySql, etc?
Q4- what tuning would need to be done above and beyond the default SO.iso advanced configuration?
Q5- how would you monitor two 10G TAPs with my limitations?
I appreciate any and all feedback. I manage a small SO standalone server for another company that I volunteer for, but it's a 500Mbps TAP that the standard SO configuration is fine for. This deployment is going to teach me a lot, and I hope to contribute back to the community when possible.
Cheers
I've also created an /nsm XFS volume which is about 11TB.
NOTE: the directions here are slightly wrong: https://code.google.com/p/security-onion/wiki/NewDisk
it's "umount" not "unmount"
with the Ubuntu 12.04 SO .iso, this line was added to /etc/fstab:
UUID=234524-3452-45674-3456-456732455674567 /nsm xfs rw,user,auto 0 1
I configured the sensor to use 20 cores for Suricata, and 20 for Bro. It is not specified in the documentation if hyper-threaded cores count as cores.
I'm still losing a lot of packets. I've attached a preliminary sostat-redacted. something still seems like it can't keep up.
admin@securityonion:~$ ethtool -a ethX
Pause parameters for ethX:
Autonegotiate: off
RX: on
TX: on
This works great:
sudo ethtool -A ethX tx off rx off autoneg off
But it is not a permanent solution.
None of these solutions work:
https://help.ubuntu.com/community/UbuntuLTSP/FlowControl
"Disabling flow control with module parameters"
"Disabling flow control with ethtool"
Just to comment, changing the MTU from 1500 to 9000 also did not affect packet drop. It definitely appears to be flow control.
Modifying this line doesn't appear to work.
post-up ethtool -G $IFACE rx 4096; for i in rx tx sg tso ufo gso gro lro; do ethtool -K $IFACE $i off; do ethtool -A $IFACE tx off rx off; done
We're simply mirroring a single 10G upstream port on an Arista 7050 switch.
> Looks like you have an extra "do" in there (there should only be one
> to start the for-loop). Try something like this:
>
> post-up ethtool -G $IFACE rx 4096; for i in rx tx sg tso ufo gso gro
> lro; do ethtool -K $IFACE $i off; ethtool -A $IFACE tx off rx off;
> done
>
> > We're simply mirroring a single 10G upstream port on an Arista 7050 switch.
>
> What kind of NIC in your sensor?
Your modification works great (removing the second "do"), thank you Doug.
$ ethtool -a ethX
Pause parameters for ethX:
Autonegotiate: off
RX: off
TX: off
We're using Intel X520-SR2, Intel sfp+, new OM4 fiber.
Some DMESG output:
ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 3.15.1-k
Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63
PCI Express bandwidth of 32GT/s available
Speed:5.0GT/s, Width: x8, Encoding Loss:20%
NIC Link is Up 10 Gbps, Flow Control: None
I see that the driver version is older, (http://downloadmirror.intel.com/14687/eng/ixgbe-3.22.3.tar.gz) is the newest but I don't know how to update that in Linux or if its needed.