Our experience with TRex testing NGFW

1,261 views

Skip to first unread message

Andreas Vöst

unread,

Mar 1, 2019, 5:22:36 AM3/1/19

to trex...@googlegroups.com

As network engineers we are involved in performance tests on a regularly
basis. Sizing decisions, preparation of software rollouts on network
devices or the installation of new IPS signatures on IPS systems or
“next generation firewalls” (NGFW) are examples which require
performance testing.

A couple of questions arise when dealing with performance tests: How
much time do you have at hand? What is your budget? Do you simply accept
the datasheet performance figures? How important is it to get exact
performance measures? Are you required to perform these tests regularly?

Let’s get started by describing the situation where we came in touch
with TRex for the first time and by stating the (performance) questions
we intended to answer using TRex. Thereafter, we will discuss our findings.

As network consultants we supported a customer in his NGFW tender/proof
of concept (PoC). As usual, we came across performance questions which
directly impacted sizing decisions. Below are typical questions related
to NGFW designs or designs involving other network devices:

- What is the realistic throughput we can expect with a traffic mix
typical for the customer? Note: We used NetFlow traffic reports as
base-line to model the typical protocol ratios for the test traffic.
- What is the maximum TCP throughput (depending on packet sizes) for a
single session? Note: This was of interest since NGFWs usually handle
one session by a single CPU. So, the goal was to measure the “single CPU
performance”.
- What is the maximum TCP throughput (depending on packet sizes) for
multiple concurrent sessions?
- What is the maximum UDP throughput (depending on packet sizes) for
multiple concurrent sessions?
- What is the latency introduced by the network device?
- What is the latency introduced by the network device under heavy load
(e.g. 90% of the maximum measured throughput)?
- What is the connection setup rate (amount of new TCP connections per
second)?
- What is the maximum amount of parallel TCP sessions?
- How many TLS sessions can the devices setup per second with
SSL-decryption enabled?
- How many TLS sessions can the device handle with SSL-decryption enabled?
- What is the throughput for SSL-decrypted traffic?

NGFWs can offload specific traffic. This traffic is then no longer fully
inspected (by IPS, anti-virus and other inspection engines). Thus, two
different throughput figures could be expected, one for offloaded and
one for fully inspected traffic. In addition to that the performance
difference between fully inspected and offloaded traffic was of interest.

Tests with malware-samples were not part of the TRex performance tests
since malware detection was tested separately. TRex was solely used for
traffic generation to answer all the questions stated above.

Coming back to the initial motivation:
We were dealing with a NGFW PoC. The vendor that wins the PoC will be
used in the customer’s network for the foreseeable future. We wanted to
know the characteristics of these machines, since we must reflect these
in sizing and operational decisions. If you consider the sizing aspect,
it is obvious what financial impact this has and how this directly
relates to the budget: Can I go with the smaller (pricier) appliance or
do I have to buy the big one.

In addition to that, new software versions must be tested before
productive use. Thus, these tests will be performed again and again on a
regular basis. The same applies to other network devices (e.g. routers,
switches and load balancers): The performance must be verified before a
new software version is deployed in production to assure that a new
software version does not have a negative performance impact.
Due to all these points, it was decided that a traffic generator for
performance testing is a good investment for the test lab.

So, why was TRex chosen as the traffic generator?

1. It was the priciest solution of those that we considered.
We deployed TRex on a UCS server with 4x40GE, 8x10GE and 8x1GE ports.

2. Internally at IsarNet AG we already had some experience with TRex. It
was successfully used for convergence measurements during other PoCs in
Cisco labs.

As described earlier, excessive performance testing for IPv4 and IPv6
was done. Below are our findings during the test procedures:

As with each product, the TRex had to be integrated into the Lab
environment. This means customization of the TRex deployment according
to various requirements, including a web-GUI for performance plots and
several Make-files and scripts to setup the TRex test topology. Due to
the full API programmability of TRex, it was very easy to combine TRex
with an influxdb/Grafana setup for visualization.

The TRex version we used for this PoC (version 2.41) did not support
IPv6 neighbor discovery. Thus, we had to configure static IPv6 neighbor
entries on the tested devices. Apparently, this is now supported by TRex
(as for now we did not conduct any further performance tests with a new
TRex version). However, this underlines an observation we have made so
far: The TRex developers are eager to improve their product. If you run
into issues and your feedback is relevant, it usually only takes a
couple of days until you are able to download a new TRex version which
includes a fix or implements your feature request.

After successful integration of the TRex into the lab environment, we
were able to perform all tests which did not require the TRex to do any
handshaking or state information. This was an issue even though, nearly
all tests were performed using the ASTF mode of TRex. Keep in mind that
we were testing NGFWs, which are expected to verify a correct session
setup. Clearly, TRex is not able to handle several million TCP sessions
if it has to keep the state for each single session. To generate the
high traffic loads, the TRex can therefore not react to e.g.
SYN-cookies, which had to be disabled on the tested devices. This
however appears to be not only an issue for TRex but also for other
commercial products as we learned through a vendor, referencing NSS lab
tests, when we conducted our performance test.

As for the TLS tests, we were not able to perform these tests since TRex
does not yet support stateful SSL/TLS session setups. SSL-decryption is
widely deployed at enterprise networks and we are currently not able to
test performance impacts of SSL-decryption. We really hope that the
developers find some time to implement stateful TLS setup support at
some point.

Given these findings, we can summarize: The maximum throughput we
achieved with our setup for unencrypted traffic was 80Gbit/s. To get
these performance figures, two 40 Gbit/s QSFP+ port pairs were used.
Each port of a port pair had to be on a different network card to
achieve the best performance. Further, the test traffic was tuned to
produce similar traffic volumes in bot directions of the communication
(between server and client). Obviously, the single session and “small
packet” performance was a bit smaller. We were able to perform nearly
all the relevant performance tests using TRex. The generated traffic
load was more than enough to identify throughput limitations of the
tested products.

Thanks a lot for your great tool, the next PoC is on the radar and TRex
will be part of it!

hanoh haim

unread,

Mar 3, 2019, 6:12:48 AM3/3/19

to Andreas Vöst, TRex Traffic Generator

Hi Andreas,

Thanks for the feedback. A few questions and clarification

1. In regards to this

"TRex is not able to handle several million TCP sessions

if it has to keep the state for each single session ."

TRex does keep full state per flow and can scale to ~40M active flows without an issue -- it is real flows.

Clients and Servers has a full BSD TCP implementation. I would expect to SYN-cookies to work in ASTF mode.

As it is a standard BSD client talking to standard server. The scale of 40M flows is achieved in a different way

See here for more info how we did that:

https://trex-tgn.cisco.com/trex/doc/trex_astf.html#_the_main_properties_of_scalable_tcp_for_traffic_generation

https://trex-tgn.cisco.com/trex/doc/trex_astf_vs_nginx.html

Could you describe the issue? Configuration, etc. so we could check it? Do you have capture files?

2. It seems that in your case you were limited by NIC and not by CPU (~40gb per dual XL710) more ports could have help. In our lab we can reach 160gb/sec with ASTF profiles

3. IPv6/IPv4 emulation layer with multi-clients are supported now using Linux network namespace implementation for both STL and ASTF (same API)

4. We are working to add IGMPv3/MLDv2 and more scalable implementation for IPv6-ND/ARP that would scale to M of clients (Linux namespace takes time to create)

thanks

Hanoh

--
You received this message because you are subscribed to the Google Groups "TRex Traffic Generator" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trex-tgn+u...@googlegroups.com.
To post to this group, send email to trex...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trex-tgn/0db96661-cc36-77e3-bb87-7b77bda41e5d%40isarnet.de.
For more options, visit https://groups.google.com/d/optout.

Hanoh

Sent from my iPhone

Reply all

Reply to author

Forward

0 new messages