Hi all,
I have been working on getting Seastar HTTPD (using DPDK) running on AWS so that I can do some comparative performance analysis with and without DPDK/kernel-bypass.
After a number of false starts I finally have it running. Below I share some highlights of what it took, and I have a few follow up questions at the end. FYI I am new to both Seastar and DPDK.
- I am using the Fedora 34 Linux AMI (Fedora-Cloud-Base-34-1.2.x86_64-hvm-us-east-2-gp2-0/ami-04d6c97822332a0a6 in us-east-2 FYI)
- The VFIO patch to enable CONFIG_VFIO_NOIOMMU by default (https://bugzilla.redhat.com/show_bug.cgi?id=2030856) has now made its way into the stable updates channel, so you only need to run sudo dnf update kernel to update to 5.15.8 or newer.
- I modified net/dpdk.cc to set the default_ring_size to 1024 based on this issue/comment: https://github.com/scylladb/seastar/issues/654#issuecomment-504794262.
- I applied the Amazon's DPDK specific ENA patches (https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk) before building Seastar/DPDK. Without the patches, the Seastar DHCP client times out. Note that the patches didn't apply cleanly, so I filed an issue (https://github.com/amzn/amzn-drivers/issues/199).
- I am using the primary ENI (eth0) for SSH access and a secondary ENI (eth1) for DPDK
- On the client side I used twrk (wrk with a few small modifications https://github.com/talawahtech/wrk/commits/twrk) to run some performance tests using the following command:
Linux/POSIX networking stack results
------------------------------------------------------------Running 5s test @ http://172.31.6.58:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 1.12ms 152.94us 2.64ms 462.00us 76.92% Req/Sec 14.33k 146.61 14.72k 13.87k 69.13% Latency Distribution 50.00% 1.12ms 90.00% 1.29ms 99.00% 1.56ms 99.99% 2.02ms 1141043 requests in 5.00s, 149.08MB read Requests/sec: 228205.82
DPDK networking stack results------------------------------------------------------------Running 5s test @ http://172.31.4.185:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 550.00us 127.19us 1.40ms 71.00us 77.92% Req/Sec 28.97k 270.33 29.62k 28.17k 65.82% Latency Distribution 50.00% 538.00us 90.00% 754.00us 99.00% 0.90ms 99.99% 1.09ms 2306400 requests in 5.00s, 301.34MB read Requests/sec: 461272.90
FYI, I am seeing some random variance in performance between 390k req/s and 460k req/s across instance stop/starts. My first hunch is that it has to do with the network queues/RSS hashing.
The metrics will provide a wealth of information. Start with
reactor utilitzation (make sure all shards are fully loaded) and
tasks/sec. Also check connections/shard to see if you have good
distribution.
Questions----------------1. Should I file an issue about making default_ring_size configurable?
Patches are better than issues, and having the system decide by
itself (if it can) is better than configuration.
2. I am having trouble finding an equivalent replacement for ethtool for things like checking stats like per-queue packet counts or modifying the RSS indirection table. I tried sending some commands via testpmd, but most come back as "not supported" or throw an error. Any suggestions?
I'm not knowledgeable enough. Won't testpmd fail because the device is attached to the Seastar process? Note I have no idea what testpmd is.
For statistics, we can export the important ones via the Seastar
metrics interface.
3. Given that the DPDK PMD does busy polling, CPU usage is always 100% Is there another quick recommended way to assess workload distribution and core utilization.
As mentioned above, the metrics. httpd even launches a metrics
provider, so you can set up Prometheus on the loader machine and
scrape it. It provides a huge number of metrics, both
reactor-level metrics and httpd-level metrics, all per shard.
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/f69df48e-f3d2-48e6-86aa-168e4a34e249n%40googlegroups.com.
On 12/22/21 04:16, Marc Richards wrote:
Hi all,
I have been working on getting Seastar HTTPD (using DPDK) running on AWS so that I can do some comparative performance analysis with and without DPDK/kernel-bypass.
After a number of false starts I finally have it running. Below I share some highlights of what it took, and I have a few follow up questions at the end. FYI I am new to both Seastar and DPDK.
- I am using the Fedora 34 Linux AMI (Fedora-Cloud-Base-34-1.2.x86_64-hvm-us-east-2-gp2-0/ami-04d6c97822332a0a6 in us-east-2 FYI)
- The VFIO patch to enable CONFIG_VFIO_NOIOMMU by default (https://bugzilla.redhat.com/show_bug.cgi?id=2030856) has now made its way into the stable updates channel, so you only need to run sudo dnf update kernel to update to 5.15.8 or newer.
- I modified net/dpdk.cc to set the default_ring_size to 1024 based on this issue/comment: https://github.com/scylladb/seastar/issues/654#issuecomment-504794262.
- I applied the Amazon's DPDK specific ENA patches (https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk) before building Seastar/DPDK. Without the patches, the Seastar DHCP client times out. Note that the patches didn't apply cleanly, so I filed an issue (https://github.com/amzn/amzn-drivers/issues/199).
- I am using the primary ENI (eth0) for SSH access and a secondary ENI (eth1) for DPDK
- On the client side I used twrk (wrk with a few small modifications https://github.com/talawahtech/wrk/commits/twrk) to run some performance tests using the following command:
--
Linux/POSIX networking stack results
------------------------------------------------------------Running 5s test @ http://172.31.6.58:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 1.12ms 152.94us 2.64ms 462.00us 76.92% Req/Sec 14.33k 146.61 14.72k 13.87k 69.13% Latency Distribution 50.00% 1.12ms 90.00% 1.29ms 99.00% 1.56ms 99.99% 2.02ms 1141043 requests in 5.00s, 149.08MB read Requests/sec: 228205.82
DPDK networking stack results------------------------------------------------------------Running 5s test @ http://172.31.4.185:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 550.00us 127.19us 1.40ms 71.00us 77.92% Req/Sec 28.97k 270.33 29.62k 28.17k 65.82% Latency Distribution 50.00% 538.00us 90.00% 754.00us 99.00% 0.90ms 99.99% 1.09ms 2306400 requests in 5.00s, 301.34MB read Requests/sec: 461272.90
FYI, I am seeing some random variance in performance between 390k req/s and 460k req/s across instance stop/starts. My first hunch is that it has to do with the network queues/RSS hashing.
The metrics will provide a wealth of information. Start with reactor utilitzation (make sure all shards are fully loaded) and tasks/sec. Also check connections/shard to see if you have good distribution.
Questions----------------1. Should I file an issue about making default_ring_size configurable?
Patches are better than issues, and having the system decide by itself (if it can) is better than configuration.
2. I am having trouble finding an equivalent replacement for ethtool for things like checking stats like per-queue packet counts or modifying the RSS indirection table. I tried sending some commands via testpmd, but most come back as "not supported" or throw an error. Any suggestions?
I'm not knowledgeable enough. Won't testpmd fail because the device is attached to the Seastar process? Note I have no idea what testpmd is.
For statistics, we can export the important ones via the Seastar metrics interface.
3. Given that the DPDK PMD does busy polling, CPU usage is always 100% Is there another quick recommended way to assess workload distribution and core utilization.
As mentioned above, the metrics. httpd even launches a metrics provider, so you can set up Prometheus on the loader machine and scrape it. It provides a huge number of metrics, both reactor-level metrics and httpd-level metrics, all per shard.
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/f69df48e-f3d2-48e6-86aa-168e4a34e249n%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/e471006c-e2e2-81c0-27e1-1b7addc2c7d8%40scylladb.com.
On 12/22/21 04:16, Marc Richards wrote:
Hi all,
I have been working on getting Seastar HTTPD (using DPDK) running on AWS so that I can do some comparative performance analysis with and without DPDK/kernel-bypass.
After a number of false starts I finally have it running. Below I share some highlights of what it took, and I have a few follow up questions at the end. FYI I am new to both Seastar and DPDK.
- I am using the Fedora 34 Linux AMI (Fedora-Cloud-Base-34-1.2.x86_64-hvm-us-east-2-gp2-0/ami-04d6c97822332a0a6 in us-east-2 FYI)
- The VFIO patch to enable CONFIG_VFIO_NOIOMMU by default (https://bugzilla.redhat.com/show_bug.cgi?id=2030856) has now made its way into the stable updates channel, so you only need to run sudo dnf update kernel to update to 5.15.8 or newer.
- I modified net/dpdk.cc to set the default_ring_size to 1024 based on this issue/comment: https://github.com/scylladb/seastar/issues/654#issuecomment-504794262.
- I applied the Amazon's DPDK specific ENA patches (https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk) before building Seastar/DPDK. Without the patches, the Seastar DHCP client times out. Note that the patches didn't apply cleanly, so I filed an issue (https://github.com/amzn/amzn-drivers/issues/199).
- I am using the primary ENI (eth0) for SSH access and a secondary ENI (eth1) for DPDK
- On the client side I used twrk (wrk with a few small modifications https://github.com/talawahtech/wrk/commits/twrk) to run some performance tests using the following command:
Linux/POSIX networking stack results
------------------------------------------------------------Running 5s test @ http://172.31.6.58:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 1.12ms 152.94us 2.64ms 462.00us 76.92% Req/Sec 14.33k 146.61 14.72k 13.87k 69.13% Latency Distribution 50.00% 1.12ms 90.00% 1.29ms 99.00% 1.56ms 99.99% 2.02ms 1141043 requests in 5.00s, 149.08MB read Requests/sec: 228205.82
DPDK networking stack results------------------------------------------------------------Running 5s test @ http://172.31.4.185:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 550.00us 127.19us 1.40ms 71.00us 77.92% Req/Sec 28.97k 270.33 29.62k 28.17k 65.82% Latency Distribution 50.00% 538.00us 90.00% 754.00us 99.00% 0.90ms 99.99% 1.09ms 2306400 requests in 5.00s, 301.34MB read Requests/sec: 461272.90
FYI, I am seeing some random variance in performance between 390k req/s and 460k req/s across instance stop/starts. My first hunch is that it has to do with the network queues/RSS hashing.
The metrics will provide a wealth of information. Start with reactor utilitzation (make sure all shards are fully loaded) and tasks/sec. Also check connections/shard to see if you have good distribution.
Questions----------------1. Should I file an issue about making default_ring_size configurable?
Patches are better than issues, and having the system decide by itself (if it can) is better than configuration.
2. I am having trouble finding an equivalent replacement for ethtool for things like checking stats like per-queue packet counts or modifying the RSS indirection table. I tried sending some commands via testpmd, but most come back as "not supported" or throw an error. Any suggestions?
I'm not knowledgeable enough. Won't testpmd fail because the device is attached to the Seastar process? Note I have no idea what testpmd is.
For statistics, we can export the important ones via the Seastar metrics interface.
3. Given that the DPDK PMD does busy polling, CPU usage is always 100% Is there another quick recommended way to assess workload distribution and core utilization.
As mentioned above, the metrics. httpd even launches a metrics provider, so you can set up Prometheus on the loader machine and scrape it. It provides a huge number of metrics, both reactor-level metrics and httpd-level metrics, all per shard.
On Wed, Dec 22, 2021 at 12:14 PM Avi Kivity <a...@scylladb.com> wrote:On 12/22/21 04:16, Marc Richards wrote:
Hi all,
I have been working on getting Seastar HTTPD (using DPDK) running on AWS so that I can do some comparative performance analysis with and without DPDK/kernel-bypass.
After a number of false starts I finally have it running. Below I share some highlights of what it took, and I have a few follow up questions at the end. FYI I am new to both Seastar and DPDK.
- I am using the Fedora 34 Linux AMI (Fedora-Cloud-Base-34-1.2.x86_64-hvm-us-east-2-gp2-0/ami-04d6c97822332a0a6 in us-east-2 FYI)
- The VFIO patch to enable CONFIG_VFIO_NOIOMMU by default (https://bugzilla.redhat.com/show_bug.cgi?id=2030856) has now made its way into the stable updates channel, so you only need to run sudo dnf update kernel to update to 5.15.8 or newer.
- I modified net/dpdk.cc to set the default_ring_size to 1024 based on this issue/comment: https://github.com/scylladb/seastar/issues/654#issuecomment-504794262.
- I applied the Amazon's DPDK specific ENA patches (https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk) before building Seastar/DPDK. Without the patches, the Seastar DHCP client times out. Note that the patches didn't apply cleanly, so I filed an issue (https://github.com/amzn/amzn-drivers/issues/199).
- I am using the primary ENI (eth0) for SSH access and a secondary ENI (eth1) for DPDK
- On the client side I used twrk (wrk with a few small modifications https://github.com/talawahtech/wrk/commits/twrk) to run some performance tests using the following command:Cheers for managing to set all of it up, it's not simple at all.
Long ago, when we tested seastar with DPDK, we discovered that wrk itselfis a wreck (ok, I'm just kidding), but it didn't scale and couldn't satorate seastar,so we developed seawrek - https://github.com/scylladb/seastar/tree/master/apps/seawreckYou should transition to it once you max out with twrk
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/8e1e9dd6-da8b-41ce-9f78-cea1577a7c6bn%40googlegroups.com.
This doesn't smell like 100%. I encourage you to use Prometheus
to see the fuller picture, and use irate(reactor_runtime_ms)
instead of utilization (which is an instantaneous sample).
httpd_connections_total{service="http-0",shard="0"} 3
httpd_connections_total{service="http-0",shard="1"} 5
httpd_connections_total{service="http-0",shard="2"} 4
httpd_connections_total{service="http-0",shard="3"} 5
Seastar likes more connections, this causes uneven load as you
can see above. Add more connections (at least 10X).
It's advancing the 10ms lowres_clock.
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/82aeb488-0d04-4391-96a0-a3879af681c8n%40googlegroups.com.
httpd_connections_total{service="http-0",shard="0"} 3
httpd_connections_total{service="http-0",shard="1"} 5
httpd_connections_total{service="http-0",shard="2"} 4
httpd_connections_total{service="http-0",shard="3"} 5
Seastar likes more connections, this causes uneven load as you can see above. Add more connections (at least 10X).
Hi,
> I did some more digging a found the bug in the ENA driver. There is some erroneous logic the resets the ring size to 8192 (max size for the ENA rx queue) if the requested sizes happens to be 512
> (RTE_ETH_DEV_FALLBACK_RX_RINGSIZE). This issues was fixed in the DPDK codebase last year (https://github.com/DPDK/dpdk/commit/30a6c7ef4054), as you can see it is just a matter of removing the relevant if
> conditions. Backporting that change to the seastar/dpdk fork should be straightforward. I can open a PR.
There was a recent effort by Kefu Chai to integrate a much more recent DPDK, with the most recent attempt being
https://github.com/scylladb/seastar/commit/1ec9063549bf95f4dddd800ab0066576e805475f
It was reverted (if I am not wrong) for a small compile time issue.
Basically, from DPDK’s pkg-config (when using recent version compiled with meson) a big .o is built to merge all .a and is then linked statically when building libseastar.a . The missing part is that depending on the compile machine, DPDK may also link dynamically to libmlx5, libbsd, libpcap, … and the above commit does not “parse” this part of the pkg-config to forward it to Cmake & seastar’s pkg-config causing build issue for users of libseastar. DPDK 21.08 was working without issue, except that issue with cmake/pkg-config that failed to automatically configure/get proper libraries.
Some detail in the discussion there : https://groups.google.com/g/seastar-dev/c/OnNKAoQoId0/m/jhXJBb81AgAJ
If this patch, supporting “meson” built DPDK is reapplied, then upgrading to recent DPDK should be more straightforward.
Best regards
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/35652694-5cc9-469a-b285-eeab6b23720fn%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "seastar-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/seastar-dev/FdOgDSry9n4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/89f8cb38-df75-4602-a5f4-8cbb250a8468n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/CAC9bR%3DaF5t5gYmAQ9GD2U6iOZ4i-LODFKg3%2BPsph%3Dd9YAcEviA%40mail.gmail.com.