Seastar HTTPD (using DPDK) on AWS

679 views
Skip to first unread message

Marc Richards

<marc@talawah.net>
unread,
Dec 21, 2021, 9:16:23 PM12/21/21
to seastar-dev
Hi all,

I have been working on getting Seastar HTTPD (using DPDK) running on AWS so that I can do some comparative performance analysis with and without DPDK/kernel-bypass. 

After a number of false starts I finally have it running. Below I share some highlights of what it took, and I have a few follow up questions at the end. FYI I am new to both Seastar and DPDK.

- I am using the Fedora 34 Linux AMI (Fedora-Cloud-Base-34-1.2.x86_64-hvm-us-east-2-gp2-0/ami-04d6c97822332a0a6 in us-east-2 FYI)

- The VFIO patch to enable CONFIG_VFIO_NOIOMMU by default (https://bugzilla.redhat.com/show_bug.cgi?id=2030856) has now made its way into the stable updates channel, so you only need to run sudo dnf update kernel to update to 5.15.8 or newer.

- I modified net/dpdk.cc to set the default_ring_size to 1024 based on this issue/comment: https://github.com/scylladb/seastar/issues/654#issuecomment-504794262.

- I applied the Amazon's DPDK specific ENA patches (https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk) before building Seastar/DPDK. Without the patches, the Seastar DHCP client times out. Note that the patches didn't apply cleanly, so I filed an issue (https://github.com/amzn/amzn-drivers/issues/199).

- I am using the primary ENI (eth0) for SSH access and a secondary ENI (eth1) for DPDK

- On the client side I used twrk (wrk with a few small modifications https://github.com/talawahtech/wrk/commits/twrk) to run some performance tests using the following command: 

twrk --latency --pin-cpus "http://172.31.4.185:8080/" -t 16 -c 256 -D 1 -d 5

Linux/POSIX networking stack results
------------------------------------------------------------
Running 5s test @ http://172.31.6.58:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 1.12ms 152.94us 2.64ms 462.00us 76.92% Req/Sec 14.33k 146.61 14.72k 13.87k 69.13% Latency Distribution 50.00% 1.12ms 90.00% 1.29ms 99.00% 1.56ms 99.99% 2.02ms 1141043 requests in 5.00s, 149.08MB read Requests/sec: 228205.82

DPDK networking stack results
------------------------------------------------------------
Running 5s test @ http://172.31.4.185:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 550.00us 127.19us 1.40ms 71.00us 77.92% Req/Sec 28.97k 270.33 29.62k 28.17k 65.82% Latency Distribution 50.00% 538.00us 90.00% 754.00us 99.00% 0.90ms 99.99% 1.09ms 2306400 requests in 5.00s, 301.34MB read Requests/sec: 461272.90

FYI, I am seeing some random variance in performance between 390k req/s and 460k req/s across instance stop/starts. My first hunch is that it has to do with the network queues/RSS hashing.


Questions
----------------
1. Should I file an issue about making default_ring_size configurable?

2. I am having trouble finding an equivalent replacement for ethtool for things like checking stats like per-queue packet counts or modifying the RSS indirection table. I tried sending some commands via testpmd, but most come back as "not supported" or throw an error. Any suggestions?

3. Given that the DPDK PMD does busy polling, CPU usage is always 100% Is there another quick recommended way to assess workload distribution and core utilization.

Avi Kivity

<avi@scylladb.com>
unread,
Dec 22, 2021, 5:14:00 AM12/22/21
to Marc Richards, seastar-dev
On 12/22/21 04:16, Marc Richards wrote:
Hi all,

I have been working on getting Seastar HTTPD (using DPDK) running on AWS so that I can do some comparative performance analysis with and without DPDK/kernel-bypass. 

After a number of false starts I finally have it running. Below I share some highlights of what it took, and I have a few follow up questions at the end. FYI I am new to both Seastar and DPDK.

- I am using the Fedora 34 Linux AMI (Fedora-Cloud-Base-34-1.2.x86_64-hvm-us-east-2-gp2-0/ami-04d6c97822332a0a6 in us-east-2 FYI)

- The VFIO patch to enable CONFIG_VFIO_NOIOMMU by default (https://bugzilla.redhat.com/show_bug.cgi?id=2030856) has now made its way into the stable updates channel, so you only need to run sudo dnf update kernel to update to 5.15.8 or newer.

- I modified net/dpdk.cc to set the default_ring_size to 1024 based on this issue/comment: https://github.com/scylladb/seastar/issues/654#issuecomment-504794262.

- I applied the Amazon's DPDK specific ENA patches (https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk) before building Seastar/DPDK. Without the patches, the Seastar DHCP client times out. Note that the patches didn't apply cleanly, so I filed an issue (https://github.com/amzn/amzn-drivers/issues/199).

- I am using the primary ENI (eth0) for SSH access and a secondary ENI (eth1) for DPDK

- On the client side I used twrk (wrk with a few small modifications https://github.com/talawahtech/wrk/commits/twrk) to run some performance tests using the following command: 
twrk --latency --pin-cpus "http://172.31.4.185:8080/" -t 16 -c 256 -D 1 -d 5

Linux/POSIX networking stack results
------------------------------------------------------------
Running 5s test @ http://172.31.6.58:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 1.12ms 152.94us 2.64ms 462.00us 76.92% Req/Sec 14.33k 146.61 14.72k 13.87k 69.13% Latency Distribution 50.00% 1.12ms 90.00% 1.29ms 99.00% 1.56ms 99.99% 2.02ms 1141043 requests in 5.00s, 149.08MB read Requests/sec: 228205.82

DPDK networking stack results
------------------------------------------------------------
Running 5s test @ http://172.31.4.185:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 550.00us 127.19us 1.40ms 71.00us 77.92% Req/Sec 28.97k 270.33 29.62k 28.17k 65.82% Latency Distribution 50.00% 538.00us 90.00% 754.00us 99.00% 0.90ms 99.99% 1.09ms 2306400 requests in 5.00s, 301.34MB read Requests/sec: 461272.90

FYI, I am seeing some random variance in performance between 390k req/s and 460k req/s across instance stop/starts. My first hunch is that it has to do with the network queues/RSS hashing.



The metrics will provide a wealth of information. Start with reactor utilitzation (make sure all shards are fully loaded) and tasks/sec. Also check connections/shard to see if you have good distribution.


Questions
----------------
1. Should I file an issue about making default_ring_size configurable?


Patches are better than issues, and having the system decide by itself (if it can) is better than configuration.


2. I am having trouble finding an equivalent replacement for ethtool for things like checking stats like per-queue packet counts or modifying the RSS indirection table. I tried sending some commands via testpmd, but most come back as "not supported" or throw an error. Any suggestions?


I'm not knowledgeable enough. Won't testpmd fail because the device is attached to the Seastar process? Note I have no idea what testpmd is.


For statistics, we can export the important ones via the Seastar metrics interface.


3. Given that the DPDK PMD does busy polling, CPU usage is always 100% Is there another quick recommended way to assess workload distribution and core utilization.


As mentioned above, the metrics. httpd even launches a metrics provider, so you can set up Prometheus on the loader machine and scrape it. It provides a huge number of metrics, both reactor-level metrics and httpd-level metrics, all per shard.


--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/f69df48e-f3d2-48e6-86aa-168e4a34e249n%40googlegroups.com.


Dor Laor

<dor@scylladb.com>
unread,
Dec 22, 2021, 5:20:25 AM12/22/21
to Avi Kivity, Marc Richards, seastar-dev
On Wed, Dec 22, 2021 at 12:14 PM Avi Kivity <a...@scylladb.com> wrote:
On 12/22/21 04:16, Marc Richards wrote:
Hi all,

I have been working on getting Seastar HTTPD (using DPDK) running on AWS so that I can do some comparative performance analysis with and without DPDK/kernel-bypass. 

After a number of false starts I finally have it running. Below I share some highlights of what it took, and I have a few follow up questions at the end. FYI I am new to both Seastar and DPDK.

- I am using the Fedora 34 Linux AMI (Fedora-Cloud-Base-34-1.2.x86_64-hvm-us-east-2-gp2-0/ami-04d6c97822332a0a6 in us-east-2 FYI)

- The VFIO patch to enable CONFIG_VFIO_NOIOMMU by default (https://bugzilla.redhat.com/show_bug.cgi?id=2030856) has now made its way into the stable updates channel, so you only need to run sudo dnf update kernel to update to 5.15.8 or newer.

- I modified net/dpdk.cc to set the default_ring_size to 1024 based on this issue/comment: https://github.com/scylladb/seastar/issues/654#issuecomment-504794262.

- I applied the Amazon's DPDK specific ENA patches (https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk) before building Seastar/DPDK. Without the patches, the Seastar DHCP client times out. Note that the patches didn't apply cleanly, so I filed an issue (https://github.com/amzn/amzn-drivers/issues/199).

- I am using the primary ENI (eth0) for SSH access and a secondary ENI (eth1) for DPDK

- On the client side I used twrk (wrk with a few small modifications https://github.com/talawahtech/wrk/commits/twrk) to run some performance tests using the following command: 

Cheers for managing to set all of it up, it's not simple at all. 
twrk --latency --pin-cpus "http://172.31.4.185:8080/" -t 16 -c 256 -D 1 -d 5
Long ago, when we tested seastar with DPDK, we discovered that wrk itself
is a wreck (ok, I'm just kidding), but it didn't scale and couldn't satorate seastar,
You should transition to it once you max out with twrk
 

Linux/POSIX networking stack results
------------------------------------------------------------
Running 5s test @ http://172.31.6.58:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 1.12ms 152.94us 2.64ms 462.00us 76.92% Req/Sec 14.33k 146.61 14.72k 13.87k 69.13% Latency Distribution 50.00% 1.12ms 90.00% 1.29ms 99.00% 1.56ms 99.99% 2.02ms 1141043 requests in 5.00s, 149.08MB read Requests/sec: 228205.82

DPDK networking stack results
------------------------------------------------------------
Running 5s test @ http://172.31.4.185:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 550.00us 127.19us 1.40ms 71.00us 77.92% Req/Sec 28.97k 270.33 29.62k 28.17k 65.82% Latency Distribution 50.00% 538.00us 90.00% 754.00us 99.00% 0.90ms 99.99% 1.09ms 2306400 requests in 5.00s, 301.34MB read Requests/sec: 461272.90

FYI, I am seeing some random variance in performance between 390k req/s and 460k req/s across instance stop/starts. My first hunch is that it has to do with the network queues/RSS hashing.



The metrics will provide a wealth of information. Start with reactor utilitzation (make sure all shards are fully loaded) and tasks/sec. Also check connections/shard to see if you have good distribution.


Questions
----------------
1. Should I file an issue about making default_ring_size configurable?


Patches are better than issues, and having the system decide by itself (if it can) is better than configuration.


2. I am having trouble finding an equivalent replacement for ethtool for things like checking stats like per-queue packet counts or modifying the RSS indirection table. I tried sending some commands via testpmd, but most come back as "not supported" or throw an error. Any suggestions?


I'm not knowledgeable enough. Won't testpmd fail because the device is attached to the Seastar process? Note I have no idea what testpmd is.


For statistics, we can export the important ones via the Seastar metrics interface.


3. Given that the DPDK PMD does busy polling, CPU usage is always 100% Is there another quick recommended way to assess workload distribution and core utilization.


As mentioned above, the metrics. httpd even launches a metrics provider, so you can set up Prometheus on the loader machine and scrape it. It provides a huge number of metrics, both reactor-level metrics and httpd-level metrics, all per shard.


--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/f69df48e-f3d2-48e6-86aa-168e4a34e249n%40googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.

Marc Richards

<marc@talawah.net>
unread,
Dec 22, 2021, 7:43:05 PM12/22/21
to seastar-dev
On Wednesday, December 22, 2021 at 5:14:00 AM UTC-5 Avi Kivity wrote:
On 12/22/21 04:16, Marc Richards wrote:
Hi all,

I have been working on getting Seastar HTTPD (using DPDK) running on AWS so that I can do some comparative performance analysis with and without DPDK/kernel-bypass. 

After a number of false starts I finally have it running. Below I share some highlights of what it took, and I have a few follow up questions at the end. FYI I am new to both Seastar and DPDK.

- I am using the Fedora 34 Linux AMI (Fedora-Cloud-Base-34-1.2.x86_64-hvm-us-east-2-gp2-0/ami-04d6c97822332a0a6 in us-east-2 FYI)

- The VFIO patch to enable CONFIG_VFIO_NOIOMMU by default (https://bugzilla.redhat.com/show_bug.cgi?id=2030856) has now made its way into the stable updates channel, so you only need to run sudo dnf update kernel to update to 5.15.8 or newer.

- I modified net/dpdk.cc to set the default_ring_size to 1024 based on this issue/comment: https://github.com/scylladb/seastar/issues/654#issuecomment-504794262.

- I applied the Amazon's DPDK specific ENA patches (https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk) before building Seastar/DPDK. Without the patches, the Seastar DHCP client times out. Note that the patches didn't apply cleanly, so I filed an issue (https://github.com/amzn/amzn-drivers/issues/199).

- I am using the primary ENI (eth0) for SSH access and a secondary ENI (eth1) for DPDK

- On the client side I used twrk (wrk with a few small modifications https://github.com/talawahtech/wrk/commits/twrk) to run some performance tests using the following command: 
twrk --latency --pin-cpus "http://172.31.4.185:8080/" -t 16 -c 256 -D 1 -d 5

Linux/POSIX networking stack results
------------------------------------------------------------
Running 5s test @ http://172.31.6.58:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 1.12ms 152.94us 2.64ms 462.00us 76.92% Req/Sec 14.33k 146.61 14.72k 13.87k 69.13% Latency Distribution 50.00% 1.12ms 90.00% 1.29ms 99.00% 1.56ms 99.99% 2.02ms 1141043 requests in 5.00s, 149.08MB read Requests/sec: 228205.82

DPDK networking stack results
------------------------------------------------------------
Running 5s test @ http://172.31.4.185:8080/ 16 threads and 256 connections Thread Stats Avg Stdev Max Min +/- Stdev Latency 550.00us 127.19us 1.40ms 71.00us 77.92% Req/Sec 28.97k 270.33 29.62k 28.17k 65.82% Latency Distribution 50.00% 538.00us 90.00% 754.00us 99.00% 0.90ms 99.99% 1.09ms 2306400 requests in 5.00s, 301.34MB read Requests/sec: 461272.90

FYI, I am seeing some random variance in performance between 390k req/s and 460k req/s across instance stop/starts. My first hunch is that it has to do with the network queues/RSS hashing.



The metrics will provide a wealth of information. Start with reactor utilitzation (make sure all shards are fully loaded) and tasks/sec. Also check connections/shard to see if you have good distribution.


Questions
----------------
1. Should I file an issue about making default_ring_size configurable?


Patches are better than issues, and having the system decide by itself (if it can) is better than configuration.


Ok, I will do some more investigation and try to figure out the best approach. It is also possible that this is really just an ENA bug.
 
2. I am having trouble finding an equivalent replacement for ethtool for things like checking stats like per-queue packet counts or modifying the RSS indirection table. I tried sending some commands via testpmd, but most come back as "not supported" or throw an error. Any suggestions?


I'm not knowledgeable enough. Won't testpmd fail because the device is attached to the Seastar process? Note I have no idea what testpmd is.


Testpmd is a DPDK app includes an interactive CLI that lets you modify properties of the device. You are correct that it cannot be run at the same time as Seastar, but the changes should persist. I have been able to use it to change the MTU so far, but nothing else. I will follow up with the ENA team to get some more clarity on what is supported.
 

For statistics, we can export the important ones via the Seastar metrics interface.


3. Given that the DPDK PMD does busy polling, CPU usage is always 100% Is there another quick recommended way to assess workload distribution and core utilization.


As mentioned above, the metrics. httpd even launches a metrics provider, so you can set up Prometheus on the loader machine and scrape it. It provides a huge number of metrics, both reactor-level metrics and httpd-level metrics, all per shard.


I ran some benchmarks (1 sec warmup + 10 sec test) to evaluate the server running fully-loaded, partially-loaded and idle. Based on the results below the workload seems to be distributed pretty evenly and utilization is near 100% when fully loaded. I just ran `curl -s http://172.31.10.95:9180/metrics > metrics` to grab the data. 

256 connections, 10s run, right at startup
---------------------------------------------
twrk --latency --pin-cpus "http://172.31.10.95:8080/" -t 16 -c 256 -D 1 -d 10

Running 10s test @ http://172.31.10.95:8080/

  16 threads and 256 connections
  Thread Stats   Avg     Stdev       Max       Min   +/- Stdev
    Latency   575.70us  151.29us    2.83ms   58.00us   73.46%
    Req/Sec    27.70k   810.19     29.17k    25.59k    58.46%
  Latency Distribution
  50.00%  557.00us
  90.00%  813.00us
  99.00%    0.94ms
  99.99%    1.16ms
  4410352 requests in 10.00s, 576.23MB read
Requests/sec: 441032.47

reactor_utilization{shard="0"} 99.809030
reactor_utilization{shard="1"} 98.438314
reactor_utilization{shard="2"} 98.344491
reactor_utilization{shard="3"} 98.472967

httpd_connections_total{service="http-0",shard="0"} 62
httpd_connections_total{service="http-0",shard="1"} 65
httpd_connections_total{service="http-0",shard="2"} 65
httpd_connections_total{service="http-0",shard="3"} 65

httpd_requests_served{service="http-0",shard="0"} 1209776
httpd_requests_served{service="http-0",shard="1"} 1193325
httpd_requests_served{service="http-0",shard="2"} 1205591
httpd_requests_served{service="http-0",shard="3"} 1187394

network_port0_queue0_rx_packets{shard="0"} 1210043
network_port0_queue1_rx_packets{shard="1"} 1193602
network_port0_queue2_rx_packets{shard="2"} 1205872
network_port0_queue3_rx_packets{shard="3"} 1187683

network_port0_queue0_tx_packets{shard="0"} 1209847
network_port0_queue1_tx_packets{shard="1"} 1193393
network_port0_queue2_tx_packets{shard="2"} 1205665
network_port0_queue3_tx_packets{shard="3"} 1187477

reactor_polls{shard="0"} 534243
reactor_polls{shard="1"} 647806
reactor_polls{shard="2"} 717108
reactor_polls{shard="3"} 622031


16 connections, 10s run, right at startup
-----------------------------------------
 twrk --latency --pin-cpus "http://172.31.10.95:8080/" -t 16 -c 16 -D 1 -d 10

Running 10s test @ http://172.31.10.95:8080/
  16 threads and 16 connections

  Thread Stats   Avg     Stdev       Max       Min   +/- Stdev
    Latency    72.21us   12.05us    2.28ms   49.00us   82.96%
    Req/Sec    13.68k   554.05     15.61k    12.96k    81.25%
  Latency Distribution
  50.00%   70.00us
  90.00%   85.00us
  99.00%  103.00us
  99.99%  318.00us
  2177391 requests in 10.00s, 284.48MB read
Requests/sec: 217737.73

reactor_utilization{shard="0"} 50.408446
reactor_utilization{shard="1"} 77.313893
reactor_utilization{shard="2"} 50.812668
reactor_utilization{shard="3"} 76.445980

httpd_connections_total{service="http-0",shard="0"} 3
httpd_connections_total{service="http-0",shard="1"} 5
httpd_connections_total{service="http-0",shard="2"} 4
httpd_connections_total{service="http-0",shard="3"} 5

httpd_requests_served{service="http-0",shard="0"} 456425
httpd_requests_served{service="http-0",shard="1"} 743589
httpd_requests_served{service="http-0",shard="2"} 478118
httpd_requests_served{service="http-0",shard="3"} 739368

network_port0_queue0_rx_packets{shard="0"} 456443
network_port0_queue1_rx_packets{shard="1"} 743617
network_port0_queue2_rx_packets{shard="2"} 478137
network_port0_queue3_rx_packets{shard="3"} 739393

network_port0_queue0_tx_packets{shard="0"} 456434
network_port0_queue1_tx_packets{shard="1"} 743600
network_port0_queue2_tx_packets{shard="2"} 478126
network_port0_queue3_tx_packets{shard="3"} 739378

reactor_polls{shard="0"} 11887575
reactor_polls{shard="1"} 5691838
reactor_polls{shard="2"} 11369826
reactor_polls{shard="3"} 5897196


Idle for 10s at startup
-----------------------
reactor_utilization{shard="0"} 0.025414
reactor_utilization{shard="1"} 0.000000
reactor_utilization{shard="2"} 0.000000
reactor_utilization{shard="3"} 0.000000

httpd_connections_total{service="http-0",shard="0"} 0
httpd_connections_total{service="http-0",shard="1"} 0
httpd_connections_total{service="http-0",shard="2"} 0
httpd_connections_total{service="http-0",shard="3"} 0

httpd_requests_served{service="http-0",shard="0"} 0
httpd_requests_served{service="http-0",shard="1"} 0
httpd_requests_served{service="http-0",shard="2"} 0
httpd_requests_served{service="http-0",shard="3"} 0

network_port0_queue0_rx_packets{shard="0"} 2
network_port0_queue1_rx_packets{shard="1"} 0
network_port0_queue2_rx_packets{shard="2"} 0
network_port0_queue3_rx_packets{shard="3"} 3

network_port0_queue0_tx_packets{shard="0"} 2
network_port0_queue1_tx_packets{shard="1"} 0
network_port0_queue2_tx_packets{shard="2"} 0
network_port0_queue3_tx_packets{shard="3"} 1

reactor_polls{shard="0"} 2090164
reactor_polls{shard="1"} 793098
reactor_polls{shard="2"} 792175
reactor_polls{shard="3"} 792418


The only real anomaly that I noticed is the one above where shard 0 polls a lot more than the others when the system is idle.


 

Marc Richards

<marc@talawah.net>
unread,
Dec 22, 2021, 7:46:50 PM12/22/21
to seastar-dev
On Wednesday, December 22, 2021 at 5:20:25 AM UTC-5 Dor Laor wrote:
On Wed, Dec 22, 2021 at 12:14 PM Avi Kivity <a...@scylladb.com> wrote:
On 12/22/21 04:16, Marc Richards wrote:
Hi all,

I have been working on getting Seastar HTTPD (using DPDK) running on AWS so that I can do some comparative performance analysis with and without DPDK/kernel-bypass. 

After a number of false starts I finally have it running. Below I share some highlights of what it took, and I have a few follow up questions at the end. FYI I am new to both Seastar and DPDK.

- I am using the Fedora 34 Linux AMI (Fedora-Cloud-Base-34-1.2.x86_64-hvm-us-east-2-gp2-0/ami-04d6c97822332a0a6 in us-east-2 FYI)

- The VFIO patch to enable CONFIG_VFIO_NOIOMMU by default (https://bugzilla.redhat.com/show_bug.cgi?id=2030856) has now made its way into the stable updates channel, so you only need to run sudo dnf update kernel to update to 5.15.8 or newer.

- I modified net/dpdk.cc to set the default_ring_size to 1024 based on this issue/comment: https://github.com/scylladb/seastar/issues/654#issuecomment-504794262.

- I applied the Amazon's DPDK specific ENA patches (https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk) before building Seastar/DPDK. Without the patches, the Seastar DHCP client times out. Note that the patches didn't apply cleanly, so I filed an issue (https://github.com/amzn/amzn-drivers/issues/199).

- I am using the primary ENI (eth0) for SSH access and a secondary ENI (eth1) for DPDK

- On the client side I used twrk (wrk with a few small modifications https://github.com/talawahtech/wrk/commits/twrk) to run some performance tests using the following command: 

Cheers for managing to set all of it up, it's not simple at all. 

Yea, it was definitely trickier than I anticipated lol. 
twrk --latency --pin-cpus "http://172.31.4.185:8080/" -t 16 -c 256 -D 1 -d 5
Long ago, when we tested seastar with DPDK, we discovered that wrk itself
is a wreck (ok, I'm just kidding), but it didn't scale and couldn't satorate seastar,
You should transition to it once you max out with twrk

Yes, it is on my to-do list to test out seawreck, I just wanted to keep things simple on the client side to start. I am also using a larger instance on the client to make sure that twrk isn't the bottleneck.

 

Dor Laor

<dor@scylladb.com>
unread,
Dec 23, 2021, 4:36:23 AM12/23/21
to Marc Richards, seastar-dev
It's good to see the reactor gets near 100% utilization. Another mechanism
we use is perf with a per core (thread) analysis (-C flag) on the server. This way
we can see what dominates the CPU and realize if it's aligned with our expectations.

There should be a huge difference when you compare the core activity with the linux
tcp stack vs seastar

--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.

Avi Kivity

<avi@scylladb.com>
unread,
Dec 23, 2021, 5:04:00 AM12/23/21
to Marc Richards, seastar-dev

This doesn't smell like 100%. I encourage you to use Prometheus to see the fuller picture, and use irate(reactor_runtime_ms) instead of utilization (which is an instantaneous sample).


httpd_connections_total{service="http-0",shard="0"} 3
httpd_connections_total{service="http-0",shard="1"} 5
httpd_connections_total{service="http-0",shard="2"} 4
httpd_connections_total{service="http-0",shard="3"} 5


Seastar likes more connections, this causes uneven load as you can see above. Add more connections (at least 10X).

It's advancing the 10ms lowres_clock.


 
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.

Marc Richards

<marc@talawah.net>
unread,
Dec 23, 2021, 10:28:25 AM12/23/21
to seastar-dev
This was the partially-loaded example (sorry if that wasn't clear) I deliberately reduced the number of connection to 16 to demonstrate under-utilization.
 


httpd_connections_total{service="http-0",shard="0"} 3
httpd_connections_total{service="http-0",shard="1"} 5
httpd_connections_total{service="http-0",shard="2"} 4
httpd_connections_total{service="http-0",shard="3"} 5


Seastar likes more connections, this causes uneven load as you can see above. Add more connections (at least 10X).


Yes, this was deliberate. The full-utilization test-case uses 256 connections.
Ok cool. Good to confirm that this is normal.

Marc Richards

<marc@talawah.net>
unread,
Jan 5, 2022, 6:50:49 PM1/5/22
to seastar-dev
I did some more digging a found the bug in the ENA driver. There is some erroneous logic the resets the ring size to 8192 (max size for the ENA rx queue) if the requested sizes happens to be 512 (RTE_ETH_DEV_FALLBACK_RX_RINGSIZE). This issues was fixed in the DPDK codebase last year (https://github.com/DPDK/dpdk/commit/30a6c7ef4054), as you can see it is just a matter of removing the relevant if conditions. Backporting that change to the seastar/dpdk fork should be straightforward. I can open a PR.

However my tests also indicate that in practice, the default ring size of 512 that is currently hard-coded into net/dpdk.cc is suboptimal compared to the ENA driver default of 1024. When the value is set to 512, throughput drops by around 4% and p99 jumps form just under 1ms to over 200ms for my HTTPD test.

The same ENA driver patch that fixed the previous issue also exposes the ENA device's default ring size via dev_info->default_rxportconf.ring_size, which means that it should be possible to query the value (maybe in the init_port_start function) and use the result to set default_ring_size and mbufs_per_queue_rx dynamically instead of declaring them statically.

Of course this is a more significant change that may present other issues where backwards compatibility is concerned unless we also checked dev_info.driver_name and restricted the new logic to the ENA driver.

Nicolas Le Scouarnec

<Nicolas.LeScouarnec@broadpeak.tv>
unread,
Jan 6, 2022, 11:27:07 AM1/6/22
to Marc Richards, seastar-dev

Hi,

 

> I did some more digging a found the bug in the ENA driver. There is some erroneous logic the resets the ring size to 8192 (max size for the ENA rx queue) if the requested sizes happens to be 512

> (RTE_ETH_DEV_FALLBACK_RX_RINGSIZE). This issues was fixed in the DPDK codebase last year (https://github.com/DPDK/dpdk/commit/30a6c7ef4054), as you can see it is just a matter of removing the relevant if

> conditions. Backporting that change to the seastar/dpdk fork should be straightforward. I can open a PR.

 

There was a recent effort by Kefu Chai to integrate a much more recent DPDK, with the most recent attempt being  

https://github.com/scylladb/seastar/commit/1ec9063549bf95f4dddd800ab0066576e805475f

It was reverted (if I am not wrong) for a small compile time issue.

Basically, from DPDK’s pkg-config (when using recent version compiled with meson) a big .o is built to merge all .a and is then linked statically when building libseastar.a . The missing part is that depending on the compile machine, DPDK may also link dynamically to libmlx5,  libbsd, libpcap, … and the above commit does not “parse” this part of the pkg-config to forward it to Cmake & seastar’s pkg-config causing build issue for users of libseastar. DPDK 21.08 was working without issue, except that issue with cmake/pkg-config that failed to automatically configure/get proper libraries.

 

Some detail in the discussion there : https://groups.google.com/g/seastar-dev/c/OnNKAoQoId0/m/jhXJBb81AgAJ

 

If this patch, supporting “meson” built DPDK is reapplied, then upgrading to recent DPDK should be more straightforward.

 

Best regards

 

 

--

You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.

David Guo

<guo@taniustech.com>
unread,
Aug 31, 2022, 11:15:16 AM8/31/22
to seastar-dev
Marc, are you using VFIO driver? 
I experimented F-Stack and some other searches & it sounds to me they all recommended using igb_uio.ko.
 my performance test result on F-Stack is it's pretty good on TCP, but horrible on UDP(even worse than raw socket, maybe some parameters was setup wrong in config file).
Does Seastar have instruction on how to set it up in AWS EC2 env?
by the way, Seastar install-dependencies.sh failed due to its ID=amzn and it's not clear which packages are missing. I manually export ID=centos or rhel, both failed. didn't try fedora yet.

 /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"

Marc R

<email.marc@gmail.com>
unread,
Sep 4, 2022, 3:04:19 PM9/4/22
to seastar-dev
Hi David,

I ended up switching from Amazon Linux 2 to Fedora 34 to Amazon Linux 2022 (preview) to try to strike a balance between available dependencies and kernel update frequency. 
FYI the ENA DPDK driver documentation[1] has gotten a lot better and includes instructions for getting started with VFIO or IGB_UIO. IGB_UIO is probably easier right now. 

I also wrote up a blog post[2] that you might find helpful, the sections titled "Building Seastar", "DPDK on AWS", and "DPDK Optimization". But the docs should stay more up to date than my blog post. 
Note that Seastar uses an older versions of DPDK and requires some patches to work on AWS: https://github.com/scylladb/dpdk/pulls/talawahtech

FYI below you will find the commands I used to install the dependencies that I used to build Seastar on Amazon Linux 2022:

sudo dnf install gcc gcc-c++ make python3 systemtap-sdt-devel libtool cmake c-ares-devel diffutils doxygen openssl ninja-build ragel \
boost-devel libubsan libasan libatomic valgrind-devel libtool-ltdl-devel trousers-devel libidn2-devel libunistring-devel hwloc-devel \
numactl-devel libpciaccess-devel libxml2-devel xfsprogs-devel gnutls-devel lksctp-tools-devel lz4-devel

# fmt-devel yaml-cpp-devel stow cryptopp-devel are missing in AL2022 so I installed them from the Fedore 34 repo

sudo dnf install -y https://download.fedoraproject.org/pub/fedora/linux/releases/34/Everything/x86_64/os/Packages/s/stow-2.3.1-4.fc34.noarch.rpm

sudo dnf install -y https://download.fedoraproject.org/pub/fedora/linux/updates/34/Everything/x86_64/Packages/c/cryptopp-8.6.0-1.fc34.x86_64.rpm
sudo dnf install -y https://download.fedoraproject.org/pub/fedora/linux/updates/34/Everything/x86_64/Packages/c/cryptopp-devel-8.6.0-1.fc34.x86_64.rpm

sudo dnf install -y https://download.fedoraproject.org/pub/fedora/linux/updates/34/Everything/x86_64/Packages/f/fmt-7.1.3-3.fc34.x86_64.rpm
sudo dnf install -y https://download.fedoraproject.org/pub/fedora/linux/updates/34/Everything/x86_64/Packages/f/fmt-devel-7.1.3-3.fc34.x86_64.rpm

sudo dnf install -y https://download.fedoraproject.org/pub/fedora/linux/releases/34/Everything/x86_64/os/Packages/y/yaml-cpp-0.6.3-4.fc34.x86_64.rpm
sudo dnf install -y https://download.fedoraproject.org/pub/fedora/linux/releases/34/Everything/x86_64/os/Packages/y/yaml-cpp-devel-0.6.3-4.fc34.x86_64.rpm




David Guo

<guo@taniustech.com>
unread,
Sep 5, 2022, 1:51:28 PM9/5/22
to Marc R, seastar-dev
Thanks, Marc. 
Unfortunately, I don't have the freedom to choose which Amazon Linux to use, as there is a lot of existing code base building on it.
I successfully tested F-Stack code using latest DPDK on IFB_UIO, but couldn't get VFIO working. its TCP performance is good, UDP sucks. couldn't get UDPDK work in IGB_UIO and not sure if VFIO could work as its creator didn't test on AWS box.
I'm surprised there is no step by step instruction to set up a working Seastar env on any AWS box and it's hard to find out what causes it not working for starters with limited time/resource as proof of concept work to show case it improves both TCP and UDP over linux raw socket.
your work is appreciated and I will take a look when our company wants to try it again in the future(hopefully not too far).
BTW, have you ever tested UDP performance? if you did and prove it's much better(at least 100% better), then it gave me some persuasive power to kick start evaluation process again. 

Best,
David

You received this message because you are subscribed to a topic in the Google Groups "seastar-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/seastar-dev/FdOgDSry9n4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/89f8cb38-df75-4602-a5f4-8cbb250a8468n%40googlegroups.com.

Marc Richards

<marc@talawah.net>
unread,
Sep 5, 2022, 3:41:14 PM9/5/22
to David Guo, Marc R, seastar-dev
Hey David,

I didn't test UDP, only TCP. FYI I tested IGB_UIO and the performance was the same as VFIO with the write-combining patch.

Reply all
Reply to author
Forward
0 new messages