[PATCH] reactor: adjust max_networking_aio_io_control_blocks to lower size when fs.aio-max-nr is small

112 views
Skip to first unread message

Takuya ASADA

<syuu@scylladb.com>
unread,
Aug 3, 2021, 2:29:16 PM8/3/21
to seastar-dev@googlegroups.com, Takuya ASADA
When fs.aio-max-nr does not have enough size, try to adjust
max_networking_aio_io_control_blocks size to fit fs.aio-max-nr.

See scylladb/scylla#9096

Signed-off-by: Takuya ASADA <sy...@scylladb.com>
---
include/seastar/core/smp.hh | 1 +
src/core/reactor.cc | 27 ++++++++++++++++++++++++++-
2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/seastar/core/smp.hh b/include/seastar/core/smp.hh
index 74898627..1f58a08d 100644
--- a/include/seastar/core/smp.hh
+++ b/include/seastar/core/smp.hh
@@ -445,6 +445,7 @@ class smp : public std::enable_shared_from_this<smp> {
void pin(unsigned cpu_id);
void allocate_reactor(unsigned id, reactor_backend_selector rbs, reactor_config cfg);
void create_thread(std::function<void ()> thread_loop);
+ unsigned adjust_max_networking_aio_io_control_blocks(unsigned network_iocbs);
public:
static unsigned count;
};
diff --git a/src/core/reactor.cc b/src/core/reactor.cc
index 2b6bd206..382733ae 100644
--- a/src/core/reactor.cc
+++ b/src/core/reactor.cc
@@ -54,6 +54,7 @@
#include <seastar/core/internal/buffer_allocator.hh>
#include <seastar/core/scheduling_specific.hh>
#include <seastar/util/log.hh>
+#include <seastar/util/read_first_line.hh>
#include "core/file-impl.hh"
#include "core/reactor_backend.hh"
#include "core/syscall_result.hh"
@@ -3713,6 +3714,30 @@ void smp::register_network_stacks() {
register_native_stack();
}

+unsigned smp::adjust_max_networking_aio_io_control_blocks(unsigned network_iocbs)
+{
+ static unsigned constexpr storage_iocbs = reactor::max_aio;
+ static unsigned constexpr preempt_iocbs = 2;
+
+ auto aio_max_nr = read_first_line_as<unsigned>("/proc/sys/fs/aio-max-nr");
+ auto aio_nr = read_first_line_as<unsigned>("/proc/sys/fs/aio-nr");
+ auto available_aio = aio_max_nr - aio_nr;
+ auto requested_aio_network = network_iocbs * smp::count;
+ auto requested_aio_other = (storage_iocbs + preempt_iocbs) * smp::count;
+ auto requested_aio = requested_aio_network + requested_aio_other;
+
+ if (available_aio < requested_aio) {
+ if (available_aio >= requested_aio_other + smp::count) { // at least one queue for each shard
+ network_iocbs = (available_aio - requested_aio_other) / smp::count;
+ seastar_logger.warn("max-networking-io-control-blocks adjusted to {} since requested size is too large. Please increase request capacity in /proc/sys/fs/aio-max-nr", network_iocbs);
+ } else {
+ seastar_logger.error("Failed to adjust max-networking-io-control-blocks, request capacity in /proc/sys/fs/aio-max-nr too small");
+ }
+ }
+
+ return network_iocbs;
+}
+
void smp::configure(boost::program_options::variables_map configuration, reactor_config reactor_cfg)
{
#ifndef SEASTAR_NO_EXCEPTION_HACK
@@ -3885,7 +3910,7 @@ void smp::configure(boost::program_options::variables_map configuration, reactor
memory::set_dump_memory_diagnostics_on_alloc_failure_kind(configuration["dump-memory-diagnostics-on-alloc-failure-kind"].as<std::string>());
}

- reactor_cfg.max_networking_aio_io_control_blocks = configuration["max-networking-io-control-blocks"].as<unsigned>();
+ reactor_cfg.max_networking_aio_io_control_blocks = adjust_max_networking_aio_io_control_blocks(configuration["max-networking-io-control-blocks"].as<unsigned>());

bool heapprof_enabled = configuration.count("heapprof");
if (heapprof_enabled) {
--
2.31.1

Benny Halevy

<bhalevy@scylladb.com>
unread,
Aug 4, 2021, 4:54:55 AM8/4/21
to Takuya ASADA, seastar-dev@googlegroups.com
Please also print the recommended value so the admin wouldn't have to guess / read the source code.

> +        } else {
> +            seastar_logger.error("Failed to adjust max-networking-io-control-blocks, request capacity in /proc/sys/fs/aio-max-nr too small");

Please also print the actual numbers (aio_max_nr, aio_nr, and the minimum required) so the admin wouldn't have to guess.

Takuya ASADA

<syuu@scylladb.com>
unread,
Aug 4, 2021, 10:04:44 PM8/4/21
to seastar-dev@googlegroups.com, Takuya ASADA
When fs.aio-max-nr does not have enough size, try to adjust
max_networking_aio_io_control_blocks size to fit fs.aio-max-nr.

See scylladb/scylla#9096

Signed-off-by: Takuya ASADA <sy...@scylladb.com>
---
include/seastar/core/smp.hh | 1 +
src/core/reactor.cc | 29 ++++++++++++++++++++++++++++-
2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/seastar/core/smp.hh b/include/seastar/core/smp.hh
index 74898627..1f58a08d 100644
--- a/include/seastar/core/smp.hh
+++ b/include/seastar/core/smp.hh
@@ -445,6 +445,7 @@ class smp : public std::enable_shared_from_this<smp> {
void pin(unsigned cpu_id);
void allocate_reactor(unsigned id, reactor_backend_selector rbs, reactor_config cfg);
void create_thread(std::function<void ()> thread_loop);
+ unsigned adjust_max_networking_aio_io_control_blocks(unsigned network_iocbs);
public:
static unsigned count;
};
diff --git a/src/core/reactor.cc b/src/core/reactor.cc
index 2b6bd206..f5cbdd24 100644
--- a/src/core/reactor.cc
+++ b/src/core/reactor.cc
@@ -54,6 +54,7 @@
#include <seastar/core/internal/buffer_allocator.hh>
#include <seastar/core/scheduling_specific.hh>
#include <seastar/util/log.hh>
+#include <seastar/util/read_first_line.hh>
#include "core/file-impl.hh"
#include "core/reactor_backend.hh"
#include "core/syscall_result.hh"
@@ -3713,6 +3714,32 @@ void smp::register_network_stacks() {
register_native_stack();
}

+unsigned smp::adjust_max_networking_aio_io_control_blocks(unsigned network_iocbs)
+{
+ static unsigned constexpr storage_iocbs = reactor::max_aio;
+ static unsigned constexpr preempt_iocbs = 2;
+
+ auto aio_max_nr = read_first_line_as<unsigned>("/proc/sys/fs/aio-max-nr");
+ auto aio_nr = read_first_line_as<unsigned>("/proc/sys/fs/aio-nr");
+ auto available_aio = aio_max_nr - aio_nr;
+ auto requested_aio_network = network_iocbs * smp::count;
+ auto requested_aio_other = (storage_iocbs + preempt_iocbs) * smp::count;
+ auto requested_aio = requested_aio_network + requested_aio_other;
+ auto network_iocbs_old = network_iocbs;
+
+ if (available_aio < requested_aio) {
+ seastar_logger.warn("Requested AIO slots too large, please increase request capacity in /proc/sys/fs/aio-max-nr. available:{} requested:{}", available_aio, requested_aio);
+ if (available_aio >= requested_aio_other + smp::count) { // at least one queue for each shard
+ network_iocbs = (available_aio - requested_aio_other) / smp::count;
+ seastar_logger.warn("max-networking-io-control-blocks adjusted from {} to {}, since AIO slots are unavailable", network_iocbs_old, network_iocbs);
+ }
+ // don't print error on else condition here, since the program will
+ // terminate with AIO error message anyway.
+ }
+
+ return network_iocbs;
+}
+
void smp::configure(boost::program_options::variables_map configuration, reactor_config reactor_cfg)
{
#ifndef SEASTAR_NO_EXCEPTION_HACK
@@ -3885,7 +3912,7 @@ void smp::configure(boost::program_options::variables_map configuration, reactor

Benny Halevy

<bhalevy@scylladb.com>
unread,
Aug 5, 2021, 12:31:00 AM8/5/21
to Takuya ASADA, seastar-dev
LGTM

--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/20210805020438.237548-1-syuu%40scylladb.com.

Pekka Enberg

<penberg@scylladb.com>
unread,
Aug 9, 2021, 3:15:58 AM8/9/21
to Takuya ASADA, Avi Kivity, Benny Halevy, seastar-dev
I don't understand this comment. If we adjusted "network_iocbs", why
would the program terminate with AIO error?

> + }
> +
> + return network_iocbs;
> +}
> +
> void smp::configure(boost::program_options::variables_map configuration, reactor_config reactor_cfg)
> {
> #ifndef SEASTAR_NO_EXCEPTION_HACK
> @@ -3885,7 +3912,7 @@ void smp::configure(boost::program_options::variables_map configuration, reactor
> memory::set_dump_memory_diagnostics_on_alloc_failure_kind(configuration["dump-memory-diagnostics-on-alloc-failure-kind"].as<std::string>());
> }
>
> - reactor_cfg.max_networking_aio_io_control_blocks = configuration["max-networking-io-control-blocks"].as<unsigned>();
> + reactor_cfg.max_networking_aio_io_control_blocks = adjust_max_networking_aio_io_control_blocks(configuration["max-networking-io-control-blocks"].as<unsigned>());
>
> bool heapprof_enabled = configuration.count("heapprof");
> if (heapprof_enabled) {
> --
> 2.31.1
>

Benny Halevy

<bhalevy@scylladb.com>
unread,
Aug 9, 2021, 4:36:26 AM8/9/21
to Pekka Enberg, Takuya ASADA, Avi Kivity, seastar-dev
IIUC, the else case is when we didn't adjust them,
so we expect to fail to allocate them.

Also, note that there is an unconditional warning above.

Takuya ASADA

<syuu@scylladb.com>
unread,
Aug 9, 2021, 7:02:51 AM8/9/21
to Pekka Enberg, Avi Kivity, Benny Halevy, seastar-dev
Probably my comment was bad,  if (available_aio >= requested_aio_other + smp::count), we still can adjust network_iocbs to continue running.
Else, we won't enough AIO slots to adjust. This is the case the comment describing, in this case, the program will soon terminate with:
"std::runtime_error(fmt::format("Could not setup Async I/O: {}. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application", msg))"

So I thought we don't need to add error message something like "not enough request capacity in /proc/sys/fs/aio-max-nr", because it will show up anyway.

Pekka Enberg

<penberg@scylladb.com>
unread,
Aug 10, 2021, 6:15:08 AM8/10/21
to Takuya ASADA, Avi Kivity, Benny Halevy, seastar-dev
Hi Takuya,
Right.

I think failing here explicitly is better because then users will see
exactly _one_ error/warning in the log that explicitly says what went
wrong. AIO setup can -- in theory -- fail for other reasons, which is
why the error message is so vague.

So what I propose is that
adjust_max_networking_aio_io_control_blocks() function (1) adjusts
"network_iocbs" if we can (but warn() about it) and (2) throw if it
cannot adjust, throw with the "Requested AIO slots too large" error
message.

Benny, Avi, agree/disagree?

Regards,

- Pekka

Benny Halevy

<bhalevy@scylladb.com>
unread,
Aug 10, 2021, 7:00:08 AM8/10/21
to Pekka Enberg, Takuya ASADA, Avi Kivity, seastar-dev
I'm OK with that (though we're probably splitting hairs here).

>
> Regards,
>
> - Pekka


Pekka Enberg

<penberg@scylladb.com>
unread,
Aug 10, 2021, 7:37:42 AM8/10/21
to Benny Halevy, Takuya ASADA, Avi Kivity, seastar-dev
On Tue, Aug 10, 2021 at 2:00 PM Benny Halevy <bha...@scylladb.com> wrote:
> > Benny, Avi, agree/disagree?
>
> I'm OK with that (though we're probably splitting hairs here).

That's not my intention here.

Letting a program run if you know it will fail later is problematic
because the code could trip over for some other reason, obfuscating
the root cause. Also, having multiple log entries on the same issue is
also problematic, because you might miss some of the logs while
debugging.

- Pekka

Benny Halevy

<bhalevy@scylladb.com>
unread,
Aug 10, 2021, 9:30:16 AM8/10/21
to Pekka Enberg, Takuya ASADA, Avi Kivity, seastar-dev
On Tue, 2021-08-10 at 14:37 +0300, Pekka Enberg wrote:
> On Tue, Aug 10, 2021 at 2:00 PM Benny Halevy <bha...@scylladb.com> wrote:
> > > Benny, Avi, agree/disagree?
> >
> > I'm OK with that (though we're probably splitting hairs here).
>
> That's not my intention here.
>
> Letting a program run if you know it will fail later is problematic
> because the code could trip over for some other reason, obfuscating
> the root cause.

That's right. Throwing if the function couldn't adjust network iocb's
is the right thing to do.

Takuya ASADA

<syuu@scylladb.com>
unread,
Aug 10, 2021, 2:51:15 PM8/10/21
to Pekka Enberg, Avi Kivity, Benny Halevy, seastar-dev
Ah, make sense.
Will send v3 to implement that.

Takuya ASADA

<syuu@scylladb.com>
unread,
Aug 12, 2021, 4:21:56 PM8/12/21
to seastar-dev@googlegroups.com, Takuya ASADA
When fs.aio-max-nr does not have enough size, try to adjust
max_networking_aio_io_control_blocks size to fit fs.aio-max-nr.

See scylladb/scylla#9096

Signed-off-by: Takuya ASADA <sy...@scylladb.com>
---
include/seastar/core/smp.hh | 1 +
src/core/reactor.cc | 29 ++++++++++++++++++++++++++++-
2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/seastar/core/smp.hh b/include/seastar/core/smp.hh
index 74898627..1f58a08d 100644
--- a/include/seastar/core/smp.hh
+++ b/include/seastar/core/smp.hh
@@ -445,6 +445,7 @@ class smp : public std::enable_shared_from_this<smp> {
void pin(unsigned cpu_id);
void allocate_reactor(unsigned id, reactor_backend_selector rbs, reactor_config cfg);
void create_thread(std::function<void ()> thread_loop);
+ unsigned adjust_max_networking_aio_io_control_blocks(unsigned network_iocbs);
public:
static unsigned count;
};
diff --git a/src/core/reactor.cc b/src/core/reactor.cc
index 2b6bd206..38bdb050 100644
+ } else {
+ throw std::runtime_error("Could not setup Async I/O: Not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application");
+ }

Pekka Enberg

<penberg@scylladb.com>
unread,
Aug 13, 2021, 6:20:53 AM8/13/21
to Takuya ASADA, seastar-dev, Benny Halevy, Avi Kivity
On Thu, Aug 12, 2021 at 11:21 PM Takuya ASADA <sy...@scylladb.com> wrote:
> When fs.aio-max-nr does not have enough size, try to adjust
> max_networking_aio_io_control_blocks size to fit fs.aio-max-nr.
>
> See scylladb/scylla#9096
>
> Signed-off-by: Takuya ASADA <sy...@scylladb.com>

Looks good to me.

Reviewed-by: Pekka Enberg <pen...@scylladb.com>

Takuya ASADA

<syuu@scylladb.com>
unread,
Aug 18, 2021, 4:48:14 AM8/18/21
to seastar-dev, Benny Halevy, Avi Kivity, Pekka Enberg
ping

Takuya ASADA

<syuu@scylladb.com>
unread,
Sep 12, 2021, 5:49:05 PM9/12/21
to seastar-dev, Benny Halevy, Avi Kivity, Pekka Enberg
ping, we need this to run Seastar app (particularly for scylla) without a privilege to change fs.aio-max-nr.

Commit Bot

<bot@cloudius-systems.com>
unread,
Sep 13, 2021, 5:28:37 AM9/13/21
to seastar-dev@googlegroups.com, Takuya ASADA
From: Takuya ASADA <sy...@scylladb.com>
Committer: Nadav Har'El <n...@scylladb.com>
Branch: master

reactor: adjust max_networking_aio_io_control_blocks to lower size when fs.aio-max-nr is small

When fs.aio-max-nr does not have enough size, try to adjust
max_networking_aio_io_control_blocks size to fit fs.aio-max-nr.

See scylladb/scylla#9096

Signed-off-by: Takuya ASADA <sy...@scylladb.com>
Message-Id: <20210812202150...@scylladb.com>

---
diff --git a/include/seastar/core/smp.hh b/include/seastar/core/smp.hh
--- a/include/seastar/core/smp.hh
+++ b/include/seastar/core/smp.hh
@@ -445,6 +445,7 @@ private:
void pin(unsigned cpu_id);
void allocate_reactor(unsigned id, reactor_backend_selector rbs, reactor_config cfg);
void create_thread(std::function<void ()> thread_loop);
+ unsigned adjust_max_networking_aio_io_control_blocks(unsigned network_iocbs);
public:
static unsigned count;
};
diff --git a/src/core/reactor.cc b/src/core/reactor.cc
@@ -3886,7 +3913,7 @@ void smp::configure(boost::program_options::variables_map configuration, reactor
Reply all
Reply to author
Forward
0 new messages