Thank you.
I answered below in-line.
I will do the remaining changes described below and also re-work it
into a series of
smaller commits that will be easier to follow on what each step and change does.
Let me take a few days to do these updates and I will send as an
initial pull request
for review.
All the async/task functions in nfs/iscsi are guaranteed to be non-blocking.
The socket processing in {iscsi|nfs}_service() is also guaranteed to
be non-blocking.
I will add comments to the ublksrv_epoll_add_fd() command that it is
very important that the callbacks that are registered
must be non-blocking.
>
> 2) same concern with ublkdrv_process_epollfd() which waits for 8 nfs
> socket events, during the period, no any ublk io command can be handled,
> so this extra wait has to be removed
I call epoll_wait() with a timeout of zero so it should return immediately.
In the epollfd callback I call {nfs|iscsi}_service() and it will
process all incoming/outcoming data for the socket so if there are for
example
multiple NFS responses to complete, the socket becomes readable once,
the registered callback is invoked once,
nfs_service() is called once and from within this function all pending
responses are completed before {nfs|iscsi}_service() returns.
I.e. a single call to _service() will process all pending completions.
I set the number of events to 8 in preparation for there to be
multiple filedescriptors to wait on and thus allow
a single call to epoll_wait() to retrieve all file descriptors that
had triggered in a single call.
8 is probably too high. Perhaps setting it to 2 is better.
If eventfd is migrated to use this new epollfd api:
That would allow the socket for nfs and also the eventfd from
triggering simultaneously and both events to be retreived with a
single call to epoll_wait().
>
> 3) one extra epoll_ctl() is added in fast io path, will this way slow down
> the whole event loop?
In this prototype it always calls epoll_ctl() after each time the
callback has been invoked.
I will change this in the finished patch-series to only call it if
what we wait for changes.
Most of the time we write directly to the socket (non-blocking) and we
wait for POLLIN to process responses
but occasionally I need to change it to POLLIN|POLLOUT in case we have
data to send to the socket but it is not writeable.
This should be very rare unless there are stalls in the tcp connection.
Many other applications I suspect will only ever need to use POLLIN so
they will never need to call epoll_ctl() from the hot-path.
I will change it to keep track of what we were waiting on and only
call epoll_ctl() when we need to change the set of events.
I will also add comments to explain why it is important to minimize
the amount of calls to epoll_ctl()
>
> 4) will nfs/iscsi io submission/completion take much CPU? If yes, the
> single pthread may be saturated. If not, this approach looks pretty good.
No, it should take minimal CPU. It is mostly bound by memory speed and
the amount of data you can read/write between memory and the socket.