lack of batch pdu read in iscsi_tcp_service causes QEMU performance degradation (and more questions on tcp performance)

48 views
Skip to first unread message

Chilledheart

unread,
Dec 6, 2016, 1:21:27 PM12/6/16
to libiscsi
hi all,

QEMU libiscsi driver has a design it comes to iscsi_tcp_service when POLLIN event comes in. it looks pretty good at most time but when the iops rise the POLLIN event storm triggers vmexit very frequently. I looked down to the source code of libiscsi, to my surprise, the iscsi_tcp_service doesn't read all pdu available in the socket, while the iscsi_tcp_service writes all pdus in the outqueue. 

In short, the vm's performance is limited to 29K (4k randread, 128 depth) or so while iscsi-perf shows the target is capable to do 100K iops (running in the vm or in the host). When I do some  hacks on the libiscsi driver to read as much as possible in POLLIN event callback. the performance rises to 80K, but still isn't perfect. 

PS qemu driver shows lack of queue handling as linux aio driver do, any suggestion or comments on it?
PS2 it seems qemu's dataplane driver does nothing on network block driver, doesn't it? 
PS3 why use send instead of writev for pdu bh sending, additional syscalls compared to writev (16% cpu cost for sendto in my case)

Chilledheart

unread,
Dec 7, 2016, 3:48:05 AM12/7/16
to libiscsi
forgot to attach the patch (you can find it on the github if you prefer to).

and one more thing to correct, sendto cost is more than 60% cpu usage, not 16%.
0001-Batch-pdu-read-in-function-iscsi_tcp_service.patch

ronnie sahlberg

unread,
Dec 9, 2016, 6:04:15 PM12/9/16
to libiscsi


On Tuesday, 6 December 2016 10:21:27 UTC-8, Chilledheart wrote:
hi all,

QEMU libiscsi driver has a design it comes to iscsi_tcp_service when POLLIN event comes in. it looks pretty good at most time but when the iops rise the POLLIN event storm triggers vmexit very frequently. I looked down to the source code of libiscsi, to my surprise, the iscsi_tcp_service doesn't read all pdu available in the socket, while the iscsi_tcp_service writes all pdus in the outqueue. 

In short, the vm's performance is limited to 29K (4k randread, 128 depth) or so while iscsi-perf shows the target is capable to do 100K iops (running in the vm or in the host). When I do some  hacks on the libiscsi driver to read as much as possible in POLLIN event callback. the performance rises to 80K, but still isn't perfect. 

Please send the patch as a pull request on github and I will have a look.

 

PS qemu driver shows lack of queue handling as linux aio driver do, any suggestion or comments on it?
PS2 it seems qemu's dataplane driver does nothing on network block driver, doesn't it? 

These two questions are better asked on the qemu list.
 
PS3 why use send instead of writev for pdu bh sending, additional syscalls compared to writev (16% cpu cost for sendto in my case)
The library is already using writev() for sending.
However, for qemu + libiscsi,   if you need high performance consider using iSER instead as you get significantly better latency and trhoughput with RDMA offload.

Chilledheart

unread,
Dec 11, 2016, 10:01:34 AM12/11/16
to libiscsi
Sure, the PR is available at https://github.com/sahlberg/libiscsi/pull/224.
>> PS qemu driver shows lack of queue handling as linux aio driver do, any suggestion or comments on it? 
>> PS2 it seems qemu's dataplane driver does nothing on network block driver, doesn't it? 

> These two questions are better asked on the qemu list.
I'll send more complete questions to qemu list later.

The library is already using writev() for sending.

not. at least it is not true pdu header and immediate data.

However, for qemu + libiscsi,   if you need high performance consider using iSER instead as you get significantly better latency and trhoughput with RDMA offload.
RDMA is another story. It requires additional hardware and there are various bugs between different vendors' cards when I was working on it. Anyway, I think we can make libiscsi works via tcp up to 400K iops, and, of course, iSER does double or triple of it in single core.


--
You received this message because you are subscribed to the Google Groups "libiscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libiscsi+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages