tcm: reduce allocations and frees per I/O

1 view
Skip to first unread message

Joe Eykholt

unread,
Sep 2, 2010, 8:29:08 PM9/2/10
to LIO-Target devel
Suggested improvement for TCM 4.1 or so:

For each command we receive from the network, we currently allocate a command structure
in the fabric driver (e.g. tcm_fc allocates ft_cmd), and TCM allocates struct se_cmd.
Then there's an allocation for struct se_transport_task, one for a sense_buffer, and
perhaps one for iov_data. That's at least four allocs and frees per I/O.

It would be better if all of these structures were allocated together inside the
fabric driver's structure so that we have less allocation and freeing, so that's
faster, and fewer error paths to deal with.

It also means less pointer dereferencing, since taking the address of a sub-structure
is just an add instead of a load, and finding the fabric-specific structure
from se_cmd can use container_of() which eliminates the need for an
se_fabric_cmd_ptr, and turns into a subtract instead of a load.
These are the kinds of tiny details that I like to think about.

Co-allocation does take away a tiny bit of module independence but as long
as we're including the header files anyway, I think it's an OK sacrifice
for performance.

Joe

Nicholas A. Bellinger

unread,
Sep 2, 2010, 8:52:33 PM9/2/10
to linux-iscsi...@googlegroups.com
On Thu, 2010-09-02 at 17:29 -0700, Joe Eykholt wrote:
> Suggested improvement for TCM 4.1 or so:
>
> For each command we receive from the network, we currently allocate a command structure
> in the fabric driver (e.g. tcm_fc allocates ft_cmd), and TCM allocates struct se_cmd.
> Then there's an allocation for struct se_transport_task, one for a sense_buffer, and
> perhaps one for iov_data. That's at least four allocs and frees per I/O.
>
> It would be better if all of these structures were allocated together inside the
> fabric driver's structure so that we have less allocation and freeing, so that's
> faster, and fewer error paths to deal with.
>

Indeed, I think this cleans things up very nicely for the per I/O
allocation/free codepath, that unfortuately is showing a bit of it's age
as you have rightly pointed out..

> It also means less pointer dereferencing, since taking the address of a sub-structure
> is just an add instead of a load, and finding the fabric-specific structure
> from se_cmd can use container_of() which eliminates the need for an
> se_fabric_cmd_ptr, and turns into a subtract instead of a load.
> These are the kinds of tiny details that I like to think about.
>

I have to admit that this did cross my mind (and the thought of you
mentioning the point one day) while doing target_core_fabric_configfs.c.

> Co-allocation does take away a tiny bit of module independence but as long
> as we're including the header files anyway, I think it's an OK sacrifice
> for performance.
>

I completely agree on this point. Soo, there are a handful of cleanup
patches for tomorrow and some changes for TCM Core that will be going in
over the weekend related to HW target mode drivers for the v4.0.0-rc4
round. However I think this is item is important enough to have a look
at getting down to a single kmem_cache_zalloc() and kmem_cache_free()
per struct se_cmd I/O descriptor for the mainline merge.

I will have a look in the upcoming days and see how this could be
resolved in a manner that is acceptable with existing v4 stable+WIP
fabric_mods.

Really, many thanks for your invaluable input on this item 8-)

Best,

--nab

Reply all
Reply to author
Forward
0 new messages