iSCSI-SCST Storage Server Usermode Adaptation

120 views
Skip to first unread message

dab2...@gmail.com

unread,
Apr 20, 2017, 3:05:06 PM4/20/17
to esos-users
This may be of interest to ESOS developers.
-------------------------------------------

iSCSI-SCST Storage Server Usermode Adaptation
An adaptation of the iSCSI-SCST storage server software to run entirely in usermode on an unmodified kernel
David A. Butterfield

This paper describes an adaptation of the iSCSI-SCST storage server software to
run entirely in usermode on an unmodified Linux kernel; performance
measurements and model; and an experimental algorithm to improve performance
for small Read operations.

In a standard installation of SCST the iscsi-scstd daemon runs as a
single-threaded Linux usermode process that cooperates with the kernel-resident
SCST datapath implementation using ioctl(2) and netlink(7) for communication.

In the iSCSI-SCST Usermode Adaptation the iscsi-scstd daemon runs on the main
thread in a multi-threaded process in which other usermode threads are
concurrently providing the services and executing the SCST code that would be
running inside the kernel in a standard installation of SCST.

The iSCSI server executable program can run as a regular (non-super) user, as
long as it has permission to access the backing storage (file or block device).
Administration is done in the usual SCST way using scstadmin, which accesses
the running server program through a fuse-mounted filesystem implemented using
a shim to connect the SCST procfs calls with the fuse(8) filesystem API.

The subset of SCST used supports the iSCSI transport type and SCSI Block
Commands (vdisk).  It includes the SCST Core, the iSCSI daemon and kernel
logic, the vdisk device, and the /proc interface; comprising about 80,000 lines
of SCST source code.  To support running in usermode, around 55 (fifty-five)
lines of executable C code had to be added or changed in SCST source files.

For a single session over 1 Gb Ethernet being serviced by a single 2.4 GHz CPU:
the described Adaptive Nagle optimization improves peak throughput performance
for 512-Byte Random Read of /dev/zero from around 63,000 IOPS to more than
100,000 IOPS, with no adverse impact below Queue Depth 17.

Paper:  https://davidbutterfield.github.io/SCST-Usermode-Adaptation/docs/SCST_Usermode.html
Code:   https://github.com/DavidButterfield/SCST-Usermode-Adaptation

Marc Smith

unread,
Apr 20, 2017, 3:25:18 PM4/20/17
to esos-...@googlegroups.com
Hi David,

I did see your posts on scst-devel and read through some of it. Yes,
we would like to include it in ESOS, but a couple questions first
(forgive me I haven't read through all of your project yet):
- Is there interest by Vlad/Bart in merging this into SCST?
- And I guess, can it co-exist with vanilla SCST?
- Or is it meant to fully replace the current SCST iSCSI target stack?
- You likely already know that ESOS is based around SCST, so what is
the simplest way for us to integrate this into ESOS without disrupting
what we already have?

Nice work!


--Marc
> --
> You received this message because you are subscribed to the Google Groups
> "esos-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to esos-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

David Butterfield

unread,
Apr 21, 2017, 12:43:19 AM4/21/17
to esos-...@googlegroups.com
- Is there interest by Vlad/Bart in merging this into SCST?

I've been focusing first on getting the SCST bugfixes integrated, since those
would be expected to be less controversial than a port to usermode, and it
gives them a chance to evaluate whether I know what I'm talking about.  I'm
working my way up to the few "generic changes" that are needed to run in
usermode, but that there's no reason to keep under #ifdef (e.g. use sigaction(2)
rather than signal(2) in the iscsi-scstd daemon because running usermode
SCST the daemon runs in a multithreaded process).  After that can come the
conversation about the #ifdef stuff -- it is fairly non-invasive; but I was thinking
to give them a little time to get comfortable with the reliability of my simpler
changes before initiating talk about that.

In the meantime I've been tracking changes to the SCST master in some
branches of my repository (for now leaving the master branch with the original
stable version the performance measurements were done against).  Vlad has
already integrated over half of my bugfixes and I've rebased onto that.  (I just
saw a bunch more come in so I get to go do that again later.)  I'm also writing
code to make it work using an SCST sysfs compilation rather than the procfs
(obsolete in SCST) API it currently supports.
 
- And I guess, can it co-exist with vanilla SCST?

If you mean the source code:  yes.  Only a tiny percentage of the C code is
changed for usermode under #ifdef, so the one source can be compiled either
way.  I have added a "usermode" subdirectory under the SCST top-level source
directory.  I've added some conditionals to Makefiles in addition to the C code.
At the top level I can type either the usual "make all" and it gives *.ko, or I can
use "make usermode" and get usermode/scst.out (ELF executable). (The
master branch in my repository pre-dates this organization, though.)

If you mean both of them serving sessions concurrently on the same machine:
I've never actually run a kernel-resident SCST, but I don't see any reason why
not.  They would be two independent servers and one of them would have to be
on some other port (they can't both listen on TCP 3260), or just configure the
kernel-resident instance with iSCSI disabled.  But other than that it's just a
regular userspace server that listens on its assigned port number and provides
iSCSI service via regular socket calls.
 
- Or is it meant to fully replace the current SCST iSCSI target stack?

It's really the same code (except for the #ifdef places).  The *.ko files from
a regular build for a kernel-resident SCST have undefined symbols such as
filp_open() and kthread_create() and mempool_alloc().  I've reimplemented
those symbols in userspace code so the original SCST source code can
compile and link against it.  (It isn't really quite that way because I did a lot
of it with the preprocessor, but that's the idea -- just build the SCST source
code with one set of headers and libraries for the kernel or the other set for
a usermode build.)  With just a few #ifdefs this should be able to track
ongoing improvements to SCST.

- You likely already know that ESOS is based around SCST, so what is
the simplest way for us to integrate this into ESOS without disrupting
what we already have?

I don't think I know enough about ESOS to answer that, but hopefully what I
mentioned above can help you determine that.  I would start by downloading
and building it and starting it up to make sure it works (I've only ever built it
on Ubuntu -- I have no reason to think it's broken elsewhere, but it's untested.)

Once it runs you can then evaluate its suitability for deployment in an ESOS
environment, and estimate any work to bring it to sufficient maturity for that.
Given the usermode SCST code is the same as the kernel-resident (modulo
a few #ifdefs), the quality of the SCSI/iSCSI usermode implementation should
be the same. You probably don't need to pay much attention there, because
you already run using SCST.  I think I'd be concerned mainly in two areas:

(1) Performance -- I have done all my testing on 1 Gb Ethernet; and under all
but the smallest couple of I/O sizes it can keep the network saturated with
a single 2.4 GHz CPU thread.  So for iSCSI over 1 Gb Ethernet I am confident
in the performance.  But I have not tested at higher network speeds.  That
would probably need to be tested for an enterprise product, and any discovered
performance issues investigated and addressed.

Keep in mind that this (at present, anyway) only supports plain iSCSI via
socket calls, and vdisk (SBC) device type.  NO iser, Infiniband, special code
for qla, mellanox, etc; NO "pass through" of SCSI commands to real SCSI
devices;  Also, it does not make use of the latest kernel performance features;
for example kthread_create_on_node() does exactly the same thing as
kthread_create(), both ending up at pthread_create(2).  Everything ends up
folded into calls to functions in manpage sections 2 and 3.  But again, at the
1 Gb speed it saturates the network anyway.

On the other hand with the iSCSI server out in userspace it becomes easier
to try things like implementing its backing storage with direct calls to librados.
So I don't think the dust is settled on the kernel vs. usermode performance,
but with usermode SCST it is now possible to get very comparable datapoints
between the two.

(2) Review of the code that replaces the kernel functions depended on by the
SCST implementation.  I'm the only one who has ever looked at it, so it would
be prudent for other engineers to review it.  The non-SCST code is in three
very decoupled parts (SCST, UMC, MTE), each in a separate repository; and
the optimal reviewer for each part may be different:

  - SCST Usermode Adaptation has the original SCST base, my #ifdef
    changes and Makefile changes, and two additional source files providing
    interfaces between SCST code and the rest of the usermode environment.
    This code knows how to initialize both SCST and UMC and connect them
    together.  This code would best be reviewed by someone familiar with SCST.

  - Usermode Compatibility (UMC) is mostly a .h file #defining a whole bunch
    of symbols found in the kernel; this .h file gets included (gcc -include) at
    the start of each SCST kernel .c file when compiling to run in usermode.
    This code does not know about SCST -- it just emulates kernel functions,
    using calls to glibc and other libraries, including MTE.  This code would best
    be reviewed by someone familiar with the semantics and behavior of Linux
    kernel internal interfaces.  I consider this UMC part of the code to be the
    part most in need of review.

  - Multithreaded Engine (MTE) is an event-driven infrastructure providing
    threading and event-loop services, memory allocation (to outperform
    malloc when multithreaded), decent stacktraces, etc.  It does not know
    about SCST or UMC, it just provides infrastructure services for event-
    driven programs which UMC consumes.  For example UMC implements
    simulated "SIRQ" threads by using the MTE event loop.  This part of the
    code would best be reviewed by someone familiar with epoll_wait(2) and
    event-driven programming.

This diagram shows the relationship between the three components just mentioned:
https://davidbutterfield.github.io/SCST-Usermode-Adaptation/docs/SCST_usermode_service_map.pdf

Finally, in the non-SCST source code I have written "XXX" comments in some
places for future attention.  The more X's the more concern.  Three XXX can
largely be ignored.  More than three indicate what possibly may be real issues.
I have tried to mark anywhere I noticed something that might cause trouble in
a port, or a future enhancement elsewhere in the code, or whatever.  There are
probably some where I was unsure exactly what something was supposed to,
and left a question to be checked again later.  These should all be reviewed,
at least to the point of checking they have the right severity levels assigned.

Regards,
David Butterfield

David Butterfield

unread,
May 13, 2017, 12:31:14 AM5/13/17
to esos-...@googlegroups.com
Since my earlier message I've written a little interface module to drive
tcmu-runner backstore handlers from the Usermode-SCST block I/O layer.

So far I've run it with the tcmu-runner/rbd.c Ceph handler and with a little
"ramdisk" handler I wrote to have a fast device at a strategic point for
performance testing (the point of crossover to backstore client modules).

I have also built it with QEMU/qcow and Gluster/glfs, but I haven't tried
running those (not installed). I did manage to get myself running a little
1-node Ceph "cluster" so I could test using rbd.c There's a nice
diagram of all this on this PDF page:

https://github.com/DavidButterfield/SCST-Usermode-Adaptation/blob/scstu_tcmu/usermode/scstu_tcmur.pdf
Reply all
Reply to author
Forward
0 new messages