Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

[ANNOUNCE]: Generic SCSI Target Mid-level For Linux (SCST), target drivers for iSCSI and QLogic Fibre Channel cards released

198 views

Skip to first unread message

Vladislav Bolkhovitin

unread,

Jul 8, 2008, 3:41:00 PM7/8/08

to linux-...@vger.kernel.org, linux...@vger.kernel.org, scst-devel

I'm glad to announce that version 1.0.0 of Generic SCSI Target Middle
Level for Linux (SCST) was released and available for download from
http://scst.sourceforge.net/downloads.html

SCST is a subsystem of the Linux kernel that provides a standard
framework for SCSI target drivers development. It is designed to provide
unified, consistent interface between SCSI target drivers and Linux
kernel and simplify target drivers development as much as possible. It
has the following main features:

* Simple, easy to use interface with target drivers. Particularly,
SCST core performs required pre- and post- processing of incoming
requests as well as necessary error recovery.

* Undertakes most problems, related to execution contexts, thus
practically eliminating one of the most complicated problem in the
kernel drivers development. For example, a target driver for QLogic
22xx/23xx cards, which has all necessary features, is only about 2000
lines of code long.

* Very low overhead, fine-grained locks and simplest commands
processing path, which allow to reach maximum possible performance and
scalability. Particularly, incoming requests can be processed in the
caller's context or in one of the internal SCST core's tasklets,
therefore no extra context switches required.

* Device handlers, i.e. plugins, architecture provides extra
flexibility by allowing various I/O modes in backstorage handling. For
example, pass-through device handlers allow to use real SCSI hardware
and vdisk device handler allows to use files as virtual disks.

* Provides advanced per-initiator devices visibility management
(LUN masking), which allows different initiators to see different set of
devices with different access permissions. For instance, initiator A
could see exported from target T devices X and Y read-writable, and
initiator B from the same target T could see devices Y read-only and Z
read-writable.

* Emulates necessary functionality of SCSI host adapter, because
from remote initiators point of view SCST acts as a SCSI host with its
own devices.

The following I/O modes are supported by SCST:

* Pass-through mode with one to many relationship, i.e. when multiple
initiators can connect to the exported pass-through devices, for
virtually all SCSI devices types: disks (type 0), tapes (type 1),
processors (type 3), CDROMs (type 5), MO disks (type 7), medium changers
(type 8) and RAID controllers (type 0xC)

* FILEIO mode, which allows to use files on file systems or block
devices as virtual remotely available SCSI disks or CDROMs with benefits
of the Linux page cache

* BLOCKIO mode, which performs direct block I/O with a block device,
bypassing page-cache for all operations. This mode works ideally with
high-end storage HBAs and for applications that either do not need
caching between application and disk or need the large block throughput.

* User space mode using scst_user device handler, which allows to
implement in the user space virtual SCSI devices in the SCST environment.

Detail description of SCST, its drivers and utilities you can find on
SCST home page http://scst.sourceforge.net.

Comparison with the mainstream target middle level STGT you can find on
the SCST vs STGT page http://scst.sourceforge.net/scstvsstgt.html. In
short, SCST has the following main advantages over STGT:

- Better performance (in many cases tens of %% and more) with
potential for further improvement, for example, by implementing
zero-copy cache I/O.

- Monolithic in-kernel architecture, which follows standard Linux
kernel paradigm to eliminate distributed processing. It is simpler,
hence more reliable and maintainable.

SCST is being prepared in form of patch for review and inclusion to the
kernel.

Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Nicholas A. Bellinger

unread,

Jul 8, 2008, 5:09:40 PM7/8/08

to Vladislav Bolkhovitin, linux-...@vger.kernel.org, linux...@vger.kernel.org, scst-devel, n...@kernel.org

Hi Vlad,

On Tue, 2008-07-08 at 23:14 +0400, Vladislav Bolkhovitin wrote:
> I'm glad to announce that version 1.0.0 of Generic SCSI Target Middle
> Level for Linux (SCST) was released and available for download from
> http://scst.sourceforge.net/downloads.html
>

Congratulations on reaching your v1.0.0 release!

I noticed that you included a reference to my presentation at LSF 08' on
your SCST vs. STGT page liked above, and took my description of your
work (you are more than welcome to come and present your own case at LSF
'09) very much out of context.

If you wish to reference my presentation, please at least make the
comparision between LIO-Core+LIO-Target vs. SCST vs. STGT, and NOT JUST
SCST vs. STGT so that the community are large can understand the
differences and technical challenges. The more in fighting between the
leaders in our community, the less the community benefits.

Many thanks for your most valuable of time,

--nab

> - Better performance (in many cases tens of %% and more) with
> potential for further improvement, for example, by implementing
> zero-copy cache I/O.
>
> - Monolithic in-kernel architecture, which follows standard Linux
> kernel paradigm to eliminate distributed processing. It is simpler,
> hence more reliable and maintainable.
>
> SCST is being prepared in form of patch for review and inclusion to the
> kernel.
>
> Vlad
> --

> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in

Vladislav Bolkhovitin

unread,

Jul 9, 2008, 7:18:43 AM7/9/08

to Nicholas A. Bellinger, linux-...@vger.kernel.org, linux...@vger.kernel.org, scst-devel, n...@kernel.org

Hi Nicholas,

Nicholas A. Bellinger wrote:
> Hi Vlad,
>
> On Tue, 2008-07-08 at 23:14 +0400, Vladislav Bolkhovitin wrote:
>> I'm glad to announce that version 1.0.0 of Generic SCSI Target Middle
>> Level for Linux (SCST) was released and available for download from
>> http://scst.sourceforge.net/downloads.html
>
> Congratulations on reaching your v1.0.0 release!

Thanks!

>> Comparison with the mainstream target middle level STGT you can find on
>> the SCST vs STGT page http://scst.sourceforge.net/scstvsstgt.html. In
>> short, SCST has the following main advantages over STGT:
>
> I noticed that you included a reference to my presentation at LSF 08' on
> your SCST vs. STGT page liked above, and took my description of your
> work (you are more than welcome to come and present your own case at LSF
> '09) very much out of context.

I wasn't on the presentation, so on it might have looked as out of
context. I have only documents, which I referenced. In them, especially
in "2008 Linux Storage & Filesystem Workshop" summary, it doesn't look
as I took it out of context. You put emphasis on "older" vs
"current"/"new", didn't you ;)? Plus, Mike Christie is also listed as an
author.

BTW, there are another inaccuracies on your slides:

- STGT doesn't support "hardware accelerated traditional iSCSI
(Qlogic)", at least I have not found any signs of it.

- For SCST you wrote "Only project to support PSCSI, FC and SAS Target
mode (with out of tree hardware drivers)". This is ambiguous statement,
but looks like you meant that SCST is intended and limited to support
only the listed transport. This is incorrect. SCST is intended to
support ALL possible transports and types backstorage.

> If you wish to reference my presentation, please at least make the
> comparision between LIO-Core+LIO-Target vs. SCST vs. STGT, and NOT JUST
> SCST vs. STGT so that the community are large can understand the
> differences and technical challenges.

The SCST vs STGT page was written a long ago, before I ever looked at
LIO. I wasn't actually going to refer to your presentation, just added a
small note to your funny, from my POV, "older" vs "new" architecture
comparison ;)

But, when I have time for careful look, I'm going to write some LIO
critics. So far, at the first glance:

- It is too iSCSI-centric. ISCSI is a very special transport, so looks
like when you decide to add in LIO drivers for other transports,
especially for parallel SCSI and SAS, you are going to have big troubles
and major redesign. And this is a real showstopper for making LIO-Core
the default and the only SCSI target framework. SCST is SCSI-centric,
just because there's no way to make *SCSI* target framework not being
SCSI-centric. Nobody blames Linux SCSI (initiator) mid-layer for being
SCSI-centric, correct?

- Seems, it's a bit overcomplicated, because it has too many abstract
interfaces where there's not much need it them. Having too many abstract
interfaces makes code analyze a lot more complicated. For comparison,
SCST has only 2 such interfaces: for target drivers and for backstorage
dev handlers. Plus, there is half-abstract interface for memory
allocator (sgv_pool_set_allocator()) to allow scst_user to allocate user
space supplied pages. And they cover all needs.

- Pass-through mode (PSCSI) also provides non-enforced 1-to-1
relationship, as it used to be in STGT (now in STGT support for
pass-through mode seems to be removed), which isn't mentioned anywhere.

- There is some confusion in the code in the function and variable
names between persistent and SAM-2 reservations.

- There is at least one SCSI standard violation: target and LUN resets
don't clear the reservation.

Again, it is the first impression, without deep analyze, so I might be
wrong somewhere.

> The more in fighting between the
> leaders in our community, the less the community benefits.

Sure. If my note hurts you, I can remove it. But you should also remove
from your presentation and the summary paper those psychological
arguments to not confuse people.

> Many thanks for your most valuable of time,
>
> --nab

Nicholas A. Bellinger

unread,

Jul 9, 2008, 3:34:31 PM7/9/08

to Vladislav Bolkhovitin, linux-...@vger.kernel.org, linux...@vger.kernel.org, scst-devel, Linux-iSCSI.org Target Dev, n...@kernel.org

On Wed, 2008-07-09 at 15:19 +0400, Vladislav Bolkhovitin wrote:
> Hi Nicholas,
>
> Nicholas A. Bellinger wrote:
> > Hi Vlad,
> >
> > On Tue, 2008-07-08 at 23:14 +0400, Vladislav Bolkhovitin wrote:
> >> I'm glad to announce that version 1.0.0 of Generic SCSI Target Middle
> >> Level for Linux (SCST) was released and available for download from
> >> http://scst.sourceforge.net/downloads.html
> >
> > Congratulations on reaching your v1.0.0 release!
>
> Thanks!
>
> >> Comparison with the mainstream target middle level STGT you can find on
> >> the SCST vs STGT page http://scst.sourceforge.net/scstvsstgt.html. In
> >> short, SCST has the following main advantages over STGT:
> >
> > I noticed that you included a reference to my presentation at LSF 08' on
> > your SCST vs. STGT page liked above, and took my description of your
> > work (you are more than welcome to come and present your own case at LSF
> > '09) very much out of context.
>
> I wasn't on the presentation, so on it might have looked as out of
> context.

Either was I until I dropped the program chair (Chris Mason was our
gracious host this year) a note about some of the topics that I thought
would be useful to discuss in the storage transport track. As LSF is a
Usenix sponsered event, this basically consisted of a bunch of notes wrt
Linux/iSCSI, LIO-Core+LIO-Target and what has now become (alot of things
change in 5 months in the Linux/iSCSI world) the VHACS cloud. Please
have a look at Linux-iSCSI.org to see what VHACS and VHACS-VM is. :-)

> I have only documents, which I referenced. In them, especially
> in "2008 Linux Storage & Filesystem Workshop" summary, it doesn't look
> as I took it out of context. You put emphasis on "older" vs
> "current"/"new", didn't you ;)?

Well, my job was to catch everyone up to speed on the status of the 4
(four) different (insert your favorite SAM capable transport name here)
Linux v2.6 based target projects. With all of the acroynms for the
standards+implementations+linux-kernel being extremly confusing to
anyone who does know all of them by heart. Even those people in the
room, who where fimilar with storage, but not necessarly with target
mode engine design, its hard to follow. You will notice is the actual
slide discussing the status SCST, nothing about old vs. new is
mentioned.

> Plus, Mike Christie is also listed as an
> author.
>

Yes, the Honorable Mr. Christie was discussing SYSFS infrastructure that
we are going to need for a generic target mode engine.

> BTW, there are another inaccuracies on your slides:
>
> - STGT doesn't support "hardware accelerated traditional iSCSI
> (Qlogic)", at least I have not found any signs of it.
>

<nod>, that is correct. It does it's hardware acceleration generically
using OFA VERBS for hardware that do wire protocol that implements
fabric dependent direct data placement. iSER does this with 504[0-4],
and I don't recall exactly how IB does it. Anyways, the point is that
they use a single interface so that hardware vendors do not have to
implement their own APIs, which are very complex, and usually very buggy
when coming from a company who is trying to get a design into ASIC.

> - For SCST you wrote "Only project to support PSCSI, FC and SAS Target
> mode (with out of tree hardware drivers)". This is ambiguous statement,
> but looks like you meant that SCST is intended and limited to support
> only the listed transport. This is incorrect. SCST is intended to
> support ALL possible transports and types backstorage.
>

Sure, my intention was to make the point about kernel level drivers for
speciality engines being pushed out of tree because of the lack of k.o
storage engine for which they can push generically push packets to.

> > If you wish to reference my presentation, please at least make the
> > comparision between LIO-Core+LIO-Target vs. SCST vs. STGT, and NOT JUST
> > SCST vs. STGT so that the community are large can understand the
> > differences and technical challenges.
>
> The SCST vs STGT page was written a long ago, before I ever looked at
> LIO. I wasn't actually going to refer to your presentation, just added a
> small note to your funny, from my POV, "older" vs "new" architecture
> comparison ;)
>

Heh. :-)

> But, when I have time for careful look, I'm going to write some LIO
> critics. So far, at the first glance:
>
> - It is too iSCSI-centric. ISCSI is a very special transport, so looks
> like when you decide to add in LIO drivers for other transports,
> especially for parallel SCSI and SAS, you are going to have big troubles
> and major redesign.

Not true. Because LIO-Core subsystem API is battle hardened (you could
say it is the 2nd oldest, behind UNH's :), allocating LIO-Core SE tasks
(that then get issued to LIO-Core subsystem plugins) from a SCSI CDB
with sectors+offset for ICF_SCSI_DATA_SG_IO_CDB, or a generically
emulated SCSI control CDB or logic in LIO-Core, or using LIO-Core/PSCSI
to let the underlying hardware do its thing, but still fill in the holes
so that *ANY* SCSI subsystem, including from different OSes, can talk
with storage objects behind LIO-Core when running in initiator mode
amoungst the possible fabrics. Some of the classic examples here are:

*) Because the Solaris 10 SCSI subsystem requiring all iSCSI devices to
have EVPD information, otherwise LUN registration would fail. This
means that suddently struct block_device and struct file need to have
WWN information, which may be DIFFERENT based upon if said object was a
Linux/MD or LVM block device, for example.

*) Every cluster design that required block level shared storage needs
to have at least SAM-2 Reservations.

*) Exporting via LIO-Core Hardware RAID adapters on OSes where
max_sectors cannot be easily changed. This is because some Hardware
RAID requires a smaller struct scsi_device->max_sector to handle smaller
stripe sizes for their arrays.

*) Some adapters in drivers/scsi which are not REAL SCSI devices emulate
none/some WWN or control logic mentioned above. I have had to do a
couple of hacks over the years in LIO-Core/PSCSI to make everything
place nice going to the client side of the cloud, check out
iscsi_target_pscsi.c:pscsi_transport_complete() to see what I mean.

> And this is a real showstopper for making LIO-Core
> the default and the only SCSI target framework. SCST is SCSI-centric,

Well, one needs to understand that LIO-Core subsystem API is more than a
SCSI target framework. Its a generic method of accessing any possible
storage object of the storage stack, and having said engine handle the
hardware restrictions (be they physical or virtual) for the underlying
storage object. It can run as a SCSI engine to real (or emualted) SCSI
hardware from linux/drivers/scsi, but the real strength is that it sits
above the SCSI/BLOCK/FILE layers and uses a single codepath for all
underlying storage objects. For example in the lio-core-2.6.git tree, I
chose the location linux/drivers/lio-core, because LIO-Core uses 'struct
file' from fs/, 'struct block_device' from block/ and struct scsi_device
from drivers/scsi.

Its worth to note that I am still doing the re-org of LIO-Core and
LIO-Target v3.0.0, but this will be coming soon along with the first non
traditional iSCSI packets to run across LIO-Core.

> just because there's no way to make *SCSI* target framework not being
> SCSI-centric. Nobody blames Linux SCSI (initiator) mid-layer for being
> SCSI-centric, correct?

Well, as we have discussed before, the emulation of the SCSI control
path is really a whole different monster, and I am certainly not
interested in having to emulate all of the t10.org standards
myself. :-)

>
> - Seems, it's a bit overcomplicated, because it has too many abstract
> interfaces where there's not much need it them. Having too many abstract
> interfaces makes code analyze a lot more complicated. For comparison,
> SCST has only 2 such interfaces: for target drivers and for backstorage
> dev handlers. Plus, there is half-abstract interface for memory
> allocator (sgv_pool_set_allocator()) to allow scst_user to allocate user
> space supplied pages. And they cover all needs.
>

Well, I have discussed why I think the LIO-Core design (which was more
neccessity at the start) has been able to work with for all kernel
subsystems/storage objects on all architectures for v2.2, v2.4 and v2.6
kernels. I also mention these at the 10,000 ft level in my LSF 08'
pres.

> - Pass-through mode (PSCSI) also provides non-enforced 1-to-1
> relationship, as it used to be in STGT (now in STGT support for
> pass-through mode seems to be removed), which isn't mentioned anywhere.
>

Please be more specific by what you mean here. Also, note that because
PSCSI is an LIO-Core subsystem plugin, LIO-Core handles the limitations
of the storage object through the LIO-Core subsystem API. This means
that things like (received initiator CDB sectors > LIO-Core storage
object max_sectors) are handled generically by LIO-Core, using a single
set of algoritims for all I/O interaction with Linux storage systems.
These algoritims are also the same for DIFFERENT types of transport
fabrics, both those that expect LIO-Core to allocate memory, OR that
hardware will have preallocated memory and possible restrictions from
the CPU/BUS architecture (take non-cache coherent MIPS for example) of
how the memory gets DMA'ed or PIO'ed down to the packet's intended
storage object.

> - There is some confusion in the code in the function and variable
> names between persistent and SAM-2 reservations.
>

Well, that would be because persistent reservations are not emulated
generally for all of the subsystem plugins just yet. Obviously with
LIO-Core/PSCSI if the underlying hardware supports it, it will work.

> - There is at least one SCSI standard violation: target and LUN resets
> don't clear the reservation.
>

Noted. This needs to be fixed in v3.0.0 and then backported to
v2.9-STABLE.

> Again, it is the first impression, without deep analyze, so I might be
> wrong somewhere.
>
> > The more in fighting between the
> > leaders in our community, the less the community benefits.
>
> Sure. If my note hurts you, I can remove it. But you should also remove
> from your presentation and the summary paper those psychological
> arguments to not confuse people.
>

Its not about removing, it is about updating the page to better reflect
the bigger picture so folks coming to the sight can get the latest
information from last update.

Vladislav Bolkhovitin

unread,

Jul 10, 2008, 2:25:34 PM7/10/08

to Nicholas A. Bellinger, linux-...@vger.kernel.org, linux...@vger.kernel.org, scst-devel, Linux-iSCSI.org Target Dev, n...@kernel.org

Nicholas A. Bellinger wrote:
>> I have only documents, which I referenced. In them, especially
>> in "2008 Linux Storage & Filesystem Workshop" summary, it doesn't look
>> as I took it out of context. You put emphasis on "older" vs
>> "current"/"new", didn't you ;)?
>
> Well, my job was to catch everyone up to speed on the status of the 4
> (four) different (insert your favorite SAM capable transport name here)
> Linux v2.6 based target projects. With all of the acroynms for the
> standards+implementations+linux-kernel being extremly confusing to
> anyone who does know all of them by heart. Even those people in the
> room, who where fimilar with storage, but not necessarly with target
> mode engine design, its hard to follow.

Yes, this is a problem. Even storage experts are not too familiar with
SCSI internals and not willing much to get better familiarity. Hence,
almost nobody really understands for what is all those SCSI processing
in SCST..

>> BTW, there are another inaccuracies on your slides:
>>
>> - STGT doesn't support "hardware accelerated traditional iSCSI
>> (Qlogic)", at least I have not found any signs of it.
>>
>
> <nod>, that is correct. It does it's hardware acceleration generically
> using OFA VERBS for hardware that do wire protocol that implements
> fabric dependent direct data placement. iSER does this with 504[0-4],
> and I don't recall exactly how IB does it. Anyways, the point is that
> they use a single interface so that hardware vendors do not have to
> implement their own APIs, which are very complex, and usually very buggy
> when coming from a company who is trying to get a design into ASIC.

ISER is "iSCSI Extensions for RDMA", while usually under "hardware
accelerated traditional iSCSI" people mean regular hardware iSCSI cards,
like QLogic 4xxx. Hence, your sentence for most people, including
myself, was incorrect and confusing.

I meant something different: interface between target drivers and SCSI
target core. Here (seems) you are going to have big troubles when you
try to add not-iSCSI transport, like FC, for instance.

>> And this is a real showstopper for making LIO-Core
>> the default and the only SCSI target framework. SCST is SCSI-centric,
>
> Well, one needs to understand that LIO-Core subsystem API is more than a
> SCSI target framework. Its a generic method of accessing any possible
> storage object of the storage stack, and having said engine handle the
> hardware restrictions (be they physical or virtual) for the underlying
> storage object. It can run as a SCSI engine to real (or emualted) SCSI
> hardware from linux/drivers/scsi, but the real strength is that it sits
> above the SCSI/BLOCK/FILE layers and uses a single codepath for all
> underlying storage objects. For example in the lio-core-2.6.git tree, I
> chose the location linux/drivers/lio-core, because LIO-Core uses 'struct
> file' from fs/, 'struct block_device' from block/ and struct scsi_device
> from drivers/scsi.

SCST and iSCSI-SCST, basically, do the same things, except iSCSI MC/S
and related, + something more, like 1-to-many pass-through and
scst_user, which need a big chunks of code, correct? And they are
together about 2 times smaller:

$ find core-iscsi/svn/trunk/target/target -type f -name "*.[ch]"|xargs wc
59764 163202 1625877 total
+
$ find core-iscsi/svn/trunk/target/include -type f -name "*.[ch]"|xargs
2981 9316 91930 total
=
62745 1717807

$ find svn/trunk/scst -type f -name "*.[ch]"|xargs wc
28327 77878 734625 total
+
$ find svn/trunk/iscsi-scst/kernel -type f -name "*.[ch]"|xargs wc
7857 20394 194693 total
=
36184 929318

Or did I count incorrectly?

> Its worth to note that I am still doing the re-org of LIO-Core and
> LIO-Target v3.0.0, but this will be coming soon along with the first non
> traditional iSCSI packets to run across LIO-Core.
>
>> just because there's no way to make *SCSI* target framework not being
>> SCSI-centric. Nobody blames Linux SCSI (initiator) mid-layer for being
>> SCSI-centric, correct?
>
> Well, as we have discussed before, the emulation of the SCSI control
> path is really a whole different monster, and I am certainly not
> interested in having to emulate all of the t10.org standards
> myself. :-)

Sure, there optional things. But there are also requirements, which must
be followed. So, this isn't about interested or not, this is about must
do or don't do at all.

>> - Seems, it's a bit overcomplicated, because it has too many abstract
>> interfaces where there's not much need it them. Having too many abstract
>> interfaces makes code analyze a lot more complicated. For comparison,
>> SCST has only 2 such interfaces: for target drivers and for backstorage
>> dev handlers. Plus, there is half-abstract interface for memory
>> allocator (sgv_pool_set_allocator()) to allow scst_user to allocate user
>> space supplied pages. And they cover all needs.
>
> Well, I have discussed why I think the LIO-Core design (which was more
> neccessity at the start) has been able to work with for all kernel
> subsystems/storage objects on all architectures for v2.2, v2.4 and v2.6
> kernels. I also mention these at the 10,000 ft level in my LSF 08'
> pres.

Nobody in the Linux kernel community is interested to have obsolete or
unneeded for the current kernel version code in the kernel, so if you
want LIO core be in the kernel, you will have to make a major cleanup.

Also, see the above LIO vs SCST size comparison. Is the additional code
all about the obsolete/currently unneeded features?

>> - Pass-through mode (PSCSI) also provides non-enforced 1-to-1
>> relationship, as it used to be in STGT (now in STGT support for
>> pass-through mode seems to be removed), which isn't mentioned anywhere.
>>
>
> Please be more specific by what you mean here. Also, note that because
> PSCSI is an LIO-Core subsystem plugin, LIO-Core handles the limitations
> of the storage object through the LIO-Core subsystem API. This means
> that things like (received initiator CDB sectors > LIO-Core storage
> object max_sectors) are handled generically by LIO-Core, using a single
> set of algoritims for all I/O interaction with Linux storage systems.
> These algoritims are also the same for DIFFERENT types of transport
> fabrics, both those that expect LIO-Core to allocate memory, OR that
> hardware will have preallocated memory and possible restrictions from
> the CPU/BUS architecture (take non-cache coherent MIPS for example) of
> how the memory gets DMA'ed or PIO'ed down to the packet's intended
> storage object.

See here:
http://www.mail-archive.com/linux...@vger.kernel.org/msg06911.html

>> - There is some confusion in the code in the function and variable
>> names between persistent and SAM-2 reservations.
>
> Well, that would be because persistent reservations are not emulated
> generally for all of the subsystem plugins just yet. Obviously with
> LIO-Core/PSCSI if the underlying hardware supports it, it will work.

What you did (passing reservation commands directly to devices and
nothing more) will work only with a single initiator per device, where
reservations in the majority of cases are not needed at all. With
multiple initiators, as it is in clusters and where reservations are
really needed, it will sooner or later lead to data corruption. See the
referenced above message as well as the whole thread.

>>> The more in fighting between the
>>> leaders in our community, the less the community benefits.
>> Sure. If my note hurts you, I can remove it. But you should also remove
>> from your presentation and the summary paper those psychological
>> arguments to not confuse people.
>>
>
> Its not about removing, it is about updating the page to better reflect
> the bigger picture so folks coming to the sight can get the latest
> information from last update.

Your suggestions?

Nicholas A. Bellinger

unread,

Jul 10, 2008, 5:26:44 PM7/10/08

to Vladislav Bolkhovitin, linux-...@vger.kernel.org, linux...@vger.kernel.org, scst-devel, Linux-iSCSI.org Target Dev, Jeff Garzik, Leonid Grossman, H. Peter Anvin, Pete Wyckoff, Ming Zhang, Ross S. W. Walker, Rafiu Fakunle, Mike Mazarick, Andrew Morton, David Miller, Christoph Hellwig, Ted Ts'o, Jerome Martin

On Thu, 2008-07-10 at 22:25 +0400, Vladislav Bolkhovitin wrote:
> Nicholas A. Bellinger wrote:
> >> I have only documents, which I referenced. In them, especially
> >> in "2008 Linux Storage & Filesystem Workshop" summary, it doesn't look
> >> as I took it out of context. You put emphasis on "older" vs
> >> "current"/"new", didn't you ;)?
> >
> > Well, my job was to catch everyone up to speed on the status of the 4
> > (four) different (insert your favorite SAM capable transport name here)
> > Linux v2.6 based target projects. With all of the acroynms for the
> > standards+implementations+linux-kernel being extremly confusing to
> > anyone who does know all of them by heart. Even those people in the
> > room, who where fimilar with storage, but not necessarly with target
> > mode engine design, its hard to follow.
>
> Yes, this is a problem. Even storage experts are not too familiar with
> SCSI internals and not willing much to get better familiarity. Hence,
> almost nobody really understands for what is all those SCSI processing
> in SCST..
>

Which is why being specific when we talk about these many varied
subjects (see below) that all fit into the bigger picture we all want to
get to (see VHACS) is of utmost importance.

> >> BTW, there are another inaccuracies on your slides:
> >>
> >> - STGT doesn't support "hardware accelerated traditional iSCSI
> >> (Qlogic)", at least I have not found any signs of it.
> >>
> >
> > <nod>, that is correct. It does it's hardware acceleration generically
> > using OFA VERBS for hardware that do wire protocol that implements
> > fabric dependent direct data placement. iSER does this with 504[0-4],
> > and I don't recall exactly how IB does it. Anyways, the point is that
> > they use a single interface so that hardware vendors do not have to
> > implement their own APIs, which are very complex, and usually very buggy
> > when coming from a company who is trying to get a design into ASIC.
>
> ISER is "iSCSI Extensions for RDMA", while usually under "hardware
> accelerated traditional iSCSI" people mean regular hardware iSCSI cards,
> like QLogic 4xxx. Hence, your sentence for most people, including
> myself, was incorrect and confusing.

Yes, I know the difference between traditional TCP and Direct Data
Placement (DDP) on multiple fabric interconnects. (I own multiple
Qlogic, Intel, Alacratec traditional iSCSI cards myself, and have gotten
them all to work with LIO at some point). The point that I was making
is that OFA VERBS does it INDEPENDENT of the vendor actually PRODUCING
the card/chip/whatever. That means:

I) It makes the vendor's job easier producing silicon, because they
don't need to spend lots of extra engineering resources on producing the
types of APIs (VERBS, DAPL, MPI) that (some) cluster guys need for their
apps.

II) It allows other vendors who are also making hardware to implement
the same fabric to benefit from others using/building/changing the code.

III) It allows storage engine architects (like ourselves) to use a
single API (and codebase with OFA) to push packets DDP packet for iSER
(RFC-5045) to the engine.

Anyways, the point is that with traditional iSCSI hardware acceleration.
there was never anything like that, because those implemenetations, most
noteably TOE (yes, I also worked on a TOE hardware at one point too :-)
where always considered a 'point in time' solution.

I know what you mean. The point that I am making is that LIO-Core <->
Subsystem and LIO-Target <-> LIO-Core are seperated for all intensive
purposes in the lio-core-2.6.git tree.

Once the SCST interface between Fabric <-> Engine and can be hooked up
to v3.0.0 LIO-Core (Engine) <-> Subsystem (Linux Storage Stack) we will
be good to go to port ALL Fabric plugins, from SCST, iSER from STGT, and
eventually _NON_ SCSI fabrics as well (think AoE and Target Mode SATA).

> >> And this is a real showstopper for making LIO-Core
> >> the default and the only SCSI target framework. SCST is SCSI-centric,
> >
> > Well, one needs to understand that LIO-Core subsystem API is more than a
> > SCSI target framework. Its a generic method of accessing any possible
> > storage object of the storage stack, and having said engine handle the
> > hardware restrictions (be they physical or virtual) for the underlying
> > storage object. It can run as a SCSI engine to real (or emualted) SCSI
> > hardware from linux/drivers/scsi, but the real strength is that it sits
> > above the SCSI/BLOCK/FILE layers and uses a single codepath for all
> > underlying storage objects. For example in the lio-core-2.6.git tree, I
> > chose the location linux/drivers/lio-core, because LIO-Core uses 'struct
> > file' from fs/, 'struct block_device' from block/ and struct scsi_device
> > from drivers/scsi.
>
> SCST and iSCSI-SCST, basically, do the same things, except iSCSI MC/S
> and related, + something more, like 1-to-many pass-through and
> scst_user, which need a big chunks of code, correct? And they are
> together about 2 times smaller:
>

Yes, something much more. A complete implementation of traditional
iSCSI/TCP (known as RFC-3720), iSCSI/SCTP (which will be important in
the future), and IPv6 (also important) is a significant amount of logic.
When I say a 'complete implementation' I mean:

I) Active-Active connection layer recovery (known as
ErrorRecoveryLevel=2). (We are going to use the same code for iSER for
inter-nexus OS independent (eg: below the SCSI Initiator level)
recovery. Again, the important part here is that recovery and
outstanding task migration happens transparently to the host OS SCSI
subsystem. This means (at least with iSCSI and iSER): not having to
register multiple LUNs and depend (at least completely) on SCSI WWN
information, and OS dependent SCSI level multipath.

II) MC/S for multiplexing (same as I), as well as being able to
multiplex across multiple cards and subnets (using TCP, SCTP has
multi-homing). Also being able to bring iSCSI connections up/down on
the fly, until we all have iSCSI/SCTP, is very important too.

III) Every possible combination of RFC-3720 defined parameter keys (and
provide the apparatis to prove it). And yes, anyone can do this today
against their own Target. I created core-iscsi-dv specifically for
testing LIO-Target <-> LIO-Core back in 2005. Core-iSCSI-DV is the
_ONLY_ _PUBLIC_ RFC-3720 domain validation tool that will actually
demonstrate, using ANY data integrity tool complete domain validation of
user defined keys. Please have a look at:

http://linux-iscsi.org/index.php/Core-iscsi-dv

http://www.linux-iscsi.org/files/core-iscsi-dv/README

Any traditional iSCSI target mode implementation + Storage Engine +
Subsystem Plugin that thinks its ready to go into the kernel will have
to pass at LEAST the 8k test loop interations, the simplest being:

HeaderDigest, DataDigest, MaxRecvDataSegmentLength (512 -> 262144, in
512 byte increments)

Core-iSCSI-DV is also a great indication of stability and data integrity
of hardware/software of an iSCSI Target + Engine, espically when you
have multiple core-iscsi-dv nodes hitting multiple VHACS clouds on
physical machines within the cluster. I have never run IET against
core-iscsi-dv personally, and I don't think Ming or Ross has either. So
until SOMEONE actually does this first, I think that iSCSI-SCST is more
of an experiment for your our devel that a strong contender for
Linux/iSCSI Target Mode.

<nod>

> >> - Seems, it's a bit overcomplicated, because it has too many abstract
> >> interfaces where there's not much need it them. Having too many abstract
> >> interfaces makes code analyze a lot more complicated. For comparison,
> >> SCST has only 2 such interfaces: for target drivers and for backstorage
> >> dev handlers. Plus, there is half-abstract interface for memory
> >> allocator (sgv_pool_set_allocator()) to allow scst_user to allocate user
> >> space supplied pages. And they cover all needs.
> >
> > Well, I have discussed why I think the LIO-Core design (which was more
> > neccessity at the start) has been able to work with for all kernel
> > subsystems/storage objects on all architectures for v2.2, v2.4 and v2.6
> > kernels. I also mention these at the 10,000 ft level in my LSF 08'
> > pres.
>
> Nobody in the Linux kernel community is interested to have obsolete or
> unneeded for the current kernel version code in the kernel, so if you
> want LIO core be in the kernel, you will have to make a major cleanup.
>

Obviously not. Also, what I was talking about there was the strength
and flexibility of the LIO-Core design (it even ran on the Playstation 2
at one point, http://linux-iscsi.org/index.php/Playstation2/iSCSI, when
MIPS r5900 boots modern v2.6, then we will do it again with LIO :-)

Anyways, just so everyone is clear:

v2.9-STABLE LIO-Target from Linux-iSCSI.org SVN: Works on all Modern
v2.6 kernels up until >= v2.6.26.

v3.0.0 LIO-Core in lio-core-2.6.git tree on kernel.org: All legacy code
removed, currently at v2.6.26-rc9, tested on powerpc and x86.

Please look at my code before making such blanket statements please.

> Also, see the above LIO vs SCST size comparison. Is the additional code
> all about the obsolete/currently unneeded features?
>
> >> - Pass-through mode (PSCSI) also provides non-enforced 1-to-1
> >> relationship, as it used to be in STGT (now in STGT support for
> >> pass-through mode seems to be removed), which isn't mentioned anywhere.
> >>
> >
> > Please be more specific by what you mean here. Also, note that because
> > PSCSI is an LIO-Core subsystem plugin, LIO-Core handles the limitations
> > of the storage object through the LIO-Core subsystem API. This means
> > that things like (received initiator CDB sectors > LIO-Core storage
> > object max_sectors) are handled generically by LIO-Core, using a single
> > set of algoritims for all I/O interaction with Linux storage systems.
> > These algoritims are also the same for DIFFERENT types of transport
> > fabrics, both those that expect LIO-Core to allocate memory, OR that
> > hardware will have preallocated memory and possible restrictions from
> > the CPU/BUS architecture (take non-cache coherent MIPS for example) of
> > how the memory gets DMA'ed or PIO'ed down to the packet's intended
> > storage object.
>
> See here:
> http://www.mail-archive.com/linux...@vger.kernel.org/msg06911.html
>

<nod>

> >> - There is some confusion in the code in the function and variable
> >> names between persistent and SAM-2 reservations.
> >
> > Well, that would be because persistent reservations are not emulated
> > generally for all of the subsystem plugins just yet. Obviously with
> > LIO-Core/PSCSI if the underlying hardware supports it, it will work.
>
> What you did (passing reservation commands directly to devices and
> nothing more) will work only with a single initiator per device, where
> reservations in the majority of cases are not needed at all.

I know, like I said, implementing Persistent Reservations for stuff
besides real SCSI hardware with LIO-Core/PSCSI is a TODO item. Note
that the VHACS cloud (see below) will need this for DRBD objects at some
point.

> With
> multiple initiators, as it is in clusters and where reservations are
> really needed, it will sooner or later lead to data corruption. See the
> referenced above message as well as the whole thread.
>

Obviously with any target, if a non-shared resources is accessed by
multiple initiator/client nodes and there is no data coherency layer, or
reserverations or ACLS, or whatever there is going to be a problem.
That is a no-brainer.

Now, with a shared resource, such as a Quorum disk for a traditional
cluster design, or a cluster filesystem (such as OCFS2, GFS, Lustre,
etc) handle the data coherency just fine with SPC-2 Reserve today with
all LIO-Core v2.9 and v3.0.0. storage objects from all subsystems.

> >>> The more in fighting between the
> >>> leaders in our community, the less the community benefits.
> >> Sure. If my note hurts you, I can remove it. But you should also remove
> >> from your presentation and the summary paper those psychological
> >> arguments to not confuse people.
> >>
> >
> > Its not about removing, it is about updating the page to better reflect
> > the bigger picture so folks coming to the sight can get the latest
> > information from last update.
>
> Your suggestions?
>

I would consider helping with this at some point, but as you can see, I
am extremly busy ATM. I have looked at SCST quite a bit over the years,
but I am not the one making a public comparision page, at least not
yet. :-) So until then, at least explain how there are 3 projects on
your page, with the updated 10,000 ft overviews, and mabye even add some
links to LIO-Target and a bit about VHACS cloud. I would be willing to
include info about SCST into the Linux-iSCSI.org wiki. Also, please
feel free to open an account and start adding stuff about SCST yourself
to the site.

For Linux-iSCSI.org and VHACS (which is really where everything is going
now), please have a look at:

http://linux-iscsi.org/index.php/VHACS-VM
http://linux-iscsi.org/index.php/VHACS

Btw, the VHACS and LIO-Core design will allow for other fabrics to be
used inside our cloud, and between other virtualized client setups who
speak the wire protocol presented by the server side of VHACS cloud.

Vladislav Bolkhovitin

unread,

Jul 11, 2008, 2:40:50 PM7/11/08

to Nicholas A. Bellinger, linux-...@vger.kernel.org, linux...@vger.kernel.org, scst-devel, Linux-iSCSI.org Target Dev, Jeff Garzik, Leonid Grossman, H. Peter Anvin, Pete Wyckoff, Ming Zhang, Ross S. W. Walker, Rafiu Fakunle, Mike Mazarick, Andrew Morton, David Miller, Christoph Hellwig, Ted Ts'o, Jerome Martin

There are big doubts among storage experts if features I and II are
needed at all, see, e.g. http://lkml.org/lkml/2008/2/5/331. I also tend
to agree, that for block storage on practice MC/S is not needed or, at
least, definitely doesn't worth the effort, because:

1. It is useless for sync. untagged operation (regular reads in most
cases over a single stream), when always there is only one command being
executed at any time, because of the commands connection allegiance,
which forbids transferring data for a command over multiple connections.

2. The only advantage it has over traditional OS multi-pathing is
keeping commands execution order, but on practice at the moment there is
no demand for this feature, because all OS'es I know don't rely on
commands order to protect data integrity. They use other techniques,
like queue draining. A good target should be able itself to scheduler
coming commands for execution in the correct from performance POV order
and not rely for that on the commands order as they came from initiators.

From other side, devices bonding also preserves commands execution
order, but doesn't suffer from the connection allegiance limitation of
MC/S, so can boost performance ever for sync untagged operations. Plus,
it's pretty simple, easy to use and doesn't need any additional code. I
don't have the exact numbers of MC/S vs bonding performance comparison
(mostly, because open-iscsi doesn't support MC/S, but very curious to
see them), but have very strong suspicious that on modern OS'es, which
do TCP frames reorder in zero-copy manner, there shouldn't be much
performance difference between MC/S vs bonding in the maximum possible
throughput, but bonding should outperform MC/S a lot in case of sync
untagged operations.

Anyway, I think features I and II, if added, would increase iSCSI-SCST
kernel side code not more than on 5K lines, because most of the code is
already there, the most important part which missed is fixes of locking
problems, which almost never add a lot of code. Relating Core-iSCSI-DV,
I'm sure iSCSI-SCST will pass it without problems among the required set
of iSCSI features, although still there are some limitations, derived
from IET, for instance, support for multu-PDU commands in discovery
sessions, which isn't implemented. But for adding to iSCSI-SCST optional
iSCSI features there should be good *practical* reasons, which at the
moment don't exist. And unused features are bad features, because they
overcomplicate the code and make its maintainance harder for no gain.

So, current SCST+iSCSI-SCST 36K lines + 5K new lines = 41K lines, which
still a lot less than LIO's 63K lines. I downloaded the cleanuped
lio-core-2.6.git tree and:

$ find lio-core-2.6/drivers/lio-core -type f -name "*.[ch]"|xargs wc
57064 156617 1548344 total

Still much bigger.

> Obviously not. Also, what I was talking about there was the strength
> and flexibility of the LIO-Core design (it even ran on the Playstation 2
> at one point, http://linux-iscsi.org/index.php/Playstation2/iSCSI, when
> MIPS r5900 boots modern v2.6, then we will do it again with LIO :-)

SCST and the target drivers have been successfully ran on PPC and
Sparc64, so I don't see any reasons, why it can't be ran on Playstation
2 as well.

The problem is that persistent reservations don't work for multiple
initiators even for real SCSI hardware with LIO-Core/PSCSI and I clearly
described why in the referenced e-mail. Nicholas, why don't you want to
see it?

Nicholas A. Bellinger

unread,

Jul 11, 2008, 11:28:35 PM7/11/08

Ming or Ross, would you like to make a comment on this, considering
after it, it is your work..?

> So
> > until SOMEONE actually does this first, I think that iSCSI-SCST is more
> > of an experiment for your our devel that a strong contender for
> > Linux/iSCSI Target Mode.
>
> There are big doubts among storage experts if features I and II are
> needed at all, see, e.g. http://lkml.org/lkml/2008/2/5/331.

Well, jgarzik is both a NETWORKING and STORAGE (he was a networking guy
first, mind you) expert!

> I also tend
> to agree, that for block storage on practice MC/S is not needed or, at
> least, definitely doesn't worth the effort, because:
>

Trying to agrue against MC/S (or against any other major part of
RFC-3720, including ERL=2) is saying that Linux/iSCSI should be BEHIND
what the greatest minds in the IETF have produced (and learned) from
iSCSI. Considering so many people are interested in seeing Linux/iSCSI
be best and most complete implementation possible, surely one would not
be foolish enough to try to debate that Linux should be BEHIND what
others have figured out, be it with RFCs or running code.

Also, you should understand that MC/S is more than about just moving
data I/O across multiple TCP connections, its about being able to bring
those paths up/down on the fly without having to actually STOP/PAUSE
anything. Then you then add the ERL=2 pixie dust, which you should
understand, is the result of over a decade of work creating RFC-3720
within the IETF IPS TWG. What you have is a fabric that does not
STOP/PAUSE from an OS INDEPENDENT LEVEL (below the OS dependent SCSI
subsystem layer) perspective, on every possible T/I node, big and small,
open or closed platform. Even as we move towards more logic in the
network layer (a la Stream Control Transmission Protocol), we will still
benefit from RFC-3720 as the years roll on. Quite a powerful thing..

> 1. It is useless for sync. untagged operation (regular reads in most
> cases over a single stream), when always there is only one command being
> executed at any time, because of the commands connection allegiance,
> which forbids transferring data for a command over multiple connections.
>

This is a very Parallel SCSI centric way of looking at design of SAM.
Since SAM allows the transport fabric to enforce its own ordering rules
(it does offer some of its own SCSI level ones of course). Obviously
each fabric (PSCSI, FC, SAS, iSCSI) are very different from the bus
phase perspective. But, if you look back into the history of iSCSI, you
will see that an asymmetric design with seperate CONTROL/DATA TCP
connections was considered originally BEFORE the Command Sequence Number
(CmdSN) ordering algoritim was adopted that allows both SINGLE and
MULTIPLE TCP connections to move both CONTROL/DATA packets across a
iSCSI Nexus.

Using MC/S with a modern iSCSI implementation to take advantage of lots
of cores and hardware threads is something that allows one to multiplex
across multiple vendor's NIC ports, with the least possible overhead, in
the OS INDEPENDENT manner. Keep in mind that you can do the allocation
and RX of WRITE data OOO, but the actual *EXECUTION* down via the
subsystem API (which is what LIO-Target <-> LIO-Core does, in a generic
way) MUST BE in the same over as the CDBs came from the iSCSI Initiator
port. This is the only requirement for iSCSI CmdSN order rules wrt the
SCSI Architecture Model.

> 2. The only advantage it has over traditional OS multi-pathing is
> keeping commands execution order, but on practice at the moment there is
> no demand for this feature, because all OS'es I know don't rely on
> commands order to protect data integrity. They use other techniques,
> like queue draining. A good target should be able itself to scheduler
> coming commands for execution in the correct from performance POV order
> and not rely for that on the commands order as they came from initiators.
>

Ok, you are completely missing the point of MC/S and ERL=2. Notice how
it works in both iSCSI *AND* iSER (even across DDP fabrics!). I
discussed the significant benefit of ERL=2 in numerious previous
threads. But they can all be neatly summerized in:

http://linux-iscsi.org/builds/user/nab/Inter.vs.OuterNexus.Multiplexing.pdf

Internexus Multiplexing is DESIGNED to work with OS dependent multipath
transparently, and as a matter of fact, it complements it quite well, in
a OSI (independent) method. Its completely up to the admin to determine
the benefit and configure the knobs.

So, the bit: "We should not implement this important part of the RFC
just because I want some code in the kernel" is not going to get your
design very far.

> From other side, devices bonding also preserves commands execution
> order, but doesn't suffer from the connection allegiance limitation of
> MC/S, so can boost performance ever for sync untagged operations. Plus,
> it's pretty simple, easy to use and doesn't need any additional code. I
> don't have the exact numbers of MC/S vs bonding performance comparison
> (mostly, because open-iscsi doesn't support MC/S, but very curious to
> see them), but have very strong suspicious that on modern OS'es, which
> do TCP frames reorder in zero-copy manner, there shouldn't be much
> performance difference between MC/S vs bonding in the maximum possible
> throughput, but bonding should outperform MC/S a lot in case of sync
> untagged operations.
>

Simple case here for you to get your feet wet with MC/S. Try doing
bonding across 4x GB/sec ports on 2x socket 2x core x86_64 and compare
MC/S vs. OS dependent networking bonding and see what you find. There
about two iSCSI initiators for two OSes that implementing MC/S and
LIO-Target <-> LIO-Target. Anyone interested in the CPU overhead on
this setup between MC/S and Link Layer bonding across 2x 2x 1 Gb/sec
port chips on 4 core x86_64..?

> Anyway, I think features I and II, if added, would increase iSCSI-SCST
> kernel side code not more than on 5K lines, because most of the code is
> already there, the most important part which missed is fixes of locking
> problems, which almost never add a lot of code.

You can think whatever you want. Why don't you have a look at
lio-core-2.6.git and see how big they are for yourself.

> Relating Core-iSCSI-DV,
> I'm sure iSCSI-SCST will pass it without problems among the required set
> of iSCSI features, although still there are some limitations, derived
> from IET, for instance, support for multu-PDU commands in discovery
> sessions, which isn't implemented. But for adding to iSCSI-SCST optional
> iSCSI features there should be good *practical* reasons, which at the
> moment don't exist. And unused features are bad features, because they
> overcomplicate the code and make its maintainance harder for no gain.
>

Again, you can think whatever you want. But since you did not implement
the majority of the iSCSI-SCST code yourself, (or implement your own
iSCSI Initiator in parallel with your own iSCSI Target), I do not
believe you are in a position to say. Any IET devs want to comment on
this..?

> So, current SCST+iSCSI-SCST 36K lines + 5K new lines = 41K lines, which
> still a lot less than LIO's 63K lines. I downloaded the cleanuped
> lio-core-2.6.git tree and:
>

Blindly comparing lines of code with no context is usually dumb. But,
since that is what you seem to be stuck on, how about this:

LIO 63k +
SCST (minus iSCSI) ??k +
iSER from STGT ??k ==

For the complete LIO-Core engine on fabrics, and which includes what
Rafiu from Openfiler has been so kind to call LIO-Target, "arguably the
most feature complete and mature implementation out there (on any
platform) "

> $ find lio-core-2.6/drivers/lio-core -type f -name "*.[ch]"|xargs wc
> 57064 156617 1548344 total
>
> Still much bigger.
>
> > Obviously not. Also, what I was talking about there was the strength
> > and flexibility of the LIO-Core design (it even ran on the Playstation 2
> > at one point, http://linux-iscsi.org/index.php/Playstation2/iSCSI, when
> > MIPS r5900 boots modern v2.6, then we will do it again with LIO :-)
>
> SCST and the target drivers have been successfully ran on PPC and
> Sparc64, so I don't see any reasons, why it can't be ran on Playstation
> 2 as well.
>

Oh it can, can it..? Does your engine memory allocation algoritim
provide for a SINGLE method for allocating linked list scatterlists
containing page links of ANY (not just PAGE_SIZE) size handled
generically across both internal or preregistered memory allocation
acases, or coming from say, a software RNIC moving DDP packets for iSCSI
in a single code path..?

And then it needs to be able to go down to the PS2-Linux PATA driver,
that does not show up under the SCSI subsystem mind you. Surely you
understand that because the MIPS r5900 is a non cache coherent
architecture that you simply cannot allocate out multiple page
contigious scatterlists for your I/Os, and simply expect it to work when
we are sending blocks down to the 32-bit MIPS r3000 IOP..?

Why don't you provide a reference in the code to where you think the
problem is, and/or problem case using Linux iSCSI Initiators VMs to
demonstrate the bug..?

New v0.8.15 VHACS-VM images online btw. Keep checking the site for more details.

Bart Van Assche

unread,

Jul 12, 2008, 3:53:08 AM7/12/08

to Nicholas A. Bellinger, Vladislav Bolkhovitin, Andrew Morton, Rafiu Fakunle, linux...@vger.kernel.org, Jeff Garzik, David Miller, Mike Mazarick, Leonid Grossman, linux-...@vger.kernel.org, Pete Wyckoff, Ross S. W. Walker, scst-devel, Ted Ts'o, H. Peter Anvin, Jerome Martin, Christoph Hellwig, Linux-iSCSI.org Target Dev

On Sat, Jul 12, 2008 at 5:28 AM, Nicholas A. Bellinger
<n...@linux-iscsi.org> wrote:
> Using MC/S with a modern iSCSI implementation to take advantage of lots
> of cores and hardware threads is something that allows one to multiplex
> across multiple vendor's NIC ports, with the least possible overhead, in
> the OS INDEPENDENT manner.

Why do you emphasize MC/S so much ? Any datacenter operator that has
the choice between MC/S and using a faster network technology will
choose the last IMHO. It's much more convenient to use e.g. 10 GbE
instead of multiple 1 GbE connections.

Bart.

Ming Zhang

unread,

Jul 13, 2008, 2:47:31 PM7/13/08

to Nicholas A. Bellinger, Vladislav Bolkhovitin, linux-...@vger.kernel.org, linux...@vger.kernel.org, scst-devel, Linux-iSCSI.org Target Dev, Jeff Garzik, Leonid Grossman, H. Peter Anvin, Pete Wyckoff, Ross S. W. Walker, Rafiu Fakunle, Mike Mazarick, Andrew Morton, David Miller, Christoph Hellwig, Ted Ts'o, Jerome Martin

hot water here ;)

i never run that test on iet, probably nobody. if someone actually ran
the test and find the failed case, i believe there are people who want
to fix it.

why not both of you write/reuse some test scripts to test a most
advanced/fast target and let the number to talk?

Ming Zhang

@#$%^ purging memory... (*!%
http://blackmagic02881.wordpress.com/
http://www.linkedin.com/in/blackmagic02881
--------------------------------------------

Vladislav Bolkhovitin

unread,

Jul 14, 2008, 2:17:25 PM7/14/08

Nicholas A. Bellinger wrote:
>> So
>>> until SOMEONE actually does this first, I think that iSCSI-SCST is more
>>> of an experiment for your our devel that a strong contender for
>>> Linux/iSCSI Target Mode.
>> There are big doubts among storage experts if features I and II are
>> needed at all, see, e.g. http://lkml.org/lkml/2008/2/5/331.
>
> Well, jgarzik is both a NETWORKING and STORAGE (he was a networking guy
> first, mind you) expert!

Well, you can question Jeff Garzik knowledge, but just look around. How
many are there OS'es supporting MC/S on the initiator level? I know only
one: Windows. Neither Linux's mainline open-iscsi, nor xBSD, nor Solaris
don't support MC/S as initiators. Only your core-iscsi supports it, but
you abandoned its development in favor of open-iscsi and I've heard
there are big problems to run it on the recent kernels.

Then, how many are there open source iSCSI targets supporting MC/S?
Neither xBSD, nor Solaris have it. People simply prefer developing MPIO,
because there are other SCSI transports and they all need multipath as
well. Then, finally, if that multipath works well for, e.g., FC, why it
wouldn't work also well for iSCSI?

>> I also tend
>> to agree, that for block storage on practice MC/S is not needed or, at
>> least, definitely doesn't worth the effort, because:
>
> Trying to agrue against MC/S (or against any other major part of
> RFC-3720, including ERL=2) is saying that Linux/iSCSI should be BEHIND
> what the greatest minds in the IETF have produced (and learned) from
> iSCSI. Considering so many people are interested in seeing Linux/iSCSI
> be best and most complete implementation possible, surely one would not
> be foolish enough to try to debate that Linux should be BEHIND what
> others have figured out, be it with RFCs or running code.

A rather psychological argument again. One more "older" vs "newer"? ;)

> Also, you should understand that MC/S is more than about just moving
> data I/O across multiple TCP connections, its about being able to bring
> those paths up/down on the fly without having to actually STOP/PAUSE
> anything. Then you then add the ERL=2 pixie dust, which you should
> understand, is the result of over a decade of work creating RFC-3720
> within the IETF IPS TWG. What you have is a fabric that does not
> STOP/PAUSE from an OS INDEPENDENT LEVEL (below the OS dependent SCSI
> subsystem layer) perspective, on every possible T/I node, big and small,
> open or closed platform. Even as we move towards more logic in the
> network layer (a la Stream Control Transmission Protocol), we will still
> benefit from RFC-3720 as the years roll on. Quite a powerful thing..

Still not convincing that those are worth the effort considering that
there is MPIO implementation anyway in the OS.

To make you statements clearer, can you write what *real life* tasks the
above going to solve, which can't be solved by MPIO?

>> 1. It is useless for sync. untagged operation (regular reads in most
>> cases over a single stream), when always there is only one command being
>> executed at any time, because of the commands connection allegiance,
>> which forbids transferring data for a command over multiple connections.
>
> This is a very Parallel SCSI centric way of looking at design of SAM.
> Since SAM allows the transport fabric to enforce its own ordering rules
> (it does offer some of its own SCSI level ones of course). Obviously
> each fabric (PSCSI, FC, SAS, iSCSI) are very different from the bus
> phase perspective. But, if you look back into the history of iSCSI, you
> will see that an asymmetric design with seperate CONTROL/DATA TCP
> connections was considered originally BEFORE the Command Sequence Number
> (CmdSN) ordering algoritim was adopted that allows both SINGLE and
> MULTIPLE TCP connections to move both CONTROL/DATA packets across a
> iSCSI Nexus.

No, the above isn't Parallel SCSI centric way of looking, it's a
practical way of looking. All attempts to distribute commands between
several cores to get better performance are helpless, if there is always
only one being executed command at time. In this case MC/S is useless
and brings nothing (if not makes things worse because of possible
overhead). Only bonding can improve throughput in this case, because it
can distribute data transfers of those single commands over several
links, which MC/S can't do by design. And this scenario isn't rare. In
fact, it's the most common. Just count commands coming to your target
during single stream reads. This is why WRITEs are almost always very
much outperform READs.

> Using MC/S with a modern iSCSI implementation to take advantage of lots
> of cores and hardware threads is something that allows one to multiplex
> across multiple vendor's NIC ports, with the least possible overhead, in
> the OS INDEPENDENT manner. Keep in mind that you can do the allocation
> and RX of WRITE data OOO, but the actual *EXECUTION* down via the
> subsystem API (which is what LIO-Target <-> LIO-Core does, in a generic
> way) MUST BE in the same over as the CDBs came from the iSCSI Initiator
> port. This is the only requirement for iSCSI CmdSN order rules wrt the
> SCSI Architecture Model.

Yes, I've already written that keeping commands order between several
links is the only real advantage of MC/S. But can you name *practical*
uses of it in block storage?

>> 2. The only advantage it has over traditional OS multi-pathing is
>> keeping commands execution order, but on practice at the moment there is
>> no demand for this feature, because all OS'es I know don't rely on
>> commands order to protect data integrity. They use other techniques,
>> like queue draining. A good target should be able itself to scheduler
>> coming commands for execution in the correct from performance POV order
>> and not rely for that on the commands order as they came from initiators.
>
> Ok, you are completely missing the point of MC/S and ERL=2. Notice how
> it works in both iSCSI *AND* iSER (even across DDP fabrics!). I
> discussed the significant benefit of ERL=2 in numerious previous
> threads. But they can all be neatly summerized in:
>
> http://linux-iscsi.org/builds/user/nab/Inter.vs.OuterNexus.Multiplexing.pdf
>
> Internexus Multiplexing is DESIGNED to work with OS dependent multipath
> transparently, and as a matter of fact, it complements it quite well, in
> a OSI (independent) method. Its completely up to the admin to determine
> the benefit and configure the knobs.

Nicholas, seems you miss the important point: Linux has multipath
*anyway* and MC/S can't change it.

>> From other side, devices bonding also preserves commands execution
>> order, but doesn't suffer from the connection allegiance limitation of
>> MC/S, so can boost performance ever for sync untagged operations. Plus,
>> it's pretty simple, easy to use and doesn't need any additional code. I
>> don't have the exact numbers of MC/S vs bonding performance comparison
>> (mostly, because open-iscsi doesn't support MC/S, but very curious to
>> see them), but have very strong suspicious that on modern OS'es, which
>> do TCP frames reorder in zero-copy manner, there shouldn't be much
>> performance difference between MC/S vs bonding in the maximum possible
>> throughput, but bonding should outperform MC/S a lot in case of sync
>> untagged operations.
>
> Simple case here for you to get your feet wet with MC/S. Try doing
> bonding across 4x GB/sec ports on 2x socket 2x core x86_64 and compare
> MC/S vs. OS dependent networking bonding and see what you find. There
> about two iSCSI initiators for two OSes that implementing MC/S and
> LIO-Target <-> LIO-Target. Anyone interested in the CPU overhead on
> this setup between MC/S and Link Layer bonding across 2x 2x 1 Gb/sec
> port chips on 4 core x86_64..?

I think, everybody interested to see those numbers. Do you have any?

>> Anyway, I think features I and II, if added, would increase iSCSI-SCST
>> kernel side code not more than on 5K lines, because most of the code is
>> already there, the most important part which missed is fixes of locking
>> problems, which almost never add a lot of code.
>
> You can think whatever you want. Why don't you have a look at
> lio-core-2.6.git and see how big they are for yourself.

I almost doubled the iSCSI-SCST in-kernel size by that estimation
(currently it's 7.8K lines long)

>> Relating Core-iSCSI-DV,
>> I'm sure iSCSI-SCST will pass it without problems among the required set
>> of iSCSI features, although still there are some limitations, derived
>> from IET, for instance, support for multu-PDU commands in discovery
>> sessions, which isn't implemented. But for adding to iSCSI-SCST optional
>> iSCSI features there should be good *practical* reasons, which at the
>> moment don't exist. And unused features are bad features, because they
>> overcomplicate the code and make its maintainance harder for no gain.
>
> Again, you can think whatever you want. But since you did not implement
> the majority of the iSCSI-SCST code yourself, (or implement your own
> iSCSI Initiator in parallel with your own iSCSI Target), I do not
> believe you are in a position to say. Any IET devs want to comment on
> this..?

You already asked me don't do blanket statements. Can you don't make
them yourself, please? I very much appreciate the work, which IET
developers done, but, in fact, I had to rewrite at least 70% of in
kernel part of IET, because of many problems, starting from:

- Simple code quality issues, which made code auditing practically
impossible. For instance, struct iscsi_cmnd has field pdu_list, which
used in different part of the code both as list and list entry. Now, how
many time would you need to find out in a random code place how it
should be used, as list entry or list? And how big is the probability to
guess wrongly? I suspect, such issues is the main reason why development
of IET was frozen at some point. It's simply impossible to tell looking
at a patch touching the corresponding code if it's correct or not.

to more sophisticated problems like:

- a Russian roulette with VMware, mentioned there:
http://communities.vmware.com/thread/53797?tstart=0&start=15. BTW, LIO
target isn't affected by that simply by accident, because of the reset
SCSI violation, which I already mentioned.

I also had to considerably change the user space part, particularly,
iSCSI negotiation, because interpretation of the iSCSI RFC, which IET
has, forces it to use by default very inoptimal values.

Now guess, was I able to do that without sufficient understanding of
iSCSI or not?

Actually, if I had known about open source LIO iSCSI target
implementation, I would have chosen it, not IET as the base. And now we
wouldn't have a point to discuss ;)

I described the problem in the referenced e-mail pretty well. Do you
have problems with reading and understanding it?

0 new messages