Q: "- PDU header Digest" fetaure

Ulrich Windl

unread,

Feb 25, 2009, 5:39:42 AM2/25/09

to open-...@googlegroups.com

Hello,

when browsing the open-iscsi feature list, I found:
- PDU header Digest;

Does this mean that data digests are not supported? A bugzilla at readhat near mid
of 2007 seems to confirm this.

I see the performance impact, but is there another reason against implementing it?
Can I safely activate it on the target, or will it cause problems?

Regards,
Ulrich

Konrad Rzeszutek

unread,

Feb 25, 2009, 8:55:32 AM2/25/09

to open-...@googlegroups.com

On Wed, Feb 25, 2009 at 11:39:42AM +0100, Ulrich Windl wrote:
>
> Hello,
>
> when browsing the open-iscsi feature list, I found:
> - PDU header Digest;
>
> Does this mean that data digests are not supported? A bugzilla at readhat near mid

I am quite sure it is supported.

> of 2007 seems to confirm this.

Could you be more specific about the bugzilla number? Is the bugzilla in question
accessible to you?

>
> I see the performance impact, but is there another reason against implementing it?

None. It should be implemented.

> Can I safely activate it on the target, or will it cause problems?

You can activate it on the target. If you see problems, please do report it.

Ulrich Windl

unread,

Feb 25, 2009, 10:02:07 AM2/25/09

to open-...@googlegroups.com

On 25 Feb 2009 at 8:55, Konrad Rzeszutek wrote:

>
> On Wed, Feb 25, 2009 at 11:39:42AM +0100, Ulrich Windl wrote:
> >
> > Hello,
> >
> > when browsing the open-iscsi feature list, I found:
> > - PDU header Digest;
> >
> > Does this mean that data digests are not supported? A bugzilla at readhat near mid
>
> I am quite sure it is supported.
>
> > of 2007 seems to confirm this.
>
> Could you be more specific about the bugzilla number? Is the bugzilla in question
> accessible to you?

Hi,

I found it with google: bugzilla.redhat.com, bug 245792, comment #6

Regards,
Ulrich

Mike Christie

unread,

Feb 25, 2009, 12:38:35 PM2/25/09

to open-...@googlegroups.com

Ulrich Windl wrote:
> Hello,
>
> when browsing the open-iscsi feature list, I found:
> - PDU header Digest;
>

Is that from suse's docs or open-iscsi.orgs?

> Does this mean that data digests are not supported? A bugzilla at readhat near mid
> of 2007 seems to confirm this.

Data digests were working but when upstream did the scatterlist changes
to the kernel it broke data digests. We have not found the cause yet.

For Red Hat, they do not support them for different reasons depending on
the version and arch. For example in RHEL5, the big endien crypto digest
code is busted. It needs a fix from upstream, and I think in general
there is still some other bugs in the digest code.

>
> I see the performance impact, but is there another reason against implementing it?
> Can I safely activate it on the target, or will it cause problems?
>

Another reason a lot of distros do not support it is because a common
problem we always hit is that users will write out some data, then start
modifying it again. But the kernel will normally not do do a sync write
when you do a write. So once the write() returns, the kernel is still
sending it through the caches, block, scsi, and iscsi layers. If you are
writing to the data while the it is working its way through the iscsi
layers, the iscsi layer could have done the digest calculation, then you
could modify it and now when the target checks it the digest check will
fail. And so this happens over and over and you get digest errors all
over the place and the iscsi layers fire their error handling and retry
and retry, and in the end they just say forget it and do not support
data digests.

Ulrich Windl

unread,

Feb 26, 2009, 2:13:38 AM2/26/09

to open-...@googlegroups.com

On 25 Feb 2009 at 11:38, Mike Christie wrote:

>
> Ulrich Windl wrote:
> > Hello,
> >
> > when browsing the open-iscsi feature list, I found:
> > - PDU header Digest;
> >
>
> Is that from suse's docs or open-iscsi.orgs?

Hi Mike,

it was from the README you had sent me recently. Inside it's dated "Mar 14, 2008".

>
>
> > Does this mean that data digests are not supported? A bugzilla at readhat near mid
> > of 2007 seems to confirm this.
>
> Data digests were working but when upstream did the scatterlist changes
> to the kernel it broke data digests. We have not found the cause yet.

Hmm: I thought it's all about network packages: You put SCSI commands into network
packets, and then checksum the data in the package. When receiving, you check the
packet, then extract the SCSI commands. I don't see where the SCSI layer could
invalidate network packets here. But I really don't have a deep insight.

>
> For Red Hat, they do not support them for different reasons depending on
> the version and arch. For example in RHEL5, the big endien crypto digest
> code is busted. It needs a fix from upstream, and I think in general
> there is still some other bugs in the digest code.

OK, naively I's assume that if digests are enabled, but broken, nothing will work
(all packets rejected), and the syslog should complain a lot. Am I right with
this?

>
>
>
> >
> > I see the performance impact, but is there another reason against implementing it?
> > Can I safely activate it on the target, or will it cause problems?
> >
>
> Another reason a lot of distros do not support it is because a common
> problem we always hit is that users will write out some data, then start
> modifying it again. But the kernel will normally not do do a sync write
> when you do a write. So once the write() returns, the kernel is still
> sending it through the caches, block, scsi, and iscsi layers. If you are
> writing to the data while the it is working its way through the iscsi
> layers, the iscsi layer could have done the digest calculation, then you
> could modify it and now when the target checks it the digest check will
> fail. And so this happens over and over and you get digest errors all
> over the place and the iscsi layers fire their error handling and retry
> and retry, and in the end they just say forget it and do not support
> data digests.

So information flow seems to be a bit different from what I was expecting. I
thought the "physical write" is when the kernel issues the SCSI command write to
the HBA (which should be iSCSI here). ISCSI would then build a network packet from
the SCSI command and transfer it, optionally with digests attached.

Regards,
Ulrich

Konrad Rzeszutek

unread,

Feb 26, 2009, 9:18:05 AM2/26/09

to open-...@googlegroups.com

> > Another reason a lot of distros do not support it is because a common
> > problem we always hit is that users will write out some data, then start
> > modifying it again. But the kernel will normally not do do a sync write
> > when you do a write. So once the write() returns, the kernel is still
> > sending it through the caches, block, scsi, and iscsi layers. If you are
> > writing to the data while the it is working its way through the iscsi
> > layers, the iscsi layer could have done the digest calculation, then you
> > could modify it and now when the target checks it the digest check will
> > fail. And so this happens over and over and you get digest errors all
> > over the place and the iscsi layers fire their error handling and retry
> > and retry, and in the end they just say forget it and do not support
> > data digests.
>
> So information flow seems to be a bit different from what I was expecting. I
> thought the "physical write" is when the kernel issues the SCSI command write to
> the HBA (which should be iSCSI here). ISCSI would then build a network packet from

It does build it, but the page that references the data is unmodified - and is passed
along to the TCP layer which then passes it on to the NIC. This is the zero-page copy
path which "assembles" the network packet by assembling the components of the network
packet (ie, on this page is the TCP header, on this next page is the data) on the fly (instead of
copying the data from iSCSI and then assembling them all on one page).

Mike, please correct me if I am wrong.

> the SCSI command and transfer it, optionally with digests attached.

Your original e-mail was asking about two different types of digests. The PDU header
and the data. The PDU header I am pretty sure is working - will double-check
today thought.

Vladislav Bolkhovitin

unread,

Feb 26, 2009, 2:54:57 PM2/26/09

to open-...@googlegroups.com, scst-devel

Mike Christie, on 02/25/2009 08:38 PM wrote:
> Another reason a lot of distros do not support it is because a common
> problem we always hit is that users will write out some data, then start
> modifying it again. But the kernel will normally not do do a sync write
> when you do a write. So once the write() returns, the kernel is still
> sending it through the caches, block, scsi, and iscsi layers. If you are
> writing to the data while the it is working its way through the iscsi
> layers, the iscsi layer could have done the digest calculation, then you
> could modify it and now when the target checks it the digest check will
> fail. And so this happens over and over and you get digest errors all
> over the place and the iscsi layers fire their error handling and retry
> and retry, and in the end they just say forget it and do not support
> data digests.

During testing of iSCSI-SCST with data digests enabled with open-iscsi
initiator I've regularly once in several hours seen data digests errors.
I was going to investigate it, but had no time. Now I know the reason.

Thanks for the explanation!
Vlad

Boaz Harrosh

unread,

Mar 5, 2009, 4:17:11 AM3/5/09

to open-...@googlegroups.com, linux-fsdevel, linux-scsi, Mike Christie

Hi Mike, list.

Mike Christie has pointed out of a serious problem for us which we need
the list help of.

It started with a question by Ulrich Windl of why data-digests are
not supported/recommended by open-iscsi installations and distros.

[iscsi data-digests is when the complete payload of an iscsi transaction
initiator-target is signed by an HMAC(SHA1) both read/write]

Mike Christie wrote:
> Ulrich Windl wrote:
>>
>

> Data digests were working but when upstream did the scatterlist changes
> to the kernel it broke data digests. We have not found the cause yet.
>
> For Red Hat, they do not support them for different reasons depending on
> the version and arch. For example in RHEL5, the big endien crypto digest
> code is busted. It needs a fix from upstream, and I think in general
> there is still some other bugs in the digest code.
>
>> I see the performance impact, but is there another reason against implementing it?
>> Can I safely activate it on the target, or will it cause problems?
>>
>
> Another reason a lot of distros do not support it is because a common
> problem we always hit is that users will write out some data, then start
> modifying it again. But the kernel will normally not do do a sync write
> when you do a write. So once the write() returns, the kernel is still
> sending it through the caches, block, scsi, and iscsi layers. If you are
> writing to the data while the it is working its way through the iscsi
> layers, the iscsi layer could have done the digest calculation, then you
> could modify it and now when the target checks it the digest check will
> fail. And so this happens over and over and you get digest errors all
> over the place and the iscsi layers fire their error handling and retry
> and retry, and in the end they just say forget it and do not support
> data digests.
>

Mike if what you said in the last paragraph is true, about FS modifying the data
while the request is in-flight, then it does not explain your statement above
about, things getting worse around the scatterlist changes.

The way I see it there can be two fundamental problems:
1. The FS is permitted to (or sinfully) modifies pages of memory while a request to
write these pages is already in-flight. fsdevel guys might want to comment on that?
Mike have you observed these problems with a particular file system?
I can anticipate such a problem arising in a memory-mapped IO, while a page-cache
write-back is in progress. Is that so? is Linux not safe in this regard? If so
how does DM & MD do there raid parity calculations? do they copy the data?

2. iSCSI releases the request too soon, before the all data was actually used up by the
network stack, and is allowing the FS to continue modifying these pages.
This is a serious problem which means that there can be crashes and data corruption even
if data-digest are not used.
Actually we did move not long ago from copy of network data to been completely copy-less
could that be the point in time things stopped working?

3. Plain coding bug, but I could not find any.

I know in the passed that data-digests are a grate tool for finding bugs that otherwise can
go undetected, it happened to me several times in the passed. All of these cases reviled a flaw
in the code, do to rebasing, things changing, plain programmer bugs.

Mike, I'm running here a plain iscsi initiator-target setup and the regression tests, and it
runs. What setup and tests did you run to trigger these digest retries, I would like to
reproduce this here, and investigate.

Thanks for any help
Boaz

Mike Christie

unread,

Mar 5, 2009, 12:42:48 PM3/5/09

to open-...@googlegroups.com, linux-fsdevel, linux-scsi

They are two separate issues.

Around the time of the scatterlist changes I will get an oops in the
digest calculation code (when we call into the crypto callouts), or in
newer kernels the oops went away and now I will get data digest errors.
I am still trying to narrow down the commit and line and make sure that
the oops is fixed and did not turn into a digest error or if maybe I am
hitting a real digest error.

The second issue is that we normally do zero copy for writes. I do not
think it is FS bug or net bug or a bug in the iscsi layer. Maybe more of
a bug in what the user expects (who reads the man page for write() to
check if the data is committed to disk when write() returns). We
discussed this a couple times. For open-iscsi we tried to close the gap,
by not doing zero copy writes when data digests are used. And a long
long time ago this was discussed for linux-iscsi, and I think that is
one of the reasons we added DID_IMM_RETRY to the scsi layer (we can then
avoid the 5 retry limit in this case and retry until it is resolved).

>
> The way I see it there can be two fundamental problems:
> 1. The FS is permitted to (or sinfully) modifies pages of memory while a request to
> write these pages is already in-flight. fsdevel guys might want to comment on that?
> Mike have you observed these problems with a particular file system?
> I can anticipate such a problem arising in a memory-mapped IO, while a page-cache
> write-back is in progress. Is that so? is Linux not safe in this regard? If so
> how does DM & MD do there raid parity calculations? do they copy the data?
>
> 2. iSCSI releases the request too soon, before the all data was actually used up by the
> network stack, and is allowing the FS to continue modifying these pages.
> This is a serious problem which means that there can be crashes and data corruption even
> if data-digest are not used.
> Actually we did move not long ago from copy of network data to been completely copy-less
> could that be the point in time things stopped working?
>
> 3. Plain coding bug, but I could not find any.
>
> I know in the passed that data-digests are a grate tool for finding bugs that otherwise can
> go undetected, it happened to me several times in the passed. All of these cases reviled a flaw
> in the code, do to rebasing, things changing, plain programmer bugs.
>
> Mike, I'm running here a plain iscsi initiator-target setup and the regression tests, and it
> runs. What setup and tests did you run to trigger these digest retries, I would like to
> reproduce this here, and investigate.
>

The open-iscsi/test regression script and dat file. Once the section
with data digests runs I hit the oops/digest error. I am not sure if I
ever hit the second zero copy write issues. I might be hitting that now.
Like I said, I have not had time to check if the oops turned into a
digest error when it should not or if I am hitting the zero copy issue.

Ulrich Windl

unread,

Mar 6, 2009, 3:17:32 AM3/6/09

to open-...@googlegroups.com

(Trimmed the list of reply-to:)
On 5 Mar 2009 at 11:42, Mike Christie wrote:

[...]

> The second issue is that we normally do zero copy for writes. I do not
> think it is FS bug or net bug or a bug in the iscsi layer. Maybe more of
> a bug in what the user expects (who reads the man page for write() to
> check if the data is committed to disk when write() returns). We
> discussed this a couple times. For open-iscsi we tried to close the gap,
> by not doing zero copy writes when data digests are used. And a long
> long time ago this was discussed for linux-iscsi, and I think that is
> one of the reasons we added DID_IMM_RETRY to the scsi layer (we can then
> avoid the 5 retry limit in this case and retry until it is resolved).

Hi Mike,

I have no idea about the buffer handling in networking, etc, but as far as I
understand, you are telling the iSCSI/networking code to transmit some data blocks
in menory via networking without making a copy of the data blocks. Now I wonder if
there are some flags to tell the kernel not to update that data blocks, because a
transfer is "in progress" (when actually the transfer is just queued for
progress). In buffer_head.h there are some flags, but I don't know what they
exactly mean. The other solution that comes to my mind would be a type of call-
back when a buffer was changed after being queued for transport; the call back
would then have to re-calculate the data disgest.

Am I completely wrong here, or is it more or less the problem we are talking
about?

Regards,
Ulrich

Mike Christie

unread,

Mar 9, 2009, 12:17:59 PM3/9/09

to open-...@googlegroups.com

I think we are talking about the same problem.

I am not sure about some of the details in what you propose. Probably,
the iscsi list is not the appropriate forum since we do not have a deep
expertise in the vm or networking code. But I think if you ask on the
other lists, they might tell you to just write your app so you do a sync
after the write.

Reply all

Reply to author

Forward