> Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8) > flags:0x010000000000000c mapping:ffff8100355f1dd8 mapcount:2 count:0 > Backtrace:
> Call Trace:<ffffffff80159f93>{bad_page+99} <ffffffff8015a965>{free_hot_cold_page+101} > <ffffffff80162007>{__page_cache_release+151} <ffffffff802b8fe8>{sgl_unmap_user_pages+120} > <ffffffff802b48fb>{release_buffering+27} <ffffffff802b4fb1>{st_write+1697} > <ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83} > <ffffffff8010db7a>{system_call+126} > Trying to fix it up, but a reboot is needed > Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8) > flags:0x010000000000081c mapping:ffff81005c0fc310 mapcount:0 count:0 > Backtrace:
> Bad page state at prep_new_page (in process 'find', page ffff81000260b6f8) > flags:0x0100000000000064 mapping:ffff8100f3be9be9 mapcount:1 count:1 > Backtrace:
On Tue, Nov 29, 2005 at 10:04:39PM +0200, Kai Makisara wrote: > I looked at the driver and it seems that there is a bug: st_write calls > release_buffering at the end even when it has started an asynchronous > write. This means that it releases the mapping while it is being used! > (I wonder why this has not been noticed earlier.)
> The patch below (against 2.6.15-rc2) should fix this bug and some others > related to buffering. It is based on the patch "[PATCH] SCSI tape direct > i/o fixes" I sent to linux-scsi on Nov 21. The patch restores setting > pages dirty after reading and clears number of s/g segments when the > pointers are not valid any more.
> The patch has been lightly tested with AMD64.
This applies cleanly to 2.6.14.2, do you forsee any problems using it with that kernel? I'd like to not change too many things at once.
If it should be OK, I'll boot this tonight or tomorrow - the backups run every other night, so it won't get any testing until tomorrow night.
On Tue, 29 Nov 2005, Ryan Richter wrote: > On Tue, Nov 29, 2005 at 10:04:39PM +0200, Kai Makisara wrote: > > I looked at the driver and it seems that there is a bug: st_write calls > > release_buffering at the end even when it has started an asynchronous > > write. This means that it releases the mapping while it is being used! > > (I wonder why this has not been noticed earlier.)
> > The patch below (against 2.6.15-rc2) should fix this bug and some others > > related to buffering. It is based on the patch "[PATCH] SCSI tape direct > > i/o fixes" I sent to linux-scsi on Nov 21. The patch restores setting > > pages dirty after reading and clears number of s/g segments when the > > pointers are not valid any more.
> > The patch has been lightly tested with AMD64.
> This applies cleanly to 2.6.14.2, do you forsee any problems using it > with that kernel? I'd like to not change too many things at once.
No, I don't see any potential problems applying this patch to 2.6.14.2. There is nothing specific to 2.6.15-rc2.
If someone sees that there is something wrong, please yell. The main purpose of the patch is not to call release_buffering() at the end of st_write() when starting asynchronous write and call it in write_behind_check() instead.
> If it should be OK, I'll boot this tonight or tomorrow - the backups run > every other night, so it won't get any testing until tomorrow night.
> Thanks a lot, > -ryan
Thanks for reporting the problem and thanks in advance for testing.
On Tue, Nov 29, 2005 at 10:48:22PM +0200, Kai Makisara wrote: > On Tue, 29 Nov 2005, Ryan Richter wrote: > > This applies cleanly to 2.6.14.2, do you forsee any problems using it > > with that kernel? I'd like to not change too many things at once.
> No, I don't see any potential problems applying this patch to 2.6.14.2. > There is nothing specific to 2.6.15-rc2.
> If someone sees that there is something wrong, please yell. The > main purpose of the patch is not to call release_buffering() at the end of > st_write() when starting asynchronous write and call it in > write_behind_check() instead.
OK, thanks. I think I'll go ahead and advance to 2.6.14.3 since that should theoretically not cause any problems.
One question: do you think the oopses that happened later that actually crashed the box were from damage caused by this bug or is that a different problem?
> > If it should be OK, I'll boot this tonight or tomorrow - the backups run > > every other night, so it won't get any testing until tomorrow night.
> > Thanks a lot, > > -ryan
> Thanks for reporting the problem and thanks in advance for testing.
On Tue, 29 Nov 2005, Ryan Richter wrote: > On Tue, Nov 29, 2005 at 10:48:22PM +0200, Kai Makisara wrote: > > On Tue, 29 Nov 2005, Ryan Richter wrote: > > > This applies cleanly to 2.6.14.2, do you forsee any problems using it > > > with that kernel? I'd like to not change too many things at once.
> > No, I don't see any potential problems applying this patch to 2.6.14.2. > > There is nothing specific to 2.6.15-rc2.
> > If someone sees that there is something wrong, please yell. The > > main purpose of the patch is not to call release_buffering() at the end of > > st_write() when starting asynchronous write and call it in > > write_behind_check() instead.
> OK, thanks. I think I'll go ahead and advance to 2.6.14.3 since that > should theoretically not cause any problems.
> One question: do you think the oopses that happened later that actually > crashed the box were from damage caused by this bug or is that a > different problem?
I looked at the oopses but, not knowing enough about what is happening inside the kernel, I can only hope that they are caused by the st bug(s). We will see after testing with the patch.
On Tue, 29 Nov 2005, Kai Makisara wrote: > On Tue, 29 Nov 2005, Ryan Richter wrote:
> > On Tue, Nov 29, 2005 at 10:04:39PM +0200, Kai Makisara wrote: > > > I looked at the driver and it seems that there is a bug: st_write calls > > > release_buffering at the end even when it has started an asynchronous > > > write. This means that it releases the mapping while it is being used! > > > (I wonder why this has not been noticed earlier.)
> > > The patch below (against 2.6.15-rc2) should fix this bug and some others > > > related to buffering. It is based on the patch "[PATCH] SCSI tape direct > > > i/o fixes" I sent to linux-scsi on Nov 21. The patch restores setting > > > pages dirty after reading and clears number of s/g segments when the > > > pointers are not valid any more.
> > > The patch has been lightly tested with AMD64.
> > This applies cleanly to 2.6.14.2, do you forsee any problems using it > > with that kernel? I'd like to not change too many things at once.
> No, I don't see any potential problems applying this patch to 2.6.14.2. > There is nothing specific to 2.6.15-rc2.
> If someone sees that there is something wrong, please yell. The > main purpose of the patch is not to call release_buffering() at the end of > st_write() when starting asynchronous write and call it in > write_behind_check() instead.
Yelling!
The patch does not cause any problems but my theory about calling release_buffering() too early is wrong: asynchronous writes are not done with direct i/o. (The reason for this choice is that the API allows the user to do anything with the buffer between writes and so the SCSI write has to be finished before returning from write().)
The other fixes in the patch are valid but I don't see how they should fix this problem.
> Date: Tue, 29 Nov 2005 10:44:09 -0500 > From: Ryan Richter <r...@tau.solarneutrino.net> > To: linux-ker...@vger.kernel.org > Cc: r...@tau.solarneutrino.net > Subject: crash on x86_64 - mm related?
> Hi, I booted 2.6.14.2 with the MPT fusion performance fix patch about a > week ago on my file server. The machine crashed lat night while it was > doing backups. You can see the voluminous kernel output below.
> I will reply later today with the kernel .config, right now I have to > wait for someone to reboot the machine first.
> Any help would be appreciated, > -ryan
> Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8) > flags:0x010000000000000c mapping:ffff8100355f1dd8 mapcount:2 count:0 > Backtrace:
> Call Trace:<ffffffff80159f93>{bad_page+99} <ffffffff8015a965>{free_hot_cold_page+101} > <ffffffff80162007>{__page_cache_release+151} <ffffffff802b8fe8>{sgl_unmap_user_pages+120} > <ffffffff802b48fb>{release_buffering+27} <ffffffff802b4fb1>{st_write+1697} > <ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83} > <ffffffff8010db7a>{system_call+126} > Trying to fix it up, but a reboot is needed > Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8) > flags:0x010000000000081c mapping:ffff81005c0fc310 mapcount:0 count:0 > Backtrace:
I have installed amanda and learned to use it enough to do experiments with my main system. Unfortunately I have not been able to see any oopses.
My system is somewhat similar to yours but not completely. I have a single processor system with 1 GB memory whereas your system is a dual processor system with 5 GB memory. We both use the sym53c8xx driver to control the tape drive.
I have tried 2.6.14.2 and 2.6.15-rc3 kernels with and without the patch I sent earlier to the list. The first kernels did not have preemption and NUMA support enabled but later I configured the 2.6.14.2 kernel with both enabled. This is the nearest thing to your NUMA dual processor system but it does not seem to be near enough.
Since I can't reproduce the problem, I have to look at the oopses more carefully. Both yout oopses and those from http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=338335 are quite similar at the beginning. First come one or more reports about "Bad page state at free_hot_cold_page". The mapping_count is always two and count is zero. This condition triggers the message.
The next thing is "Kernel BUG at include/linux/mm.h:341". This is in put_page(struct page *page) and points to page pointer being NULL.
The third event is "Kernel BUG at mm/rmap.c:487" which results from "BUG_ON(page_mapcount(page) < 0)". The page pointer has been used used earlier in page_remove_rmap().
I am not an mm expert and have no idea what could cause this sequence of events. Any ideas?
If someone has any ideas for my debugging, they are welcome. I will continue thinking about this but now I am out of useful ideas.
On Thu, 1 Dec 2005, Kai Makisara wrote: > On Tue, 29 Nov 2005, Andrew Morton wrote:
> > Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8) > > flags:0x010000000000000c mapping:ffff8100355f1dd8 mapcount:2 count:0 > > Backtrace:
Ryan, can you test 2.6.15-rc4 and report what it does?
The "Bad page state" messages may (should) remain, but the crashes should be gone and the machine should hopefully continue functioning fine. And, perhaps more importantly, you should hopefully have a _new_ message about incomplete pfn mappings that should help pinpoint which driver causes this..
On Thu, Dec 01, 2005 at 09:18:33PM +0200, Kai Makisara wrote: > I have installed amanda and learned to use it enough to do experiments > with my main system. Unfortunately I have not been able to see any oopses.
Thanks a lot for doing this! Our backups run every other night and usually write 15-20GB to tape, and it was 2 weeks before we hit this. So it's not very easy to reproduce. They ran again last night (with the patch) without incident.
There's one more thing that might be interesting: the morning before this crash happened, this machine mysteriously rebooted for no apparent reason. There was nothing in the logs. Unfortunately, the BIOS screen wipes out the last ~24 lines of serial console output on reboot, so all I can say is that if anything was printed it was much less than 24 lines. There are other machines on the same UPS so it wasn't a power glitch. And it had just had an uptime of 170-some days with 2.6.11.3 (which had some annoying bugs but was stable), so it's not exactly prone to random reboots...
I didn't think it was worth mentioning since I knew nothing about what happened, but I noticed that it happened exactly one hour after the 6:37am cron job (updatedb etc.) - the same one that caused it to crash the next day after the oopses during the backups. Might it be related?
Let me know if there's any more info I can provide.
On Thu, Dec 01, 2005 at 11:38:20AM -0800, Linus Torvalds wrote:
> On Thu, 1 Dec 2005, Kai Makisara wrote:
> > On Tue, 29 Nov 2005, Andrew Morton wrote:
> > > Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8) > > > flags:0x010000000000000c mapping:ffff8100355f1dd8 mapcount:2 count:0 > > > Backtrace:
> Ryan, can you test 2.6.15-rc4 and report what it does?
> The "Bad page state" messages may (should) remain, but the crashes should > be gone and the machine should hopefully continue functioning fine. And, > perhaps more importantly, you should hopefully have a _new_ message about > incomplete pfn mappings that should help pinpoint which driver causes > this..
Will do, I plan to take this machine down Saturday to run memtest86 for a while (just to be sure - 2/3 of the RAM is new, but I should be seeing machine checks if that were the problem, no?) so I'll boot this after that.
On Thu, 1 Dec 2005, Ryan Richter wrote: > On Thu, Dec 01, 2005 at 11:38:20AM -0800, Linus Torvalds wrote:
> > > > Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8) > > > > flags:0x010000000000000c mapping:ffff8100355f1dd8 mapcount:2 count:0 > > > > Backtrace:
> > Ryan, can you test 2.6.15-rc4 and report what it does?
> > The "Bad page state" messages may (should) remain, but the crashes should > > be gone and the machine should hopefully continue functioning fine. And, > > perhaps more importantly, you should hopefully have a _new_ message about > > incomplete pfn mappings that should help pinpoint which driver causes > > this..
But we already know which driver does this: drivers/scsi/st.c.
> Will do, I plan to take this machine down Saturday to run memtest86 for > a while (just to be sure - 2/3 of the RAM is new, but I should be seeing > machine checks if that were the problem, no?) so I'll boot this after that.
Nick and I had already been looking at drivers/scsi/{sg.c,st.c}, brought there by __put_page in sg.c's peculiar sg_rb_correct4mmap, which we'd like to remove. But that's irrelevant to your pain, except...
One extract from the patches I'd like to send Doug and Kai for 2.6.15 or 2.6.16 is this below: since the incomplete get_user_pages path omits to reset res, but has already released all the pages, it will result in premature freeing of user pages, and behaviour just like you've seen.
Though I'd have thought incomplete get_user_pages was an exceptional case, and a bit surprised you'd encounter it. Perhaps there's some other premature freeing in the driver, and this instance has nothing whatever to do with it.
If the problem were easily reproducible, it'd be great if you could try this patch; but I think you've said it's not :-(
Hugh
--- 2.6.14/drivers/scsi/st.c 2005-10-28 01:02:08.000000000 +0100 +++ linux/drivers/scsi/st.c 2005-12-01 20:06:02.000000000 +0000 @@ -4511,6 +4511,7 @@ static int sgl_map_user_pages(struct sca if (res > 0) { for (j=0; j < res; j++) page_cache_release(pages[j]); + res = 0; } kfree(pages); return res; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, 2005-12-01 at 11:38 -0800, Linus Torvalds wrote: > Ryan, can you test 2.6.15-rc4 and report what it does?
> The "Bad page state" messages may (should) remain, but the crashes should > be gone and the machine should hopefully continue functioning fine. And, > perhaps more importantly, you should hopefully have a _new_ message about > incomplete pfn mappings that should help pinpoint which driver causes > this..
On a side note, I have Kai's patch in the scsi-rc-fixes tree which I'm getting ready to push. Can we get a consensus on whether it should be removed before I merge upwards?
On Thu, 1 Dec 2005, James Bottomley wrote: > On Thu, 2005-12-01 at 11:38 -0800, Linus Torvalds wrote: > > Ryan, can you test 2.6.15-rc4 and report what it does?
> > The "Bad page state" messages may (should) remain, but the crashes should > > be gone and the machine should hopefully continue functioning fine. And, > > perhaps more importantly, you should hopefully have a _new_ message about > > incomplete pfn mappings that should help pinpoint which driver causes > > this..
> On a side note, I have Kai's patch in the scsi-rc-fixes tree which I'm > getting ready to push. Can we get a consensus on whether it should be > removed before I merge upwards?
I think it should be removed because it is based partly on a wrong assumption: asynchronous writes are _not_ done together with direct i/o. (I have also experimentally verified that this does not happen.)
The patch includes the patch I sent sent to linux-scsi on Nov 21. Nobody has commented it and I don't know if the user pages have to be explicitly marked dirty after the HBA has read data there. If they have to, then this earlier patch is valid. If not, I will send a patch for 2.6.16 to remove the latent code.
> Nick and I had already been looking at drivers/scsi/{sg.c,st.c}, > brought there by __put_page in sg.c's peculiar sg_rb_correct4mmap, > which we'd like to remove. But that's irrelevant to your pain, except...
> One extract from the patches I'd like to send Doug and Kai for 2.6.15 > or 2.6.16 is this below: since the incomplete get_user_pages path omits > to reset res, but has already released all the pages, it will result in > premature freeing of user pages, and behaviour just like you've seen.
> Though I'd have thought incomplete get_user_pages was an exceptional > case, and a bit surprised you'd encounter it. Perhaps there's some > other premature freeing in the driver, and this instance has nothing > whatever to do with it.
> If the problem were easily reproducible, it'd be great if you could > try this patch; but I think you've said it's not :-(
Whether this fix solves the current problem or not, it clearly fixes a bug. If someone wants to include this into a patch series, you can add Signed-off-by: Kai Makisara <kai.makis...@kolumbus.fi> If not, I will include this when I send patches next time.
On Thu, 1 Dec 2005, Kai Makisara wrote: > On Thu, 1 Dec 2005, James Bottomley wrote:
> > On a side note, I have Kai's patch in the scsi-rc-fixes tree which I'm > > getting ready to push. Can we get a consensus on whether it should be > > removed before I merge upwards?
I too need to decide whether to base my little sg.c,st.c patchset on top of Kai's (as I see in 2.6.15-rc3-mm1, I presume) or on top of 2.6.15-rc4.
> I think it should be removed because it is based partly on a wrong > assumption: asynchronous writes are _not_ done together with direct i/o. > (I have also experimentally verified that this does not happen.)
I'm assuming from this that I'd best base on 2.6.15-rc4; but by all means overrule me if you've changed your mind.
> The patch includes the patch I sent sent to linux-scsi on Nov 21. Nobody > has commented it and I don't know if the user pages have to be explicitly > marked dirty after the HBA has read data there. If they have to, then this > earlier patch is valid.
What I see in 2.6.15-rc3-mm1 looks like three patches.
One to do with resetting sg_segs to 0 at various points: I've no appreciation of that patch at all. If it would help you for me to add that into my little set, please send me a comment for it.
One to add an "is_read" argument to release_buffering. Yes, that's a part of my set too, though in my case called "dirtied" (and I believe that the call at the end of st_read can say 0, because that's just for an error path: when it's really dirtied user memory, it'll be read_tape that does the release_buffering). sg.c was always saying dirtied, even when writing from memory; st.c was always saying not dirtied, even when reading into memory. Usually the latter is okay, get_user_pages has said dirty in advance; but under pressure there's a window whereby it's not good enough. And SetPageDirty can be counter-productive these days, so your patch is incomplete in that regard: I'll explain more in mine.
One to move around where release_buffering is called from: that's the part you've decided was wrong, or at least unnecessary.
> If not, I will send a patch for 2.6.16 to remove the latent code.
I didn't understand that bit, but I probably don't need to.
On Fri, 2 Dec 2005, Hugh Dickins wrote: > On Thu, 1 Dec 2005, Kai Makisara wrote: > > On Thu, 1 Dec 2005, James Bottomley wrote:
> > > On a side note, I have Kai's patch in the scsi-rc-fixes tree which I'm > > > getting ready to push. Can we get a consensus on whether it should be > > > removed before I merge upwards?
> I too need to decide whether to base my little sg.c,st.c patchset > on top of Kai's (as I see in 2.6.15-rc3-mm1, I presume) or on top > of 2.6.15-rc4.
> > I think it should be removed because it is based partly on a wrong > > assumption: asynchronous writes are _not_ done together with direct i/o. > > (I have also experimentally verified that this does not happen.)
> I'm assuming from this that I'd best base on 2.6.15-rc4; > but by all means overrule me if you've changed your mind.
> > The patch includes the patch I sent sent to linux-scsi on Nov 21. Nobody > > has commented it and I don't know if the user pages have to be explicitly > > marked dirty after the HBA has read data there. If they have to, then this > > earlier patch is valid.
> What I see in 2.6.15-rc3-mm1 looks like three patches.
> One to do with resetting sg_segs to 0 at various points: > I've no appreciation of that patch at all. If it would help you for > me to add that into my little set, please send me a comment for it.
I include at the end of this message the patch I sent to linux-scsi earlier. It should clarify what are the useful parts of the later patch.
I think the release_buffering() call at the end of st_read must say 1. All returns use the same path (except the one returning -ERESTARTSYS).
This is the comment related to this part: - the number of s/g segments has not always been zeroed when the page pointers become invalid
The changes setting sg_segs to 0 don't fix any known bugs. They will be necessary if/when Mike Christie's patches will be merged later. You can omit these things now if you want.
> One to add an "is_read" argument to release_buffering. Yes, that's > a part of my set too, though in my case called "dirtied" (and I believe > that the call at the end of st_read can say 0, because that's just for > an error path: when it's really dirtied user memory, it'll be read_tape > that does the release_buffering). sg.c was always saying dirtied, even > when writing from memory; st.c was always saying not dirtied, even when > reading into memory. Usually the latter is okay, get_user_pages has > said dirty in advance; but under pressure there's a window whereby it's > not good enough. And SetPageDirty can be counter-productive these days, > so your patch is incomplete in that regard: I'll explain more in mine.
Good, I will leave sorting out these things to to you. Any naming is OK with me.
I think the release_buffering() call at the end of st_read must say 1. All returns use the same path (except the one returning -ERESTARTSYS).
st.c did set pages dirty after reading before 2.6.0-test4. It disappeared when code was rearranged and I don't have any notes about why.
> One to move around where release_buffering is called from: > that's the part you've decided was wrong, or at least unnecessary.
It is unnecessary but does not cause any problems. It is wrong because it hints that asynchronous writes could be done with direct i/o.
> > If not, I will send a patch for 2.6.16 to remove the latent code.
> I didn't understand that bit, but I probably don't need to.
After the previous comments, you don't need to :-) The meaning was that I would remove the extra parameter and code setting pages dirty from sgl_unmap_pages() if that code would never be used.
> Hugh
Thanks for looking at the code.
Kai
---------- This patch is against 2.6.15-rc2 and fixes the following two bugs in the SCSI tape driver: - the pages dirtied by reading data to user space have not been marked dirty - the number of s/g segments has not always been zeroed when the page pointers become invalid
Signed-off-by: Kai Makisara <kai.makis...@kolumbus.fi> --- --- linux-2.6.15-rc2/drivers/scsi/st.c 2005-11-20 22:10:00.000000000 +0200 +++ linux-2.6.15-rc2-k1/drivers/scsi/st.c 2005-11-20 22:33:25.000000000 +0200 @@ -17,7 +17,7 @@ Last modified: 18-JAN-1998 Richard Gooch <rgo...@atnf.csiro.au> Devfs support */
@@ -1449,14 +1449,15 @@ static int setup_buffering(struct scsi_t
/* Can be called more than once after each setup_buffer() */ -static void release_buffering(struct scsi_tape *STp) +static void release_buffering(struct scsi_tape *STp, int is_read) { struct st_buffer *STbp;
On Thu, Dec 01, 2005 at 08:21:57PM +0000, Hugh Dickins wrote: > If the problem were easily reproducible, it'd be great if you could > try this patch; but I think you've said it's not :-(
I can throw this in and test it for sure. I'll run the backups every day for a while to speed up the testing also.
Could someone please tell me exactly which patches I should include in the kernel I will boot tomorrow? I haven't played with -rc for ages, so I'm no longer sure which kernel I should start with (2.6.14 or 2.6.14.3?). Are the MPT-fusion performance fix patches in -rc4, or if not will they still apply?
On 12/2/05, Ryan Richter <r...@tau.solarneutrino.net> wrote: [snip]
> Could someone please tell me exactly which patches I should include in > the kernel I will boot tomorrow? I haven't played with -rc for ages, so > I'm no longer sure which kernel I should start with (2.6.14 or > 2.6.14.3?). Are the MPT-fusion performance fix patches in -rc4, or if > not will they still apply?
On Fri, 2 Dec 2005, Kai Makisara wrote: > On Fri, 2 Dec 2005, Hugh Dickins wrote: > I include at the end of this message the patch I sent to linux-scsi > earlier. It should clarify what are the useful parts of the later patch.
Thanks, yes. I'll leave out updating the verstr[], I think that should be sent by you alone.
> I think the release_buffering() call at the end of st_read must say 1. All > returns use the same path (except the one returning -ERESTARTSYS).
Okay, if you insist. Yes, all those returns pass that way, but if it actually did some reading into the memory, it called read_tape, which did the effective release_buffering immediately after st_do_scsi.
But perhaps I'm misreading it, and even if not, someone will come along and "correct" it later, or change things around and make my not-dirty assumption wrong.
It's just that after seeing how sg.c is claiming to dirty even readonly memory, I'm excessively averse to saying we've dirtied memory we haven't. My hangup, I'll get over it!
> st.c did set pages dirty after reading before 2.6.0-test4. It disappeared > when code was rearranged and I don't have any notes about why.
Possibly because of issues with hugetlb compound pages: David Gibson raised that issue recently with respect to access_process_vm (page[1].mapping is reused and crashes set_page_dirty), I'm thinking we don't want to add PageCompound tests all over, and have mailed Andrew separately for guidance on that.
On Fri, 2 Dec 2005, Ryan Richter wrote: > On Thu, Dec 01, 2005 at 08:21:57PM +0000, Hugh Dickins wrote: > > If the problem were easily reproducible, it'd be great if you could > > try this patch; but I think you've said it's not :-(
> I can throw this in and test it for sure. I'll run the backups every > day for a while to speed up the testing also.
Thanks - though everyone seems to have agreed that the patch is needed whatever: so although it would be interesting to know if it has fixed your problem, we're not waiting on you to go forward with it (James already invited Linus to pull).
> Could someone please tell me exactly which patches I should include in > the kernel I will boot tomorrow?
You hit the problem with 2.6.14.2. My own opinion would be to apply just that patch to that release, or to 2.6.14.3 if you prefer.
Linus was suggesting 2.6.15-rc4 because there's some debug there which might have helped identify the offending driver: but your backtrace had already showed us the offending driver. Well, not proven: if a page is doubly freed, the one that suffers is not necessarily the one that's guilty; but it's a reasonable expectation. I don't think his debug would tell us anything more.
> I haven't played with -rc for ages, so > I'm no longer sure which kernel I should start with (2.6.14 or > 2.6.14.3?).
If you do want 2.6.15-rc4, then it's 2.6.14 + patch-2.6.15-rc4.
> Are the MPT-fusion performance fix patches in -rc4, or if > not will they still apply?
I don't know it myself. Is that the one about the module performing much better than the builtin? I'd expect that to be in. If you have it, suggest you look at your 2.6.15-rc4 source to see if it's there.
On Fri, Dec 02, 2005 at 07:12:19PM +0000, Hugh Dickins wrote: > On Fri, 2 Dec 2005, Ryan Richter wrote: > > On Thu, Dec 01, 2005 at 08:21:57PM +0000, Hugh Dickins wrote: > > > If the problem were easily reproducible, it'd be great if you could > > > try this patch; but I think you've said it's not :-(
> > I can throw this in and test it for sure. I'll run the backups every > > day for a while to speed up the testing also.
> Thanks - though everyone seems to have agreed that the patch is needed > whatever: so although it would be interesting to know if it has fixed > your problem, we're not waiting on you to go forward with it (James > already invited Linus to pull).
> > Could someone please tell me exactly which patches I should include in > > the kernel I will boot tomorrow?
> You hit the problem with 2.6.14.2. My own opinion would be to apply > just that patch to that release, or to 2.6.14.3 if you prefer.
> Linus was suggesting 2.6.15-rc4 because there's some debug there which > might have helped identify the offending driver: but your backtrace > had already showed us the offending driver. Well, not proven: if a > page is doubly freed, the one that suffers is not necessarily the one > that's guilty; but it's a reasonable expectation. I don't think his > debug would tell us anything more.
OK, I guess I'll stick with 2.6.14.3 for now, plus your patch. Should I keep Kai's st.c patch? There was some mention of other patches, are those relevant? Most of that discussion went over my head...
On Fri, 2 Dec 2005, Hugh Dickins wrote: > On Fri, 2 Dec 2005, Kai Makisara wrote: > > On Fri, 2 Dec 2005, Hugh Dickins wrote: > > I include at the end of this message the patch I sent to linux-scsi > > earlier. It should clarify what are the useful parts of the later patch.
> Thanks, yes. I'll leave out updating the verstr[], > I think that should be sent by you alone.
> > I think the release_buffering() call at the end of st_read must say 1. All > > returns use the same path (except the one returning -ERESTARTSYS).
> Okay, if you insist. Yes, all those returns pass that way, but if it > actually did some reading into the memory, it called read_tape, which > did the effective release_buffering immediately after st_do_scsi.
> But perhaps I'm misreading it, and even if not, someone will come > along and "correct" it later, or change things around and make my > not-dirty assumption wrong.
Your analysis is correct. It is not necessary or useful to use dirtied = 1 at the end of st_read(). It has been a long time since I introduced release_buffering() into st.c and I did not read all of the code now.
> It's just that after seeing how sg.c is claiming to dirty even readonly > memory, I'm excessively averse to saying we've dirtied memory we haven't. > My hangup, I'll get over it!
Please don't. I have a very conservative attitude to these things: my priority is to make sure that the data is correct even if it is not the fastest code. I will happily let others point out when I am too conservative.
> OK, I guess I'll stick with 2.6.14.3 for now, plus your patch. Should I > keep Kai's st.c patch? There was some mention of other patches, are > those relevant? Most of that discussion went over my head...
For the "Bad page state" premature freeing you were seeing, only my patch should be relevant. There are other patches in the works, yes, and we have good reasons for them; but don't worry about them for this.
On Fri, 2 Dec 2005, Kai Makisara wrote: > On Fri, 2 Dec 2005, Hugh Dickins wrote:
> > It's just that after seeing how sg.c is claiming to dirty even readonly > > memory, I'm excessively averse to saying we've dirtied memory we haven't. > > My hangup, I'll get over it!
> Please don't. I have a very conservative attitude to these things: my > priority is to make sure that the data is correct even if it is not the > fastest code. I will happily let others point out when I am too > conservative.
I'm not certain which way you're directing me now: a conservative attitude suggests we play safe at the end of st_read, by saying we might somehow have dirtied memory there, perhaps if someone changes sequence.
As I presently, incompletely have it, to maintain more similarity with sg.c, I've actually moved away from "dirtied" to "rw" READ or WRITE, and it will look odd to put WRITE at the end of st_read.
I'm giving up for the evening, and probably won't have a chance to do more until Sunday - the PageCompound issue still under discussion too.
On Fri, Dec 02, 2005 at 08:40:56PM +0000, Hugh Dickins wrote: > On Fri, 2 Dec 2005, Ryan Richter wrote:
> > OK, I guess I'll stick with 2.6.14.3 for now, plus your patch. Should I > > keep Kai's st.c patch? There was some mention of other patches, are > > those relevant? Most of that discussion went over my head...
> For the "Bad page state" premature freeing you were seeing, only my > patch should be relevant. There are other patches in the works, yes, > and we have good reasons for them; but don't worry about them for this.
OK, I've booted 2.6.14.3+mpt fusion patch+your patch. Memtest86 found nothing overnight, so it's not that easy. Also the backups will run daily for at least the next week or two, so I'll pipe up if anything happens.