Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Fw: crash on x86_64 - mm related?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 66 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Ryan Richter  
View profile  
 More options Nov 29 2005, 1:34 pm
Newsgroups: fa.linux.kernel
From: Ryan Richter <r...@tau.solarneutrino.net>
Date: Tue, 29 Nov 2005 18:34:51 UTC
Local: Tues, Nov 29 2005 1:34 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Tue, Nov 29, 2005 at 09:24:32AM -0800, Ryan Richter wrote:

Not sure if this matters, but this apparently happened in two stages.
This first part happened during the backups, as I said earlier:

Everything from here on happened several hours later while updatedb was
running.

...

read more »


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Richter  
View profile  
 More options Nov 29 2005, 3:31 pm
Newsgroups: fa.linux.kernel
From: Ryan Richter <r...@tau.solarneutrino.net>
Date: Tue, 29 Nov 2005 20:31:42 UTC
Local: Tues, Nov 29 2005 3:31 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Tue, Nov 29, 2005 at 10:04:39PM +0200, Kai Makisara wrote:
> I looked at the driver and it seems that there is a bug: st_write calls
> release_buffering at the end even when it has started an asynchronous
> write. This means that it releases the mapping while it is being used!
> (I wonder why this has not been noticed earlier.)

> The patch below (against 2.6.15-rc2) should fix this bug and some others
> related to buffering. It is based on the patch "[PATCH] SCSI tape direct
> i/o fixes" I sent to linux-scsi on Nov 21. The patch restores setting
> pages dirty after reading and clears number of s/g segments when the
> pointers are not valid any more.

> The patch has been lightly tested with AMD64.

This applies cleanly to 2.6.14.2, do you forsee any problems using it
with that kernel?  I'd like to not change too many things at once.

If it should be OK, I'll boot this tonight or tomorrow - the backups run
every other night, so it won't get any testing until tomorrow night.

Thanks a lot,
-ryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kai Makisara  
View profile  
 More options Nov 29 2005, 3:48 pm
Newsgroups: fa.linux.kernel
From: Kai Makisara <Kai.Makis...@kolumbus.fi>
Date: Tue, 29 Nov 2005 20:48:32 UTC
Local: Tues, Nov 29 2005 3:48 pm
Subject: Re: Fw: crash on x86_64 - mm related?

No, I don't see any potential problems applying this patch to 2.6.14.2.
There is nothing specific to 2.6.15-rc2.

If someone sees that there is something wrong, please yell. The
main purpose of the patch is not to call release_buffering() at the end of
st_write() when starting asynchronous write and call it in
write_behind_check() instead.

> If it should be OK, I'll boot this tonight or tomorrow - the backups run
> every other night, so it won't get any testing until tomorrow night.

> Thanks a lot,
> -ryan

Thanks for reporting the problem and thanks in advance for testing.

--
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Richter  
View profile  
 More options Nov 29 2005, 3:59 pm
Newsgroups: fa.linux.kernel
From: Ryan Richter <r...@tau.solarneutrino.net>
Date: Tue, 29 Nov 2005 20:59:34 UTC
Local: Tues, Nov 29 2005 3:59 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Tue, Nov 29, 2005 at 10:48:22PM +0200, Kai Makisara wrote:
> On Tue, 29 Nov 2005, Ryan Richter wrote:
> > This applies cleanly to 2.6.14.2, do you forsee any problems using it
> > with that kernel?  I'd like to not change too many things at once.

> No, I don't see any potential problems applying this patch to 2.6.14.2.
> There is nothing specific to 2.6.15-rc2.

> If someone sees that there is something wrong, please yell. The
> main purpose of the patch is not to call release_buffering() at the end of
> st_write() when starting asynchronous write and call it in
> write_behind_check() instead.

OK, thanks.  I think I'll go ahead and advance to 2.6.14.3 since that
should theoretically not cause any problems.

One question: do you think the oopses that happened later that actually
crashed the box were from damage caused by this bug or is that a
different problem?

> > If it should be OK, I'll boot this tonight or tomorrow - the backups run
> > every other night, so it won't get any testing until tomorrow night.

> > Thanks a lot,
> > -ryan

> Thanks for reporting the problem and thanks in advance for testing.

Sure thing,
-ryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kai Makisara  
View profile  
 More options Nov 29 2005, 4:36 pm
Newsgroups: fa.linux.kernel
From: Kai Makisara <Kai.Makis...@kolumbus.fi>
Date: Tue, 29 Nov 2005 21:36:50 UTC
Local: Tues, Nov 29 2005 4:36 pm
Subject: Re: Fw: crash on x86_64 - mm related?

I looked at the oopses but, not knowing enough about what is happening
inside the kernel, I can only hope that they are caused by the st bug(s).
We will see after testing with the patch.

--
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kai Makisara  
View profile  
 More options Nov 30 2005, 12:12 am
Newsgroups: fa.linux.kernel
From: Kai Makisara <Kai.Makis...@kolumbus.fi>
Date: Wed, 30 Nov 2005 05:12:37 UTC
Local: Wed, Nov 30 2005 12:12 am
Subject: Re: Fw: crash on x86_64 - mm related?

Yelling!

The patch does not cause any problems but my theory about calling
release_buffering() too early is wrong: asynchronous writes are not done
with direct i/o. (The reason for this choice is that the API allows the
user to do anything with the buffer between writes and so the SCSI write
has to be finished before returning from write().)

The other fixes in the patch are valid but I don't see how they should fix
this problem.

--
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kai Makisara  
View profile  
 More options Dec 1 2005, 2:19 pm
Newsgroups: fa.linux.kernel
From: Kai Makisara <Kai.Makis...@kolumbus.fi>
Date: Thu, 01 Dec 2005 19:19:03 UTC
Local: Thurs, Dec 1 2005 2:19 pm
Subject: Re: Fw: crash on x86_64 - mm related?

[ Rest of the oopses cut ]

I have installed amanda and learned to use it enough to do experiments
with my main system. Unfortunately I have not been able to see any oopses.

My system is somewhat similar to yours but not completely. I have a single
processor system with 1 GB memory whereas your system is a dual processor
system with 5 GB memory. We both use the sym53c8xx driver to control the
tape drive.

I have tried 2.6.14.2 and 2.6.15-rc3 kernels with and without the patch I
sent earlier to the list. The first kernels did not have preemption and
NUMA support enabled but later I configured the 2.6.14.2 kernel with both
enabled. This is the nearest thing to your NUMA dual processor system but
it does not seem to be near enough.

Since I can't reproduce the problem, I have to look at the oopses more
carefully. Both yout oopses and those from
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=338335 are quite similar
at the beginning. First come one or more reports about "Bad page state at
free_hot_cold_page". The mapping_count is always two and count is zero.
This condition triggers the message.

The next thing is "Kernel BUG at include/linux/mm.h:341". This is in
put_page(struct page *page) and points to page pointer being NULL.

The third event is "Kernel BUG at mm/rmap.c:487" which results from
"BUG_ON(page_mapcount(page) < 0)". The page pointer has been used used
earlier in page_remove_rmap().

I am not an mm expert and have no idea what could cause this sequence of
events. Any ideas?

If someone has any ideas for my debugging, they are welcome. I will
continue thinking about this but now I am out of useful ideas.

--
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Linus Torvalds  
View profile  
 More options Dec 1 2005, 2:39 pm
Newsgroups: fa.linux.kernel
From: Linus Torvalds <torva...@osdl.org>
Date: Thu, 01 Dec 2005 19:39:25 UTC
Local: Thurs, Dec 1 2005 2:39 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Thu, 1 Dec 2005, Kai Makisara wrote:
> On Tue, 29 Nov 2005, Andrew Morton wrote:

> >  Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8)
> > flags:0x010000000000000c mapping:ffff8100355f1dd8 mapcount:2 count:0
> > Backtrace:

Ryan, can you test 2.6.15-rc4 and report what it does?

The "Bad page state" messages may (should) remain, but the crashes should
be gone and the machine should hopefully continue functioning fine. And,
perhaps more importantly, you should hopefully have a _new_ message about
incomplete pfn mappings that should help pinpoint which driver causes
this..

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Richter  
View profile  
 More options Dec 1 2005, 2:53 pm
Newsgroups: fa.linux.kernel
From: Ryan Richter <r...@tau.solarneutrino.net>
Date: Thu, 01 Dec 2005 19:53:49 UTC
Local: Thurs, Dec 1 2005 2:53 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Thu, Dec 01, 2005 at 09:18:33PM +0200, Kai Makisara wrote:
> I have installed amanda and learned to use it enough to do experiments
> with my main system. Unfortunately I have not been able to see any oopses.

Thanks a lot for doing this!  Our backups run every other night and
usually write 15-20GB to tape, and it was 2 weeks before we hit this.
So it's not very easy to reproduce.  They ran again last night (with the
patch) without incident.

There's one more thing that might be interesting: the morning before
this crash happened, this machine mysteriously rebooted for no apparent
reason.  There was nothing in the logs.  Unfortunately, the BIOS screen
wipes out the last ~24 lines of serial console output on reboot, so all
I can say is that if anything was printed it was much less than 24
lines.  There are other machines on the same UPS so it wasn't a power
glitch.  And it had just had an uptime of 170-some days with 2.6.11.3
(which had some annoying bugs but was stable), so it's not exactly prone
to random reboots...

I didn't think it was worth mentioning since I knew nothing about what
happened, but I noticed that it happened exactly one hour after the
6:37am cron job (updatedb etc.) - the same one that caused it to crash
the next day after the oopses during the backups.  Might it be related?

Let me know if there's any more info I can provide.

-ryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Richter  
View profile  
 More options Dec 1 2005, 2:57 pm
Newsgroups: fa.linux.kernel
From: Ryan Richter <r...@tau.solarneutrino.net>
Date: Thu, 01 Dec 2005 19:57:28 UTC
Local: Thurs, Dec 1 2005 2:57 pm
Subject: Re: Fw: crash on x86_64 - mm related?

Will do, I plan to take this machine down Saturday to run memtest86 for
a while (just to be sure - 2/3 of the RAM is new, but I should be seeing
machine checks if that were the problem, no?) so I'll boot this after that.

Thanks,
-ryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hugh Dickins  
View profile  
 More options Dec 1 2005, 3:22 pm
Newsgroups: fa.linux.kernel
From: Hugh Dickins <h...@veritas.com>
Date: Thu, 01 Dec 2005 20:22:57 UTC
Local: Thurs, Dec 1 2005 3:22 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Thu, 1 Dec 2005, Ryan Richter wrote:
> On Thu, Dec 01, 2005 at 11:38:20AM -0800, Linus Torvalds wrote:

> > > >  Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8)
> > > > flags:0x010000000000000c mapping:ffff8100355f1dd8 mapcount:2 count:0
> > > > Backtrace:

> > Ryan, can you test 2.6.15-rc4 and report what it does?

> > The "Bad page state" messages may (should) remain, but the crashes should
> > be gone and the machine should hopefully continue functioning fine. And,
> > perhaps more importantly, you should hopefully have a _new_ message about
> > incomplete pfn mappings that should help pinpoint which driver causes
> > this..

But we already know which driver does this: drivers/scsi/st.c.

> Will do, I plan to take this machine down Saturday to run memtest86 for
> a while (just to be sure - 2/3 of the RAM is new, but I should be seeing
> machine checks if that were the problem, no?) so I'll boot this after that.

Nick and I had already been looking at drivers/scsi/{sg.c,st.c},
brought there by __put_page in sg.c's peculiar sg_rb_correct4mmap,
which we'd like to remove.  But that's irrelevant to your pain, except...

One extract from the patches I'd like to send Doug and Kai for 2.6.15
or 2.6.16 is this below: since the incomplete get_user_pages path omits
to reset res, but has already released all the pages, it will result in
premature freeing of user pages, and behaviour just like you've seen.

Though I'd have thought incomplete get_user_pages was an exceptional
case, and a bit surprised you'd encounter it.  Perhaps there's some
other premature freeing in the driver, and this instance has nothing
whatever to do with it.

If the problem were easily reproducible, it'd be great if you could
try this patch; but I think you've said it's not :-(

Hugh

--- 2.6.14/drivers/scsi/st.c    2005-10-28 01:02:08.000000000 +0100
+++ linux/drivers/scsi/st.c     2005-12-01 20:06:02.000000000 +0000
@@ -4511,6 +4511,7 @@ static int sgl_map_user_pages(struct sca
        if (res > 0) {
                for (j=0; j < res; j++)
                        page_cache_release(pages[j]);
+               res = 0;
        }
        kfree(pages);
        return res;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Bottomley  
View profile  
 More options Dec 1 2005, 3:29 pm
Newsgroups: fa.linux.kernel
From: James Bottomley <James.Bottom...@SteelEye.com>
Date: Thu, 01 Dec 2005 20:29:16 UTC
Local: Thurs, Dec 1 2005 3:29 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Thu, 2005-12-01 at 11:38 -0800, Linus Torvalds wrote:
> Ryan, can you test 2.6.15-rc4 and report what it does?

> The "Bad page state" messages may (should) remain, but the crashes should
> be gone and the machine should hopefully continue functioning fine. And,
> perhaps more importantly, you should hopefully have a _new_ message about
> incomplete pfn mappings that should help pinpoint which driver causes
> this..

On a side note, I have Kai's patch in the scsi-rc-fixes tree which I'm
getting ready to push.  Can we get a consensus on whether it should be
removed before I merge upwards?

Thanks,

James

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kai Makisara  
View profile  
 More options Dec 1 2005, 4:17 pm
Newsgroups: fa.linux.kernel
From: Kai Makisara <Kai.Makis...@kolumbus.fi>
Date: Thu, 01 Dec 2005 21:17:34 UTC
Local: Thurs, Dec 1 2005 4:17 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Thu, 1 Dec 2005, James Bottomley wrote:
> On Thu, 2005-12-01 at 11:38 -0800, Linus Torvalds wrote:
> > Ryan, can you test 2.6.15-rc4 and report what it does?

> > The "Bad page state" messages may (should) remain, but the crashes should
> > be gone and the machine should hopefully continue functioning fine. And,
> > perhaps more importantly, you should hopefully have a _new_ message about
> > incomplete pfn mappings that should help pinpoint which driver causes
> > this..

> On a side note, I have Kai's patch in the scsi-rc-fixes tree which I'm
> getting ready to push.  Can we get a consensus on whether it should be
> removed before I merge upwards?

I think it should be removed because it is based partly on a wrong
assumption: asynchronous writes are _not_ done together with direct i/o.
(I have also experimentally verified that this does not happen.)

The patch includes the patch I sent sent to linux-scsi on Nov 21. Nobody
has commented it and I don't know if the user pages have to be explicitly
marked dirty after the HBA has read data there. If they have to, then this
earlier patch is valid. If not, I will send a patch for 2.6.16 to remove
the latent code.

--
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kai Makisara  
View profile  
 More options Dec 1 2005, 4:44 pm
Newsgroups: fa.linux.kernel
From: Kai Makisara <Kai.Makis...@kolumbus.fi>
Date: Thu, 01 Dec 2005 21:44:46 UTC
Local: Thurs, Dec 1 2005 4:44 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Thu, 1 Dec 2005, Hugh Dickins wrote:

..

Whether this fix solves the current problem or not, it clearly fixes a
bug. If someone wants to include this into a patch series, you can add
Signed-off-by: Kai Makisara <kai.makis...@kolumbus.fi>
If not, I will include this when I send patches next time.

Thanks for noticing this problem.

Kai

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hugh Dickins  
View profile  
 More options Dec 2 2005, 8:46 am
Newsgroups: fa.linux.kernel
From: Hugh Dickins <h...@veritas.com>
Date: Fri, 02 Dec 2005 13:46:58 UTC
Local: Fri, Dec 2 2005 8:46 am
Subject: Re: Fw: crash on x86_64 - mm related?

On Thu, 1 Dec 2005, Kai Makisara wrote:
> On Thu, 1 Dec 2005, James Bottomley wrote:

> > On a side note, I have Kai's patch in the scsi-rc-fixes tree which I'm
> > getting ready to push.  Can we get a consensus on whether it should be
> > removed before I merge upwards?

I too need to decide whether to base my little sg.c,st.c patchset
on top of Kai's (as I see in 2.6.15-rc3-mm1, I presume) or on top
of 2.6.15-rc4.

> I think it should be removed because it is based partly on a wrong
> assumption: asynchronous writes are _not_ done together with direct i/o.
> (I have also experimentally verified that this does not happen.)

I'm assuming from this that I'd best base on 2.6.15-rc4;
but by all means overrule me if you've changed your mind.

> The patch includes the patch I sent sent to linux-scsi on Nov 21. Nobody
> has commented it and I don't know if the user pages have to be explicitly
> marked dirty after the HBA has read data there. If they have to, then this
> earlier patch is valid.

What I see in 2.6.15-rc3-mm1 looks like three patches.

One to do with resetting sg_segs to 0 at various points:
I've no appreciation of that patch at all.  If it would help you for
me to add that into my little set, please send me a comment for it.

One to add an "is_read" argument to release_buffering.  Yes, that's
a part of my set too, though in my case called "dirtied" (and I believe
that the call at the end of st_read can say 0, because that's just for
an error path: when it's really dirtied user memory, it'll be read_tape
that does the release_buffering).  sg.c was always saying dirtied, even
when writing from memory; st.c was always saying not dirtied, even when
reading into memory.  Usually the latter is okay, get_user_pages has
said dirty in advance; but under pressure there's a window whereby it's
not good enough.  And SetPageDirty can be counter-productive these days,
so your patch is incomplete in that regard: I'll explain more in mine.

One to move around where release_buffering is called from:
that's the part you've decided was wrong, or at least unnecessary.

> If not, I will send a patch for 2.6.16 to remove the latent code.

I didn't understand that bit, but I probably don't need to.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kai Makisara  
View profile  
 More options Dec 2 2005, 12:59 pm
Newsgroups: fa.linux.kernel
From: Kai Makisara <Kai.Makis...@kolumbus.fi>
Date: Fri, 02 Dec 2005 17:59:57 UTC
Local: Fri, Dec 2 2005 12:59 pm
Subject: Re: Fw: crash on x86_64 - mm related?

I include at the end of this message the patch I sent to linux-scsi
earlier. It should clarify what are the useful parts of the later patch.

I think the release_buffering() call at the end of st_read must say 1. All
returns use the same path (except the one returning -ERESTARTSYS).

This is the comment related to this part:
- the number of s/g segments has not always been zeroed when the page
  pointers become invalid

The changes setting sg_segs to 0 don't fix any known bugs. They will be
necessary if/when Mike Christie's patches will be merged later. You can
omit these things now if you want.

> One to add an "is_read" argument to release_buffering.  Yes, that's
> a part of my set too, though in my case called "dirtied" (and I believe
> that the call at the end of st_read can say 0, because that's just for
> an error path: when it's really dirtied user memory, it'll be read_tape
> that does the release_buffering).  sg.c was always saying dirtied, even
> when writing from memory; st.c was always saying not dirtied, even when
> reading into memory.  Usually the latter is okay, get_user_pages has
> said dirty in advance; but under pressure there's a window whereby it's
> not good enough.  And SetPageDirty can be counter-productive these days,
> so your patch is incomplete in that regard: I'll explain more in mine.

Good, I will leave sorting out these things to to you. Any naming is OK
with me.

I think the release_buffering() call at the end of st_read must say 1. All
returns use the same path (except the one returning -ERESTARTSYS).

st.c did set pages dirty after reading before 2.6.0-test4. It disappeared
when code was rearranged and I don't have any notes about why.

> One to move around where release_buffering is called from:
> that's the part you've decided was wrong, or at least unnecessary.

It is unnecessary but does not cause any problems. It is wrong because it
hints that asynchronous writes could be done with direct i/o.

> > If not, I will send a patch for 2.6.16 to remove the latent code.

> I didn't understand that bit, but I probably don't need to.

After the previous comments, you don't need to :-) The meaning was that I
would remove the extra parameter and code setting pages dirty from
sgl_unmap_pages() if that code would never be used.

> Hugh

Thanks for looking at the code.

Kai

----------
This patch is against 2.6.15-rc2 and fixes the following two bugs in the SCSI
tape driver:
- the pages dirtied by reading data to user space have not been marked dirty
- the number of s/g segments has not always been zeroed when the page
  pointers become invalid

Signed-off-by: Kai Makisara <kai.makis...@kolumbus.fi>
---
--- linux-2.6.15-rc2/drivers/scsi/st.c  2005-11-20 22:10:00.000000000 +0200
+++ linux-2.6.15-rc2-k1/drivers/scsi/st.c       2005-11-20 22:33:25.000000000 +0200
@@ -17,7 +17,7 @@
    Last modified: 18-JAN-1998 Richard Gooch <rgo...@atnf.csiro.au> Devfs support
  */

-static char *verstr = "20050830";
+static char *verstr = "20051120";

 #include <linux/module.h>

@@ -1449,14 +1449,15 @@ static int setup_buffering(struct scsi_t

 /* Can be called more than once after each setup_buffer() */
-static void release_buffering(struct scsi_tape *STp)
+static void release_buffering(struct scsi_tape *STp, int is_read)
 {
        struct st_buffer *STbp;

        STbp = STp->buffer;
        if (STbp->do_dio) {
-               sgl_unmap_user_pages(&(STbp->sg[0]), STbp->do_dio, 0);
+               sgl_unmap_user_pages(&(STbp->sg[0]), STbp->do_dio, is_read);
                STbp->do_dio = 0;
+               STbp->sg_segs = 0;
        }
 }

@@ -1729,7 +1730,7 @@ st_write(struct file *filp, const char _
  out:
        if (SRpnt != NULL)
                scsi_release_request(SRpnt);
-       release_buffering(STp);
+       release_buffering(STp, 0);
        up(&STp->lock);

        return retval;
@@ -1787,7 +1788,7 @@ static long read_tape(struct scsi_tape *
        SRpnt = *aSRpnt;
        SRpnt = st_do_scsi(SRpnt, STp, cmd, bytes, DMA_FROM_DEVICE,
                           STp->device->timeout, MAX_RETRIES, 1);
-       release_buffering(STp);
+       release_buffering(STp, 1);
        *aSRpnt = SRpnt;
        if (!SRpnt)
                return STbp->syscall_result;
@@ -2058,7 +2059,7 @@ st_read(struct file *filp, char __user *
                SRpnt = NULL;
        }
        if (do_dio) {
-               release_buffering(STp);
+               release_buffering(STp, 1);
                STbp->buffer_bytes = 0;
        }
        up(&STp->lock);
@@ -3670,6 +3671,7 @@ static void normalize_buffer(struct st_b
        }
        STbuffer->frp_segs = STbuffer->orig_frp_segs;
        STbuffer->frp_sg_current = 0;
+       STbuffer->sg_segs = 0;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Richter  
View profile  
 More options Dec 2 2005, 1:04 pm
Newsgroups: fa.linux.kernel
From: Ryan Richter <r...@tau.solarneutrino.net>
Date: Fri, 02 Dec 2005 18:04:15 UTC
Local: Fri, Dec 2 2005 1:04 pm
Subject: Re: Fw: crash on x86_64 - mm related?

I can throw this in and test it for sure.  I'll run the backups every
day for a while to speed up the testing also.

Could someone please tell me exactly which patches I should include in
the kernel I will boot tomorrow?  I haven't played with -rc for ages, so
I'm no longer sure which kernel I should start with (2.6.14 or
2.6.14.3?).  Are the MPT-fusion performance fix patches in -rc4, or if
not will they still apply?

Thanks,
-ryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jesper Juhl  
View profile  
 More options Dec 2 2005, 1:44 pm
Newsgroups: fa.linux.kernel
From: Jesper Juhl <jesper.j...@gmail.com>
Date: Fri, 02 Dec 2005 18:44:07 UTC
Local: Fri, Dec 2 2005 1:44 pm
Subject: Re: Fw: crash on x86_64 - mm related?
On 12/2/05, Ryan Richter <r...@tau.solarneutrino.net> wrote:
[snip]

> Could someone please tell me exactly which patches I should include in
> the kernel I will boot tomorrow?  I haven't played with -rc for ages, so
> I'm no longer sure which kernel I should start with (2.6.14 or
> 2.6.14.3?).  Are the MPT-fusion performance fix patches in -rc4, or if
> not will they still apply?

The -rc patches apply to the base x.y.z kernel - in this case 2.6.14
For more info, read Documentation/applying-patches.txt in a recent
kernel source (or use this link:
http://sosdg.org/~coywolf/lxr/source/Documentation/applying-patches.txt
)

--
Jesper Juhl <jesper.j...@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hugh Dickins  
View profile  
 More options Dec 2 2005, 1:56 pm
Newsgroups: fa.linux.kernel
From: Hugh Dickins <h...@veritas.com>
Date: Fri, 02 Dec 2005 18:56:44 UTC
Local: Fri, Dec 2 2005 1:56 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Fri, 2 Dec 2005, Kai Makisara wrote:
> On Fri, 2 Dec 2005, Hugh Dickins wrote:
> I include at the end of this message the patch I sent to linux-scsi
> earlier. It should clarify what are the useful parts of the later patch.

Thanks, yes.  I'll leave out updating the verstr[],
I think that should be sent by you alone.

> I think the release_buffering() call at the end of st_read must say 1. All
> returns use the same path (except the one returning -ERESTARTSYS).

Okay, if you insist.  Yes, all those returns pass that way, but if it
actually did some reading into the memory, it called read_tape, which
did the effective release_buffering immediately after st_do_scsi.

But perhaps I'm misreading it, and even if not, someone will come
along and "correct" it later, or change things around and make my
not-dirty assumption wrong.

It's just that after seeing how sg.c is claiming to dirty even readonly
memory, I'm excessively averse to saying we've dirtied memory we haven't.
My hangup, I'll get over it!

> st.c did set pages dirty after reading before 2.6.0-test4. It disappeared
> when code was rearranged and I don't have any notes about why.

Possibly because of issues with hugetlb compound pages: David Gibson
raised that issue recently with respect to access_process_vm
(page[1].mapping is reused and crashes set_page_dirty), I'm thinking
we don't want to add PageCompound tests all over, and have mailed
Andrew separately for guidance on that.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hugh Dickins  
View profile  
 More options Dec 2 2005, 2:14 pm
Newsgroups: fa.linux.kernel
From: Hugh Dickins <h...@veritas.com>
Date: Fri, 02 Dec 2005 19:14:48 UTC
Local: Fri, Dec 2 2005 2:14 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Fri, 2 Dec 2005, Ryan Richter wrote:
> On Thu, Dec 01, 2005 at 08:21:57PM +0000, Hugh Dickins wrote:
> > If the problem were easily reproducible, it'd be great if you could
> > try this patch; but I think you've said it's not :-(

> I can throw this in and test it for sure.  I'll run the backups every
> day for a while to speed up the testing also.

Thanks - though everyone seems to have agreed that the patch is needed
whatever: so although it would be interesting to know if it has fixed
your problem, we're not waiting on you to go forward with it (James
already invited Linus to pull).

> Could someone please tell me exactly which patches I should include in
> the kernel I will boot tomorrow?

You hit the problem with 2.6.14.2.  My own opinion would be to apply
just that patch to that release, or to 2.6.14.3 if you prefer.

Linus was suggesting 2.6.15-rc4 because there's some debug there which
might have helped identify the offending driver: but your backtrace
had already showed us the offending driver.  Well, not proven: if a
page is doubly freed, the one that suffers is not necessarily the one
that's guilty; but it's a reasonable expectation.  I don't think his
debug would tell us anything more.

> I haven't played with -rc for ages, so
> I'm no longer sure which kernel I should start with (2.6.14 or
> 2.6.14.3?).

If you do want 2.6.15-rc4, then it's 2.6.14 + patch-2.6.15-rc4.

> Are the MPT-fusion performance fix patches in -rc4, or if
> not will they still apply?

I don't know it myself.  Is that the one about the module performing
much better than the builtin?  I'd expect that to be in.  If you have
it, suggest you look at your 2.6.15-rc4 source to see if it's there.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Richter  
View profile  
 More options Dec 2 2005, 2:45 pm
Newsgroups: fa.linux.kernel
From: Ryan Richter <r...@tau.solarneutrino.net>
Date: Fri, 02 Dec 2005 19:45:47 UTC
Local: Fri, Dec 2 2005 2:45 pm
Subject: Re: Fw: crash on x86_64 - mm related?

OK, I guess I'll stick with 2.6.14.3 for now, plus your patch.  Should I
keep Kai's st.c patch?  There was some mention of other patches, are
those relevant?  Most of that discussion went over my head...

Thanks,
-ryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kai Makisara  
View profile  
 More options Dec 2 2005, 2:47 pm
Newsgroups: fa.linux.kernel
From: Kai Makisara <Kai.Makis...@kolumbus.fi>
Date: Fri, 02 Dec 2005 19:47:09 UTC
Local: Fri, Dec 2 2005 2:47 pm
Subject: Re: Fw: crash on x86_64 - mm related?

Your analysis is correct. It is not necessary or useful to use dirtied = 1
at the end of st_read(). It has been a long time since I introduced
release_buffering() into st.c and I did not read all of the code now.

> It's just that after seeing how sg.c is claiming to dirty even readonly
> memory, I'm excessively averse to saying we've dirtied memory we haven't.
> My hangup, I'll get over it!

Please don't. I have a very conservative attitude to these things: my
priority is to make sure that the data is correct even if it is not the
fastest code. I will happily let others point out when I am too
conservative.

--
Kai
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hugh Dickins  
View profile  
 More options Dec 2 2005, 3:41 pm
Newsgroups: fa.linux.kernel
From: Hugh Dickins <h...@veritas.com>
Date: Fri, 02 Dec 2005 20:41:34 UTC
Local: Fri, Dec 2 2005 3:41 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Fri, 2 Dec 2005, Ryan Richter wrote:

> OK, I guess I'll stick with 2.6.14.3 for now, plus your patch.  Should I
> keep Kai's st.c patch?  There was some mention of other patches, are
> those relevant?  Most of that discussion went over my head...

For the "Bad page state" premature freeing you were seeing, only my
patch should be relevant.  There are other patches in the works, yes,
and we have good reasons for them; but don't worry about them for this.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hugh Dickins  
View profile  
 More options Dec 2 2005, 3:48 pm
Newsgroups: fa.linux.kernel
From: Hugh Dickins <h...@veritas.com>
Date: Fri, 02 Dec 2005 20:48:26 UTC
Local: Fri, Dec 2 2005 3:48 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Fri, 2 Dec 2005, Kai Makisara wrote:
> On Fri, 2 Dec 2005, Hugh Dickins wrote:

> > It's just that after seeing how sg.c is claiming to dirty even readonly
> > memory, I'm excessively averse to saying we've dirtied memory we haven't.
> > My hangup, I'll get over it!

> Please don't. I have a very conservative attitude to these things: my
> priority is to make sure that the data is correct even if it is not the
> fastest code. I will happily let others point out when I am too
> conservative.

I'm not certain which way you're directing me now: a conservative
attitude suggests we play safe at the end of st_read, by saying we might
somehow have dirtied memory there, perhaps if someone changes sequence.

As I presently, incompletely have it, to maintain more similarity with
sg.c, I've actually moved away from "dirtied" to "rw" READ or WRITE,
and it will look odd to put WRITE at the end of st_read.

I'm giving up for the evening, and probably won't have a chance to do
more until Sunday - the PageCompound issue still under discussion too.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ryan Richter  
View profile  
 More options Dec 3 2005, 12:30 pm
Newsgroups: fa.linux.kernel
From: Ryan Richter <r...@tau.solarneutrino.net>
Date: Sat, 03 Dec 2005 17:30:40 UTC
Local: Sat, Dec 3 2005 12:30 pm
Subject: Re: Fw: crash on x86_64 - mm related?

On Fri, Dec 02, 2005 at 08:40:56PM +0000, Hugh Dickins wrote:
> On Fri, 2 Dec 2005, Ryan Richter wrote:

> > OK, I guess I'll stick with 2.6.14.3 for now, plus your patch.  Should I
> > keep Kai's st.c patch?  There was some mention of other patches, are
> > those relevant?  Most of that discussion went over my head...

> For the "Bad page state" premature freeing you were seeing, only my
> patch should be relevant.  There are other patches in the works, yes,
> and we have good reasons for them; but don't worry about them for this.

OK, I've booted 2.6.14.3+mpt fusion patch+your patch.  Memtest86 found
nothing overnight, so it's not that easy.  Also the backups will run
daily for at least the next week or two, so I'll pipe up if anything
happens.

Thanks,
-ryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 66   Newer >
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google