Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] mm: do not use page_count without a page pin

45 views
Skip to first unread message

Minchan Kim

unread,
Jun 10, 2012, 8:18:56 PM6/10/12
to Andrew Morton, linux-...@vger.kernel.org, linu...@kvack.org, Minchan Kim, Andrea Arcangeli, Mel Gorman, Michal Hocko, KAMEZAWA Hiroyuki
d179e84ba fixed the problem[1] in vmscan.c but same problem is here.
Let's fix it.

[1] http://comments.gmane.org/gmane.linux.kernel.mm/65844

I copy and paste d179e84ba's contents for description.

"It is unsafe to run page_count during the physical pfn scan because
compound_head could trip on a dangling pointer when reading
page->first_page if the compound page is being freed by another CPU."

Cc: Andrea Arcangeli <aarc...@redhat.com>
Cc: Mel Gorman <mgo...@suse.de>
Cc: Michal Hocko <mho...@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>
Signed-off-by: Minchan Kim <min...@kernel.org>
---
mm/page_alloc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 266f267..019c4fe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5496,7 +5496,11 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
continue;

page = pfn_to_page(check);
- if (!page_count(page)) {
+ /*
+ * We can't use page_count withou pin a page
+ * because another CPU can free compound page.
+ */
+ if (!atomic_read(&page->_count)) {
if (PageBuddy(page))
iter += (1 << page_order(page)) - 1;
continue;
--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Wanpeng Li

unread,
Jun 10, 2012, 8:24:17 PM6/10/12
to Minchan Kim, Andrew Morton, linux-...@vger.kernel.org, linu...@kvack.org, Minchan Kim, Andrea Arcangeli, MelGorman, Michal Hocko, KAMEZAWA Hiroyuki
^
without
>+ * because another CPU can free compound page.
>+ */
>+ if (!atomic_read(&page->_count)) {
> if (PageBuddy(page))
> iter += (1 << page_order(page)) - 1;
> continue;
>--
>1.7.9.5
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majo...@kvack.org. For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"do...@kvack.org"> em...@kvack.org </a>

Minchan Kim

unread,
Jun 10, 2012, 10:09:17 PM6/10/12
to Wanpeng Li, Andrew Morton, linux-...@vger.kernel.org, linu...@kvack.org, Andrea Arcangeli, MelGorman, Michal Hocko, KAMEZAWA Hiroyuki
Hi Wanpeng,
I will resend fixed version after reviewer comment out.
Thanks!

--
Kind regards,
Minchan Kim

Kamezawa Hiroyuki

unread,
Jun 11, 2012, 3:22:35 AM6/11/12
to Minchan Kim, Andrew Morton, linux-...@vger.kernel.org, linu...@kvack.org, Andrea Arcangeli, Mel Gorman, Michal Hocko
Nice Catch.

Other than the comment fix already pointed out..
Hmm...BTW, it seems this __count_xxx doesn't have any code for THP/Hugepage..
so, we need more fixes for better code, I think.
Hmm, Don't we need !PageTail() check and 'skip thp' code ?

Thanks,
-Kame

Andrea Arcangeli

unread,
Jun 11, 2012, 3:44:57 AM6/11/12
to Kamezawa Hiroyuki, Minchan Kim, Andrew Morton, linux-...@vger.kernel.org, linu...@kvack.org, Mel Gorman, Michal Hocko
Hi,
Agreed!

> Other than the comment fix already pointed out..
> Hmm...BTW, it seems this __count_xxx doesn't have any code for THP/Hugepage..
> so, we need more fixes for better code, I think.
> Hmm, Don't we need !PageTail() check and 'skip thp' code ?

So the page->_count for tail pages is guaranteed zero at all times
(tail page refcounting is done on _mapcount).

We could add a comment that "this check already skips compound tails
of THP because their page->_count is zero at all times".

Instead of a comment we could consider defining an inline function
with a special name that does atomic_read(&page->_count) and use it
when we intend to the regular or compound head count and return 0 on
tails. It would make it easier to identify these places later if we
ever want to change the refcounting mechanism, but it may be overkill,
it's up to you.

Tail pages also can't be PageLRU.

The code after the patch should already skip thp tails fine (it won't
skip heads but I believe that's intentional, but one problem that
remains is that the heads should increase found by more than 1...).

Thanks,
Andrea

Kamezawa Hiroyuki

unread,
Jun 11, 2012, 4:51:07 AM6/11/12
to Andrea Arcangeli, Minchan Kim, Andrew Morton, linux-...@vger.kernel.org, linu...@kvack.org, Mel Gorman, Michal Hocko
Thank you for clarification.

I'll look into this later. Fortunately, our team has memory-hotplug
team again for our next server and should revisit this :)
I'll give an input to them.

Thanks,
-Kame

Minchan Kim

unread,
Jun 11, 2012, 9:30:59 AM6/11/12
to Andrea Arcangeli, Kamezawa Hiroyuki, Minchan Kim, Andrew Morton, linux-...@vger.kernel.org, linu...@kvack.org, Mel Gorman, Michal Hocko
Hi Andrea,
Sure.

>
> We could add a comment that "this check already skips compound tails
> of THP because their page->_count is zero at all times".

No problem.

>
> Instead of a comment we could consider defining an inline function
> with a special name that does atomic_read(&page->_count) and use it
> when we intend to the regular or compound head count and return 0 on
> tails. It would make it easier to identify these places later if we
> ever want to change the refcounting mechanism, but it may be overkill,
> it's up to you.

That's a good idea but it's not proper time because I don't have much time
for it and other patch[1] is pended by this.

I hope it could be another nice clean up patch later. :)

[1] https://lkml.org/lkml/2012/6/11/169

>
> Tail pages also can't be PageLRU.
>
> The code after the patch should already skip thp tails fine (it won't
> skip heads but I believe that's intentional, but one problem that
> remains is that the heads should increase found by more than 1...).

I can't fail to parse your last sentense.
Could you elaborate it more?

AFAIUC, you mean we have to increase reference count of head page?
If so, it's not in __count_immobile_pages because it is already race-likely function
so it shouldn't be critical although race happens.

If I miss something, please let me know it.

Andrea Arcangeli

unread,
Jun 11, 2012, 10:41:51 AM6/11/12
to Minchan Kim, Kamezawa Hiroyuki, Andrew Morton, linux-...@vger.kernel.org, linu...@kvack.org, Mel Gorman, Michal Hocko
Hi Minchan,

On Mon, Jun 11, 2012 at 10:30:43PM +0900, Minchan Kim wrote:
> AFAIUC, you mean we have to increase reference count of head page?
> If so, it's not in __count_immobile_pages because it is already race-likely function
> so it shouldn't be critical although race happens.

I meant, shouldn't we take into account the full size? If it's in the
lru the whole thing can be moved away.

if (!PageLRU(page)) {
nr_pages = hpage_nr_pages(page);
barrier();
found += nr_pages;
iter += nr_pages-1;

Minchan Kim

unread,
Jun 11, 2012, 6:49:40 PM6/11/12
to Andrea Arcangeli, Kamezawa Hiroyuki, Andrew Morton, linux-...@vger.kernel.org, linu...@kvack.org, Mel Gorman, Michal Hocko
On 06/11/2012 11:41 PM, Andrea Arcangeli wrote:

> Hi Minchan,
>
> On Mon, Jun 11, 2012 at 10:30:43PM +0900, Minchan Kim wrote:
>> AFAIUC, you mean we have to increase reference count of head page?
>> If so, it's not in __count_immobile_pages because it is already race-likely function
>> so it shouldn't be critical although race happens.
>
> I meant, shouldn't we take into account the full size? If it's in the
> lru the whole thing can be moved away.
>
> if (!PageLRU(page)) {
> nr_pages = hpage_nr_pages(page);
> barrier();


Could you explain why we need barrier?

> found += nr_pages;
> iter += nr_pages-1;
> }
>


Thanks for the explain.

For the normal pages, the logic accounts it as "non-movable pages" so for the consistency,
it seems you're right. But let's think about a bit.

If THP page isn't LRU and it's still PageTransHuge, I think it's rather rare and
although it happens, it means migration/reclaimer is about to split or isolate/putback
so it ends up making THP page movable pages.

IMHO, it would be better to account it by movable pages.
What do you think about it?

Thanks.
--
Kind regards,
Minchan Kim

Minchan Kim

unread,
Jun 13, 2012, 9:49:55 PM6/13/12
to Andrea Arcangeli, Kamezawa Hiroyuki, Andrew Morton, linux-...@vger.kernel.org, linu...@kvack.org, Mel Gorman, Michal Hocko
On 06/14/2012 10:21 AM, Andrea Arcangeli wrote:

> On Tue, Jun 12, 2012 at 07:49:34AM +0900, Minchan Kim wrote:
>> If THP page isn't LRU and it's still PageTransHuge, I think it's rather rare and
>> although it happens, it means migration/reclaimer is about to split or isolate/putback
>> so it ends up making THP page movable pages.
>>
>> IMHO, it would be better to account it by movable pages.
>> What do you think about it?
>
> Agreed. Besides THP don't fragment pageblocks. It was just about
> speeding up the scanning the same way it happens with the pagebuddy
> check, but probably not worth it because we're in a racy area here not
> holding locks. pagebuddy is safe because the zone lock is hold, or
> it'd run in the same problem.


Yeb. zone lock is already hold so pagebuddy check is safe but THP still in a racy so let's leave it as it is.
If you don't have concern about this patch any more, could you add Acked-by in my latest patch for Andrew
to pick up? Although you have a concern, let's make it as separate patch because it's optimization patch and
other patch is pending by this.

Thanks, Andrea.
0 new messages