[RFC PATCH 00/10] Reduce stack usage used by page reclaim V1

Mel Gorman

unread,

Apr 15, 2010, 1:30:02 PM4/15/10

to

This is just an RFC to reduce some of the more obvious stack usage in page
reclaim. It's a bit rushed and I haven't tested this yet but am sending
it out as there may be others working on similar material and would rather
avoid overlap. I built on some of Kosaki Motohiro's work.

On X86 bit, stack usage figures (generated using a modified bloat-o-meter
that uses checkstack.pl as its input) change in the following ways after
the series of patches.

add/remove: 2/0 grow/shrink: 0/4 up/down: 804/-1688 (-884)
function old new delta
putback_lru_pages - 676 +676
update_isolated_counts - 128 +128
do_try_to_free_pages 172 128 -44
kswapd 1324 1168 -156
shrink_page_list 1616 1224 -392
shrink_zone 2320 1224 -1096

There are some growths there but critically they are no longer in the path
that would call writepages. In the main path, there is about 1K of stack
lopped off giving a small amount of breathing room.

KOSAKI Motohiro (3):
vmscan: kill prev_priority completely
vmscan: move priority variable into scan_control
vmscan: simplify shrink_inactive_list()

Mel Gorman (7):
vmscan: Remove useless loop at end of do_try_to_free_pages
vmscan: Remove unnecessary temporary vars in do_try_to_free_pages
vmscan: Split shrink_zone to reduce stack usage
vmscan: Remove unnecessary temporary variables in shrink_zone()
vmscan: Setup pagevec as late as possible in shrink_inactive_list()
vmscan: Setup pagevec as late as possible in shrink_page_list()
vmscan: Update isolated page counters outside of main path in
shrink_inactive_list()

include/linux/mmzone.h | 15 --
mm/page_alloc.c | 2 -
mm/vmscan.c | 447 +++++++++++++++++++++++-------------------------
mm/vmstat.c | 2 -
4 files changed, 210 insertions(+), 256 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Mel Gorman

unread,

Apr 15, 2010, 1:30:02 PM4/15/10

to

With the patch "vmscan: kill prev_priority completely", the loop at the
end of do_try_to_free_pages() is now doing nothing. Delete it.

Signed-off-by: Mel Gorman <m...@csn.ul.ie>
---

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 76c2b03..838ac8b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1806,11 +1806,6 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
ret = sc->nr_reclaimed;

out:
- if (scanning_global_lru(sc))
- for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
- if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
- continue;
-
delayacct_freepages_end();

return ret;
--
1.6.5

Mel Gorman

unread,

Apr 15, 2010, 1:30:03 PM4/15/10

to

When shrink_inactive_list() isolates pages, it updates a number of
counters using temporary variables to gather them. These consume stack
and it's in the main path that calls ->writepage(). This patch moves the
accounting updates outside of the main path to reduce stack usage.

Signed-off-by: Mel Gorman <m...@csn.ul.ie>
---

mm/vmscan.c | 63 +++++++++++++++++++++++++++++++++++-----------------------
1 files changed, 38 insertions(+), 25 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2c22c83..4225319 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1061,7 +1061,8 @@ static unsigned long clear_active_flags(struct list_head *page_list,
ClearPageActive(page);
nr_active++;
}
- count[lru]++;
+ if (count)
+ count[lru]++;
}

return nr_active;
@@ -1141,12 +1142,13 @@ static int too_many_isolated(struct zone *zone, int file,
* TODO: Try merging with migrations version of putback_lru_pages
*/
static noinline void putback_lru_pages(struct zone *zone,
- struct zone_reclaim_stat *reclaim_stat,
+ struct scan_control *sc,
unsigned long nr_anon, unsigned long nr_file,
struct list_head *page_list)
{
struct page *page;
struct pagevec pvec;
+ struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);

pagevec_init(&pvec, 1);

@@ -1185,6 +1187,37 @@ static noinline void putback_lru_pages(struct zone *zone,
pagevec_release(&pvec);
}

+static noinline void update_isolated_counts(struct zone *zone,
+ struct scan_control *sc,
+ unsigned long *nr_anon,
+ unsigned long *nr_file,
+ struct list_head *isolated_list)
+{
+ unsigned long nr_active;
+ unsigned int count[NR_LRU_LISTS] = { 0, };
+ struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
+
+ nr_active = clear_active_flags(isolated_list, count);
+ __count_vm_events(PGDEACTIVATE, nr_active);
+
+ __mod_zone_page_state(zone, NR_ACTIVE_FILE,
+ -count[LRU_ACTIVE_FILE]);
+ __mod_zone_page_state(zone, NR_INACTIVE_FILE,
+ -count[LRU_INACTIVE_FILE]);
+ __mod_zone_page_state(zone, NR_ACTIVE_ANON,
+ -count[LRU_ACTIVE_ANON]);
+ __mod_zone_page_state(zone, NR_INACTIVE_ANON,
+ -count[LRU_INACTIVE_ANON]);
+
+ *nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
+ *nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, *nr_anon);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, *nr_file);
+
+ reclaim_stat->recent_scanned[0] += *nr_anon;
+ reclaim_stat->recent_scanned[1] += *nr_file;
+}
+
/*
* shrink_inactive_list() is a helper for shrink_zone(). It returns the number
* of reclaimed pages
@@ -1196,11 +1229,9 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
LIST_HEAD(page_list);
unsigned long nr_scanned;
unsigned long nr_reclaimed = 0;
- struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
int lumpy_reclaim = 0;
unsigned long nr_taken;
unsigned long nr_active;
- unsigned int count[NR_LRU_LISTS] = { 0, };
unsigned long nr_anon;
unsigned long nr_file;

@@ -1244,25 +1275,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
return 0;
}

- nr_active = clear_active_flags(&page_list, count);
- __count_vm_events(PGDEACTIVATE, nr_active);
-
- __mod_zone_page_state(zone, NR_ACTIVE_FILE,
- -count[LRU_ACTIVE_FILE]);
- __mod_zone_page_state(zone, NR_INACTIVE_FILE,
- -count[LRU_INACTIVE_FILE]);
- __mod_zone_page_state(zone, NR_ACTIVE_ANON,
- -count[LRU_ACTIVE_ANON]);
- __mod_zone_page_state(zone, NR_INACTIVE_ANON,
- -count[LRU_INACTIVE_ANON]);
-
- nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
- nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
- __mod_zone_page_state(zone, NR_ISOLATED_ANON, nr_anon);
- __mod_zone_page_state(zone, NR_ISOLATED_FILE, nr_file);
-
- reclaim_stat->recent_scanned[0] += nr_anon;
- reclaim_stat->recent_scanned[1] += nr_file;
+ update_isolated_counts(zone, sc, &nr_anon, &nr_file, &page_list);

spin_unlock_irq(&zone->lru_lock);

@@ -1281,7 +1294,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
* The attempt at page out may have made some
* of the pages active, mark them inactive again.
*/
- nr_active = clear_active_flags(&page_list, count);
+ nr_active = clear_active_flags(&page_list, NULL);
count_vm_events(PGDEACTIVATE, nr_active);

nr_reclaimed += shrink_page_list(&page_list, sc,
@@ -1293,7 +1306,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
__count_zone_vm_events(PGSTEAL, zone, nr_reclaimed);

- putback_lru_pages(zone, reclaim_stat, nr_anon, nr_file, &page_list);
+ putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
return nr_reclaimed;
}

--
1.6.5

Mel Gorman

unread,

Apr 15, 2010, 1:30:03 PM4/15/10

to

shrink_inactive_list() sets up a pagevec to release unfreeable pages. It
uses significant amounts of stack doing this. This patch splits
shrink_inactive_list() to take the stack usage out of the main path so
that callers to writepage() do not contain an unused pagevec on the
stack.

Signed-off-by: Mel Gorman <m...@csn.ul.ie>
---

mm/vmscan.c | 93 +++++++++++++++++++++++++++++++++-------------------------
1 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a232ad6..9bc1ede 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1120,6 +1120,54 @@ static int too_many_isolated(struct zone *zone, int file,
}

/*
+ * TODO: Try merging with migrations version of putback_lru_pages
+ */
+static noinline void putback_lru_pages(struct zone *zone,
+ struct zone_reclaim_stat *reclaim_stat,
+ unsigned long nr_anon, unsigned long nr_file,
+ struct list_head *page_list)
+{
+ struct page *page;
+ struct pagevec pvec;
+
+ pagevec_init(&pvec, 1);
+
+ /*
+ * Put back any unfreeable pages.
+ */
+ spin_lock(&zone->lru_lock);
+ while (!list_empty(page_list)) {
+ int lru;
+ page = lru_to_page(page_list);
+ VM_BUG_ON(PageLRU(page));
+ list_del(&page->lru);
+ if (unlikely(!page_evictable(page, NULL))) {
+ spin_unlock_irq(&zone->lru_lock);
+ putback_lru_page(page);
+ spin_lock_irq(&zone->lru_lock);
+ continue;
+ }
+ SetPageLRU(page);
+ lru = page_lru(page);
+ add_page_to_lru_list(zone, page, lru);
+ if (is_active_lru(lru)) {
+ int file = is_file_lru(lru);
+ reclaim_stat->recent_rotated[file]++;
+ }
+ if (!pagevec_add(&pvec, page)) {
+ spin_unlock_irq(&zone->lru_lock);
+ __pagevec_release(&pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+ }
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
+
+ spin_unlock_irq(&zone->lru_lock);
+ pagevec_release(&pvec);
+}
+
+/*

* shrink_inactive_list() is a helper for shrink_zone(). It returns the number
* of reclaimed pages

*/
@@ -1128,12 +1176,10 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
int file)
{
LIST_HEAD(page_list);
- struct pagevec pvec;

unsigned long nr_scanned;
unsigned long nr_reclaimed = 0;

struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
int lumpy_reclaim = 0;

- struct page *page;

unsigned long nr_taken;
unsigned long nr_active;

unsigned int count[NR_LRU_LISTS] = { 0, };

@@ -1160,8 +1206,6 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
else if (sc->order && sc->priority < DEF_PRIORITY - 2)
lumpy_reclaim = 1;

- pagevec_init(&pvec, 1);
-
lru_add_drain();
spin_lock_irq(&zone->lru_lock);
nr_taken = sc->isolate_pages(nr_to_scan,
@@ -1177,8 +1221,10 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
__count_zone_vm_events(PGSCAN_DIRECT, zone, nr_scanned);
}

- if (nr_taken == 0)
- goto done;
+ if (nr_taken == 0) {
+ spin_unlock_irq(&zone->lru_lock);
+ return 0;
+ }

nr_active = clear_active_flags(&page_list, count);
__count_vm_events(PGDEACTIVATE, nr_active);
@@ -1229,40 +1275,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,

__count_vm_events(KSWAPD_STEAL, nr_reclaimed);
__count_zone_vm_events(PGSTEAL, zone, nr_reclaimed);

- spin_lock(&zone->lru_lock);
- /*
- * Put back any unfreeable pages.
- */
- while (!list_empty(&page_list)) {
- int lru;
- page = lru_to_page(&page_list);
- VM_BUG_ON(PageLRU(page));
- list_del(&page->lru);
- if (unlikely(!page_evictable(page, NULL))) {
- spin_unlock_irq(&zone->lru_lock);
- putback_lru_page(page);
- spin_lock_irq(&zone->lru_lock);
- continue;
- }
- SetPageLRU(page);
- lru = page_lru(page);
- add_page_to_lru_list(zone, page, lru);
- if (is_active_lru(lru)) {
- int file = is_file_lru(lru);
- reclaim_stat->recent_rotated[file]++;
- }
- if (!pagevec_add(&pvec, page)) {
- spin_unlock_irq(&zone->lru_lock);
- __pagevec_release(&pvec);
- spin_lock_irq(&zone->lru_lock);
- }
- }
- __mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
- __mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
-
-done:
- spin_unlock_irq(&zone->lru_lock);
- pagevec_release(&pvec);
+ putback_lru_pages(zone, reclaim_stat, nr_anon, nr_file, &page_list);
return nr_reclaimed;
}

--
1.6.5

Mel Gorman

unread,

Apr 15, 2010, 1:30:01 PM4/15/10

to

shrink_page_list() sets up a pagevec to release pages as according as they
are free. It uses significant amounts of stack on the pagevec. This
patch adds pages to be freed via pagevec to a linked list which is then
freed en-masse at the end. This avoids using stack in the main path that
potentially calls writepage().

Signed-off-by: Mel Gorman <m...@csn.ul.ie>
---

mm/vmscan.c | 34 ++++++++++++++++++++++++++--------
1 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9bc1ede..2c22c83 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -619,6 +619,22 @@ static enum page_references page_check_references(struct page *page,
return PAGEREF_RECLAIM;
}

+static void free_page_list(struct list_head *free_list)
+{
+ struct pagevec freed_pvec;
+ struct page *page, *tmp;
+
+ pagevec_init(&freed_pvec, 1);
+
+ list_for_each_entry_safe(page, tmp, free_list, lru) {
+ list_del(&page->lru);
+ if (!pagevec_add(&freed_pvec, page)) {
+ __pagevec_free(&freed_pvec);
+ pagevec_reinit(&freed_pvec);
+ }
+ }
+}
+
/*
* shrink_page_list() returns the number of reclaimed pages
*/
@@ -627,13 +643,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
enum pageout_io sync_writeback)
{
LIST_HEAD(ret_pages);
- struct pagevec freed_pvec;
+ LIST_HEAD(free_list);
int pgactivate = 0;

unsigned long nr_reclaimed = 0;

cond_resched();

- pagevec_init(&freed_pvec, 1);
while (!list_empty(page_list)) {
enum page_references references;
struct address_space *mapping;
@@ -808,10 +823,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
__clear_page_locked(page);
free_it:
nr_reclaimed++;
- if (!pagevec_add(&freed_pvec, page)) {
- __pagevec_free(&freed_pvec);
- pagevec_reinit(&freed_pvec);
- }
+
+ /*
+ * Is there need to periodically free_page_list? It would
+ * appear not as the counts should be low
+ */
+ list_add(&page->lru, &free_list);
continue;

cull_mlocked:
@@ -834,9 +851,10 @@ keep:
list_add(&page->lru, &ret_pages);
VM_BUG_ON(PageLRU(page) || PageUnevictable(page));
}
+
+ free_page_list(&free_list);
+
list_splice(&ret_pages, page_list);
- if (pagevec_count(&freed_pvec))
- __pagevec_free(&freed_pvec);
count_vm_events(PGACTIVATE, pgactivate);
return nr_reclaimed;
}
--
1.6.5

Mel Gorman

unread,

Apr 15, 2010, 1:30:03 PM4/15/10

to

From: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>

Now, max_scan of shrink_inactive_list() is always passed less than
SWAP_CLUSTER_MAX. then, we can remove scanning pages loop in it.
This patch also help stack diet.

detail
- remove "while (nr_scanned < max_scan)" loop
- remove nr_freed (now, we use nr_reclaimed directly)
- remove nr_scan (now, we use nr_scanned directly)
- rename max_scan to nr_to_scan
- pass nr_to_scan into isolate_pages() directly instead
using SWAP_CLUSTER_MAX

Signed-off-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
---
mm/vmscan.c | 190 ++++++++++++++++++++++++++++-------------------------------
1 files changed, 89 insertions(+), 101 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5c276f0..76c2b03 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1123,16 +1123,22 @@ static int too_many_isolated(struct zone *zone, int file,

* shrink_inactive_list() is a helper for shrink_zone(). It returns the number
* of reclaimed pages
*/

-static unsigned long shrink_inactive_list(unsigned long max_scan,
+static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
struct zone *zone, struct scan_control *sc,
int file)
{
LIST_HEAD(page_list);
struct pagevec pvec;
- unsigned long nr_scanned = 0;
+ unsigned long nr_scanned;

unsigned long nr_reclaimed = 0;

struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
int lumpy_reclaim = 0;

+ struct page *page;
+ unsigned long nr_taken;
+ unsigned long nr_active;
+ unsigned int count[NR_LRU_LISTS] = { 0, };
+ unsigned long nr_anon;
+ unsigned long nr_file;

while (unlikely(too_many_isolated(zone, file, sc))) {
congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -1158,119 +1164,101 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,

lru_add_drain();
spin_lock_irq(&zone->lru_lock);
- do {
- struct page *page;
- unsigned long nr_taken;
- unsigned long nr_scan;
- unsigned long nr_freed;
- unsigned long nr_active;
- unsigned int count[NR_LRU_LISTS] = { 0, };
- int mode = lumpy_reclaim ? ISOLATE_BOTH : ISOLATE_INACTIVE;
- unsigned long nr_anon;
- unsigned long nr_file;
-
- nr_taken = sc->isolate_pages(SWAP_CLUSTER_MAX,
- &page_list, &nr_scan, sc->order, mode,
- zone, sc->mem_cgroup, 0, file);
+ nr_taken = sc->isolate_pages(nr_to_scan,
+ &page_list, &nr_scanned, sc->order,
+ lumpy_reclaim ? ISOLATE_BOTH : ISOLATE_INACTIVE,
+ zone, sc->mem_cgroup, 0, file);

- if (scanning_global_lru(sc)) {
- zone->pages_scanned += nr_scan;
- if (current_is_kswapd())
- __count_zone_vm_events(PGSCAN_KSWAPD, zone,
- nr_scan);
- else
- __count_zone_vm_events(PGSCAN_DIRECT, zone,
- nr_scan);
- }
+ if (scanning_global_lru(sc)) {
+ zone->pages_scanned += nr_scanned;
+ if (current_is_kswapd())
+ __count_zone_vm_events(PGSCAN_KSWAPD, zone, nr_scanned);
+ else
+ __count_zone_vm_events(PGSCAN_DIRECT, zone, nr_scanned);
+ }

- if (nr_taken == 0)
- goto done;
+ if (nr_taken == 0)

+ goto done;

- nr_active = clear_active_flags(&page_list, count);
- __count_vm_events(PGDEACTIVATE, nr_active);

+ nr_active = clear_active_flags(&page_list, count);

+ __count_vm_events(PGDEACTIVATE, nr_active);

- __mod_zone_page_state(zone, NR_ACTIVE_FILE,
- -count[LRU_ACTIVE_FILE]);
- __mod_zone_page_state(zone, NR_INACTIVE_FILE,
- -count[LRU_INACTIVE_FILE]);
- __mod_zone_page_state(zone, NR_ACTIVE_ANON,
- -count[LRU_ACTIVE_ANON]);
- __mod_zone_page_state(zone, NR_INACTIVE_ANON,
- -count[LRU_INACTIVE_ANON]);

+ __mod_zone_page_state(zone, NR_ACTIVE_FILE,
+ -count[LRU_ACTIVE_FILE]);
+ __mod_zone_page_state(zone, NR_INACTIVE_FILE,
+ -count[LRU_INACTIVE_FILE]);
+ __mod_zone_page_state(zone, NR_ACTIVE_ANON,
+ -count[LRU_ACTIVE_ANON]);
+ __mod_zone_page_state(zone, NR_INACTIVE_ANON,
+ -count[LRU_INACTIVE_ANON]);

- nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
- nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
- __mod_zone_page_state(zone, NR_ISOLATED_ANON, nr_anon);
- __mod_zone_page_state(zone, NR_ISOLATED_FILE, nr_file);

+ nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
+ nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, nr_anon);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, nr_file);

- reclaim_stat->recent_scanned[0] += nr_anon;
- reclaim_stat->recent_scanned[1] += nr_file;

+ reclaim_stat->recent_scanned[0] += nr_anon;
+ reclaim_stat->recent_scanned[1] += nr_file;

- spin_unlock_irq(&zone->lru_lock);
+ spin_unlock_irq(&zone->lru_lock);

- nr_scanned += nr_scan;
- nr_freed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
+ nr_reclaimed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
+
+ /*
+ * If we are direct reclaiming for contiguous pages and we do
+ * not reclaim everything in the list, try again and wait
+ * for IO to complete. This will stall high-order allocations
+ * but that should be acceptable to the caller
+ */
+ if (nr_reclaimed < nr_taken && !current_is_kswapd() && lumpy_reclaim) {
+ congestion_wait(BLK_RW_ASYNC, HZ/10);

/*
- * If we are direct reclaiming for contiguous pages and we do
- * not reclaim everything in the list, try again and wait
- * for IO to complete. This will stall high-order allocations
- * but that should be acceptable to the caller
+ * The attempt at page out may have made some
+ * of the pages active, mark them inactive again.
*/
- if (nr_freed < nr_taken && !current_is_kswapd() &&
- lumpy_reclaim) {
- congestion_wait(BLK_RW_ASYNC, HZ/10);
-
- /*
- * The attempt at page out may have made some
- * of the pages active, mark them inactive again.
- */

- nr_active = clear_active_flags(&page_list, count);

- count_vm_events(PGDEACTIVATE, nr_active);
-
- nr_freed += shrink_page_list(&page_list, sc,
- PAGEOUT_IO_SYNC);
- }

+ nr_active = clear_active_flags(&page_list, count);

+ count_vm_events(PGDEACTIVATE, nr_active);

- nr_reclaimed += nr_freed;
+ nr_reclaimed += shrink_page_list(&page_list, sc,
+ PAGEOUT_IO_SYNC);
+ }

- local_irq_disable();
- if (current_is_kswapd())
- __count_vm_events(KSWAPD_STEAL, nr_freed);
- __count_zone_vm_events(PGSTEAL, zone, nr_freed);
+ local_irq_disable();
+ if (current_is_kswapd())
+ __count_vm_events(KSWAPD_STEAL, nr_reclaimed);
+ __count_zone_vm_events(PGSTEAL, zone, nr_reclaimed);

+ spin_lock(&zone->lru_lock);

+ /*
+ * Put back any unfreeable pages.
+ */

+ while (!list_empty(&page_list)) {
+ int lru;
+ page = lru_to_page(&page_list);
+ VM_BUG_ON(PageLRU(page));
+ list_del(&page->lru);

+ if (unlikely(!page_evictable(page, NULL))) {
+ spin_unlock_irq(&zone->lru_lock);
+ putback_lru_page(page);
+ spin_lock_irq(&zone->lru_lock);
+ continue;
}

- __mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
- __mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);
-

- } while (nr_scanned < max_scan);

+ SetPageLRU(page);
+ lru = page_lru(page);
+ add_page_to_lru_list(zone, page, lru);
+ if (is_active_lru(lru)) {
+ int file = is_file_lru(lru);
+ reclaim_stat->recent_rotated[file]++;
+ }
+ if (!pagevec_add(&pvec, page)) {
+ spin_unlock_irq(&zone->lru_lock);
+ __pagevec_release(&pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+ }
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, -nr_anon);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, -nr_file);

done:
spin_unlock_irq(&zone->lru_lock);
--
1.6.5

Mel Gorman

unread,

Apr 15, 2010, 1:30:02 PM4/15/10

to

From: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>

Since 2.6.28 zone->prev_priority is unused. Then it can be removed
safely. It reduce stack usage slightly.

Now I have to say that I'm sorry. 2 years ago, I thghout prev_priority
can be integrate again, it's useful. but four (or more) times trying
haven't got good performance number. thus I give up such approach.

Signed-off-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
---

include/linux/mmzone.h | 15 -------------
mm/page_alloc.c | 2 -
mm/vmscan.c | 54 ++---------------------------------------------
mm/vmstat.c | 2 -
4 files changed, 3 insertions(+), 70 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cf9e458..ad76962 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -339,21 +339,6 @@ struct zone {
atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];

/*
- * prev_priority holds the scanning priority for this zone. It is
- * defined as the scanning priority at which we achieved our reclaim
- * target at the previous try_to_free_pages() or balance_pgdat()
- * invocation.
- *
- * We use prev_priority as a measure of how much stress page reclaim is
- * under - it drives the swappiness decision: whether to unmap mapped
- * pages.
- *
- * Access to both this field is quite racy even on uniprocessor. But
- * it is expected to average out OK.
- */
- int prev_priority;
-
- /*
* The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on
* this zone's LRU. Maintained by the pageout code.
*/
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d03c946..88513c0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3862,8 +3862,6 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
zone_seqlock_init(zone);
zone->zone_pgdat = pgdat;

- zone->prev_priority = DEF_PRIORITY;
-
zone_pcp_init(zone);
for_each_lru(l) {
INIT_LIST_HEAD(&zone->lru[l].list);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3ff3311..1db19f8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1277,20 +1277,6 @@ done:
}

/*
- * We are about to scan this zone at a certain priority level. If that priority
- * level is smaller (ie: more urgent) than the previous priority, then note
- * that priority level within the zone. This is done so that when the next
- * process comes in to scan this zone, it will immediately start out at this
- * priority level rather than having to build up its own scanning priority.
- * Here, this priority affects only the reclaim-mapped threshold.
- */
-static inline void note_zone_scanning_priority(struct zone *zone, int priority)
-{
- if (priority < zone->prev_priority)
- zone->prev_priority = priority;
-}
-
-/*
* This moves pages from the active list to the inactive list.
*
* We move them the other way if the page is referenced by one or more
@@ -1726,20 +1712,15 @@ static void shrink_zones(int priority, struct zonelist *zonelist,
if (scanning_global_lru(sc)) {
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
continue;
- note_zone_scanning_priority(zone, priority);
-
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue; /* Let kswapd poll it */
sc->all_unreclaimable = 0;
- } else {
+ } else
/*
* Ignore cpuset limitation here. We just want to reduce
* # of used pages by us regardless of memory shortage.
*/
sc->all_unreclaimable = 0;
- mem_cgroup_note_reclaim_priority(sc->mem_cgroup,
- priority);
- }

shrink_zone(priority, zone, sc);
}
@@ -1845,17 +1826,11 @@ out:
if (priority < 0)
priority = 0;

- if (scanning_global_lru(sc)) {
- for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
-
+ if (scanning_global_lru(sc))
+ for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
continue;

- zone->prev_priority = priority;
- }
- } else
- mem_cgroup_record_reclaim_priority(sc->mem_cgroup, priority);
-
delayacct_freepages_end();

return ret;
@@ -2008,22 +1983,12 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
.mem_cgroup = NULL,
.isolate_pages = isolate_pages_global,
};
- /*
- * temp_priority is used to remember the scanning priority at which
- * this zone was successfully refilled to
- * free_pages == high_wmark_pages(zone).
- */
- int temp_priority[MAX_NR_ZONES];
-
loop_again:
total_scanned = 0;
sc.nr_reclaimed = 0;
sc.may_writepage = !laptop_mode;
count_vm_event(PAGEOUTRUN);

- for (i = 0; i < pgdat->nr_zones; i++)
- temp_priority[i] = DEF_PRIORITY;
-
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long lru_pages = 0;
@@ -2091,9 +2056,7 @@ loop_again:
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue;

- temp_priority[i] = priority;
sc.nr_scanned = 0;
- note_zone_scanning_priority(zone, priority);

nid = pgdat->node_id;
zid = zone_idx(zone);
@@ -2166,16 +2129,6 @@ loop_again:
break;
}
out:
- /*
- * Note within each zone the priority level at which this zone was
- * brought into a happy state. So that the next thread which scans this
- * zone will start out at that priority level.
- */
- for (i = 0; i < pgdat->nr_zones; i++) {
- struct zone *zone = pgdat->node_zones + i;
-
- zone->prev_priority = temp_priority[i];
- }
if (!all_zones_ok) {
cond_resched();

@@ -2593,7 +2546,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
*/
priority = ZONE_RECLAIM_PRIORITY;
do {
- note_zone_scanning_priority(zone, priority);
shrink_zone(priority, zone, &sc);
priority--;
} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index fa12ea3..2db0a0f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -761,11 +761,9 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
}
seq_printf(m,
"\n all_unreclaimable: %u"
- "\n prev_priority: %i"
"\n start_pfn: %lu"
"\n inactive_ratio: %u",
zone->all_unreclaimable,
- zone->prev_priority,
zone->zone_start_pfn,
zone->inactive_ratio);
seq_putc(m, '\n');
--
1.6.5

Mel Gorman

unread,

Apr 15, 2010, 1:30:03 PM4/15/10

to

Remove temporary variable that is only used once and does not help
clarify code.

Signed-off-by: Mel Gorman <m...@csn.ul.ie>
---

mm/vmscan.c | 8 +++-----
1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 838ac8b..1ace7c6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1685,13 +1685,12 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
*/
static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
{
- enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
struct zoneref *z;
struct zone *zone;

sc->all_unreclaimable = 1;
- for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
- sc->nodemask) {
+ for_each_zone_zonelist_nodemask(zone, z, zonelist,
+ gfp_zone(sc->gfp_mask), sc->nodemask) {
if (!populated_zone(zone))
continue;
/*
@@ -1741,7 +1740,6 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,

unsigned long lru_pages = 0;

struct zoneref *z;
struct zone *zone;
- enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
unsigned long writeback_threshold;

delayacct_freepages_start();
@@ -1752,7 +1750,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
* mem_cgroup will not do shrink_slab.
*/

if (scanning_global_lru(sc)) {
- for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {

+ for_each_zone_zonelist(zone, z, zonelist, gfp_zone(sc->gfp_mask)) {

if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
continue;
--
1.6.5

Mel Gorman

unread,

Apr 15, 2010, 1:30:03 PM4/15/10

to

From: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>

Now very lots function in vmscan have `priority' argument. It consume
stack slightly. To move it on struct scan_control reduce stack.

Signed-off-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
---

mm/vmscan.c | 83 ++++++++++++++++++++++++++--------------------------------
1 files changed, 37 insertions(+), 46 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1db19f8..5c276f0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -77,6 +77,8 @@ struct scan_control {

int order;

+ int priority;
+
/* Which cgroup do we reclaim from */
struct mem_cgroup *mem_cgroup;

@@ -1123,7 +1125,7 @@ static int too_many_isolated(struct zone *zone, int file,
*/

static unsigned long shrink_inactive_list(unsigned long max_scan,

struct zone *zone, struct scan_control *sc,

- int priority, int file)
+ int file)
{
LIST_HEAD(page_list);
struct pagevec pvec;
@@ -1149,7 +1151,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
*/
if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
lumpy_reclaim = 1;
- else if (sc->order && priority < DEF_PRIORITY - 2)
+ else if (sc->order && sc->priority < DEF_PRIORITY - 2)
lumpy_reclaim = 1;

pagevec_init(&pvec, 1);
@@ -1328,7 +1330,7 @@ static void move_active_pages_to_lru(struct zone *zone,
}

static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
- struct scan_control *sc, int priority, int file)
+ struct scan_control *sc, int file)
{
unsigned long nr_taken;
unsigned long pgscanned;
@@ -1491,17 +1493,17 @@ static int inactive_list_is_low(struct zone *zone, struct scan_control *sc,
}

static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
- struct zone *zone, struct scan_control *sc, int priority)
+ struct zone *zone, struct scan_control *sc)
{
int file = is_file_lru(lru);

if (is_active_lru(lru)) {
if (inactive_list_is_low(zone, sc, file))
- shrink_active_list(nr_to_scan, zone, sc, priority, file);
+ shrink_active_list(nr_to_scan, zone, sc, file);
return 0;
}

- return shrink_inactive_list(nr_to_scan, zone, sc, priority, file);
+ return shrink_inactive_list(nr_to_scan, zone, sc, file);
}

/*
@@ -1608,8 +1610,7 @@ static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
/*
* This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
*/
-static void shrink_zone(int priority, struct zone *zone,
- struct scan_control *sc)
+static void shrink_zone(struct zone *zone, struct scan_control *sc)
{
unsigned long nr[NR_LRU_LISTS];
unsigned long nr_to_scan;
@@ -1633,8 +1634,8 @@ static void shrink_zone(int priority, struct zone *zone,
unsigned long scan;

scan = zone_nr_lru_pages(zone, sc, l);
- if (priority || noswap) {
- scan >>= priority;
+ if (sc->priority || noswap) {
+ scan >>= sc->priority;
scan = (scan * percent[file]) / 100;
}
nr[l] = nr_scan_try_batch(scan,
@@ -1650,7 +1651,7 @@ static void shrink_zone(int priority, struct zone *zone,
nr[l] -= nr_to_scan;

nr_reclaimed += shrink_list(l, nr_to_scan,
- zone, sc, priority);
+ zone, sc);
}
}
/*
@@ -1661,7 +1662,8 @@ static void shrink_zone(int priority, struct zone *zone,
* with multiple processes reclaiming pages, the total
* freeing target can get unreasonably large.
*/
- if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
+ if (nr_reclaimed >= nr_to_reclaim &&
+ sc->priority < DEF_PRIORITY)
break;
}

@@ -1672,7 +1674,7 @@ static void shrink_zone(int priority, struct zone *zone,
* rebalance the anon lru active/inactive ratio.
*/
if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
- shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
+ shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, 0);

throttle_vm_writeout(sc->gfp_mask);
}
@@ -1693,8 +1695,7 @@ static void shrink_zone(int priority, struct zone *zone,
* If a zone is deemed to be full of pinned pages then just give it a light
* scan then give up on it.
*/
-static void shrink_zones(int priority, struct zonelist *zonelist,
- struct scan_control *sc)
+static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
{

enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
struct zoneref *z;

@@ -1712,7 +1713,8 @@ static void shrink_zones(int priority, struct zonelist *zonelist,

if (scanning_global_lru(sc)) {
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
continue;

- if (zone->all_unreclaimable && priority != DEF_PRIORITY)
+ if (zone->all_unreclaimable &&
+ sc->priority != DEF_PRIORITY)

continue; /* Let kswapd poll it */
sc->all_unreclaimable = 0;

} else
@@ -1722,7 +1724,7 @@ static void shrink_zones(int priority, struct zonelist *zonelist,

*/
sc->all_unreclaimable = 0;

- shrink_zone(priority, zone, sc);
+ shrink_zone(zone, sc);
}
}

@@ -1745,7 +1747,6 @@ static void shrink_zones(int priority, struct zonelist *zonelist,

static unsigned long do_try_to_free_pages(struct zonelist *zonelist,

struct scan_control *sc)
{
- int priority;
unsigned long ret = 0;
unsigned long total_scanned = 0;
struct reclaim_state *reclaim_state = current->reclaim_state;
@@ -1772,11 +1773,11 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,

}
}

- for (priority = DEF_PRIORITY; priority >= 0; priority--) {

+ for (sc->priority = DEF_PRIORITY; sc->priority >= 0; sc->priority--) {
sc->nr_scanned = 0;
- if (!priority)
+ if (!sc->priority)
disable_swap_token();
- shrink_zones(priority, zonelist, sc);
+ shrink_zones(zonelist, sc);
/*
* Don't shrink slabs when reclaiming memory from
* over limit cgroups
@@ -1809,23 +1810,14 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,

/* Take a nap, wait for some writeback to complete */
if (!sc->hibernation_mode && sc->nr_scanned &&
- priority < DEF_PRIORITY - 2)
+ sc->priority < DEF_PRIORITY - 2)
congestion_wait(BLK_RW_ASYNC, HZ/10);
}
/* top priority shrink_zones still had more to do? don't OOM, then */
if (!sc->all_unreclaimable && scanning_global_lru(sc))
ret = sc->nr_reclaimed;
-out:
- /*
- * Now that we've scanned all the zones at this priority level, note
- * that level within the zone so that the next thread which performs
- * scanning of this zone will immediately start out at this priority
- * level. This affects only the decision whether or not to bring
- * mapped pages onto the inactive list.
- */
- if (priority < 0)
- priority = 0;

+out:
if (scanning_global_lru(sc))

for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))

@@ -1885,7 +1877,8 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
* will pick up pages from other mem cgroup's as well. We hack
* the priority and make it zero.
*/
- shrink_zone(0, zone, &sc);
+ sc.priority = 0;
+ shrink_zone(zone, &sc);
return sc.nr_reclaimed;
}

@@ -1965,7 +1958,6 @@ static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)

static unsigned long balance_pgdat(pg_data_t *pgdat, int order)

{
int all_zones_ok;
- int priority;
int i;
unsigned long total_scanned;
struct reclaim_state *reclaim_state = current->reclaim_state;
@@ -1989,13 +1981,13 @@ loop_again:
sc.may_writepage = !laptop_mode;
count_vm_event(PAGEOUTRUN);

- for (priority = DEF_PRIORITY; priority >= 0; priority--) {
+ for (sc.priority = DEF_PRIORITY; sc.priority >= 0; sc.priority--) {

int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */

unsigned long lru_pages = 0;

int has_under_min_watermark_zone = 0;

/* The swap token gets in the way of swapout... */
- if (!priority)
+ if (!sc.priority)
disable_swap_token();

all_zones_ok = 1;
@@ -2010,7 +2002,7 @@ loop_again:
if (!populated_zone(zone))
continue;

- if (zone->all_unreclaimable && priority != DEF_PRIORITY)

+ if (zone->all_unreclaimable && sc.priority != DEF_PRIORITY)
continue;

/*
@@ -2019,7 +2011,7 @@ loop_again:
*/
if (inactive_anon_is_low(zone, &sc))
shrink_active_list(SWAP_CLUSTER_MAX, zone,
- &sc, priority, 0);
+ &sc, 0);

if (!zone_watermark_ok(zone, order,
high_wmark_pages(zone), 0, 0)) {
@@ -2053,7 +2045,7 @@ loop_again:
if (!populated_zone(zone))
continue;

- if (zone->all_unreclaimable && priority != DEF_PRIORITY)

+ if (zone->all_unreclaimable && sc.priority != DEF_PRIORITY)
continue;

sc.nr_scanned = 0;
@@ -2072,7 +2064,7 @@ loop_again:
*/
if (!zone_watermark_ok(zone, order,
8*high_wmark_pages(zone), end_zone, 0))
- shrink_zone(priority, zone, &sc);
+ shrink_zone(zone, &sc);
reclaim_state->reclaimed_slab = 0;
nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL,
lru_pages);
@@ -2112,7 +2104,7 @@ loop_again:
* OK, kswapd is getting into trouble. Take a nap, then take
* another pass across the zones.
*/
- if (total_scanned && (priority < DEF_PRIORITY - 2)) {
+ if (total_scanned && (sc.priority < DEF_PRIORITY - 2)) {
if (has_under_min_watermark_zone)
count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
else
@@ -2513,7 +2505,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
const unsigned long nr_pages = 1 << order;
struct task_struct *p = current;
struct reclaim_state reclaim_state;
- int priority;
struct scan_control sc = {
.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2544,11 +2535,11 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
* Free memory by calling shrink zone with increasing
* priorities until we have enough memory freed.
*/
- priority = ZONE_RECLAIM_PRIORITY;
+ sc.priority = ZONE_RECLAIM_PRIORITY;
do {
- shrink_zone(priority, zone, &sc);
- priority--;
- } while (priority >= 0 && sc.nr_reclaimed < nr_pages);
+ shrink_zone(zone, &sc);
+ sc.priority--;
+ } while (sc.priority >= 0 && sc.nr_reclaimed < nr_pages);
}

slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
--
1.6.5

KOSAKI Motohiro

unread,

Apr 15, 2010, 10:50:01 PM4/15/10

to

> With the patch "vmscan: kill prev_priority completely", the loop at the
> end of do_try_to_free_pages() is now doing nothing. Delete it.
>
> Signed-off-by: Mel Gorman <m...@csn.ul.ie>

Obviously. thanks correct me.
Reviewed-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>

KOSAKI Motohiro

unread,

Apr 15, 2010, 10:50:01 PM4/15/10

to

> Remove temporary variable that is only used once and does not help
> clarify code.
>
> Signed-off-by: Mel Gorman <m...@csn.ul.ie>

Reviewed-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>

Dave Chinner

unread,

Apr 16, 2010, 12:30:02 AM4/16/10

to

On Thu, Apr 15, 2010 at 06:21:41PM +0100, Mel Gorman wrote:
> shrink_inactive_list() sets up a pagevec to release unfreeable pages. It
> uses significant amounts of stack doing this. This patch splits
> shrink_inactive_list() to take the stack usage out of the main path so
> that callers to writepage() do not contain an unused pagevec on the
> stack.
>
> Signed-off-by: Mel Gorman <m...@csn.ul.ie>
> ---
> mm/vmscan.c | 93 +++++++++++++++++++++++++++++++++-------------------------
> 1 files changed, 53 insertions(+), 40 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a232ad6..9bc1ede 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1120,6 +1120,54 @@ static int too_many_isolated(struct zone *zone, int file,
> }
>
> /*
> + * TODO: Try merging with migrations version of putback_lru_pages
> + */
> +static noinline void putback_lru_pages(struct zone *zone,

noinline_for_stack

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com

KOSAKI Motohiro

unread,

Apr 16, 2010, 2:40:01 AM4/16/10

to

> shrink_inactive_list() sets up a pagevec to release unfreeable pages. It
> uses significant amounts of stack doing this. This patch splits
> shrink_inactive_list() to take the stack usage out of the main path so
> that callers to writepage() do not contain an unused pagevec on the
> stack.
>
> Signed-off-by: Mel Gorman <m...@csn.ul.ie>
> ---
> mm/vmscan.c | 93 +++++++++++++++++++++++++++++++++-------------------------
> 1 files changed, 53 insertions(+), 40 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a232ad6..9bc1ede 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1120,6 +1120,54 @@ static int too_many_isolated(struct zone *zone, int file,
> }
>
> /*
> + * TODO: Try merging with migrations version of putback_lru_pages
> + */

I also think this stuff need more cleanups. but this patch works and
no downside. So, Let's merge this at first.
Reviewed-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>

but please fix Dave's pointed one.

KOSAKI Motohiro

unread,

Apr 16, 2010, 4:00:02 AM4/16/10

to

Need this two line at this? because we need consider number of
list element are not 14xN.

if (pagevec_count(&freed_pvec))
__pagevec_free(&freed_pvec);

KOSAKI Motohiro

unread,

Apr 16, 2010, 7:20:02 AM4/16/10

to

Seems unrelated change here.
Otherwise looks good.

- kosaki

Mel Gorman

unread,

Apr 16, 2010, 10:40:02 AM4/16/10

to

On Fri, Apr 16, 2010 at 03:30:00PM +0900, KOSAKI Motohiro wrote:
> > shrink_inactive_list() sets up a pagevec to release unfreeable pages. It
> > uses significant amounts of stack doing this. This patch splits
> > shrink_inactive_list() to take the stack usage out of the main path so
> > that callers to writepage() do not contain an unused pagevec on the
> > stack.
> >
> > Signed-off-by: Mel Gorman <m...@csn.ul.ie>
> > ---
> > mm/vmscan.c | 93 +++++++++++++++++++++++++++++++++-------------------------
> > 1 files changed, 53 insertions(+), 40 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index a232ad6..9bc1ede 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1120,6 +1120,54 @@ static int too_many_isolated(struct zone *zone, int file,
> > }
> >
> > /*
> > + * TODO: Try merging with migrations version of putback_lru_pages
> > + */
>
> I also think this stuff need more cleanups. but this patch works and
> no downside. So, Let's merge this at first.

Agreed. I was keeping the first-pass straight-forward and didn't want to
do too much in one patch to ease review.

> Reviewed-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
>
>
> but please fix Dave's pointed one.
>

Done. Thanks Dave, I hadn't spotted noinline_for_stack but it's exactly
what was needed here. It's not obvious why I used noinline otherwise.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

Mel Gorman

unread,

Apr 16, 2010, 10:40:03 AM4/16/10

to

Whoops, yes indeed. Otherwise this potentially leaks and as
SWAP_CLUSTER_MAX is 32, it's often not going to be 14xN

--

Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

Mel Gorman

unread,

Apr 16, 2010, 10:40:02 AM4/16/10

to

It was needed somewhat otherwise the split in accounting looked odd. It
could be done as two patches but it felt trickier to review.

--

Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

Mel Gorman

unread,

Apr 16, 2010, 11:00:01 AM4/16/10

to

On Thu, Apr 15, 2010 at 06:21:33PM +0100, Mel Gorman wrote:
> This is just an RFC to reduce some of the more obvious stack usage in page
> reclaim. It's a bit rushed and I haven't tested this yet but am sending
> it out as there may be others working on similar material and would rather
> avoid overlap. I built on some of Kosaki Motohiro's work.
>

So the first pass seems to have been reasonably well received. Kosaki,
Rik and Johannes, how you do typically test reclaim-related patches for
regressions? My initial sniff-tests look ok with the page leak sorted out
but I typically am not searching for vmscan regressions other than lumpy
reclaim.

> On X86 bit, stack usage figures (generated using a modified bloat-o-meter

This should have been X86-64. The stack shrinkage is less on X86
obviously because of the difference size of pointers and the like.

--

Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

Johannes Weiner

unread,

Apr 16, 2010, 6:40:02 PM4/16/10

to

On Thu, Apr 15, 2010 at 06:21:34PM +0100, Mel Gorman wrote:
> From: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
>
> Since 2.6.28 zone->prev_priority is unused. Then it can be removed
> safely. It reduce stack usage slightly.
>
> Now I have to say that I'm sorry. 2 years ago, I thghout prev_priority
> can be integrate again, it's useful. but four (or more) times trying
> haven't got good performance number. thus I give up such approach.
>
> Signed-off-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>

Reviewed-by: Johannes Weiner <han...@cmpxchg.org>

Johannes Weiner

unread,

Apr 16, 2010, 6:50:01 PM4/16/10

to

On Thu, Apr 15, 2010 at 06:21:35PM +0100, Mel Gorman wrote:
> From: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
>
> Now very lots function in vmscan have `priority' argument. It consume
> stack slightly. To move it on struct scan_control reduce stack.

I don't like this much because it obfuscates value communication.

Functions no longer have obvious arguments and return values, as it's all
passed hidden in that struct.

Do you think it's worth it? I would much rather see that thing die than
expand on it...

Johannes Weiner

unread,

Apr 16, 2010, 7:00:02 PM4/16/10

to

On Thu, Apr 15, 2010 at 06:21:37PM +0100, Mel Gorman wrote:
> With the patch "vmscan: kill prev_priority completely", the loop at the
> end of do_try_to_free_pages() is now doing nothing. Delete it.
>
> Signed-off-by: Mel Gorman <m...@csn.ul.ie>

Reviewed-by: Johannes Weiner <han...@cmpxchg.org>

Maybe fold that into the prev_priority patch? :)

Johannes Weiner

unread,

Apr 16, 2010, 7:00:02 PM4/16/10

to

On Thu, Apr 15, 2010 at 06:21:36PM +0100, Mel Gorman wrote:
> From: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
>
> Now, max_scan of shrink_inactive_list() is always passed less than
> SWAP_CLUSTER_MAX. then, we can remove scanning pages loop in it.
> This patch also help stack diet.
>
> detail
> - remove "while (nr_scanned < max_scan)" loop
> - remove nr_freed (now, we use nr_reclaimed directly)
> - remove nr_scan (now, we use nr_scanned directly)
> - rename max_scan to nr_to_scan
> - pass nr_to_scan into isolate_pages() directly instead
> using SWAP_CLUSTER_MAX
>
> Signed-off-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>

Reviewed-by: Johannes Weiner <han...@cmpxchg.org>

Johannes Weiner

unread,

Apr 16, 2010, 7:30:02 PM4/16/10

to

On Thu, Apr 15, 2010 at 06:21:41PM +0100, Mel Gorman wrote:

> shrink_inactive_list() sets up a pagevec to release unfreeable pages. It
> uses significant amounts of stack doing this. This patch splits
> shrink_inactive_list() to take the stack usage out of the main path so
> that callers to writepage() do not contain an unused pagevec on the
> stack.
>
> Signed-off-by: Mel Gorman <m...@csn.ul.ie>

Reviewed-by: Johannes Weiner <han...@cmpxchg.org>

Johannes Weiner

unread,

Apr 16, 2010, 7:40:02 PM4/16/10

to

On Thu, Apr 15, 2010 at 06:21:43PM +0100, Mel Gorman wrote:
> When shrink_inactive_list() isolates pages, it updates a number of
> counters using temporary variables to gather them. These consume stack
> and it's in the main path that calls ->writepage(). This patch moves the
> accounting updates outside of the main path to reduce stack usage.
>
> Signed-off-by: Mel Gorman <m...@csn.ul.ie>

Reviewed-by: Johannes Weiner <han...@cmpxchg.org>

Mel Gorman

unread,

May 26, 2010, 6:30:02 AM5/26/10

to

Sorry for the long delay on this. I got distracted by the anon_vma and
page migration stuff.

On Sat, Apr 17, 2010 at 12:48:20AM +0200, Johannes Weiner wrote:
> On Thu, Apr 15, 2010 at 06:21:35PM +0100, Mel Gorman wrote:
> > From: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
> >
> > Now very lots function in vmscan have `priority' argument. It consume
> > stack slightly. To move it on struct scan_control reduce stack.
>
> I don't like this much because it obfuscates value communication.
>
> Functions no longer have obvious arguments and return values, as it's all
> passed hidden in that struct.
>
> Do you think it's worth it? I would much rather see that thing die than
> expand on it...

I don't feel strongly enough to fight about it and reducing stack usage here
isn't the "fix" anyway. I'll drop this patch for now.

That aside, the page reclaim algorithm maintains a lot of state and the
"priority" is part of that state. While the struct means that functions might
not have obvious arguments, passing the state around as arguments gets very
unwieldly very quickly. I don't think killing scan_control would be as
nice as you imagine although I do think it should be as small as
possible.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

KOSAKI Motohiro

unread,

May 27, 2010, 10:50:01 PM5/27/10

to

Hi

> Sorry for the long delay on this. I got distracted by the anon_vma and
> page migration stuff.

Sorry for the delay too. I don't have enough development time recently ;)
I had tested this patch series a while. but now they need to rebase and retest. that's sad.

> On Sat, Apr 17, 2010 at 12:48:20AM +0200, Johannes Weiner wrote:
> > On Thu, Apr 15, 2010 at 06:21:35PM +0100, Mel Gorman wrote:
> > > From: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
> > >
> > > Now very lots function in vmscan have `priority' argument. It consume
> > > stack slightly. To move it on struct scan_control reduce stack.
> >
> > I don't like this much because it obfuscates value communication.
> >
> > Functions no longer have obvious arguments and return values, as it's all
> > passed hidden in that struct.
> >
> > Do you think it's worth it? I would much rather see that thing die than
> > expand on it...
>
> I don't feel strongly enough to fight about it and reducing stack usage here
> isn't the "fix" anyway. I'll drop this patch for now.

I'm ok either.

> That aside, the page reclaim algorithm maintains a lot of state and the
> "priority" is part of that state. While the struct means that functions might
> not have obvious arguments, passing the state around as arguments gets very
> unwieldly very quickly. I don't think killing scan_control would be as
> nice as you imagine although I do think it should be as small as
> possible.

I don't have strong opinion. I think both you and Hannes were talking correct thing.
But Hannes seems to have more strong opinion. then, I'm tend to drop this one.

Thanks.