Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 00/15] 512K readahead size with thrashing safe readahead v2

135 views
Skip to first unread message

Wu Fengguang

unread,
Feb 23, 2010, 10:13:21 PM2/23/10
to Andrew Morton, Jens Axboe
Andrew,

This enlarges the default readahead size from 128K to 512K.
To avoid possible regressions, also do
- scale down readahead size on small device and small memory
- thrashing safe context readahead
- add readahead tracing/stats support to help expose possible problems

Besides, the patchset also includes several algorithm updates:
- no start-of-file readahead after lseek
- faster radix_tree_next_hole()/radix_tree_prev_hole()
- pagecache context based mmap read-around


Changes since v1:
- update mmap read-around heuristics (Thanks to Nick Piggin)
- radix_tree_lookup_leaf_node() for the pagecache based mmap read-around
- use __print_symbolic() to show readahead pattern names
(Thanks to Steven Rostedt)
- scale down readahead size proportional to system memory
(Thanks to Matt Mackall)
- add readahead size kernel parameter (by Nikanth Karthikesan)
- add comments from Christian Ehrhardt

Changes since RFC:
- move the lenthy intro text to individual patch changelogs
- treat get_capacity()==0 as uninitilized value (Thanks to Vivek Goyal)
- increase readahead size limit for small devices (Thanks to Jens Axboe)
- add fio test results by Vivek Goyal


[PATCH 01/15] readahead: limit readahead size for small devices
[PATCH 02/15] readahead: retain inactive lru pages to be accessed soon
[PATCH 03/15] readahead: bump up the default readahead size
[PATCH 04/15] readahead: make default readahead size a kernel parameter
[PATCH 05/15] readahead: limit readahead size for small memory systems
[PATCH 06/15] readahead: replace ra->mmap_miss with ra->ra_flags
[PATCH 07/15] readahead: thrashing safe context readahead
[PATCH 08/15] readahead: record readahead patterns
[PATCH 09/15] readahead: add tracing event
[PATCH 10/15] readahead: add /debug/readahead/stats
[PATCH 11/15] readahead: dont do start-of-file readahead after lseek()
[PATCH 12/15] radixtree: introduce radix_tree_lookup_leaf_node()
[PATCH 13/15] radixtree: speed up the search for hole
[PATCH 14/15] readahead: reduce MMAP_LOTSAMISS for mmap read-around
[PATCH 15/15] readahead: pagecache context based mmap read-around

Documentation/kernel-parameters.txt | 4
block/blk-core.c | 3
block/genhd.c | 24 +
fs/fuse/inode.c | 2
fs/read_write.c | 3
include/linux/fs.h | 64 +++
include/linux/mm.h | 8
include/linux/radix-tree.h | 2
include/trace/events/readahead.h | 78 ++++
lib/radix-tree.c | 94 ++++-
mm/Kconfig | 13
mm/filemap.c | 30 +
mm/readahead.c | 459 ++++++++++++++++++++++----
13 files changed, 680 insertions(+), 104 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Wu Fengguang

unread,
Feb 23, 2010, 10:13:32 PM2/23/10
to Andrew Morton, Jens Axboe, Linus Torvalds, Wu Fengguang
readahead-lseek.patch

Wu Fengguang

unread,
Feb 23, 2010, 10:13:35 PM2/23/10
to Andrew Morton, Jens Axboe, Chris Frost, Steve VanDeBogart, KAMEZAWA Hiroyuki, Wu Fengguang
readahead-retain-pages-find_get_page.patch

Wu Fengguang

unread,
Feb 23, 2010, 10:14:05 PM2/23/10
to Andrew Morton, Jens Axboe, Nick Piggin, Wu Fengguang
readahead-mmap-around.patch

Wu Fengguang

unread,
Feb 23, 2010, 10:14:31 PM2/23/10
to Andrew Morton, Jens Axboe, Matt Mackall, Wu Fengguang
readahead-small-memory-limit.patch

Wu Fengguang

unread,
Feb 23, 2010, 10:14:35 PM2/23/10
to Andrew Morton, Jens Axboe, Ingo Molnar, Steven Rostedt, Peter Zijlstra, Wu Fengguang
readahead-tracer.patch

Wu Fengguang

unread,
Feb 23, 2010, 10:14:49 PM2/23/10
to Andrew Morton, Jens Axboe, Ankit Jain, Dave Chinner, Christian Ehrhardt, Nikanth Karthikesan, Wu Fengguang
readahead-kernel-parameter.patch

Wu Fengguang

unread,
Feb 23, 2010, 10:15:04 PM2/23/10
to Andrew Morton, Jens Axboe, Nick Piggin, Wu Fengguang
radixtree-radix_tree_lookup_leaf_node.patch

Wu Fengguang

unread,
Feb 23, 2010, 10:15:52 PM2/23/10
to Andrew Morton, Jens Axboe, Nick Piggin, Wu Fengguang
readahead-mmap-around-context.patch

Wu Fengguang

unread,
Feb 23, 2010, 10:16:10 PM2/23/10
to Andrew Morton, Jens Axboe, Chris Mason, Peter Zijlstra, Martin Schwidefsky, Paul Gortmaker, Matt Mackall, David Woodhouse, Christian Ehrhardt, Wu Fengguang
readahead-enlarge-default-size.patch

Rik van Riel

unread,
Feb 24, 2010, 10:19:13 PM2/24/10
to Wu Fengguang, Andrew Morton, Jens Axboe, Chris Frost, Steve VanDeBogart, KAMEZAWA Hiroyuki, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Christian Ehrhardt, Matt Mackall, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML
On 02/23/2010 10:10 PM, Wu Fengguang wrote:
> From: Chris Frost<fr...@cs.ucla.edu>
>
> Ensure that cached pages in the inactive list are not prematurely evicted;
> move such pages to lru head when they are covered by
> - in-kernel heuristic readahead
> - an posix_fadvise(POSIX_FADV_WILLNEED) hint from an application

> Signed-off-by: Chris Frost<fr...@cs.ucla.edu>
> Signed-off-by: Steve VanDeBogart<van...@cs.ucla.edu>
> Signed-off-by: KAMEZAWA Hiroyuki<kamezaw...@jp.fujitsu.com>
> Signed-off-by: Wu Fengguang<fenggu...@intel.com>

Acked-by: Rik van Riel <ri...@redhat.com>

When we get into the situation where readahead thrashing
would occur, we will end up evicting other stuff more
quickly from the inactive file list. However, that will
be the case either with or without this code...

Rik van Riel

unread,
Feb 24, 2010, 11:03:32 PM2/24/10
to Wu Fengguang, Andrew Morton, Jens Axboe, Chris Mason, Peter Zijlstra, Martin Schwidefsky, Paul Gortmaker, Matt Mackall, David Woodhouse, Christian Ehrhardt, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML
On 02/23/2010 10:10 PM, Wu Fengguang wrote:
> Use 512kb max readahead size, and 32kb min readahead size.
>
> The former helps io performance for common workloads.
> The latter will be used in the thrashing safe context readahead.

> CC: Jens Axboe<jens....@oracle.com>
> CC: Chris Mason<chris...@oracle.com>
> CC: Peter Zijlstra<a.p.zi...@chello.nl>
> CC: Martin Schwidefsky<schwi...@de.ibm.com>
> CC: Paul Gortmaker<paul.go...@windriver.com>
> CC: Matt Mackall<m...@selenic.com>
> CC: David Woodhouse<dw...@infradead.org>
> Tested-by: Vivek Goyal<vgo...@redhat.com>
> Tested-by: Christian Ehrhardt<ehrh...@linux.vnet.ibm.com>
> Acked-by: Christian Ehrhardt<ehrh...@linux.vnet.ibm.com>
> Signed-off-by: Wu Fengguang<fenggu...@intel.com>

Acked-by: Rik van Riel <ri...@redhat.com>

Wu Fengguang

unread,
Feb 25, 2010, 7:27:50 AM2/25/10
to Rik van Riel, Andrew Morton, Jens Axboe, Chris Frost, Steve VanDeBogart, KAMEZAWA Hiroyuki, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Christian Ehrhardt, Matt Mackall, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML
On Thu, Feb 25, 2010 at 11:17:41AM +0800, Rik van Riel wrote:
> On 02/23/2010 10:10 PM, Wu Fengguang wrote:
> > From: Chris Frost<fr...@cs.ucla.edu>
> >
> > Ensure that cached pages in the inactive list are not prematurely evicted;
> > move such pages to lru head when they are covered by
> > - in-kernel heuristic readahead
> > - an posix_fadvise(POSIX_FADV_WILLNEED) hint from an application
>
> > Signed-off-by: Chris Frost<fr...@cs.ucla.edu>
> > Signed-off-by: Steve VanDeBogart<van...@cs.ucla.edu>
> > Signed-off-by: KAMEZAWA Hiroyuki<kamezaw...@jp.fujitsu.com>
> > Signed-off-by: Wu Fengguang<fenggu...@intel.com>
>
> Acked-by: Rik van Riel <ri...@redhat.com>
>
> When we get into the situation where readahead thrashing
> would occur, we will end up evicting other stuff more
> quickly from the inactive file list. However, that will
> be the case either with or without this code...

Thanks. I'm actually not afraid of it adding memory pressure to the
readahead thrashing case. The context readahead (patch 07) can
adaptively control the memory pressure with or without this patch.

It does add memory pressure to mmap read-around. A typical read-around
request would cover some cached pages (whether or not they are
memory-mapped), and all those pages would be moved to LRU head by
this patch.

This somehow implicitly adds LRU lifetime to executable/lib pages.

Hopefully this won't behave too bad. And will be limited by
smaller readahead size in small memory systems (patch 05).

Thanks,
Fengguang

Rik van Riel

unread,
Feb 25, 2010, 10:02:00 AM2/25/10
to Wu Fengguang, Andrew Morton, Jens Axboe, Matt Mackall, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Christian Ehrhardt, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML
On 02/23/2010 10:10 PM, Wu Fengguang wrote:
> When lifting the default readahead size from 128KB to 512KB,
> make sure it won't add memory pressure to small memory systems.
>
> For read-ahead, the memory pressure is mainly readahead buffers consumed
> by too many concurrent streams. The context readahead can adapt
> readahead size to thrashing threshold well. So in principle we don't
> need to adapt the default _max_ read-ahead size to memory pressure.
>
> For read-around, the memory pressure is mainly read-around misses on
> executables/libraries. Which could be reduced by scaling down
> read-around size on fast "reclaim passes".
>
> This patch presents a straightforward solution: to limit default
> readahead size proportional to available system memory, ie.
> 512MB mem => 512KB readahead size
> 128MB mem => 128KB readahead size
> 32MB mem => 32KB readahead size (minimal)
>
> Strictly speaking, only read-around size has to be limited. However we
> don't bother to seperate read-around size from read-ahead size for now.
>
> CC: Matt Mackall<m...@selenic.com>
> Signed-off-by: Wu Fengguang<fenggu...@intel.com>

Acked-by: Rik van Riel <ri...@redhat.com>

--

Christian Ehrhardt

unread,
Feb 25, 2010, 10:26:33 AM2/25/10
to Wu Fengguang, Andrew Morton, Jens Axboe, Matt Mackall, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML

Wu Fengguang wrote:
> When lifting the default readahead size from 128KB to 512KB,
> make sure it won't add memory pressure to small memory systems.
>
> For read-ahead, the memory pressure is mainly readahead buffers consumed
> by too many concurrent streams. The context readahead can adapt
> readahead size to thrashing threshold well. So in principle we don't
> need to adapt the default _max_ read-ahead size to memory pressure.
>
> For read-around, the memory pressure is mainly read-around misses on
> executables/libraries. Which could be reduced by scaling down
> read-around size on fast "reclaim passes".
>
> This patch presents a straightforward solution: to limit default
> readahead size proportional to available system memory, ie.
> 512MB mem => 512KB readahead size
> 128MB mem => 128KB readahead size
> 32MB mem => 32KB readahead size (minimal)
>
> Strictly speaking, only read-around size has to be limited. However we
> don't bother to seperate read-around size from read-ahead size for now.
>
> CC: Matt Mackall <m...@selenic.com>
> Signed-off-by: Wu Fengguang <fenggu...@intel.com>

What I state here is for read ahead in a "multi iozone sequential"
setup, I can't speak for real "read around" workloads.
So probably your table is fine to cover read-around+read-ahead in one
number.

I have tested 256MB mem systems with 512kb readahead quite a lot.
On those 512kb is still by far superior to smaller readaheads and I
didn't see major trashing or memory pressure impact.

Therefore I would recommend a table like:
>=256MB mem => 512KB readahead size


128MB mem => 128KB readahead size
32MB mem => 32KB readahead size (minimal)

--

Gr�sse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance

Rik van Riel

unread,
Feb 25, 2010, 5:39:57 PM2/25/10
to Wu Fengguang, Andrew Morton, Jens Axboe, Ingo Molnar, Steven Rostedt, Peter Zijlstra, Chris Mason, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Christian Ehrhardt, Matt Mackall, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML
On 02/23/2010 10:10 PM, Wu Fengguang wrote:
> Example output:
>
> # echo 1> /debug/tracing/events/readahead/enable
> # cp test-file /dev/null
> # cat /debug/tracing/trace # trimmed output
> readahead-initial(dev=0:15, ino=100177, req=0+2, ra=0+4-2, async=0) = 4
> readahead-subsequent(dev=0:15, ino=100177, req=2+2, ra=4+8-8, async=1) = 8
> readahead-subsequent(dev=0:15, ino=100177, req=4+2, ra=12+16-16, async=1) = 16
> readahead-subsequent(dev=0:15, ino=100177, req=12+2, ra=28+32-32, async=1) = 32
> readahead-subsequent(dev=0:15, ino=100177, req=28+2, ra=60+60-60, async=1) = 24
> readahead-subsequent(dev=0:15, ino=100177, req=60+2, ra=120+60-60, async=1) = 0
>
> CC: Ingo Molnar<mi...@elte.hu>
> CC: Jens Axboe<jens....@oracle.com>
> CC: Steven Rostedt<ros...@goodmis.org>
> CC: Peter Zijlstra<a.p.zi...@chello.nl>
> Signed-off-by: Wu Fengguang<fenggu...@intel.com>

Acked-by: Rik van Riel <ri...@redhat.com>

--

Rik van Riel

unread,
Feb 25, 2010, 5:44:00 PM2/25/10
to Wu Fengguang, Andrew Morton, Jens Axboe, Linus Torvalds, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Christian Ehrhardt, Matt Mackall, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML
On 02/23/2010 10:10 PM, Wu Fengguang wrote:
> Some applications (eg. blkid, id3tool etc.) seek around the file
> to get information. For example, blkid does
> seek to 0
> read 1024
> seek to 1536
> read 16384
>
> The start-of-file readahead heuristic is wrong for them, whose
> access pattern can be identified by lseek() calls.
>
> So test-and-set a READAHEAD_LSEEK flag on lseek() and don't
> do start-of-file readahead on seeing it. Proposed by Linus.
>
> Acked-by: Linus Torvalds<torv...@linux-foundation.org>
> Signed-off-by: Wu Fengguang<fenggu...@intel.com>

Acked-by: Rik van Riel <ri...@redhat.com>

Rik van Riel

unread,
Feb 25, 2010, 6:15:29 PM2/25/10
to Wu Fengguang, Andrew Morton, Jens Axboe, Nick Piggin, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Christian Ehrhardt, Matt Mackall, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML
On 02/23/2010 10:10 PM, Wu Fengguang wrote:
> This will be used by the pagecache context based read-ahead/read-around
> heuristic to quickly check one pagecache range:
> - if there is any hole
> - if there is any pages
>
> Cc: Nick Piggin<nickp...@yahoo.com.au>
> Signed-off-by: Wu Fengguang<fenggu...@intel.com>

Acked-by: Rik van Riel <ri...@redhat.com>

--

Rik van Riel

unread,
Feb 25, 2010, 6:44:12 PM2/25/10
to Wu Fengguang, Andrew Morton, Jens Axboe, Nick Piggin, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Christian Ehrhardt, Matt Mackall, Linux Memory Management List, linux-...@vger.kernel.org, LKML
On 02/23/2010 10:10 PM, Wu Fengguang wrote:
> Now that we lifts readahead size from 128KB to 512KB,
> the MMAP_LOTSAMISS shall be shrinked accordingly.
>
> We shrink it a bit more, so that for sparse random access patterns,
> only 10*512KB or ~5MB memory will be wasted, instead of the previous
> 100*128KB or ~12MB. The new threshold "10" is still big enough to avoid
> turning off read-around for typical executable/lib page faults.
>
> CC: Nick Piggin<npi...@suse.de>
> Signed-off-by: Wu Fengguang<fenggu...@intel.com>

Acked-by: Rik van Riel <ri...@redhat.com>

Wu Fengguang

unread,
Feb 25, 2010, 9:29:20 PM2/25/10
to Christian Ehrhardt, Andrew Morton, Jens Axboe, Matt Mackall, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML, Rik van Riel
On Thu, Feb 25, 2010 at 11:25:54PM +0800, Christian Ehrhardt wrote:
>
>
> Wu Fengguang wrote:
> > When lifting the default readahead size from 128KB to 512KB,
> > make sure it won't add memory pressure to small memory systems.
> >
> > For read-ahead, the memory pressure is mainly readahead buffers consumed
> > by too many concurrent streams. The context readahead can adapt
> > readahead size to thrashing threshold well. So in principle we don't
> > need to adapt the default _max_ read-ahead size to memory pressure.
> >
> > For read-around, the memory pressure is mainly read-around misses on
> > executables/libraries. Which could be reduced by scaling down
> > read-around size on fast "reclaim passes".
> >
> > This patch presents a straightforward solution: to limit default
> > readahead size proportional to available system memory, ie.
> > 512MB mem => 512KB readahead size
> > 128MB mem => 128KB readahead size
> > 32MB mem => 32KB readahead size (minimal)
> >
> > Strictly speaking, only read-around size has to be limited. However we
> > don't bother to seperate read-around size from read-ahead size for now.
> >
> > CC: Matt Mackall <m...@selenic.com>
> > Signed-off-by: Wu Fengguang <fenggu...@intel.com>
>
> What I state here is for read ahead in a "multi iozone sequential"
> setup, I can't speak for real "read around" workloads.
> So probably your table is fine to cover read-around+read-ahead in one
> number.

OK.

> I have tested 256MB mem systems with 512kb readahead quite a lot.
> On those 512kb is still by far superior to smaller readaheads and I
> didn't see major trashing or memory pressure impact.

In fact I'd expect a 64MB box to also benefit from 512kb readahead :)

> Therefore I would recommend a table like:
> >=256MB mem => 512KB readahead size
> 128MB mem => 128KB readahead size
> 32MB mem => 32KB readahead size (minimal)

So, I'm fed up with compromising the read-ahead size with read-around
size.

There is no good to introduce a read-around size to confuse the user
though. Instead, I'll introduce a read-around size limit _on top of_
the readahead size. This will allow power users to adjust
read-ahead/read-around size at the same time, while saving the low end
from unnecessary memory pressure :) I made the assumption that low end
users have no need to request a large read-around size.

Thanks,
Fengguang
---
readahead: limit read-ahead size for small memory systems

When lifting the default readahead size from 128KB to 512KB,
make sure it won't add memory pressure to small memory systems.

For read-ahead, the memory pressure is mainly readahead buffers consumed
by too many concurrent streams. The context readahead can adapt
readahead size to thrashing threshold well. So in principle we don't
need to adapt the default _max_ read-ahead size to memory pressure.

For read-around, the memory pressure is mainly read-around misses on
executables/libraries. Which could be reduced by scaling down
read-around size on fast "reclaim passes".

This patch presents a straightforward solution: to limit default

read-ahead size proportional to available system memory, ie.


512MB mem => 512KB readahead size
128MB mem => 128KB readahead size
32MB mem => 32KB readahead size

CC: Matt Mackall <m...@selenic.com>
CC: Christian Ehrhardt <ehrh...@linux.vnet.ibm.com>
Signed-off-by: Wu Fengguang <fenggu...@intel.com>
---
mm/filemap.c | 2 +-
mm/readahead.c | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 1 deletion(-)

--- linux.orig/mm/filemap.c 2010-02-26 10:04:28.000000000 +0800
+++ linux/mm/filemap.c 2010-02-26 10:08:33.000000000 +0800
@@ -1431,7 +1431,7 @@ static void do_sync_mmap_readahead(struc
/*
* mmap read-around
*/
- ra_pages = max_sane_readahead(ra->ra_pages);
+ ra_pages = min(ra->ra_pages, roundup_pow_of_two(totalram_pages / 1024));
if (ra_pages) {
ra->start = max_t(long, 0, offset - ra_pages/2);
ra->size = ra_pages;

Wu Fengguang

unread,
Feb 25, 2010, 9:48:48 PM2/25/10
to Christian Ehrhardt, Andrew Morton, Jens Axboe, Matt Mackall, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML, Rik van Riel
> readahead: limit read-ahead size for small memory systems
>
> When lifting the default readahead size from 128KB to 512KB,
> make sure it won't add memory pressure to small memory systems.

btw, I wrote some comments to summarize the now complex readahead size
rules..

==
readahead: add notes on readahead size

Basically, currently the default max readahead size
- is 512k
- is boot time configurable with "readahead="
and is auto scaled down:
- for small devices
- for small memory systems (read-around size alone)

CC: Matt Mackall <m...@selenic.com>
CC: Christian Ehrhardt <ehrh...@linux.vnet.ibm.com>
Signed-off-by: Wu Fengguang <fenggu...@intel.com>
---

mm/readahead.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

--- linux.orig/mm/readahead.c 2010-02-26 10:11:41.000000000 +0800
+++ linux/mm/readahead.c 2010-02-26 10:11:55.000000000 +0800
@@ -7,6 +7,28 @@
* Initial version.
*/

+/*
+ * Notes on readahead size.
+ *
+ * The default max readahead size is VM_MAX_READAHEAD=512k,
+ * which can be changed by user with boot time parameter "readahead="
+ * or runtime interface "/sys/devices/virtual/bdi/default/read_ahead_kb".
+ * The latter normally only takes effect in future for hot added devices.
+ *
+ * The effective max readahead size for each block device can be accessed with
+ * 1) the `blockdev` command
+ * 2) /sys/block/sda/queue/read_ahead_kb
+ * 3) /sys/devices/virtual/bdi/$(env stat -c '%t:%T' /dev/sda)/read_ahead_kb
+ *
+ * They are typically initialized with the global default size, however may be
+ * auto scaled down for small devices in add_disk(). NFS, software RAID, btrfs
+ * etc. have special rules to setup their default readahead size.
+ *
+ * The mmap read-around size typically equals with readahead size, with an
+ * extra limit proportional to system memory size. For example, a 64MB box
+ * will have a 64KB read-around size limit, 128MB mem => 128KB limit, etc.
+ */
+
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/memcontrol.h>

Christian Ehrhardt

unread,
Feb 26, 2010, 2:23:59 AM2/26/10
to Wu Fengguang, Andrew Morton, Jens Axboe, Matt Mackall, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML, Rik van Riel
Unfortunately without a chance to measure this atm, this patch now looks
really good to me.
Thanks for adapting it to a read-ahead only per mem limit.
Acked-by: Christian Ehrhardt <ehrh...@linux.vnet.ibm.com>

--

Gr�sse / regards, Christian Ehrhardt


IBM Linux Technology Center, System z Linux Performance

Wu Fengguang

unread,
Feb 26, 2010, 2:38:21 AM2/26/10
to Christian Ehrhardt, Andrew Morton, Jens Axboe, Matt Mackall, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Vivek Goyal, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML, Rik van Riel
Christian,

On Fri, Feb 26, 2010 at 03:23:40PM +0800, Christian Ehrhardt wrote:
> Unfortunately without a chance to measure this atm, this patch now looks
> really good to me.
> Thanks for adapting it to a read-ahead only per mem limit.
> Acked-by: Christian Ehrhardt <ehrh...@linux.vnet.ibm.com>

Thank you. Effective measurement is hard because it really depends on
how the user want to stress use his small memory system ;) So I think
a simple to understand and yet reasonable limit scheme would be OK.

Thanks,
Fengguang
---
readahead: limit read-ahead size for small memory systems

When lifting the default readahead size from 128KB to 512KB,
make sure it won't add memory pressure to small memory systems.

For read-ahead, the memory pressure is mainly readahead buffers consumed
by too many concurrent streams. The context readahead can adapt
readahead size to thrashing threshold well. So in principle we don't
need to adapt the default _max_ read-ahead size to memory pressure.

For read-around, the memory pressure is mainly read-around misses on
executables/libraries. Which could be reduced by scaling down
read-around size on fast "reclaim passes".

This patch presents a straightforward solution: to limit default
read-ahead size proportional to available system memory, ie.

512MB mem => 512KB read-around size
128MB mem => 128KB read-around size
32MB mem => 32KB read-around size

This will allow power users to adjust read-ahead/read-around size at

once, while saving the low end from unnecessary memory pressure, under


the assumption that low end users have no need to request a large
read-around size.

CC: Matt Mackall <m...@selenic.com>
Acked-by: Christian Ehrhardt <ehrh...@linux.vnet.ibm.com>


Signed-off-by: Wu Fengguang <fenggu...@intel.com>
---
mm/filemap.c | 2 +-
mm/readahead.c | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 1 deletion(-)

--- linux.orig/mm/filemap.c 2010-02-26 10:04:28.000000000 +0800
+++ linux/mm/filemap.c 2010-02-26 10:08:33.000000000 +0800
@@ -1431,7 +1431,7 @@ static void do_sync_mmap_readahead(struc
/*
* mmap read-around
*/
- ra_pages = max_sane_readahead(ra->ra_pages);
+ ra_pages = min(ra->ra_pages, roundup_pow_of_two(totalram_pages / 1024));
if (ra_pages) {
ra->start = max_t(long, 0, offset - ra_pages/2);
ra->size = ra_pages;
--

Vivek Goyal

unread,
Feb 26, 2010, 9:18:57 AM2/26/10
to Wu Fengguang, Christian Ehrhardt, Andrew Morton, Jens Axboe, Matt Mackall, Chris Mason, Peter Zijlstra, Clemens Ladisch, Olivier Galibert, Nick Piggin, Linux Memory Management List, linux-...@vger.kernel.org, LKML, Rik van Riel

Great. I was confused among so many ways to control read ahead size. This
documentation helps a lot.

Vivek

0 new messages