This enlarges the default readahead size from 128K to 512K.
To avoid possible regressions, also do
- scale down readahead size on small device and small memory
- thrashing safe context readahead
- add readahead tracing/stats support to help expose possible problems
Besides, the patchset also includes several algorithm updates:
- no start-of-file readahead after lseek
- faster radix_tree_next_hole()/radix_tree_prev_hole()
- pagecache context based mmap read-around
Changes since v1:
- update mmap read-around heuristics (Thanks to Nick Piggin)
- radix_tree_lookup_leaf_node() for the pagecache based mmap read-around
- use __print_symbolic() to show readahead pattern names
(Thanks to Steven Rostedt)
- scale down readahead size proportional to system memory
(Thanks to Matt Mackall)
- add readahead size kernel parameter (by Nikanth Karthikesan)
- add comments from Christian Ehrhardt
Changes since RFC:
- move the lenthy intro text to individual patch changelogs
- treat get_capacity()==0 as uninitilized value (Thanks to Vivek Goyal)
- increase readahead size limit for small devices (Thanks to Jens Axboe)
- add fio test results by Vivek Goyal
[PATCH 01/15] readahead: limit readahead size for small devices
[PATCH 02/15] readahead: retain inactive lru pages to be accessed soon
[PATCH 03/15] readahead: bump up the default readahead size
[PATCH 04/15] readahead: make default readahead size a kernel parameter
[PATCH 05/15] readahead: limit readahead size for small memory systems
[PATCH 06/15] readahead: replace ra->mmap_miss with ra->ra_flags
[PATCH 07/15] readahead: thrashing safe context readahead
[PATCH 08/15] readahead: record readahead patterns
[PATCH 09/15] readahead: add tracing event
[PATCH 10/15] readahead: add /debug/readahead/stats
[PATCH 11/15] readahead: dont do start-of-file readahead after lseek()
[PATCH 12/15] radixtree: introduce radix_tree_lookup_leaf_node()
[PATCH 13/15] radixtree: speed up the search for hole
[PATCH 14/15] readahead: reduce MMAP_LOTSAMISS for mmap read-around
[PATCH 15/15] readahead: pagecache context based mmap read-around
Documentation/kernel-parameters.txt | 4
block/blk-core.c | 3
block/genhd.c | 24 +
fs/fuse/inode.c | 2
fs/read_write.c | 3
include/linux/fs.h | 64 +++
include/linux/mm.h | 8
include/linux/radix-tree.h | 2
include/trace/events/readahead.h | 78 ++++
lib/radix-tree.c | 94 ++++-
mm/Kconfig | 13
mm/filemap.c | 30 +
mm/readahead.c | 459 ++++++++++++++++++++++----
13 files changed, 680 insertions(+), 104 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
> Signed-off-by: Chris Frost<fr...@cs.ucla.edu>
> Signed-off-by: Steve VanDeBogart<van...@cs.ucla.edu>
> Signed-off-by: KAMEZAWA Hiroyuki<kamezaw...@jp.fujitsu.com>
> Signed-off-by: Wu Fengguang<fenggu...@intel.com>
Acked-by: Rik van Riel <ri...@redhat.com>
When we get into the situation where readahead thrashing
would occur, we will end up evicting other stuff more
quickly from the inactive file list. However, that will
be the case either with or without this code...
> CC: Jens Axboe<jens....@oracle.com>
> CC: Chris Mason<chris...@oracle.com>
> CC: Peter Zijlstra<a.p.zi...@chello.nl>
> CC: Martin Schwidefsky<schwi...@de.ibm.com>
> CC: Paul Gortmaker<paul.go...@windriver.com>
> CC: Matt Mackall<m...@selenic.com>
> CC: David Woodhouse<dw...@infradead.org>
> Tested-by: Vivek Goyal<vgo...@redhat.com>
> Tested-by: Christian Ehrhardt<ehrh...@linux.vnet.ibm.com>
> Acked-by: Christian Ehrhardt<ehrh...@linux.vnet.ibm.com>
> Signed-off-by: Wu Fengguang<fenggu...@intel.com>
Acked-by: Rik van Riel <ri...@redhat.com>
Thanks. I'm actually not afraid of it adding memory pressure to the
readahead thrashing case. The context readahead (patch 07) can
adaptively control the memory pressure with or without this patch.
It does add memory pressure to mmap read-around. A typical read-around
request would cover some cached pages (whether or not they are
memory-mapped), and all those pages would be moved to LRU head by
this patch.
This somehow implicitly adds LRU lifetime to executable/lib pages.
Hopefully this won't behave too bad. And will be limited by
smaller readahead size in small memory systems (patch 05).
Thanks,
Fengguang
Acked-by: Rik van Riel <ri...@redhat.com>
--
Wu Fengguang wrote:
> When lifting the default readahead size from 128KB to 512KB,
> make sure it won't add memory pressure to small memory systems.
>
> For read-ahead, the memory pressure is mainly readahead buffers consumed
> by too many concurrent streams. The context readahead can adapt
> readahead size to thrashing threshold well. So in principle we don't
> need to adapt the default _max_ read-ahead size to memory pressure.
>
> For read-around, the memory pressure is mainly read-around misses on
> executables/libraries. Which could be reduced by scaling down
> read-around size on fast "reclaim passes".
>
> This patch presents a straightforward solution: to limit default
> readahead size proportional to available system memory, ie.
> 512MB mem => 512KB readahead size
> 128MB mem => 128KB readahead size
> 32MB mem => 32KB readahead size (minimal)
>
> Strictly speaking, only read-around size has to be limited. However we
> don't bother to seperate read-around size from read-ahead size for now.
>
> CC: Matt Mackall <m...@selenic.com>
> Signed-off-by: Wu Fengguang <fenggu...@intel.com>
What I state here is for read ahead in a "multi iozone sequential"
setup, I can't speak for real "read around" workloads.
So probably your table is fine to cover read-around+read-ahead in one
number.
I have tested 256MB mem systems with 512kb readahead quite a lot.
On those 512kb is still by far superior to smaller readaheads and I
didn't see major trashing or memory pressure impact.
Therefore I would recommend a table like:
>=256MB mem => 512KB readahead size
128MB mem => 128KB readahead size
32MB mem => 32KB readahead size (minimal)
--
Gr�sse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
Acked-by: Rik van Riel <ri...@redhat.com>
--
Acked-by: Rik van Riel <ri...@redhat.com>
Acked-by: Rik van Riel <ri...@redhat.com>
--
Acked-by: Rik van Riel <ri...@redhat.com>
OK.
> I have tested 256MB mem systems with 512kb readahead quite a lot.
> On those 512kb is still by far superior to smaller readaheads and I
> didn't see major trashing or memory pressure impact.
In fact I'd expect a 64MB box to also benefit from 512kb readahead :)
> Therefore I would recommend a table like:
> >=256MB mem => 512KB readahead size
> 128MB mem => 128KB readahead size
> 32MB mem => 32KB readahead size (minimal)
So, I'm fed up with compromising the read-ahead size with read-around
size.
There is no good to introduce a read-around size to confuse the user
though. Instead, I'll introduce a read-around size limit _on top of_
the readahead size. This will allow power users to adjust
read-ahead/read-around size at the same time, while saving the low end
from unnecessary memory pressure :) I made the assumption that low end
users have no need to request a large read-around size.
Thanks,
Fengguang
---
readahead: limit read-ahead size for small memory systems
When lifting the default readahead size from 128KB to 512KB,
make sure it won't add memory pressure to small memory systems.
For read-ahead, the memory pressure is mainly readahead buffers consumed
by too many concurrent streams. The context readahead can adapt
readahead size to thrashing threshold well. So in principle we don't
need to adapt the default _max_ read-ahead size to memory pressure.
For read-around, the memory pressure is mainly read-around misses on
executables/libraries. Which could be reduced by scaling down
read-around size on fast "reclaim passes".
This patch presents a straightforward solution: to limit default
read-ahead size proportional to available system memory, ie.
512MB mem => 512KB readahead size
128MB mem => 128KB readahead size
32MB mem => 32KB readahead size
CC: Matt Mackall <m...@selenic.com>
CC: Christian Ehrhardt <ehrh...@linux.vnet.ibm.com>
Signed-off-by: Wu Fengguang <fenggu...@intel.com>
---
mm/filemap.c | 2 +-
mm/readahead.c | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 1 deletion(-)
--- linux.orig/mm/filemap.c 2010-02-26 10:04:28.000000000 +0800
+++ linux/mm/filemap.c 2010-02-26 10:08:33.000000000 +0800
@@ -1431,7 +1431,7 @@ static void do_sync_mmap_readahead(struc
/*
* mmap read-around
*/
- ra_pages = max_sane_readahead(ra->ra_pages);
+ ra_pages = min(ra->ra_pages, roundup_pow_of_two(totalram_pages / 1024));
if (ra_pages) {
ra->start = max_t(long, 0, offset - ra_pages/2);
ra->size = ra_pages;
btw, I wrote some comments to summarize the now complex readahead size
rules..
==
readahead: add notes on readahead size
Basically, currently the default max readahead size
- is 512k
- is boot time configurable with "readahead="
and is auto scaled down:
- for small devices
- for small memory systems (read-around size alone)
CC: Matt Mackall <m...@selenic.com>
CC: Christian Ehrhardt <ehrh...@linux.vnet.ibm.com>
Signed-off-by: Wu Fengguang <fenggu...@intel.com>
---
mm/readahead.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
--- linux.orig/mm/readahead.c 2010-02-26 10:11:41.000000000 +0800
+++ linux/mm/readahead.c 2010-02-26 10:11:55.000000000 +0800
@@ -7,6 +7,28 @@
* Initial version.
*/
+/*
+ * Notes on readahead size.
+ *
+ * The default max readahead size is VM_MAX_READAHEAD=512k,
+ * which can be changed by user with boot time parameter "readahead="
+ * or runtime interface "/sys/devices/virtual/bdi/default/read_ahead_kb".
+ * The latter normally only takes effect in future for hot added devices.
+ *
+ * The effective max readahead size for each block device can be accessed with
+ * 1) the `blockdev` command
+ * 2) /sys/block/sda/queue/read_ahead_kb
+ * 3) /sys/devices/virtual/bdi/$(env stat -c '%t:%T' /dev/sda)/read_ahead_kb
+ *
+ * They are typically initialized with the global default size, however may be
+ * auto scaled down for small devices in add_disk(). NFS, software RAID, btrfs
+ * etc. have special rules to setup their default readahead size.
+ *
+ * The mmap read-around size typically equals with readahead size, with an
+ * extra limit proportional to system memory size. For example, a 64MB box
+ * will have a 64KB read-around size limit, 128MB mem => 128KB limit, etc.
+ */
+
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/memcontrol.h>
--
Gr�sse / regards, Christian Ehrhardt
IBM Linux Technology Center, System z Linux Performance
On Fri, Feb 26, 2010 at 03:23:40PM +0800, Christian Ehrhardt wrote:
> Unfortunately without a chance to measure this atm, this patch now looks
> really good to me.
> Thanks for adapting it to a read-ahead only per mem limit.
> Acked-by: Christian Ehrhardt <ehrh...@linux.vnet.ibm.com>
Thank you. Effective measurement is hard because it really depends on
how the user want to stress use his small memory system ;) So I think
a simple to understand and yet reasonable limit scheme would be OK.
Thanks,
Fengguang
---
readahead: limit read-ahead size for small memory systems
When lifting the default readahead size from 128KB to 512KB,
make sure it won't add memory pressure to small memory systems.
For read-ahead, the memory pressure is mainly readahead buffers consumed
by too many concurrent streams. The context readahead can adapt
readahead size to thrashing threshold well. So in principle we don't
need to adapt the default _max_ read-ahead size to memory pressure.
For read-around, the memory pressure is mainly read-around misses on
executables/libraries. Which could be reduced by scaling down
read-around size on fast "reclaim passes".
This patch presents a straightforward solution: to limit default
read-ahead size proportional to available system memory, ie.
512MB mem => 512KB read-around size
128MB mem => 128KB read-around size
32MB mem => 32KB read-around size
This will allow power users to adjust read-ahead/read-around size at
once, while saving the low end from unnecessary memory pressure, under
the assumption that low end users have no need to request a large
read-around size.
CC: Matt Mackall <m...@selenic.com>
Acked-by: Christian Ehrhardt <ehrh...@linux.vnet.ibm.com>
Signed-off-by: Wu Fengguang <fenggu...@intel.com>
---
mm/filemap.c | 2 +-
mm/readahead.c | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 1 deletion(-)
--- linux.orig/mm/filemap.c 2010-02-26 10:04:28.000000000 +0800
+++ linux/mm/filemap.c 2010-02-26 10:08:33.000000000 +0800
@@ -1431,7 +1431,7 @@ static void do_sync_mmap_readahead(struc
/*
* mmap read-around
*/
- ra_pages = max_sane_readahead(ra->ra_pages);
+ ra_pages = min(ra->ra_pages, roundup_pow_of_two(totalram_pages / 1024));
if (ra_pages) {
ra->start = max_t(long, 0, offset - ra_pages/2);
ra->size = ra_pages;
--
Great. I was confused among so many ways to control read ahead size. This
documentation helps a lot.
Vivek