I've been looking into this issue with Anthony, and I wanted to follow
up with some of our findings and some additional questions.
The really peculiar thing about the /proc/meminfo output when the device
becomes unresponsive is the delta between the page cache size and the
number of pages in the file lrus. One example:
MemTotal: 177036 kB
MemFree: 2096 kB
Buffers: 104 kB
Cached: 34356 kB
Active: 70756 kB
Inactive: 72688 kB
Active(anon): 70136 kB
Inactive(anon): 70368 kB
Active(file): 620 kB
Inactive(file): 2320 kB
So a very large portion of the page cache is occupied by pages
containing something other than cached data from media-backed
filesystems, and it's not buffer cache data either. If we run 'stop;
echo 1 > /proc/sys/vm/drop_caches', this delta is reduced to reasonable
levels.
One thing we'd really like to do is track down where all these anonymous
pages in the page cache are coming from. The only other data I'm aware
of that is accounted to the page cache is from tmpfs. But the delta
here is awfully big. Does anyone know of any other data that resides in
the page cache? Does anyone know of any way to analyze the page cache
to find out the sources of the pages it contains? I did write some code
to get statistics about the pages used by ashmem via a proc node, and
that only accounted for about 30kB in the case quoted above. We also
know that the initramfs and userland tmpfs mounts only account for a
couple of megabytes.
The other odd thing we've noticed is that when we're in this state, the
lowmemorykiller isn't killing any processes. This is because
global_page_state(NR_FILE_PAGES) is still above the highest value we
have defined in lowmem_minfree[], but in reality we are extremely low on
free and reclaimable pages due to most of the pages represented in
NR_FILE_PAGES being pinned into the page cache. It seems that maybe
NR_FILE_PAGES isn't the best value to use in systems without swap. I'm
not an expert on the page cache, but it seems like the method below
might be a better way to go. It certainly helps our situation a lot.
Any comments?
Thanks,
Seth
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
index 39d5e65..360d3a0 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -78,6 +78,16 @@ task_notify_func(struct notifier_block *self, unsigned long val, void *data)
return NOTIFY_OK;
}
+static inline int other_file_pages(void)
+{
+#ifdef CONFIG_SWAP
+ return global_page_state(NR_FILE_PAGES);
+#else
+ return global_page_state(NR_ACTIVE_FILE) +
+ global_page_state(NR_INACTIVE_FILE);
+#endif
+}
+
static int lowmem_shrink(int nr_to_scan, gfp_t gfp_mask)
{
struct task_struct *p;
@@ -90,7 +100,7 @@ static int lowmem_shrink(int nr_to_scan, gfp_t gfp_mask)
int selected_oom_adj;
int array_size = ARRAY_SIZE(lowmem_adj);
int other_free = global_page_state(NR_FREE_PAGES);
- int other_file = global_page_state(NR_FILE_PAGES);
+ int other_file = other_file_pages();
/*
* If we already have a death outstanding, then