Swappiness vs. mmap() and interactive response

Elladan

unread,

Apr 28, 2009, 1:00:12 AM4/28/09

to

Hi,

So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
and then I did the following (with XFS over LVM):

mv /500gig/of/data/on/disk/one /disk/two

This quickly caused the system to. grind.. to... a.... complete..... halt.
Basically every UI operation, including the mouse in Xorg, started experiencing
multiple second lag and delays. This made the system essentially unusable --
for example, just flipping to the window where the "mv" command was running
took 10 seconds on more than one occasion. Basically a "click and get coffee"
interface.

There was no particular kernel CPU load -- the SATA DMA seemed fine.

If I actively used the GUI, then the pieces I was using would work better, but
they'd start experiencing astonishing latency again if I just let the UI sit
for a little while. From this, I diagnosed that the problem was probably
related to the VM paging out my GUI.

Next, I set the following:

echo 0 > /proc/sys/vm/swappiness

... hoping it would prevent paging out of the UI in favor of file data that's
only used once. It did appear to help to a small degree, but not much. The
system is still effectively unusable while a file copy is going on.

From this, I diagnosed that most likely, the kernel was paging out all my
application file mmap() data (such as my executables and shared libraries) in
favor of total garbage VM load from the file copy.

I don't know how to verify that this is true definitively. Are there some
magic numbers in /proc I can look at? However, I did run latencytop, and it
showed massive 2000+ msec latency in the page fault handler, as well as in
various operations such as XFS read.

Could this be something else? There were some long delays in latencytop from
various apps doing fsync as well, but it seems unlikely that this would destroy
latency in Xorg, and again, latency improved whenever I touched an app, for
that app.

Is there any way to fix this, short of rewriting the VM myself? For example,
is there some way I could convince this VM that pages with active mappings are
valuable?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

KOSAKI Motohiro

unread,

Apr 28, 2009, 1:40:07 AM4/28/09

to

(cc to linux-mm and Rik)

> Hi,
>
> So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> and then I did the following (with XFS over LVM):
>
> mv /500gig/of/data/on/disk/one /disk/two
>
> This quickly caused the system to. grind.. to... a.... complete..... halt.
> Basically every UI operation, including the mouse in Xorg, started experiencing
> multiple second lag and delays. This made the system essentially unusable --
> for example, just flipping to the window where the "mv" command was running
> took 10 seconds on more than one occasion. Basically a "click and get coffee"
> interface.

I have some question and request.

1. please post your /proc/meminfo
2. Do above copy make tons swap-out? IOW your disk read much faster than write?
3. cache limitation of memcgroup solve this problem?
4. Which disk have your /bin and /usr/bin?

Elladan

unread,

Apr 28, 2009, 2:40:11 AM4/28/09

to

On Tue, Apr 28, 2009 at 02:35:29PM +0900, KOSAKI Motohiro wrote:
> (cc to linux-mm and Rik)
>
> > Hi,
> >
> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > and then I did the following (with XFS over LVM):
> >
> > mv /500gig/of/data/on/disk/one /disk/two
> >
> > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > Basically every UI operation, including the mouse in Xorg, started experiencing
> > multiple second lag and delays. This made the system essentially unusable --
> > for example, just flipping to the window where the "mv" command was running
> > took 10 seconds on more than one occasion. Basically a "click and get coffee"
> > interface.
>
> I have some question and request.
>
> 1. please post your /proc/meminfo
> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> 3. cache limitation of memcgroup solve this problem?
> 4. Which disk have your /bin and /usr/bin?

I'll answer these out of order if you don't mind.

2. Do above copy make tons swap-out? IOW your disk read much faster than write?

The disks should be roughly similar. However:

sda is the read disk, sdb is the write. Here's a few snippets from iostat -xm 10

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 67.70 0.00 373.10 0.20 48.47 0.00 265.90 1.94 5.21 2.10 78.32
sdb 0.00 1889.60 0.00 139.80 0.00 52.52 769.34 35.01 250.45 5.17 72.28
---
sda 5.30 0.00 483.80 0.30 60.65 0.00 256.59 1.59 3.28 1.65 79.72
sdb 0.00 3632.70 0.00 171.10 0.00 61.10 731.39 117.09 709.66 5.84 100.00
---
sda 51.20 0.00 478.10 1.00 65.79 0.01 281.27 2.48 5.18 1.96 93.72
sdb 0.00 2104.60 0.00 174.80 0.00 62.84 736.28 108.50 613.64 5.72 100.00
--
sda 153.20 0.00 349.40 0.20 60.99 0.00 357.30 4.47 13.19 2.85 99.80
sdb 0.00 1766.50 0.00 158.60 0.00 59.89 773.34 110.07 672.25 6.30 99.96

This data seems to indicate the IO performance varies, but the reader is usually faster.

4. Which disk have your /bin and /usr/bin?

sda, the reader.

3. cache limitation of memcgroup solve this problem?

I was unable to get this to work -- do you have some documentation handy?

1. please post your /proc/meminfo

$ cat /proc/meminfo
MemTotal: 3467668 kB
MemFree: 20164 kB
Buffers: 204 kB
Cached: 2295232 kB
SwapCached: 4012 kB
Active: 639608 kB
Inactive: 2620880 kB
Active(anon): 608104 kB
Inactive(anon): 360812 kB
Active(file): 31504 kB
Inactive(file): 2260068 kB
Unevictable: 8 kB
Mlocked: 8 kB
SwapTotal: 4194296 kB
SwapFree: 4186968 kB
Dirty: 147280 kB
Writeback: 8424 kB
AnonPages: 961280 kB
Mapped: 39016 kB
Slab: 81904 kB
SReclaimable: 59044 kB
SUnreclaim: 22860 kB
PageTables: 20548 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 5928128 kB
Committed_AS: 1770348 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 281908 kB
VmallocChunk: 34359449059 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 44928 kB
DirectMap2M: 3622912 kB

KOSAKI Motohiro

unread,

Apr 28, 2009, 3:00:20 AM4/28/09

to

Hi

> 3. cache limitation of memcgroup solve this problem?
>
> I was unable to get this to work -- do you have some documentation handy?

Do you have kernel source tarball?
Documentation/cgroups/memory.txt explain usage kindly.

Elladan

unread,

Apr 28, 2009, 3:30:14 AM4/28/09

to

On Tue, Apr 28, 2009 at 03:52:29PM +0900, KOSAKI Motohiro wrote:
> Hi
>
> > 3. cache limitation of memcgroup solve this problem?
> >
> > I was unable to get this to work -- do you have some documentation handy?
>
> Do you have kernel source tarball?
> Documentation/cgroups/memory.txt explain usage kindly.

Thank you. My documentation was out of date.

I created a cgroup with limited memory and placed a copy command in it, and the
latency problem seems to essentially go away. However, I'm also a bit
suspicious that my test might have become invalid, since my IO performance
seems to have dropped somewhat too.

So, am I right in concluding that this more or less implicates bad page
replacement as the culprit? After I dropped vm caches and let my working set
re-form, the memory cgroup seems to be effective at keeping a large pool of
memory free from file pressure.

KOSAKI Motohiro

unread,

Apr 28, 2009, 3:50:04 AM4/28/09

to

> On Tue, Apr 28, 2009 at 03:52:29PM +0900, KOSAKI Motohiro wrote:
> > Hi
> >
> > > 3. cache limitation of memcgroup solve this problem?
> > >
> > > I was unable to get this to work -- do you have some documentation handy?
> >
> > Do you have kernel source tarball?
> > Documentation/cgroups/memory.txt explain usage kindly.
>
> Thank you. My documentation was out of date.
>
> I created a cgroup with limited memory and placed a copy command in it, and the
> latency problem seems to essentially go away. However, I'm also a bit
> suspicious that my test might have become invalid, since my IO performance
> seems to have dropped somewhat too.
>
> So, am I right in concluding that this more or less implicates bad page
> replacement as the culprit? After I dropped vm caches and let my working set
> re-form, the memory cgroup seems to be effective at keeping a large pool of
> memory free from file pressure.

Hmm..
it seems your result mean bad page replacement occur. but actually
I hevn't seen such result on my environment.

Hmm, I think I need to make reproduce environmet to your trouble.

Thanks.

Peter Zijlstra

unread,

Apr 28, 2009, 3:50:07 AM4/28/09

to

On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> (cc to linux-mm and Rik)
>
>
> > Hi,
> >
> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > and then I did the following (with XFS over LVM):
> >
> > mv /500gig/of/data/on/disk/one /disk/two
> >
> > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > Basically every UI operation, including the mouse in Xorg, started experiencing
> > multiple second lag and delays. This made the system essentially unusable --
> > for example, just flipping to the window where the "mv" command was running
> > took 10 seconds on more than one occasion. Basically a "click and get coffee"
> > interface.
>
> I have some question and request.
>
> 1. please post your /proc/meminfo
> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> 3. cache limitation of memcgroup solve this problem?
> 4. Which disk have your /bin and /usr/bin?
>

FWIW I fundamentally object to 3 as being a solution.

I still think the idea of read-ahead driven drop-behind is a good one,
alas last time we brought that up people thought differently.

Balbir Singh

unread,

Apr 28, 2009, 4:00:19 AM4/28/09

to

On Tue, Apr 28, 2009 at 1:18 PM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
>> (cc to linux-mm and Rik)
>>
>>
>> > Hi,
>> >
>> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
>> > and then I did the following (with XFS over LVM):
>> >
>> > mv /500gig/of/data/on/disk/one /disk/two
>> >
>> > This quickly caused the system to. grind.. to... a.... complete..... halt.
>> > Basically every UI operation, including the mouse in Xorg, started experiencing
>> > multiple second lag and delays. �This made the system essentially unusable --
>> > for example, just flipping to the window where the "mv" command was running
>> > took 10 seconds on more than one occasion. �Basically a "click and get coffee"
>> > interface.
>>
>> I have some question and request.
>>
>> 1. please post your /proc/meminfo
>> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
>> 3. cache limitation of memcgroup solve this problem?
>> 4. Which disk have your /bin and /usr/bin?
>>
>
> FWIW I fundamentally object to 3 as being a solution.
>

memcgroup were not created to solve latency problems, but they do
isolate memory and if that helps latency, I don't see why that is a
problem. I don't think isolating applications that we think are not
important and interfere or consume more resources than desired is a
bad solution.

> I still think the idea of read-ahead driven drop-behind is a good one,
> alas last time we brought that up people thought differently.

I vaguely remember the patches, but can't recollect the details.

Balbir

KOSAKI Motohiro

unread,

Apr 28, 2009, 4:10:12 AM4/28/09

to

> > 1. please post your /proc/meminfo
> > 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> > 3. cache limitation of memcgroup solve this problem?
> > 4. Which disk have your /bin and /usr/bin?
> >
>
> FWIW I fundamentally object to 3 as being a solution.

Yes, I also think so.

> I still think the idea of read-ahead driven drop-behind is a good one,
> alas last time we brought that up people thought differently.

hmm.
sorry, I can't recall this patch. do you have any pointer or url?

Peter Zijlstra

unread,

Apr 28, 2009, 4:20:08 AM4/28/09

to

So being able to isolate is a good excuse for poor replacement these
days?

Also, exactly because its isolated/limited its sub-optimal.

> > I still think the idea of read-ahead driven drop-behind is a good one,
> > alas last time we brought that up people thought differently.
>
> I vaguely remember the patches, but can't recollect the details.

A quick google gave me this:

http://lkml.org/lkml/2007/7/21/219

Balbir Singh

unread,

Apr 28, 2009, 4:30:14 AM4/28/09

to

Nope.. I am not saying that. Poor replacement needs to be fixed, but
unfortunately that is very dependent of the nature of the workload,
poor for one might be good for another, of course there is always the
middle ground based on our understanding of desired behaviour. Having
said that, isolating unimportant tasks might be a trade-off that
works, it *does not* replace the good algorithms we need to have a
default, but provides manual control of an otherwise auto piloted
system. With virtualization mixed workloads are becoming more common
on the system.

Providing the swappiness knob for example is needed because sometimes
the user does know what he/she needs.

> Also, exactly because its isolated/limited its sub-optimal.
>
>
>> > I still think the idea of read-ahead driven drop-behind is a good one,
>> > alas last time we brought that up people thought differently.
>>
>> I vaguely remember the patches, but can't recollect the details.
>
> A quick google gave me this:
>
> �http://lkml.org/lkml/2007/7/21/219

Thanks! That was quick

KAMEZAWA Hiroyuki

unread,

Apr 28, 2009, 4:30:18 AM4/28/09

to

On Tue, 28 Apr 2009 10:11:32 +0200
Peter Zijlstra <pet...@infradead.org> wrote:

While the kernel can't catch what's going on and what's wanted.

Thanks,
-Kame

Wu Fengguang

unread,

Apr 28, 2009, 5:20:07 AM4/28/09

to

On Tue, Apr 28, 2009 at 09:48:39AM +0200, Peter Zijlstra wrote:
> On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> > (cc to linux-mm and Rik)
> >
> >
> > > Hi,
> > >
> > > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > > and then I did the following (with XFS over LVM):
> > >
> > > mv /500gig/of/data/on/disk/one /disk/two
> > >
> > > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > > Basically every UI operation, including the mouse in Xorg, started experiencing
> > > multiple second lag and delays. This made the system essentially unusable --
> > > for example, just flipping to the window where the "mv" command was running
> > > took 10 seconds on more than one occasion. Basically a "click and get coffee"
> > > interface.
> >
> > I have some question and request.
> >
> > 1. please post your /proc/meminfo
> > 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> > 3. cache limitation of memcgroup solve this problem?
> > 4. Which disk have your /bin and /usr/bin?
> >
>
> FWIW I fundamentally object to 3 as being a solution.
>
> I still think the idea of read-ahead driven drop-behind is a good one,
> alas last time we brought that up people thought differently.

The semi-drop-behind is a great idea for the desktop - to put just
accessed pages to end of LRU. However I'm still afraid it vastly
changes the caching behavior and wont work well as expected in server
workloads - shall we verify this?

Back to this big-cp-hurts-responsibility issue. Background write
requests can easily pass the io scheduler's obstacles and fill up
the disk queue. Now every read request will have to wait 10+ writes
- leading to 10x slow down of major page faults.

I reach this conclusion based on recent CFQ code reviews. Will bring up
a queue depth limiting patch for more exercises..

Thanks,
Fengguang

Wu Fengguang

unread,

Apr 28, 2009, 5:30:14 AM4/28/09

to

Sorry - just realized that Elladan's root fs lies in sda - the read side.

Then why shall a single read stream to cause 2000ms major fault delays?
The 'await' value for sda is <10ms, not even close to 2000ms:

> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
> sda 67.70 0.00 373.10 0.20 48.47 0.00 265.90 1.94 5.21 2.10 78.32
> sdb 0.00 1889.60 0.00 139.80 0.00 52.52 769.34 35.01 250.45 5.17 72.28
> ---
> sda 5.30 0.00 483.80 0.30 60.65 0.00 256.59 1.59 3.28 1.65 79.72
> sdb 0.00 3632.70 0.00 171.10 0.00 61.10 731.39 117.09 709.66 5.84 100.00
> ---
> sda 51.20 0.00 478.10 1.00 65.79 0.01 281.27 2.48 5.18 1.96 93.72
> sdb 0.00 2104.60 0.00 174.80 0.00 62.84 736.28 108.50 613.64 5.72 100.00
> --
> sda 153.20 0.00 349.40 0.20 60.99 0.00 357.30 4.47 13.19 2.85 99.80
> sdb 0.00 1766.50 0.00 158.60 0.00 59.89 773.34 110.07 672.25 6.30 99.96

Theodore Tso

unread,

Apr 28, 2009, 8:10:12 AM4/28/09

to

On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:

> The semi-drop-behind is a great idea for the desktop - to put just
> accessed pages to end of LRU. However I'm still afraid it vastly
> changes the caching behavior and wont work well as expected in server
> workloads - shall we verify this?
>
> Back to this big-cp-hurts-responsibility issue. Background write
> requests can easily pass the io scheduler's obstacles and fill up
> the disk queue. Now every read request will have to wait 10+ writes
> - leading to 10x slow down of major page faults.
>
> I reach this conclusion based on recent CFQ code reviews. Will bring up
> a queue depth limiting patch for more exercises..

We can muck with the I/O scheduler, but another thing to consider is
whether the VM should be more aggressively throttling writes in this
case; it sounds like the big cp in this case may be dirtying pages so
aggressively that it's driving other (more useful) pages out of the
page cache --- if the target disk is slower than the source disk (for
example, backing up a SATA primary disk to a USB-attached backup disk)
no amount of drop-behind is going to help the situation.

So that leaves three areas for exploration:

* Write-throttling
* Drop-behind
* background writes pushing aside foreground reads

Hmm, note that although the original bug reporter is running Ubuntu
Jaunty, and hence 2.6.28, this problem is going to get *worse* with
2.6.30, since we have the ext3 data=ordered latency fixes which will
write out the any journal activity, and worse, any synchornous commits
(i.e., caused by fsync) will force out all of the dirty pages with
WRITE_SYNC priority. So with a heavy load, I suspect this is going to
be more of a VM issue, and especially figuring out how to tune more
aggressive write-throttling may be key here.

- Ted

Rik van Riel

unread,

Apr 28, 2009, 11:40:06 AM4/28/09

to

KOSAKI Motohiro wrote:

>> Next, I set the following:
>>
>> echo 0 > /proc/sys/vm/swappiness
>>
>> ... hoping it would prevent paging out of the UI in favor of file data that's
>> only used once. It did appear to help to a small degree, but not much. The
>> system is still effectively unusable while a file copy is going on.
>>
>> From this, I diagnosed that most likely, the kernel was paging out all my
>> application file mmap() data (such as my executables and shared libraries) in
>> favor of total garbage VM load from the file copy.

I believe your analysis is correct.

When merging the split LRU code upstream, some code was changed
(for scalability reasons) that results in active file pages being
moved to the inactive list any time we evict inactive file pages.

Even if the active file pages are referenced, they are not
protected from the streaming IO.

However, the use-once policy in the VM depends on the active
pages being protected from streaming IO.

A little before the decision to no longer honor the referenced
bit on active file pages was made, we dropped an ugly patch (by
me) after deciding it was just too much of a hack. However, now
that we have _no_ protection for active file pages against large
amounts of streaming IO, we may want to reinstate something like
it. Hopefully in a prettier way...

The old patch is attached for inspiration, discussion and maybe
testing :)

--
All rights reversed.

evict-cache-first.patch

Rik van Riel

unread,

Apr 28, 2009, 7:30:07 PM4/28/09

to

When the file LRU lists are dominated by streaming IO pages,
evict those pages first, before considering evicting other
pages.

This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:
1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted

The pages freed in this way can either be reused for streaming
IO, or allocated for something else. If the pages are used for
streaming IO, this pageout pattern continues. Otherwise, we will
fall back to the normal pageout pattern.

Signed-off-by: Rik van Riel <ri...@redhat.com>

---
Elladan, does this patch fix the issue you are seeing?

Peter, Kosaki, Ted, does this patch look good to you?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eac9577..4c0304e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1489,6 +1489,21 @@ static void shrink_zone(int priority, struct zone *zone,
nr[l] = scan;
}

+ /*
+ * When the system is doing streaming IO, memory pressure here
+ * ensures that active file pages get deactivated, until more
+ * than half of the file pages are on the inactive list.
+ *
+ * Once we get to that situation, protect the system's working
+ * set from being evicted by disabling active file page aging
+ * and swapping of swap backed pages. We still do background
+ * aging of anonymous pages.
+ */
+ if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]) {
+ nr[LRU_ACTIVE_FILE] = 0;
+ nr[LRU_INACTIVE_ANON] = 0;
+ }
+
while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
nr[LRU_INACTIVE_FILE]) {
for_each_evictable_lru(l) {

Elladan

unread,

Apr 28, 2009, 11:40:07 PM4/28/09

to

Rik,

This patch appears to significantly improve application latency while a large
file copy runs. I'm not seeing behavior that implies continuous bad page
replacement.

I'm still seeing some general lag, which I attribute to general filesystem
slowness. For example, latencytop sees many events like these:

down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 1475.8 msec 5.9 %

xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags 1740.9 msec 2.6 %

Writing a page to disk 1042.9 msec 43.7 %

It also occasionally sees long page faults:

Page fault 2068.3 msec 21.3 %

I guess XFS (and the elevator) is just doing a poor job managing latency
(particularly poor since all the IO on /usr/bin is on the reader disk).
Notable:

Creating block layer request 451.4 msec 14.4 %

Thank you,
Elladan

KOSAKI Motohiro

unread,

Apr 29, 2009, 2:00:19 AM4/29/09

to

Hi

firstly, I'd like to report my reproduce test result.

test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness,
CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
CPU opteronx4, mem 4G

mouse move lag: not happend
window move lag: not happend
Mapped page decrease rapidly: not happend (I guess, these page stay in
active list on my system)
page fault large latency: happend (latencytop display >200ms)

Then, I don't doubt vm replacement logic now.
but I need more investigate.
I plan to try following thing today and tommorow.

- XFS
- LVM
- another io scheduler (thanks Ted, good view point)
- Rik's new patch

Andrew Morton

unread,

Apr 29, 2009, 2:50:07 AM4/29/09

to

hm. The last two observations appear to be inconsistent.

Elladan, have you checked to see whether the Mapped: number in
/proc/meminfo is decreasing?

>
> Then, I don't doubt vm replacement logic now.
> but I need more investigate.
> I plan to try following thing today and tommorow.
>
> - XFS
> - LVM
> - another io scheduler (thanks Ted, good view point)
> - Rik's new patch

It's not clear that we know what's happening yet, is it? It's such a
gross problem that you'd think that even our testing would have found
it by now :(

Elladan, do you know if earlier kernels (2.6.26 or thereabouts) had
this severe a problem?

(notes that we _still_ haven't unbusted prev_priority)

Peter Zijlstra

unread,

Apr 29, 2009, 2:50:10 AM4/29/09

to

On Tue, 2009-04-28 at 19:29 -0400, Rik van Riel wrote:

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eac9577..4c0304e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1489,6 +1489,21 @@ static void shrink_zone(int priority, struct zone *zone,
> nr[l] = scan;
> }
>
> + /*
> + * When the system is doing streaming IO, memory pressure here
> + * ensures that active file pages get deactivated, until more
> + * than half of the file pages are on the inactive list.
> + *
> + * Once we get to that situation, protect the system's working
> + * set from being evicted by disabling active file page aging
> + * and swapping of swap backed pages. We still do background
> + * aging of anonymous pages.
> + */
> + if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]) {
> + nr[LRU_ACTIVE_FILE] = 0;
> + nr[LRU_INACTIVE_ANON] = 0;
> + }
> +

Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
shrinking INACTIVE_ANON even though it makes sense to.

KOSAKI Motohiro

unread,

Apr 29, 2009, 4:00:16 AM4/29/09

to

>> Mapped page decrease rapidly: not happend (I guess, these page stay in
>> � � � � � � � � � � � � � � � � � � � � � active list on my system)
>> page fault large latency: � � happend (latencytop display >200ms)
>
> hm. �The last two observations appear to be inconsistent.

it mean existing process don't slow down. but new process creation is very slow.

> Elladan, have you checked to see whether the Mapped: number in
> /proc/meminfo is decreasing?
>
>>
>> Then, I don't doubt vm replacement logic now.
>> but I need more investigate.
>> I plan to try following thing today and tommorow.
>>
>> �- XFS
>> �- LVM
>> �- another io scheduler (thanks Ted, good view point)
>> �- Rik's new patch
>
> It's not clear that we know what's happening yet, is it? �It's such a
> gross problem that you'd think that even our testing would have found
> it by now :(

Yes, unclear. but various testing can drill down the reason, I think.

KOSAKI Motohiro

unread,

Apr 29, 2009, 4:00:19 AM4/29/09

to

one mistake

> mouse move lag: � � � � � � � not happend
> window move lag: � � � � � � �not happend
> Mapped page decrease rapidly: not happend (I guess, these page stay in
> � � � � � � � � � � � � � � � � � � � � �active list on my system)
> page fault large latency: � � happend (latencytop display >200ms)

^^^^^^^^^

>1200ms

sorry.

Rik van Riel

unread,

Apr 29, 2009, 9:40:13 AM4/29/09

to

Only temporarily, until the number of active file pages
is larger than the number of inactive ones.

Think of it as reducing the frequency of shrinking anonymous
pages while the system is near the threshold.

--
All rights reversed.

Rik van Riel

unread,

Apr 29, 2009, 11:50:11 AM4/29/09

to

When the file LRU lists are dominated by streaming IO pages,
evict those pages first, before considering evicting other
pages.

This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:
1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted

The pages freed in this way can either be reused for streaming
IO, or allocated for something else. If the pages are used for
streaming IO, this pageout pattern continues. Otherwise, we will
fall back to the normal pageout pattern.

Signed-off-by: Rik van Riel <ri...@redhat.com>
---

On Wed, 29 Apr 2009 08:42:29 +0200
Peter Zijlstra <pet...@infradead.org> wrote:

> Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> shrinking INACTIVE_ANON even though it makes sense to.

Peter, after looking at this again, I believe that the get_scan_ratio
logic should take care of protecting the anonymous pages, so we can
get away with this following, less intrusive patch.

Elladan, does this smaller patch still work as expected?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eac9577..4471dcb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1489,6 +1489,18 @@ static void shrink_zone(int priority, struct zone *zone,

nr[l] = scan;
}

+ /*
+ * When the system is doing streaming IO, memory pressure here
+ * ensures that active file pages get deactivated, until more
+ * than half of the file pages are on the inactive list.
+ *
+ * Once we get to that situation, protect the system's working

+ * set from being evicted by disabling active file page aging.
+ * The logic in get_scan_ratio protects anonymous pages.

+ */
+ if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE])
+ nr[LRU_ACTIVE_FILE] = 0;
+

while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
nr[LRU_INACTIVE_FILE]) {
for_each_evictable_lru(l) {

--

KOSAKI Motohiro

unread,

Apr 29, 2009, 12:10:09 PM4/29/09

to

Hi

Looks good than previous version. but I have one question.

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eac9577..4471dcb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1489,6 +1489,18 @@ static void shrink_zone(int priority, struct zone *zone,
> � � � � � � � � � � � �nr[l] = scan;
> � � � �}
>
> + � � � /*
> + � � � �* When the system is doing streaming IO, memory pressure here
> + � � � �* ensures that active file pages get deactivated, until more
> + � � � �* than half of the file pages are on the inactive list.
> + � � � �*
> + � � � �* Once we get to that situation, protect the system's working
> + � � � �* set from being evicted by disabling active file page aging.
> + � � � �* The logic in get_scan_ratio protects anonymous pages.
> + � � � �*/
> + � � � if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE])
> + � � � � � � � nr[LRU_ACTIVE_FILE] = 0;
> +
> � � � �while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
> � � � � � � � � � � � � � � � � � � � �nr[LRU_INACTIVE_FILE]) {
> � � � � � � � �for_each_evictable_lru(l) {

we handle active_anon vs inactive_anon ratio by shrink_list().
Why do you insert this logic insert shrink_zone() ?

Peter Zijlstra

unread,

Apr 29, 2009, 12:20:09 PM4/29/09

to

On Wed, 2009-04-29 at 11:47 -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
>
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
>
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
>
> Signed-off-by: Rik van Riel <ri...@redhat.com>
> ---
> On Wed, 29 Apr 2009 08:42:29 +0200
> Peter Zijlstra <pet...@infradead.org> wrote:
>
> > Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> > shrinking INACTIVE_ANON even though it makes sense to.
>
> Peter, after looking at this again, I believe that the get_scan_ratio
> logic should take care of protecting the anonymous pages, so we can
> get away with this following, less intrusive patch.
>
> Elladan, does this smaller patch still work as expected?

Provided of course that it actually fixes Elladan's issue, this looks
good to me.

Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>

Rik van Riel

unread,

Apr 29, 2009, 12:20:12 PM4/29/09

to

Good question. I guess that at lower priority levels, we get to scan
a lot more pages and we could go from having too many inactive
file pages to not having enough in one invocation of shrink_zone().

That makes shrink_list() the better place to implement this, even if
it means doing this comparison more often.

I'll send a new patch this afternoon.

Christoph Hellwig

unread,

Apr 29, 2009, 1:10:12 PM4/29/09

to

On Tue, Apr 28, 2009 at 08:36:51PM -0700, Elladan wrote:
> Rik,
>
> This patch appears to significantly improve application latency while a large
> file copy runs. I'm not seeing behavior that implies continuous bad page
> replacement.
>
> I'm still seeing some general lag, which I attribute to general filesystem
> slowness. For example, latencytop sees many events like these:
>
> down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 1475.8 msec 5.9 %

This actually is contention on the buffer lock, and most likely
happens because it's trying to access a buffer that's beeing read
in currently.

>
> xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags 1740.9 msec 2.6 %

That's an actual metadata read.

> Writing a page to disk 1042.9 msec 43.7 %
>
> It also occasionally sees long page faults:
>
> Page fault 2068.3 msec 21.3 %
>
> I guess XFS (and the elevator) is just doing a poor job managing latency
> (particularly poor since all the IO on /usr/bin is on the reader disk).

The filesystem doesn't really decide which priorities to use, except
for some use of the WRITE_SYNC which is used rather minimall in XFS in
2.6.28.

> Creating block layer request 451.4 msec 14.4 %

I guess that a wait in get_request because we're above nr_requests..

Rik van Riel

unread,

Apr 29, 2009, 1:20:11 PM4/29/09

to

When the file LRU lists are dominated by streaming IO pages,
evict those pages first, before considering evicting other
pages.

This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:
1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted

The pages freed in this way can either be reused for streaming
IO, or allocated for something else. If the pages are used for
streaming IO, this pageout pattern continues. Otherwise, we will
fall back to the normal pageout pattern.

Signed-off-by: Rik van Riel <ri...@redhat.com>

---

On Thu, 30 Apr 2009 01:07:51 +0900
KOSAKI Motohiro <kosaki....@jp.fujitsu.com> wrote:

> we handle active_anon vs inactive_anon ratio by shrink_list().
> Why do you insert this logic insert shrink_zone() ?

Kosaki, this implementation mirrors the anon side of things precisely.
Does this look good?

Elladan, this patch should work just like the second version. Please
let me know how it works for you.

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a9e3b76..dbfe7ba 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -94,6 +94,7 @@ extern void mem_cgroup_note_reclaim_priority(struct mem_cgroup *mem,
extern void mem_cgroup_record_reclaim_priority(struct mem_cgroup *mem,
int priority);
int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg);
+int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg);
unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
struct zone *zone,
enum lru_list lru);
@@ -239,6 +240,12 @@ mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg)
return 1;
}

+static inline int
+mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
+{
+ return 1;
+}
+
static inline unsigned long
mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg, struct zone *zone,
enum lru_list lru)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e44fb0f..026cb5a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -578,6 +578,17 @@ int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg)
return 0;
}

+int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
+{
+ unsigned long active;
+ unsigned long inactive;
+
+ inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
+ active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
+
+ return (active > inactive);
+}
+
unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
struct zone *zone,
enum lru_list lru)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eac9577..a73f675 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1348,12 +1348,48 @@ static int inactive_anon_is_low(struct zone *zone, struct scan_control *sc)
return low;
}

+static int inactive_file_is_low_global(struct zone *zone)
+{
+ unsigned long active, inactive;
+
+ active = zone_page_state(zone, NR_ACTIVE_FILE);
+ inactive = zone_page_state(zone, NR_INACTIVE_FILE);
+
+ return (active > inactive);
+}
+
+/**
+ * inactive_file_is_low - check if file pages need to be deactivated
+ * @zone: zone to check
+ * @sc: scan control of this context
+ *

+ * When the system is doing streaming IO, memory pressure here
+ * ensures that active file pages get deactivated, until more
+ * than half of the file pages are on the inactive list.
+ *
+ * Once we get to that situation, protect the system's working
+ * set from being evicted by disabling active file page aging.
+ *

+ * This uses a different ratio than the anonymous pages, because
+ * the page cache uses a use-once replacement algorithm.
+ */
+static int inactive_file_is_low(struct zone *zone, struct scan_control *sc)
+{
+ int low;
+
+ if (scanning_global_lru(sc))
+ low = inactive_file_is_low_global(zone);
+ else
+ low = mem_cgroup_inactive_file_is_low(sc->mem_cgroup);
+ return low;
+}
+
static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
struct zone *zone, struct scan_control *sc, int priority)
{
int file = is_file_lru(lru);

- if (lru == LRU_ACTIVE_FILE) {
+ if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
shrink_active_list(nr_to_scan, zone, sc, priority, file);
return 0;

KOSAKI Motohiro

unread,

Apr 29, 2009, 8:50:05 PM4/29/09

to

> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
>
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
>
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
>
> Signed-off-by: Rik van Riel <ri...@redhat.com>
>
> ---
> On Thu, 30 Apr 2009 01:07:51 +0900
> KOSAKI Motohiro <kosaki....@jp.fujitsu.com> wrote:
>
> > we handle active_anon vs inactive_anon ratio by shrink_list().
> > Why do you insert this logic insert shrink_zone() ?
>
> Kosaki, this implementation mirrors the anon side of things precisely.
> Does this look good?
>
> Elladan, this patch should work just like the second version. Please
> let me know how it works for you.

Looks good to me. thanks.
but I don't hit Rik's explained issue, I hope Elladan report his test result.

Elladan

unread,

Apr 30, 2009, 12:20:07 AM4/30/09

to

Yes, Mapped decreases while a large file copy is ongoing. It increases again
if I use the GUI.

> > Then, I don't doubt vm replacement logic now.
> > but I need more investigate.
> > I plan to try following thing today and tommorow.
> >
> > - XFS
> > - LVM
> > - another io scheduler (thanks Ted, good view point)
> > - Rik's new patch
>
> It's not clear that we know what's happening yet, is it? It's such a
> gross problem that you'd think that even our testing would have found
> it by now :(
>
> Elladan, do you know if earlier kernels (2.6.26 or thereabouts) had
> this severe a problem?

No, I don't know about older kernels.

Also, just to add a bit: I'm having some difficulty reproducing the extremely
severe latency I was seeing right off. It's not difficult for me to reproduce
latencies that are painful, but not on the order of 10 second response. Maybe
3 or 4 seconds at most. I didn't have a stopwatch handy originally though, so
it's somewhat subjective, but I wonder if there's some element of the load that
I'm missing.

I had a theory about why this might be: my original repro was copying data
which I believe had been written once, but never read. Plus, I was using
relatime. However, on second thought this doesn't work -- there's only 8000
files, and a re-test with atime turned on isn't much different than with
relatime.

The other possibility is that there was some other background IO load spike,
which I didn't notice at the time. I don't know what that would be though,
unless it was one of gnome's indexing jobs (I didn't see one, though).

-Elladan

Andrew Morton

unread,

Apr 30, 2009, 12:50:08 AM4/30/09

to

On Wed, 29 Apr 2009 21:14:39 -0700 Elladan <ell...@eskimo.com> wrote:

> > Elladan, have you checked to see whether the Mapped: number in
> > /proc/meminfo is decreasing?
>
> Yes, Mapped decreases while a large file copy is ongoing. It increases again
> if I use the GUI.

OK. If that's still happening to an appreciable extent after you've
increased /proc/sys/vm/swappiness then I'd wager that we have a
bug/regression in that area.

Local variable `scan' in shrink_zone() is vulnerable to multiplicative
overflows on large zones, but I doubt if you have enough memory to
trigger that bug.

From: Andrew Morton <ak...@linux-foundation.org>

Local variable `scan' can overflow on zones which are larger than

(2G * 4k) / 100 = 80GB.

Making it 64-bit on 64-bit will fix that up.

Cc: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
Cc: Wu Fengguang <fenggu...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Rik van Riel <ri...@redhat.com>
Cc: Lee Schermerhorn <lee.sche...@hp.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
---

mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone mm/vmscan.c
--- a/mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone
+++ a/mm/vmscan.c
@@ -1479,7 +1479,7 @@ static void shrink_zone(int priority, st

for_each_evictable_lru(l) {
int file = is_file_lru(l);
- int scan;
+ unsigned long scan;

scan = zone_nr_pages(zone, sc, l);
if (priority) {
_

Elladan

unread,

Apr 30, 2009, 1:00:14 AM4/30/09

to

On Wed, Apr 29, 2009 at 09:43:32PM -0700, Andrew Morton wrote:
> On Wed, 29 Apr 2009 21:14:39 -0700 Elladan <ell...@eskimo.com> wrote:
>
> > > Elladan, have you checked to see whether the Mapped: number in
> > > /proc/meminfo is decreasing?
> >
> > Yes, Mapped decreases while a large file copy is ongoing. It increases again
> > if I use the GUI.
>
> OK. If that's still happening to an appreciable extent after you've
> increased /proc/sys/vm/swappiness then I'd wager that we have a
> bug/regression in that area.
>
> Local variable `scan' in shrink_zone() is vulnerable to multiplicative
> overflows on large zones, but I doubt if you have enough memory to
> trigger that bug.

No, I only have 4GB.

This appears to happen with swappiness set to 0 or 60.

-Elladan

KOSAKI Motohiro

unread,

Apr 30, 2009, 1:00:10 AM4/30/09

to

> On Wed, 29 Apr 2009 21:14:39 -0700 Elladan <ell...@eskimo.com> wrote:
>
> > > Elladan, have you checked to see whether the Mapped: number in
> > > /proc/meminfo is decreasing?
> >
> > Yes, Mapped decreases while a large file copy is ongoing. It increases again
> > if I use the GUI.
>
> OK. If that's still happening to an appreciable extent after you've
> increased /proc/sys/vm/swappiness then I'd wager that we have a
> bug/regression in that area.
>
> Local variable `scan' in shrink_zone() is vulnerable to multiplicative
> overflows on large zones, but I doubt if you have enough memory to
> trigger that bug.
>
>
> From: Andrew Morton <ak...@linux-foundation.org>
>
> Local variable `scan' can overflow on zones which are larger than
>
> (2G * 4k) / 100 = 80GB.
>
> Making it 64-bit on 64-bit will fix that up.

Agghh, thanks bugfix.

Note: His meminfo indicate his machine has 3.5GB ram. then this
patch don't fix his problem.

Elladan

unread,

Apr 30, 2009, 3:30:14 AM4/30/09

to

On Wed, Apr 29, 2009 at 11:47:08AM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
>
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
>
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
>
> Signed-off-by: Rik van Riel <ri...@redhat.com>
> ---
> On Wed, 29 Apr 2009 08:42:29 +0200
> Peter Zijlstra <pet...@infradead.org> wrote:
>
> > Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> > shrinking INACTIVE_ANON even though it makes sense to.
>
> Peter, after looking at this again, I believe that the get_scan_ratio
> logic should take care of protecting the anonymous pages, so we can
> get away with this following, less intrusive patch.
>
> Elladan, does this smaller patch still work as expected?

Rik, since the third patch doesn't work on 2.6.28 (without disabling a lot of
code), I went ahead and tested this patch.

The system does seem relatively responsive with this patch for the most part,
with occasional lag. I don't see much evidence at least over the course of a
few minutes that it pages out applications significantly. It seems about
equivalent to the first patch.

Given Andrew Morton's request that I track the Mapped: field in /proc/meminfo,
I went ahead and did that with this patch built into a kernel. Compared to the
standard Ubuntu kernel, this patch keeps significantly more Mapped memory
around, and it shrinks at a slower rate after the test runs for a while.
Eventually, it seems to reach a steady state.

For example, with your patch, Mapped will often go for 30 seconds without
changing significantly. Without your patch, it continuously lost about
500-1000K every 5 seconds, and then jumped up again significantly when I
touched Firefox or other applications. I do see some of that behavior with
your patch too, but it's much less significant.

When I first initiated the background load, Mapped did rapidly decrease from
about 85000K to 47000K. It seems to have reached a fairly steady state since
then. I would guess this implies that the VM paged out parts of my executable
set that aren't touched very often, but isn't applying further pressure to my
active pages? Also for example, after letting the test run for a while, I
scrolled around some tabs in firefox I hadn't used since the test began, and
experienced significant lag.

This seems ok (not disastrous, anyway). I suspect desktop users would
generally prefer the VM were extremely aggressive about keeping their
executables paged in though, much moreso than this patch provides (and note how
popular swappiness=0 seems to be). Paging applications back in seems to
introduce a large amount of UI latency, even if the VM keeps it to a sane level
as with this patch. Also, I don't see many desktop workloads where paging out
applications to grow the data cache is ever helpful -- practically all desktop
workloads where you get a lot of IO involve streaming, not data that might
possibly fit in ram. If I'm just copying a bunch of files around, I'd prefer
that even "worthless" pages such as eg. parts of Firefox that are only used
during load time or during rare config requests (and would thus not appear to
be part of my working set short-term) stay in cache, so I can get the maximum
interactive performance from my application.

Thank you,
Elladan

Johannes Weiner

unread,

Apr 30, 2009, 4:20:10 AM4/30/09

to

On Wed, Apr 29, 2009 at 01:14:36PM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
>
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
>
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
>
> Signed-off-by: Rik van Riel <ri...@redhat.com>

Although Elladan didn't test this exact patch, he reported on v2 that
the general idea of scanning active files only when they exceed the
inactive set works.

Acked-by: Johannes Weiner <han...@cmpxchg.org>

KOSAKI Motohiro

unread,

Apr 30, 2009, 8:10:16 AM4/30/09

to

> test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness,
> � � � � � � � � �CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
> � � � � � � � � �CPU opteronx4, mem 4G
>
> mouse move lag: � � � � � � � not happend
> window move lag: � � � � � � �not happend
> Mapped page decrease rapidly: not happend (I guess, these page stay in
> � � � � � � � � � � � � � � � � � � � � �active list on my system)

> page fault large latency: � � happend (latencytop display >1200ms)

>
>
> Then, I don't doubt vm replacement logic now.
> but I need more investigate.
> I plan to try following thing today and tommorow.
>
> �- XFS
> �- LVM
> �- another io scheduler (thanks Ted, good view point)
> �- Rik's new patch

hm, AS io-scheduler don't make such large latency on my environment.
Elladan, Can you try to AS scheduler? (adding boot option "elevator=as")

Rik van Riel

unread,

Apr 30, 2009, 9:10:10 AM4/30/09

to

Elladan wrote:

>> Elladan, does this smaller patch still work as expected?

> The system does seem relatively responsive with this patch for the most part,

> with occasional lag. I don't see much evidence at least over the course of a
> few minutes that it pages out applications significantly. It seems about
> equivalent to the first patch.

OK, good to hear that.

> This seems ok (not disastrous, anyway). I suspect desktop users would
> generally prefer the VM were extremely aggressive about keeping their
> executables paged in though,

I agree that desktop users would probably prefer something even
more aggressive. However, we do need to balance this against
other workloads, where inactive file pages need to be given a
fair chance to be referenced twice and promoted to the active
file list.

Because of that, I have chosen a patch with a minimal risk of
regressions on any workload.

--
All rights reversed.

Elladan

unread,

Apr 30, 2009, 9:50:11 AM4/30/09

to

On Thu, Apr 30, 2009 at 08:59:59PM +0900, KOSAKI Motohiro wrote:
> > test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness,
> > � � � � � � � � �CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
> > � � � � � � � � �CPU opteronx4, mem 4G
> >
> > mouse move lag: � � � � � � � not happend
> > window move lag: � � � � � � �not happend
> > Mapped page decrease rapidly: not happend (I guess, these page stay in
> > � � � � � � � � � � � � � � � � � � � � �active list on my system)
> > page fault large latency: � � happend (latencytop display >1200ms)
> >
> >
> > Then, I don't doubt vm replacement logic now.
> > but I need more investigate.
> > I plan to try following thing today and tommorow.
> >
> > �- XFS
> > �- LVM
> > �- another io scheduler (thanks Ted, good view point)
> > �- Rik's new patch
>
> hm, AS io-scheduler don't make such large latency on my environment.
> Elladan, Can you try to AS scheduler? (adding boot option "elevator=as")

I switched at runtime with /sys/block/sd[ab]/queue/scheduler, using Rik's
second patch for page replacement. It was hard to tell if this made much
difference in latency, as reported by latencytop. Both schedulers sometimes
show outliers up to 1400msec or so, and the average latency looks like it may
be similar.

Thanks,
Elladan

Elladan

unread,

Apr 30, 2009, 10:10:18 AM4/30/09

to

On Thu, Apr 30, 2009 at 09:08:06AM -0400, Rik van Riel wrote:
> Elladan wrote:
>
>>> Elladan, does this smaller patch still work as expected?
>
>> The system does seem relatively responsive with this patch for the most part,
>> with occasional lag. I don't see much evidence at least over the course of a
>> few minutes that it pages out applications significantly. It seems about
>> equivalent to the first patch.
>
> OK, good to hear that.
>
>> This seems ok (not disastrous, anyway). I suspect desktop users would
>> generally prefer the VM were extremely aggressive about keeping their
>> executables paged in though,
>
> I agree that desktop users would probably prefer something even
> more aggressive. However, we do need to balance this against
> other workloads, where inactive file pages need to be given a
> fair chance to be referenced twice and promoted to the active
> file list.
>
> Because of that, I have chosen a patch with a minimal risk of
> regressions on any workload.

I agree, this seems to work well as a bugfix, for a general purpose system.

I'm just not sure that a general-purpose page replacement algorithm actually
serves most desktop users well. I remember using some kludges back in the
2.2/2.4 days to try to force eviction of application pages when my system was
low on ram on occasion, but for desktop use that naive VM actually seemed
to generally have fewer latency problems.

Plus, since hard disks haven't been improving in speed (except for the surge in
SSDs), but RAM and CPU have been increasing dramatically, any paging or
swapping activity just becomes more and more noticeable.

Thanks,
Elladan

Andrew Morton

unread,

Apr 30, 2009, 9:00:13 PM4/30/09

to

On Thu, 30 Apr 2009 00:20:58 -0700
Elladan <ell...@eskimo.com> wrote:

> > Elladan, does this smaller patch still work as expected?
>
> Rik, since the third patch doesn't work on 2.6.28 (without disabling a lot of
> code), I went ahead and tested this patch.
>
> The system does seem relatively responsive with this patch for the most part,
> with occasional lag. I don't see much evidence at least over the course of a
> few minutes that it pages out applications significantly. It seems about
> equivalent to the first patch.
>
> Given Andrew Morton's request that I track the Mapped: field in /proc/meminfo,
> I went ahead and did that with this patch built into a kernel. Compared to the
> standard Ubuntu kernel, this patch keeps significantly more Mapped memory
> around, and it shrinks at a slower rate after the test runs for a while.
> Eventually, it seems to reach a steady state.
>
> For example, with your patch, Mapped will often go for 30 seconds without
> changing significantly. Without your patch, it continuously lost about
> 500-1000K every 5 seconds, and then jumped up again significantly when I
> touched Firefox or other applications. I do see some of that behavior with
> your patch too, but it's much less significant.

Were you able to tell whether altering /proc/sys/vm/swappiness appropriately
regulated the rate at which the mapped page count decreased?

Thanks.

Rik van Riel

unread,

Apr 30, 2009, 9:10:07 PM4/30/09

to

On Thu, 30 Apr 2009 17:45:36 -0700
Andrew Morton <ak...@linux-foundation.org> wrote:

> Were you able to tell whether altering /proc/sys/vm/swappiness
> appropriately regulated the rate at which the mapped page count
> decreased?

That should not make a difference at all for mapped file
pages, after the change was merged that makes the VM ignores
the referenced bit of mapped active file pages.

Ever since the split LRU code was merged, all that the
swappiness controls is the aggressiveness of file vs
anonymous LRU scanning.

Currently the kernel has no effective code to protect the
page cache working set from streaming IO. Elladan's bug
report shows that we do need some kind of protection...

--
All rights reversed.

Andrew Morton

unread,

Apr 30, 2009, 9:20:07 PM4/30/09

to

On Thu, 30 Apr 2009 20:59:36 -0400
Rik van Riel <ri...@redhat.com> wrote:

> On Thu, 30 Apr 2009 17:45:36 -0700
> Andrew Morton <ak...@linux-foundation.org> wrote:
>
> > Were you able to tell whether altering /proc/sys/vm/swappiness
> > appropriately regulated the rate at which the mapped page count
> > decreased?
>
> That should not make a difference at all for mapped file
> pages, after the change was merged that makes the VM ignores
> the referenced bit of mapped active file pages.
>
> Ever since the split LRU code was merged, all that the
> swappiness controls is the aggressiveness of file vs
> anonymous LRU scanning.

Which would cause exactly the problem Elladan saw?

> Currently the kernel has no effective code to protect the
> page cache working set from streaming IO. Elladan's bug
> report shows that we do need some kind of protection...

Seems to me that reclaim should treat swapcache-backed mapped mages in
a similar fashion to file-backed mapped pages?

Rik van Riel

unread,

Apr 30, 2009, 10:00:15 PM4/30/09

to

On Thu, 30 Apr 2009 18:13:40 -0700
Andrew Morton <ak...@linux-foundation.org> wrote:

> On Thu, 30 Apr 2009 20:59:36 -0400
> Rik van Riel <ri...@redhat.com> wrote:
>
> > On Thu, 30 Apr 2009 17:45:36 -0700
> > Andrew Morton <ak...@linux-foundation.org> wrote:
> >
> > > Were you able to tell whether altering /proc/sys/vm/swappiness
> > > appropriately regulated the rate at which the mapped page count
> > > decreased?
> >
> > That should not make a difference at all for mapped file
> > pages, after the change was merged that makes the VM ignores
> > the referenced bit of mapped active file pages.
> >
> > Ever since the split LRU code was merged, all that the
> > swappiness controls is the aggressiveness of file vs
> > anonymous LRU scanning.
>
> Which would cause exactly the problem Elladan saw?

Yes. It was not noticable in the initial split LRU code,
but after we decided to ignore the referenced bit on active
file pages and deactivate pages regardless, it has gotten
exacerbated.

That change was very good for scalability, so we should not
undo it. However, we do need to put something in place to
protect the working set from streaming IO.

> > Currently the kernel has no effective code to protect the
> > page cache working set from streaming IO. Elladan's bug
> > report shows that we do need some kind of protection...
>
> Seems to me that reclaim should treat swapcache-backed mapped mages in
> a similar fashion to file-backed mapped pages?

Swapcache-backed pages are not on the same set of LRUs as
file-backed mapped pages.

Furthermore, there is no streaming IO on the anon LRUs like
there is on the file LRUs. Only the file LRUs need (and want)
use-once replacement, which means that we only need special
protection of the working set for file-backed pages.

When we implement working set protection, we might as well
do it for frequently accessed unmapped pages too. There is
no reason to restrict this protection to mapped pages.

--
All rights reversed.

Andrew Morton

unread,

Apr 30, 2009, 11:00:26 PM4/30/09

to

On Thu, 30 Apr 2009 21:50:34 -0400 Rik van Riel <ri...@redhat.com> wrote:

> > Which would cause exactly the problem Elladan saw?
>
> Yes. It was not noticable in the initial split LRU code,
> but after we decided to ignore the referenced bit on active
> file pages and deactivate pages regardless, it has gotten
> exacerbated.
>
> That change was very good for scalability, so we should not
> undo it. However, we do need to put something in place to
> protect the working set from streaming IO.
>
> > > Currently the kernel has no effective code to protect the
> > > page cache working set from streaming IO. Elladan's bug
> > > report shows that we do need some kind of protection...
> >
> > Seems to me that reclaim should treat swapcache-backed mapped mages in
> > a similar fashion to file-backed mapped pages?
>
> Swapcache-backed pages are not on the same set of LRUs as
> file-backed mapped pages.

yup.

> Furthermore, there is no streaming IO on the anon LRUs like
> there is on the file LRUs. Only the file LRUs need (and want)
> use-once replacement, which means that we only need special
> protection of the working set for file-backed pages.

OK.

> When we implement working set protection, we might as well
> do it for frequently accessed unmapped pages too. There is
> no reason to restrict this protection to mapped pages.

Well. Except for empirical observation, which tells us that biasing
reclaim to prefer to retain mapped memory produces a better result.

Elladan

unread,

Apr 30, 2009, 11:20:13 PM4/30/09

to

On Thu, Apr 30, 2009 at 05:45:36PM -0700, Andrew Morton wrote:
> On Thu, 30 Apr 2009 00:20:58 -0700
> Elladan <ell...@eskimo.com> wrote:
>
> > > Elladan, does this smaller patch still work as expected?
> >
> > Rik, since the third patch doesn't work on 2.6.28 (without disabling a lot of
> > code), I went ahead and tested this patch.
> >
> > The system does seem relatively responsive with this patch for the most part,
> > with occasional lag. I don't see much evidence at least over the course of a
> > few minutes that it pages out applications significantly. It seems about
> > equivalent to the first patch.
> >
> > Given Andrew Morton's request that I track the Mapped: field in /proc/meminfo,
> > I went ahead and did that with this patch built into a kernel. Compared to the
> > standard Ubuntu kernel, this patch keeps significantly more Mapped memory
> > around, and it shrinks at a slower rate after the test runs for a while.
> > Eventually, it seems to reach a steady state.
> >
> > For example, with your patch, Mapped will often go for 30 seconds without
> > changing significantly. Without your patch, it continuously lost about
> > 500-1000K every 5 seconds, and then jumped up again significantly when I
> > touched Firefox or other applications. I do see some of that behavior with
> > your patch too, but it's much less significant.
>
> Were you able to tell whether altering /proc/sys/vm/swappiness appropriately
> regulated the rate at which the mapped page count decreased?

I don't believe so. I tested with swappiness=0 and =60, and in each case the
mapped pages continued to decrease. I don't know at what rate though. If
you'd like more precise data, I can rerun the test with appropriate logging. I
admit my "Hey, latency is terrible and mapped pages is decreasing" testing is
somewhat unscientific.

I get the impression that VM regressions happen fairly regularly. Does anyone
have good unit tests for this? Is seems like a difficult problem, since it's
partly based on pattern and partly timing.

-J

Rik van Riel

unread,

May 1, 2009, 10:10:10 AM5/1/09

to

Andrew Morton wrote:

>> When we implement working set protection, we might as well
>> do it for frequently accessed unmapped pages too. There is
>> no reason to restrict this protection to mapped pages.
>
> Well. Except for empirical observation, which tells us that biasing
> reclaim to prefer to retain mapped memory produces a better result.

That used to be the case because file-backed and
swap-backed pages shared the same set of LRUs,
while each following a different page reclaim
heuristic!

Today:
1) file-backed and swap-backed pages are separated,
2) the majority of mapped pages are on the swap-backed LRUs
3) the accessed bit on active pages no longer means much,
for good scalability reasons, and
4) because of (3), we cannot really provide special treatment
to any individual page any more, however

This means we need to provide our working set protection
on a per-list basis, by tweaking the scan rate or avoiding
scanning of the active file list alltogether under certain
conditions.

As a side effect, this will help protect frequently accessed
file pages (good for ftp and nfs servers), indirect blocks,
inode buffers and other frequently used metadata.

--
All rights reversed.

Ray Lee

unread,

May 1, 2009, 2:10:10 PM5/1/09

to

Just an honest question: Who does #3 help? All normal linux users, or
large systems for some definition of large? (Helping large systems is
good; historically it eventually helps everyone. But the point I'm
driving at is that the minority of systems which tend to use one
kernel for a while and stick with it -- ie, embedded or large iron --
can and are tuned for specific workloads. The majority of systems that
upgrade the kernel frequently, such as desktop systems needing support
for new hardware, tend to rely more upon the kernel defaults.)

Also, not all the above items are equal from a latency point of view.
The latency impact of an inode needing to be fetched from disk is
budgeted for already in most userspace design. Opening a file can be
slow, news at 11. Try not to open as many files, solution at 11:01.

The latency impact of jumping to a different part of your own
executable, however, is something most userspace programmers likely
never think of. This hurts even more in this modern age of web
browsers, where firefox has to act as a layout engine, video player,
parser and compiler, etc. Not every web page uses every feature, which
means clicking a random URL can suddenly stop the whole shebang while
a previously-unreferenced page is swapped back in. With executables,
past usage doesn't presage future need.

Said a different way, executables are not equivalent to a random
collection of mapped pages. A collection of inodes may or may not have
any causal links between them. A collection of pages for an executable
are linked via function calls, and the compiler and linker already
took a first pass at evicting unnecessary baggage.

Said way #3: We desktop users really want a way to say "Please don't
page my executables out when I'm running a system with 3gig of RAM." I
hate knobs, but I'm willing to beg for one in this case. 'cause
mlock()ing my entire working set into RAM seems pretty silly.

Does any of that make sense, or am I talking out of an inappropriate orifice?

Rik van Riel

unread,

May 1, 2009, 3:40:11 PM5/1/09

to

Ray Lee wrote:

> Said way #3: We desktop users really want a way to say "Please don't
> page my executables out when I'm running a system with 3gig of RAM." I
> hate knobs, but I'm willing to beg for one in this case. 'cause
> mlock()ing my entire working set into RAM seems pretty silly.
>
> Does any of that make sense, or am I talking out of an inappropriate orifice?

The "don't page my executables out" part makes sense.

However, I believe that kind of behaviour should be the
default. Desktops and servers alike have a few different
kinds of data in the page cache:
1) pages that have been frequently accessed at some point
in the past and got promoted to the active list
2) streaming IO

I believe that we want to give (1) absolute protection from
(2), provided there are not too many pages on the active file
list. That way we will provide executables, cached indirect
and inode blocks, etc. from streaming IO.

Pages that are new to the page cache start on the inactive
list. Only if they get accessed twice while on that list,
they get promoted to the active list.

Streaming IO should normally be evicted from memory before
it can get accessed again. This means those pages do not
get promoted to the active list and the working set is
protected.

Does this make sense?

--
All rights reversed.

Andrew Morton

unread,

May 1, 2009, 3:50:06 PM5/1/09

to

On Fri, 01 May 2009 10:05:53 -0400

Rik van Riel <ri...@redhat.com> wrote:

> Andrew Morton wrote:
>
> >> When we implement working set protection, we might as well
> >> do it for frequently accessed unmapped pages too. There is
> >> no reason to restrict this protection to mapped pages.
> >
> > Well. Except for empirical observation, which tells us that biasing
> > reclaim to prefer to retain mapped memory produces a better result.
>
> That used to be the case because file-backed and
> swap-backed pages shared the same set of LRUs,
> while each following a different page reclaim
> heuristic!

No, I think it still _is_ the case. When reclaim is treating mapped
and non-mapped pages equally, the end result sucks. Applications get
all laggy and humans get irritated. It may be that the system was
optimised from an overall throughput POV, but the result was
*irritating*.

Which led us to prefer to retain mapped pages. This had nothing at all
to do with internal impementation details - it was a design objective
based upon empirical observation of system behaviour.

> Today:
> 1) file-backed and swap-backed pages are separated,
> 2) the majority of mapped pages are on the swap-backed LRUs
> 3) the accessed bit on active pages no longer means much,
> for good scalability reasons, and
> 4) because of (3), we cannot really provide special treatment
> to any individual page any more, however
>
> This means we need to provide our working set protection
> on a per-list basis, by tweaking the scan rate or avoiding
> scanning of the active file list alltogether under certain
> conditions.
>
> As a side effect, this will help protect frequently accessed
> file pages (good for ftp and nfs servers), indirect blocks,
> inode buffers and other frequently used metadata.

Yeah, but that's all internal-implementation-of-the-day details. It
just doesn't matter how the sausages are made. What we have learned is
that the policy of retaining mapped pages over unmapped pages, *all
other things being equal* leads to a more pleasing system.

Ray Lee

unread,

May 1, 2009, 3:50:08 PM5/1/09

to

Streaming IO should always be at the bottom of the list as it's nearly
always use-once. That's not the interesting case. (I'm glad you're
protecting everything from steaming IO, it's a good thing. And if it's
a media server and serving the same stream to many clients, if I
understood you correctly those streams will no longer be use-once, and
therefore be a normal citizen with the rest of the cache. That's great
too.)

The interesting case is an updatedb running in the background, paging
out firefox, or worse, parts of X. That sucks.

Rik van Riel

unread,

May 1, 2009, 4:10:13 PM5/1/09

to

Andrew Morton wrote:
> On Fri, 01 May 2009 10:05:53 -0400
> Rik van Riel <ri...@redhat.com> wrote:

>> This means we need to provide our working set protection
>> on a per-list basis, by tweaking the scan rate or avoiding
>> scanning of the active file list alltogether under certain
>> conditions.
>>
>> As a side effect, this will help protect frequently accessed
>> file pages (good for ftp and nfs servers), indirect blocks,
>> inode buffers and other frequently used metadata.
>
> Yeah, but that's all internal-implementation-of-the-day details. It
> just doesn't matter how the sausages are made. What we have learned is
> that the policy of retaining mapped pages over unmapped pages, *all
> other things being equal* leads to a more pleasing system.

Well, retaining mapped pages is one of the implementations
that lead to a more pleasing system.

I suspect that a fully scan resistant active file list will
show the same behaviour, as well as a few other desired
behaviours that come in very handy in various server loads.

Are you open to evaluating other methods that could lead, on
desktop systems, to a behaviour similar to the one achieved
by the preserve-mapped-pages mechanism?

--
All rights reversed.

Rik van Riel

unread,

May 1, 2009, 4:20:09 PM5/1/09

to

Ray Lee wrote:

> Streaming IO should always be at the bottom of the list as it's nearly
> always use-once. That's not the interesting case.

Unfortunately, on current 2.6.28 through 2.6.30 that is broken.

Streaming IO will eventually eat away all of the pages on the
active file list, causing the binaries and libraries that programs
used to be kicked out of memory.

Not interesting?

> The interesting case is an updatedb running in the background, paging
> out firefox, or worse, parts of X. That sucks.

This is a combination of use-once IO and VFS metadata.

The used-once pages can be reclaimed fairly easily.

The growing metadata needs to be addressed by putting pressure
on it via the slab/slub/slob shrinker functions.

--
All rights reversed.

Elladan

unread,

May 1, 2009, 4:30:11 PM5/1/09

to

I think this is a simplistic view of things.

Keep in mind that the goal of a VM is: "load each page before it's needed."
LRU, use-once heuristics, and the like are ways of trying to guess when a page
is needed and when it isn't, because you don't know the future.

For high throughput, treating all pages equally (or with some simple weighting)
is often appropriate, because it allows you to balance various sorts of working
sets dynamically.

But user interfaces are a realtime problem. When the user presses a button,
you have a deadline to respond before it's annoying, and another deadline
before the user will hit the power button. With this in mind, the user's
application UI has essentially infinite priority for memory -- it's either
paged into ram before the user presses a button, or you fail.

Very often, this is just a case of streaming IO vs. everything else, in which
case detecting streaming IO (because of the usage pattern) will help. That's a
pretty simple case. But imagine I start up a big compute job in the background
-- for example, I run a video encoder or something similar, and this program
touches the source data many times, such that it does not appear to be
"streaming" by a simple heuristic.

Particularly if I walk away from the computer, from any algorithm just based on
recent usage, this will appear to be the only thing worth doing at that time,
so the UI will be paged out. And of course, when I walk back to the computer
and press a button, the UI will not respond, and will have shocking latency
until I've touched every bit of it that I use again.

That's a bad outcome. User interactivity is a real-time problem, and your
deadline is less than 30 disk seeks.

Of course, if the bulk job completes dramatically faster with some extra
memory, then the alternative (pinning the entire UI ram) is also a bad outcome.
There's no perfect solution here, and I suspect a really functional system
ultimately needs all sorts of weird hints from the UI. Or alternatively, a
naive VM (which pins the UI), and enough RAM to keep the user and any bulk jobs
happy.

-Elladan

Andrew Morton

unread,

May 1, 2009, 5:00:17 PM5/1/09

to

On Fri, 01 May 2009 16:05:55 -0400
Rik van Riel <ri...@redhat.com> wrote:

> Are you open to evaluating other methods that could lead, on
> desktop systems, to a behaviour similar to the one achieved
> by the preserve-mapped-pages mechanism?

Well.. it's more a matter of retaining what we've learnt (unless we
feel it's wrong, or technilogy change broke it) and carefully listening
to and responding to what's happening in out-there land.

The number of problem reports we're seeing from the LRU changes is
pretty low. Hopefully that's because the number of problems _is_
low.

Given the low level of problem reports, the relative immaturity of the
code and our difficulty with determining what effect our changes will
have upon everyone, I'd have thought that sit-tight-and-wait-and-see
would be the prudent approach for the next few months.

otoh if you have a change and it proves good in your testing then sure,
sooner rather than later.

There, that was nice and waffly.

I still haven't forgotten prev_priority tho!

Rik van Riel

unread,

May 1, 2009, 5:50:15 PM5/1/09

to

Andrew Morton wrote:
> On Fri, 01 May 2009 16:05:55 -0400
> Rik van Riel <ri...@redhat.com> wrote:
>
>> Are you open to evaluating other methods that could lead, on
>> desktop systems, to a behaviour similar to the one achieved
>> by the preserve-mapped-pages mechanism?
>
> Well.. it's more a matter of retaining what we've learnt (unless we
> feel it's wrong, or technilogy change broke it) and carefully listening
> to and responding to what's happening in out-there land.

Treating mapped pages specially is a bad implementation,
because it does not scale. The reason is the same reason
we dropped "treat referenced active file pages special"
right before the split LRU code was merged by Linus.

Also, it does not help workloads that have a large number
of unmapped pages, where we want to protect the frequently
used ones from a giant stream of used-once pages. NFS and
FTP servers would be a typical example of this, but so
would a database server with postgres or mysql in a default
setup.

> The number of problem reports we're seeing from the LRU changes is
> pretty low. Hopefully that's because the number of problems _is_
> low.

I believe the number of problems is low. However, the
severity of this particular problem means that we'll
probably want to do something about it.

> Given the low level of problem reports, the relative immaturity of the
> code and our difficulty with determining what effect our changes will
> have upon everyone, I'd have thought that sit-tight-and-wait-and-see
> would be the prudent approach for the next few months.
>
> otoh if you have a change and it proves good in your testing then sure,
> sooner rather than later.

I believe the patch I submitted in this thread should fix
the problem. I have experimented with the patch before
and Elladan's results show that the situation is resolved
for him.

Furthermore, Peter and I believe the patch has a minimal
risk of side effects.

Of course, there may be better ideas yet. It would be
nice if people could try to shoot holes in the concept
of the patch - if anybody can even think of a way in
which it could break, we can try to come up with a way
of fixing it.

> I still haven't forgotten prev_priority tho!

The whole priority thing could be(come) a problem too,
with us scanning WAY too many pages at once in a gigantic
memory zone. Scanning a million pages at once will
probably lead to unacceptable latencies somewhere :)

--
All rights reversed.

Andrew Morton

unread,

May 1, 2009, 6:40:12 PM5/1/09

to

On Wed, 29 Apr 2009 13:14:36 -0400

Rik van Riel <ri...@redhat.com> wrote:

> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
>
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
>
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
>

> ..
>
> +int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
> +{
> + unsigned long active;
> + unsigned long inactive;
> +
> + inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
> + active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
> +
> + return (active > inactive);
> +}

This function could trivially be made significantly more efficient by
changing it to do a single pass over all the zones of all the nodes,
rather than two passes.

> static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
> struct zone *zone, struct scan_control *sc, int priority)
> {
> int file = is_file_lru(lru);
>
> - if (lru == LRU_ACTIVE_FILE) {
> + if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
> shrink_active_list(nr_to_scan, zone, sc, priority, file);
> return 0;
> }

And it does get called rather often.

Rik van Riel

unread,

May 1, 2009, 7:10:10 PM5/1/09

to

How would I do that in a clean way?

The function mem_cgroup_inactive_anon_is_low and
the global versions all do the same. It would be
nice to make all four of them go fast :)

If there is no standardized infrastructure for
getting multiple statistics yet, I can probably
whip something up.

>> static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>> struct zone *zone, struct scan_control *sc, int priority)
>> {
>> int file = is_file_lru(lru);
>>
>> - if (lru == LRU_ACTIVE_FILE) {
>> + if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
>> shrink_active_list(nr_to_scan, zone, sc, priority, file);
>> return 0;
>> }
>
> And it does get called rather often.

Same as inactive_anon_is_low.

Optimizing them might make sense if it turns out to
use a significant amount of CPU.

--
All rights reversed.

Andrew Morton

unread,

May 1, 2009, 7:30:14 PM5/1/09

to

On Fri, 01 May 2009 19:05:21 -0400

copy-n-paste :(

static unsigned long foo(struct mem_cgroup *mem,
enum lru_list idx1, enum lru_list idx2)
{
int nid, zid;
struct mem_cgroup_per_zone *mz;
u64 total = 0;

for_each_online_node(nid)
for (zid = 0; zid < MAX_NR_ZONES; zid++) {
mz = mem_cgroup_zoneinfo(mem, nid, zid);
total += MEM_CGROUP_ZSTAT(mz, idx1);
total += MEM_CGROUP_ZSTAT(mz, idx3);
}
return total;
}

dunno if that's justifiable.

> The function mem_cgroup_inactive_anon_is_low and
> the global versions all do the same. It would be
> nice to make all four of them go fast :)
>
> If there is no standardized infrastructure for
> getting multiple statistics yet, I can probably
> whip something up.

It depends how often it would be called for, I guess.

One approach would be pass in a variable-length array of `enum
lru_list's, get returned a same-lengthed array of totals.

Or perhaps all we need to return is the sum of those totals.

I'd let the memcg guys worry about this if I were you ;)

> Optimizing them might make sense if it turns out to
> use a significant amount of CPU.

Yeah. By then it's often too late though. The sort of people for whom
(num_online_nodes*MAX_NR_ZONES) is nuttily large tend not to run
kernel.org kernels.

Wu Fengguang

unread,

May 2, 2009, 9:20:06 PM5/2/09

to

On Wed, Apr 29, 2009 at 01:14:36PM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
>
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
>
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
>

> Signed-off-by: Rik van Riel <ri...@redhat.com>
>

[snip]
> +static int inactive_file_is_low_global(struct zone *zone)
> +{
> + unsigned long active, inactive;
> +
> + active = zone_page_state(zone, NR_ACTIVE_FILE);
> + inactive = zone_page_state(zone, NR_INACTIVE_FILE);

> +
> + return (active > inactive);
> +}

[snip]

> static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
> struct zone *zone, struct scan_control *sc, int priority)
> {
> int file = is_file_lru(lru);
>
> - if (lru == LRU_ACTIVE_FILE) {
> + if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
> shrink_active_list(nr_to_scan, zone, sc, priority, file);
> return 0;
> }

Acked-by: Wu Fengguang <fenggu...@intel.com>

I like this idea - it's simple and sound, and is expected to work well
for the majority workloads. Sure the arbitrary 1:1 active:inactive ratio
may be suboptimal for many workloads, but it is mostly safe.

In the worse scenario, it could waste half the memory that could
otherwise be used for readahead buffer and to prevent thrashing, in a
server serving large datasets that are hardly reused, but still slowly
builds up its active list during the long uptime (think about a slowly
performance downgrade that can be fixed by a crude dropcache action).

That said, the actual performance degradation could be much smaller -
say 15% - all memories are not equal.

Thanks,
Fengguang

Wu Fengguang

unread,

May 2, 2009, 9:30:13 PM5/2/09

to

Good point. We could add a flag that is tested frequently in shrink_list()
and updated less frequently in shrink_zone() (or whatever).

Rik van Riel

unread,

May 2, 2009, 9:40:11 PM5/2/09

to

On Sun, 3 May 2009 09:15:40 +0800
Wu Fengguang <fenggu...@intel.com> wrote:

> In the worse scenario, it could waste half the memory that could
> otherwise be used for readahead buffer and to prevent thrashing, in a
> server serving large datasets that are hardly reused, but still slowly
> builds up its active list during the long uptime (think about a slowly
> performance downgrade that can be fixed by a crude dropcache action).

In the best case, the active list ends up containing all the
indirect blocks for the files that are occasionally reused,
and the system ends up being able to serve its clients with
less disk IO.

For systems like ftp.kernel.org, the files that are most
popular will end up on the active list, without being kicked
out by the files that are less popular.

--
All rights reversed.

Wu Fengguang

unread,

May 2, 2009, 9:50:09 PM5/2/09

to

On Sun, May 03, 2009 at 09:33:56AM +0800, Rik van Riel wrote:
> On Sun, 3 May 2009 09:15:40 +0800
> Wu Fengguang <fenggu...@intel.com> wrote:
>
> > In the worse scenario, it could waste half the memory that could
> > otherwise be used for readahead buffer and to prevent thrashing, in a
> > server serving large datasets that are hardly reused, but still slowly
> > builds up its active list during the long uptime (think about a slowly
> > performance downgrade that can be fixed by a crude dropcache action).
>
> In the best case, the active list ends up containing all the
> indirect blocks for the files that are occasionally reused,
> and the system ends up being able to serve its clients with
> less disk IO.
>
> For systems like ftp.kernel.org, the files that are most
> popular will end up on the active list, without being kicked
> out by the files that are less popular.

Sure, such good cases tend to be prevalent - so obvious that I didn't
mind to mention ;-)

Wu Fengguang

unread,

May 2, 2009, 11:20:06 PM5/2/09

to

On Fri, May 01, 2009 at 12:35:41PM -0700, Andrew Morton wrote:
> On Fri, 01 May 2009 10:05:53 -0400
> Rik van Riel <ri...@redhat.com> wrote:
>
> > Andrew Morton wrote:
> >
> > >> When we implement working set protection, we might as well
> > >> do it for frequently accessed unmapped pages too. There is
> > >> no reason to restrict this protection to mapped pages.
> > >
> > > Well. Except for empirical observation, which tells us that biasing
> > > reclaim to prefer to retain mapped memory produces a better result.
> >
> > That used to be the case because file-backed and
> > swap-backed pages shared the same set of LRUs,
> > while each following a different page reclaim
> > heuristic!
>
> No, I think it still _is_ the case. When reclaim is treating mapped
> and non-mapped pages equally, the end result sucks. Applications get
> all laggy and humans get irritated. It may be that the system was
> optimised from an overall throughput POV, but the result was
> *irritating*.
>
> Which led us to prefer to retain mapped pages. This had nothing at all
> to do with internal impementation details - it was a design objective
> based upon empirical observation of system behaviour.

Heartily Agreed. We shall try hard to protect the running applications.

Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit check)
tries to address scalability problem when every page get mapped and
referenced, so that logic(which lowed the priority of mapped pages)
could be enabled only on conditions like (priority < DEF_PRIORITY).

Or preferably we can explicitly protect the mapped executables,
as illustrated by this patch (a quick prototype).

Thanks,
Fengguang
---
include/linux/pagemap.h | 1 +
mm/mmap.c | 2 ++
mm/nommu.c | 2 ++
mm/vmscan.c | 37 +++++++++++++++++++++++++++++++++++--
4 files changed, 40 insertions(+), 2 deletions(-)

--- linux.orig/include/linux/pagemap.h
+++ linux/include/linux/pagemap.h
@@ -25,6 +25,7 @@ enum mapping_flags {
#ifdef CONFIG_UNEVICTABLE_LRU
AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
#endif
+ AS_EXEC = __GFP_BITS_SHIFT + 4, /* mapped PROT_EXEC somewhere */
};

static inline void mapping_set_error(struct address_space *mapping, int error)
--- linux.orig/mm/mmap.c
+++ linux/mm/mmap.c
@@ -1198,6 +1198,8 @@ munmap_back:
goto unmap_and_free_vma;
if (vm_flags & VM_EXECUTABLE)
added_exe_file_vma(mm);
+ if (vm_flags & VM_EXEC)
+ set_bit(AS_EXEC, &file->f_mapping->flags);
} else if (vm_flags & VM_SHARED) {
error = shmem_zero_setup(vma);
if (error)
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1220,6 +1220,7 @@ static void shrink_active_list(unsigned
int pgdeactivate = 0;
unsigned long pgscanned;
LIST_HEAD(l_hold); /* The pages which were snipped off */
+ LIST_HEAD(l_active);
LIST_HEAD(l_inactive);
struct page *page;
struct pagevec pvec;
@@ -1259,8 +1260,15 @@ static void shrink_active_list(unsigned

/* page_referenced clears PageReferenced */
if (page_mapping_inuse(page) &&
- page_referenced(page, 0, sc->mem_cgroup))
+ page_referenced(page, 0, sc->mem_cgroup)) {
+ struct address_space *mapping = page_mapping(page);
+
pgmoved++;
+ if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
+ list_add(&page->lru, &l_active);
+ continue;
+ }
+ }

list_add(&page->lru, &l_inactive);
}
@@ -1269,7 +1277,6 @@ static void shrink_active_list(unsigned
* Move the pages to the [file or anon] inactive list.
*/
pagevec_init(&pvec, 1);
- lru = LRU_BASE + file * LRU_FILE;

spin_lock_irq(&zone->lru_lock);
/*
@@ -1281,6 +1288,7 @@ static void shrink_active_list(unsigned
reclaim_stat->recent_rotated[!!file] += pgmoved;

pgmoved = 0;
+ lru = LRU_BASE + file * LRU_FILE;
while (!list_empty(&l_inactive)) {
page = lru_to_page(&l_inactive);
prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1305,6 +1313,31 @@ static void shrink_active_list(unsigned
}
__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
pgdeactivate += pgmoved;
+
+ pgmoved = 0;
+ lru = LRU_ACTIVE + file * LRU_FILE;
+ while (!list_empty(&l_active)) {
+ page = lru_to_page(&l_active);
+ prefetchw_prev_lru_page(page, &l_active, flags);
+ VM_BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ VM_BUG_ON(!PageActive(page));
+
+ list_move(&page->lru, &zone->lru[lru].list);
+ mem_cgroup_add_lru_list(page, lru);
+ pgmoved++;
+ if (!pagevec_add(&pvec, page)) {
+ __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+ pgmoved = 0;
+ spin_unlock_irq(&zone->lru_lock);
+ if (buffer_heads_over_limit)
+ pagevec_strip(&pvec);
+ __pagevec_release(&pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+ }
+ __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
__count_zone_vm_events(PGREFILL, zone, pgscanned);
__count_vm_events(PGDEACTIVATE, pgdeactivate);
spin_unlock_irq(&zone->lru_lock);
--- linux.orig/mm/nommu.c
+++ linux/mm/nommu.c
@@ -1220,6 +1220,8 @@ unsigned long do_mmap_pgoff(struct file
added_exe_file_vma(current->mm);
vma->vm_mm = current->mm;
}
+ if (vm_flags & VM_EXEC)
+ set_bit(AS_EXEC, &file->f_mapping->flags);
}

down_write(&nommu_region_sem);

Rik van Riel

unread,

May 2, 2009, 11:30:11 PM5/2/09

to

On Sun, 3 May 2009 11:15:39 +0800
Wu Fengguang <fenggu...@intel.com> wrote:

> Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit
> check) tries to address scalability problem when every page get
> mapped and referenced, so that logic(which lowed the priority of
> mapped pages) could be enabled only on conditions like (priority <
> DEF_PRIORITY).
>
> Or preferably we can explicitly protect the mapped executables,
> as illustrated by this patch (a quick prototype).

Over time, given enough streaming IO and idle applications,
executables will still be evicted with just this patch.

However, a combination of your patch and mine might do the
trick. I suspect that executables are never a very big
part of memory, except on small memory systems, so protecting
just the mapped executables should not be a scalability
problem.

My patch in combination with your patch should make sure
that if something gets evicted from the active list, it's
not executables - meanwhile, lots of the time streaming
IO will completely leave the active file list alone.

--
All rights reversed.

Wu Fengguang

unread,

May 2, 2009, 11:50:05 PM5/2/09

to

On Sun, May 03, 2009 at 11:24:03AM +0800, Rik van Riel wrote:
> On Sun, 3 May 2009 11:15:39 +0800
> Wu Fengguang <fenggu...@intel.com> wrote:
>
> > Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit
> > check) tries to address scalability problem when every page get
> > mapped and referenced, so that logic(which lowed the priority of
> > mapped pages) could be enabled only on conditions like (priority <
> > DEF_PRIORITY).
> >
> > Or preferably we can explicitly protect the mapped executables,
> > as illustrated by this patch (a quick prototype).
>
> Over time, given enough streaming IO and idle applications,
> executables will still be evicted with just this patch.
>
> However, a combination of your patch and mine might do the
> trick. I suspect that executables are never a very big
> part of memory, except on small memory systems, so protecting
> just the mapped executables should not be a scalability
> problem.

Yes, that's my intent to take advantage of you patch :-)

There may be programs that embed large amount of static data with
them - think about self-decompression data - but that's fine: this
patch behaves not in a too persistent way. Plus we can apply a size
limit(say 100M) if necessary.

> My patch in combination with your patch should make sure
> that if something gets evicted from the active list, it's
> not executables - meanwhile, lots of the time streaming
> IO will completely leave the active file list alone.

They together make
- mapped executable pages the first class citizen;
- streaming IO least intrusive.

I think that would make most desktop users and server administrators
contented and comfortable :-)

Thanks,
Fengguang

Peter Zijlstra

unread,

May 4, 2009, 4:10:09 AM5/4/09

to

On Fri, 2009-05-01 at 12:35 -0700, Andrew Morton wrote:

> No, I think it still _is_ the case. When reclaim is treating mapped
> and non-mapped pages equally, the end result sucks. Applications get
> all laggy and humans get irritated. It may be that the system was
> optimised from an overall throughput POV, but the result was
> *irritating*.
>
> Which led us to prefer to retain mapped pages. This had nothing at all
> to do with internal impementation details - it was a design objective
> based upon empirical observation of system behaviour.

Shouldn't we make a distinction between PROT_EXEC and other mappings in
this? Because as soon as you're running an application that uses gobs
and gobs of mmap'ed memory, the mapped vs non-mapped thing breaks down.

Peter Zijlstra

unread,

May 4, 2009, 6:30:17 AM5/4/09

to

Ah, nice, this re-instates the young bit for PROT_EXEC pages.
I very much like this.

KOSAKI Motohiro

unread,

May 6, 2009, 7:10:13 AM5/6/09

to

> > test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness,
> > CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
> > CPU opteronx4, mem 4G
> >
> > mouse move lag: not happend
> > window move lag: not happend
> > Mapped page decrease rapidly: not happend (I guess, these page stay in
> > active list on my system)
> > page fault large latency: happend (latencytop display >1200ms)
> >
> >
> > Then, I don't doubt vm replacement logic now.
> > but I need more investigate.
> > I plan to try following thing today and tommorow.
> >
> > - XFS
> > - LVM
> > - another io scheduler (thanks Ted, good view point)
> > - Rik's new patch
>
> hm, AS io-scheduler don't make such large latency on my environment.
> Elladan, Can you try to AS scheduler? (adding boot option "elevator=as")

second test result:
read dev(sda): SSD, lvm+XFS
write dev(sdb): HDD, lvm+XFS

the result is the same of ext3 without lvm. Thus I think
XFS isn't guilty.

Wu Fengguang

unread,

May 7, 2009, 8:20:25 AM5/7/09

to

Introduce AS_EXEC to mark executables and their linked libraries, and to
protect their referenced active pages from being deactivated.

CC: Elladan <ell...@eskimo.com>
CC: Nick Piggin <npi...@suse.de>
CC: Johannes Weiner <han...@cmpxchg.org>
CC: Christoph Lameter <c...@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
Acked-by: Peter Zijlstra <pet...@infradead.org>
Acked-by: Rik van Riel <ri...@redhat.com>
Signed-off-by: Wu Fengguang <fenggu...@intel.com>

---
include/linux/pagemap.h | 1 +
mm/mmap.c | 2 ++
mm/nommu.c | 2 ++

mm/vmscan.c | 35 +++++++++++++++++++++++++++++++++--
4 files changed, 38 insertions(+), 2 deletions(-)

--- linux.orig/include/linux/pagemap.h
+++ linux/include/linux/pagemap.h
@@ -25,6 +25,7 @@ enum mapping_flags {
#ifdef CONFIG_UNEVICTABLE_LRU
AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
#endif
+ AS_EXEC = __GFP_BITS_SHIFT + 4, /* mapped PROT_EXEC somewhere */
};

static inline void mapping_set_error(struct address_space *mapping, int error)
--- linux.orig/mm/mmap.c
+++ linux/mm/mmap.c

@@ -1194,6 +1194,8 @@ munmap_back:

goto unmap_and_free_vma;
if (vm_flags & VM_EXECUTABLE)
added_exe_file_vma(mm);
+ if (vm_flags & VM_EXEC)
+ set_bit(AS_EXEC, &file->f_mapping->flags);
} else if (vm_flags & VM_SHARED) {
error = shmem_zero_setup(vma);
if (error)

--- linux.orig/mm/nommu.c
+++ linux/mm/nommu.c
@@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file

added_exe_file_vma(current->mm);
vma->vm_mm = current->mm;
}
+ if (vm_flags & VM_EXEC)
+ set_bit(AS_EXEC, &file->f_mapping->flags);
}

down_write(&nommu_region_sem);

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned
unsigned long pgmoved;

unsigned long pgscanned;
LIST_HEAD(l_hold); /* The pages which were snipped off */
+ LIST_HEAD(l_active);
LIST_HEAD(l_inactive);
struct page *page;
struct pagevec pvec;

@@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned

/* page_referenced clears PageReferenced */
if (page_mapping_inuse(page) &&
- page_referenced(page, 0, sc->mem_cgroup))
+ page_referenced(page, 0, sc->mem_cgroup)) {
+ struct address_space *mapping = page_mapping(page);
+
pgmoved++;
+ if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
+ list_add(&page->lru, &l_active);
+ continue;
+ }
+ }

list_add(&page->lru, &l_inactive);
}

@@ -1279,7 +1287,6 @@ static void shrink_active_list(unsigned

* Move the pages to the [file or anon] inactive list.
*/
pagevec_init(&pvec, 1);
- lru = LRU_BASE + file * LRU_FILE;

spin_lock_irq(&zone->lru_lock);
/*

@@ -1291,6 +1298,7 @@ static void shrink_active_list(unsigned

reclaim_stat->recent_rotated[!!file] += pgmoved;

pgmoved = 0; /* count pages moved to inactive list */

+ lru = LRU_BASE + file * LRU_FILE;
while (!list_empty(&l_inactive)) {
page = lru_to_page(&l_inactive);
prefetchw_prev_lru_page(page, &l_inactive, flags);

@@ -1313,6 +1321,29 @@ static void shrink_active_list(unsigned

__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);

__count_zone_vm_events(PGREFILL, zone, pgscanned);
__count_vm_events(PGDEACTIVATE, pgmoved);
+
+ pgmoved = 0; /* count pages moved back to active list */

+ lru = LRU_ACTIVE + file * LRU_FILE;
+ while (!list_empty(&l_active)) {
+ page = lru_to_page(&l_active);
+ prefetchw_prev_lru_page(page, &l_active, flags);
+ VM_BUG_ON(PageLRU(page));
+ SetPageLRU(page);
+ VM_BUG_ON(!PageActive(page));
+
+ list_move(&page->lru, &zone->lru[lru].list);
+ mem_cgroup_add_lru_list(page, lru);
+ pgmoved++;
+ if (!pagevec_add(&pvec, page)) {

+ spin_unlock_irq(&zone->lru_lock);
+ if (buffer_heads_over_limit)
+ pagevec_strip(&pvec);
+ __pagevec_release(&pvec);
+ spin_lock_irq(&zone->lru_lock);
+ }
+ }
+ __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+

spin_unlock_irq(&zone->lru_lock);
if (buffer_heads_over_limit)
pagevec_strip(&pvec);

Christoph Lameter

unread,

May 7, 2009, 10:10:19 AM5/7/09

to

On Thu, 7 May 2009, Wu Fengguang wrote:

> Introduce AS_EXEC to mark executables and their linked libraries, and to
> protect their referenced active pages from being deactivated.

We already have support for mlock(). How is this an improvement? This is
worse since the AS_EXEC pages stay on the active list and are continually
rescanned.

Peter Zijlstra

unread,

May 7, 2009, 10:20:17 AM5/7/09

to

On Thu, 2009-05-07 at 09:39 -0400, Christoph Lameter wrote:
> On Thu, 7 May 2009, Wu Fengguang wrote:
>
> > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > protect their referenced active pages from being deactivated.
>
>
> We already have support for mlock(). How is this an improvement? This is
> worse since the AS_EXEC pages stay on the active list and are continually
> rescanned.

It re-instates the young bit for PROT_EXEC pages, so that they will only
be paged when they are really cold, or there is severe pressure.

This simply gives them an edge over regular data. I don't think the
extra scanning is a problem, since you rarely have huge amounts of
executable pages around.

mlock()'ing all code just doesn't sound like a good alternative.

Peter Zijlstra

unread,

May 7, 2009, 10:40:26 AM5/7/09

to

On Thu, 2009-05-07 at 10:18 -0400, Christoph Lameter wrote:

> On Thu, 7 May 2009, Peter Zijlstra wrote:
>
> > It re-instates the young bit for PROT_EXEC pages, so that they will only
> > be paged when they are really cold, or there is severe pressure.
>

> But they are rescanned until then. Really cold means what exactly? I do a
> back up of a few hundred gigabytes and do not use firefox while the backup
> is ongoing. Will the firefox pages still be in memory or not?

Likely not.

What this patch does is check the young bit on active_file scan, if its
found to be set and the page is PROT_EXEC, put the page back on the
active_file list, otherwise drop it to the inactive_file list.

So if you haven't ran any firefox code, it should be gone from the
active list after 2 full cycles, and from the inactive list on the first
full inactive cycle after that.

If you don't understand the patch, what are you complaining about, whats
your point?

Christoph Lameter

unread,

May 7, 2009, 10:40:24 AM5/7/09

to

On Thu, 7 May 2009, Peter Zijlstra wrote:

> It re-instates the young bit for PROT_EXEC pages, so that they will only
> be paged when they are really cold, or there is severe pressure.

But they are rescanned until then. Really cold means what exactly? I do a

back up of a few hundred gigabytes and do not use firefox while the backup
is ongoing. Will the firefox pages still be in memory or not?

> This simply gives them an edge over regular data. I don't think the

> extra scanning is a problem, since you rarely have huge amounts of
> executable pages around.
>
> mlock()'ing all code just doesn't sound like a good alternative.

Another possibility may be to put the exec pages on the mlock list
and scan the list if under extreme duress?

Rik van Riel

unread,

May 7, 2009, 11:10:20 AM5/7/09

to

Christoph Lameter wrote:
> On Thu, 7 May 2009, Peter Zijlstra wrote:
>
>
>> It re-instates the young bit for PROT_EXEC pages, so that they will only
>> be paged when they are really cold, or there is severe pressure.
>>
>
> But they are rescanned until then. Really cold means what exactly? I do a
> back up of a few hundred gigabytes and do not use firefox while the backup
> is ongoing. Will the firefox pages still be in memory or not?
>

The patch with the subject "[PATCH] vmscan: evict use-once pages first (v3)"
together with this patch should make sure that it stays in memory.

Johannes Weiner

unread,

May 7, 2009, 11:20:14 AM5/7/09

to

I find it a bit ugly that it applies an attribute of the memory area
(per mm) to the page cache mapping (shared). Because this in turn
means that the reference through a non-executable vma might get the
pages rotated just because there is/was an executable mmap around.

> down_write(&nommu_region_sem);
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned
> unsigned long pgmoved;
> unsigned long pgscanned;
> LIST_HEAD(l_hold); /* The pages which were snipped off */
> + LIST_HEAD(l_active);
> LIST_HEAD(l_inactive);
> struct page *page;
> struct pagevec pvec;
> @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
>
> /* page_referenced clears PageReferenced */
> if (page_mapping_inuse(page) &&
> - page_referenced(page, 0, sc->mem_cgroup))
> + page_referenced(page, 0, sc->mem_cgroup)) {
> + struct address_space *mapping = page_mapping(page);
> +
> pgmoved++;
> + if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> + list_add(&page->lru, &l_active);
> + continue;
> + }
> + }

Since we walk the VMAs in page_referenced anyway, wouldn't it be
better to check if one of them is executable? This would even work
for executable anon pages. After all, there are applications that cow
executable mappings (sbcl and other language environments that use an
executable, run-time modified core image come to mind).

Hannes

Peter Zijlstra

unread,

May 7, 2009, 11:20:22 AM5/7/09

to

On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:

> > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
> >
> > /* page_referenced clears PageReferenced */
> > if (page_mapping_inuse(page) &&
> > - page_referenced(page, 0, sc->mem_cgroup))
> > + page_referenced(page, 0, sc->mem_cgroup)) {
> > + struct address_space *mapping = page_mapping(page);
> > +
> > pgmoved++;
> > + if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > + list_add(&page->lru, &l_active);
> > + continue;
> > + }
> > + }
>
> Since we walk the VMAs in page_referenced anyway, wouldn't it be
> better to check if one of them is executable? This would even work
> for executable anon pages. After all, there are applications that cow
> executable mappings (sbcl and other language environments that use an
> executable, run-time modified core image come to mind).

Hmm, like provide a vm_flags mask along to page_referenced() to only
account matching vmas... seems like a sensible idea.

Rik van Riel

unread,

May 7, 2009, 11:30:20 AM5/7/09

to

Peter Zijlstra wrote:
> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
>
>>> @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
>>>
>>> /* page_referenced clears PageReferenced */
>>> if (page_mapping_inuse(page) &&
>>> - page_referenced(page, 0, sc->mem_cgroup))
>>> + page_referenced(page, 0, sc->mem_cgroup)) {
>>> + struct address_space *mapping = page_mapping(page);
>>> +
>>> pgmoved++;
>>> + if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
>>> + list_add(&page->lru, &l_active);
>>> + continue;
>>> + }
>>> + }
>> Since we walk the VMAs in page_referenced anyway, wouldn't it be
>> better to check if one of them is executable? This would even work
>> for executable anon pages. After all, there are applications that cow
>> executable mappings (sbcl and other language environments that use an
>> executable, run-time modified core image come to mind).
>
> Hmm, like provide a vm_flags mask along to page_referenced() to only
> account matching vmas... seems like a sensible idea.

Not for anon pages, though, because JVMs could have way too many
executable anonymous segments, which would make us run into the
scalability problems again.

Lets leave this just to the file side of the LRUs, because that
is where we have the streaming IO problem.

Christoph Lameter

unread,

May 7, 2009, 11:50:14 AM5/7/09

to

On Thu, 7 May 2009, Peter Zijlstra wrote:

> So if you haven't ran any firefox code, it should be gone from the
> active list after 2 full cycles, and from the inactive list on the first
> full inactive cycle after that.

So some incremental changes. I still want to use firefox after my backup
without having to wait 5 minutes while its paging exec pages back in.

Lee Schermerhorn

unread,

May 7, 2009, 12:10:18 PM5/7/09

to

On Thu, 2009-05-07 at 10:18 -0400, Christoph Lameter wrote:

> On Thu, 7 May 2009, Peter Zijlstra wrote:
>
> > It re-instates the young bit for PROT_EXEC pages, so that they will only
> > be paged when they are really cold, or there is severe pressure.
>
> But they are rescanned until then. Really cold means what exactly? I do a
> back up of a few hundred gigabytes and do not use firefox while the backup
> is ongoing. Will the firefox pages still be in memory or not?
>
> > This simply gives them an edge over regular data. I don't think the
> > extra scanning is a problem, since you rarely have huge amounts of
> > executable pages around.
> >
> > mlock()'ing all code just doesn't sound like a good alternative.
>
> Another possibility may be to put the exec pages on the mlock list
> and scan the list if under extreme duress?

Actually, you don't need to go thru the overhead of mucking with the
PG_mlocked flag which incurs the rmap walk on unlock, etc. If one sets
the the AS_UNEVICTABLE flag, the pages will be shuffled off the the
unevictable LRU iff we ever try to reclaim them. And, we do have the
function to scan the unevictable lru to "rescue" pages in a given
mapping should we want to bring them back under extreme load. We'd need
to remove the AS_UNEVICTABLE flag, first. This is how
SHM_LOCK/SHM_UNLOCK works.

Lee

Rik van Riel

unread,

May 7, 2009, 12:10:23 PM5/7/09

to

Christoph Lameter wrote:
> On Thu, 7 May 2009, Peter Zijlstra wrote:
>
>> So if you haven't ran any firefox code, it should be gone from the
>> active list after 2 full cycles, and from the inactive list on the first
>> full inactive cycle after that.
>
> So some incremental changes. I still want to use firefox after my backup
> without having to wait 5 minutes while its paging exec pages back in.

Please try to read and understand the patches, before
imagining that they might not be enough.

The active file list is kept at least as large as
the inactive file list. Your backup is one large
streaming IO. This means the files touched by
your backup should go onto the inactive file list
and get reclaimed, without putting pressure on
the active file list.

If you are still not convinced that these (small)
changes are enough, please test the patches and
show us the results, so we can tweak things further.

Christoph Lameter

unread,

May 7, 2009, 12:50:11 PM5/7/09

to

On Thu, 7 May 2009, Lee Schermerhorn wrote:

> > Another possibility may be to put the exec pages on the mlock list
> > and scan the list if under extreme duress?
>
> Actually, you don't need to go thru the overhead of mucking with the
> PG_mlocked flag which incurs the rmap walk on unlock, etc. If one sets
> the the AS_UNEVICTABLE flag, the pages will be shuffled off the the
> unevictable LRU iff we ever try to reclaim them. And, we do have the
> function to scan the unevictable lru to "rescue" pages in a given
> mapping should we want to bring them back under extreme load. We'd need
> to remove the AS_UNEVICTABLE flag, first. This is how
> SHM_LOCK/SHM_UNLOCK works.

We need some way to control this. If there would be a way to simply switch
off eviction of exec pages (via /proc/sys/vm/never_reclaim_exec_pages or
so) I'd use it.

Rik van Riel

unread,

May 7, 2009, 1:20:05 PM5/7/09

to

Christoph Lameter wrote:

> We need some way to control this. If there would be a way to simply switch
> off eviction of exec pages (via /proc/sys/vm/never_reclaim_exec_pages or
> so) I'd use it.

Nobody (except you) is proposing that we completely disable
the eviction of executable pages. I believe that your idea
could easily lead to a denial of service attack, with a user
creating a very large executable file and mmaping it.

Giving executable pages some priority over other file cache
pages is nowhere near as dangerous wrt. unexpected side effects
and should work just as well.

Andrew Morton

unread,

May 7, 2009, 5:00:28 PM5/7/09

to

On Thu, 7 May 2009 17:10:39 +0200
Johannes Weiner <han...@cmpxchg.org> wrote:

> > +++ linux/mm/nommu.c
> > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file
> > added_exe_file_vma(current->mm);
> > vma->vm_mm = current->mm;
> > }
> > + if (vm_flags & VM_EXEC)
> > + set_bit(AS_EXEC, &file->f_mapping->flags);
> > }
>
> I find it a bit ugly that it applies an attribute of the memory area
> (per mm) to the page cache mapping (shared). Because this in turn
> means that the reference through a non-executable vma might get the
> pages rotated just because there is/was an executable mmap around.

Yes, it's not good. That AS_EXEC bit will hang around for arbitrarily
long periods in the inode cache. So we'll have AS_EXEC set on an
entire file because someone mapped some of it with PROT_EXEC half an
hour ago. Where's the sense in that?

Wu Fengguang

unread,

May 7, 2009, 11:10:17 PM5/7/09

to

Right, the intention was to identify a whole executable/library file,
eg. /bin/bash or /lib/libc-2.9.so, covering both _text_ and _data_
sections.

The page_referenced() path will only cover the _text_ section. But
yeah, the _data_ section is more likely to grow huge in some rare cases.

Thanks,
Fengguang

Wu Fengguang

unread,

May 7, 2009, 11:40:12 PM5/7/09

to

On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
>
> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
> > >
> > > /* page_referenced clears PageReferenced */
> > > if (page_mapping_inuse(page) &&
> > > - page_referenced(page, 0, sc->mem_cgroup))
> > > + page_referenced(page, 0, sc->mem_cgroup)) {
> > > + struct address_space *mapping = page_mapping(page);
> > > +
> > > pgmoved++;
> > > + if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > + list_add(&page->lru, &l_active);
> > > + continue;
> > > + }
> > > + }
> >
> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > better to check if one of them is executable? This would even work
> > for executable anon pages. After all, there are applications that cow
> > executable mappings (sbcl and other language environments that use an
> > executable, run-time modified core image come to mind).
>
> Hmm, like provide a vm_flags mask along to page_referenced() to only
> account matching vmas... seems like a sensible idea.

I'd prefer to make vm_flags an out-param, like this:

- int page_referenced(struct page *page, int is_locked,
+ int page_referenced(struct page *page, int is_locked, unsigned long *vm_flags,
struct mem_cgroup *mem_cont)

which allows reporting more versatile flags and status bits :)

Thanks,
Fengguang

Elladan

unread,

May 7, 2009, 11:50:10 PM5/7/09

to

On Thu, May 07, 2009 at 01:11:41PM -0400, Rik van Riel wrote:
> Christoph Lameter wrote:
>
>> We need some way to control this. If there would be a way to simply switch
>> off eviction of exec pages (via /proc/sys/vm/never_reclaim_exec_pages or
>> so) I'd use it.
>
> Nobody (except you) is proposing that we completely disable
> the eviction of executable pages. I believe that your idea
> could easily lead to a denial of service attack, with a user
> creating a very large executable file and mmaping it.
>
> Giving executable pages some priority over other file cache
> pages is nowhere near as dangerous wrt. unexpected side effects
> and should work just as well.

I don't think this sort of DOS is relevant for a single user or trusted user
system.

I don't know of any distro that applies default ulimits, so desktops are
already susceptible to the far more trivial "call malloc a lot" or "fork bomb"
attacks. Plus, ulimits don't help, since they only apply per process - you'd
need a default mem cgroup before this mattered, I think.

Thanks,
Elladan

Wu Fengguang

unread,

May 8, 2009, 12:20:13 AM5/8/09

to

On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:

> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
>
> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
> > >
> > > /* page_referenced clears PageReferenced */
> > > if (page_mapping_inuse(page) &&
> > > - page_referenced(page, 0, sc->mem_cgroup))
> > > + page_referenced(page, 0, sc->mem_cgroup)) {
> > > + struct address_space *mapping = page_mapping(page);
> > > +
> > > pgmoved++;
> > > + if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > + list_add(&page->lru, &l_active);
> > > + continue;
> > > + }
> > > + }
> >
> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > better to check if one of them is executable? This would even work
> > for executable anon pages. After all, there are applications that cow
> > executable mappings (sbcl and other language environments that use an
> > executable, run-time modified core image come to mind).
>
> Hmm, like provide a vm_flags mask along to page_referenced() to only
> account matching vmas... seems like a sensible idea.

Here is a quick patch for your opinions. Compile tested.

With the added vm_flags reporting, the mlock=>unevictable logic can
possibly be made more straightforward.

Thanks,
Fengguang
---
vmscan: report vm_flags in page_referenced()

This enables more informed reclaim heuristics, eg. to protect executable
file pages more aggressively.

Signed-off-by: Wu Fengguang <fenggu...@intel.com>
---

include/linux/rmap.h | 5 +++--
mm/rmap.c | 30 +++++++++++++++++++++---------
mm/vmscan.c | 7 +++++--
3 files changed, 29 insertions(+), 13 deletions(-)

--- linux.orig/include/linux/rmap.h
+++ linux/include/linux/rmap.h
@@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
/*
* Called from mm/vmscan.c to handle paging out
*/
-int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
+int page_referenced(struct page *, int is_locked,
+ struct mem_cgroup *cnt, unsigned long *vm_flags);
int try_to_unmap(struct page *, int ignore_refs);

/*
@@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
#define anon_vma_prepare(vma) (0)
#define anon_vma_link(vma) do {} while (0)

-#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
+#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
#define try_to_unmap(page, refs) SWAP_FAIL

static inline int page_mkclean(struct page *page)
--- linux.orig/mm/rmap.c
+++ linux/mm/rmap.c
@@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
* repeatedly from either page_referenced_anon or page_referenced_file.
*/
static int page_referenced_one(struct page *page,
- struct vm_area_struct *vma, unsigned int *mapcount)
+ struct vm_area_struct *vma,
+ unsigned int *mapcount)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
@@ -385,7 +386,8 @@ out:
}

static int page_referenced_anon(struct page *page,
- struct mem_cgroup *mem_cont)
+ struct mem_cgroup *mem_cont,
+ unsigned long *vm_flags)
{
unsigned int mapcount;
struct anon_vma *anon_vma;
@@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
continue;
referenced += page_referenced_one(page, vma, &mapcount);
+ *vm_flags |= vma->vm_flags;
if (!mapcount)
break;
}
@@ -418,6 +421,7 @@ static int page_referenced_anon(struct p
* page_referenced_file - referenced check for object-based rmap
* @page: the page we're checking references on.
* @mem_cont: target memory controller
+ * @vm_flags: collect the encountered vma->vm_flags
*
* For an object-based mapped page, find all the places it is mapped and
* check/clear the referenced flag. This is done by following the page->mapping
@@ -427,7 +431,8 @@ static int page_referenced_anon(struct p
* This function is only called from page_referenced for object-based pages.
*/
static int page_referenced_file(struct page *page,
- struct mem_cgroup *mem_cont)
+ struct mem_cgroup *mem_cont,
+ unsigned long *vm_flags)
{
unsigned int mapcount;
struct address_space *mapping = page->mapping;
@@ -468,6 +473,7 @@ static int page_referenced_file(struct p
if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
continue;
referenced += page_referenced_one(page, vma, &mapcount);
+ *vm_flags |= vma->vm_flags;
if (!mapcount)
break;
}
@@ -481,29 +487,35 @@ static int page_referenced_file(struct p
* @page: the page to test
* @is_locked: caller holds lock on the page
* @mem_cont: target memory controller
+ * @vm_flags: collect the encountered vma->vm_flags
*
* Quick test_and_clear_referenced for all mappings to a page,
* returns the number of ptes which referenced the page.
*/
-int page_referenced(struct page *page, int is_locked,
- struct mem_cgroup *mem_cont)
+int page_referenced(struct page *page,
+ int is_locked,
+ struct mem_cgroup *mem_cont,
+ unsigned long *vm_flags)
{
int referenced = 0;

if (TestClearPageReferenced(page))
referenced++;

+ *vm_flags = 0;
if (page_mapped(page) && page->mapping) {
if (PageAnon(page))
- referenced += page_referenced_anon(page, mem_cont);
+ referenced += page_referenced_anon(page, mem_cont,
+ vm_flags);
else if (is_locked)
- referenced += page_referenced_file(page, mem_cont);
+ referenced += page_referenced_file(page, mem_cont,
+ vm_flags);
else if (!trylock_page(page))
referenced++;
else {
if (page->mapping)
- referenced +=
- page_referenced_file(page, mem_cont);
+ referenced += page_referenced_file(page,
+ mem_cont, vm_flags);
unlock_page(page);
}
}
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -598,6 +598,7 @@ static unsigned long shrink_page_list(st
struct pagevec freed_pvec;
int pgactivate = 0;
unsigned long nr_reclaimed = 0;
+ unsigned long vm_flags;

cond_resched();

@@ -648,7 +649,8 @@ static unsigned long shrink_page_list(st
goto keep_locked;
}

- referenced = page_referenced(page, 1, sc->mem_cgroup);
+ referenced = page_referenced(page, 1,
+ sc->mem_cgroup, &vm_flags);
/* In active use or really unfreeable? Activate it. */
if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
referenced && page_mapping_inuse(page))
@@ -1229,6 +1231,7 @@ static void shrink_active_list(unsigned

{
unsigned long pgmoved;
unsigned long pgscanned;

+ unsigned long vm_flags;

LIST_HEAD(l_hold); /* The pages which were snipped off */

LIST_HEAD(l_inactive);
struct page *page;
@@ -1269,7 +1272,7 @@ static void shrink_active_list(unsigned

/* page_referenced clears PageReferenced */
if (page_mapping_inuse(page) &&
- page_referenced(page, 0, sc->mem_cgroup))

+ page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
pgmoved++;

list_add(&page->lru, &l_inactive);

Minchan Kim

unread,

May 8, 2009, 3:40:12 AM5/8/09

to

Hi, Let me have a question.

But, your patch is care just text section.
Do I miss something ?

Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ?
Could you elaborate please ?

> yeah, the _data_ section is more likely to grow huge in some rare cases.
>
> Thanks,
> Fengguang
>
> --

> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majo...@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"do...@kvack.org"> em...@kvack.org </a>

--
Kinds Regards
Minchan Kim

Wu Fengguang

unread,

May 8, 2009, 4:20:10 AM5/8/09

to

This patch actually protects the mapped pages in the whole executable
file. Sorry, the title was a bit misleading..

I was under the wild assumption that only the _text_ section will be
PROT_EXEC mapped. No?

Thanks,
Fengguang

Wu Fengguang

unread,

May 8, 2009, 4:20:16 AM5/8/09

to

On Fri, May 08, 2009 at 04:44:10AM +0800, Andrew Morton wrote:
> On Thu, 7 May 2009 17:10:39 +0200
> Johannes Weiner <han...@cmpxchg.org> wrote:
>
> > > +++ linux/mm/nommu.c
> > > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file
> > > added_exe_file_vma(current->mm);
> > > vma->vm_mm = current->mm;
> > > }
> > > + if (vm_flags & VM_EXEC)
> > > + set_bit(AS_EXEC, &file->f_mapping->flags);
> > > }
> >
> > I find it a bit ugly that it applies an attribute of the memory area
> > (per mm) to the page cache mapping (shared). Because this in turn
> > means that the reference through a non-executable vma might get the
> > pages rotated just because there is/was an executable mmap around.
>
> Yes, it's not good. That AS_EXEC bit will hang around for arbitrarily
> long periods in the inode cache. So we'll have AS_EXEC set on an
> entire file because someone mapped some of it with PROT_EXEC half an
> hour ago. Where's the sense in that?

Yes that nonsense case is possible, but should be rare.

AS_EXEC means "this is (likely) an executable file".
It has broader coverage in both space and time:

- it protects the whole file instead of only the text section
- it allows to further protect the many executables/libraries that
typically runs short in time but maybe frequently, eg. ls, cat,
git, gcc, perl, python, ...

But none of the above cases are as important in user experience as the
currently running code, so here goes the new patch (which applies after
vmscan: report vm_flags in page_referenced()).

Thanks,
Fengguang
---
vmscan: make mapped executable pages the first class citizen

Protect referenced PROT_EXEC mapped pages from being deactivated.

PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
currently running executables and their linked libraries, they shall really be
cached aggressively to provide good user experiences.

CC: Elladan <ell...@eskimo.com>
CC: Nick Piggin <npi...@suse.de>
CC: Johannes Weiner <han...@cmpxchg.org>
CC: Christoph Lameter <c...@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
Acked-by: Peter Zijlstra <pet...@infradead.org>
Acked-by: Rik van Riel <ri...@redhat.com>
Signed-off-by: Wu Fengguang <fenggu...@intel.com>
---

mm/vmscan.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned
unsigned long pgscanned;
unsigned long vm_flags;

LIST_HEAD(l_hold); /* The pages which were snipped off */
+ LIST_HEAD(l_active);
LIST_HEAD(l_inactive);
struct page *page;
struct pagevec pvec;

@@ -1272,8 +1273,13 @@ static void shrink_active_list(unsigned

/* page_referenced clears PageReferenced */
if (page_mapping_inuse(page) &&

- page_referenced(page, 0, sc->mem_cgroup, &vm_flags))

+ page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
pgmoved++;

+ if ((vm_flags & VM_EXEC) && !PageAnon(page)) {

+ list_add(&page->lru, &l_active);
+ continue;
+ }
+ }

list_add(&page->lru, &l_inactive);
}
@@ -1282,7 +1288,6 @@ static void shrink_active_list(unsigned

* Move the pages to the [file or anon] inactive list.
*/
pagevec_init(&pvec, 1);
- lru = LRU_BASE + file * LRU_FILE;

spin_lock_irq(&zone->lru_lock);
/*

@@ -1294,6 +1299,7 @@ static void shrink_active_list(unsigned

reclaim_stat->recent_rotated[!!file] += pgmoved;

pgmoved = 0; /* count pages moved to inactive list */
+ lru = LRU_BASE + file * LRU_FILE;
while (!list_empty(&l_inactive)) {
page = lru_to_page(&l_inactive);
prefetchw_prev_lru_page(page, &l_inactive, flags);

@@ -1316,6 +1322,29 @@ static void shrink_active_list(unsigned

Wu Fengguang

unread,

May 8, 2009, 4:30:21 AM5/8/09

to

On Fri, May 08, 2009 at 04:16:08PM +0800, Wu Fengguang wrote:
> ---
> vmscan: make mapped executable pages the first class citizen
>
> Protect referenced PROT_EXEC mapped pages from being deactivated.
>
> PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
> currently running executables and their linked libraries, they shall really be
> cached aggressively to provide good user experiences.

I can verify that it actually works :)

Thanks,
Fengguang
---
printk("rescued %s 0x%lx\n", dname, page->index);

[ 929.047700] rescued ld-2.9.so 0x0
[ 929.051295] rescued libc-2.9.so 0x0
[ 929.054984] rescued init 0x0
[ 929.058086] rescued libc-2.9.so 0x1
[ 929.061810] rescued libc-2.9.so 0x2
[ 929.065557] rescued libc-2.9.so 0x3
[ 929.069279] rescued libc-2.9.so 0x7
[ 929.072978] rescued libc-2.9.so 0x8
[ 929.076697] rescued libc-2.9.so 0x9
[ 929.080413] rescued libc-2.9.so 0xb
[ 929.084127] rescued libc-2.9.so 0xf
[ 929.087849] rescued libc-2.9.so 0x10
[ 929.091667] rescued libc-2.9.so 0x12
[ 929.095426] rescued libc-2.9.so 0x13
[ 929.099235] rescued libc-2.9.so 0x14
[ 929.103055] rescued libc-2.9.so 0x15
[ 929.106868] rescued libc-2.9.so 0x16
[ 929.110661] rescued libc-2.9.so 0x1e
[ 929.114468] rescued libc-2.9.so 0x42
[ 929.118259] rescued libc-2.9.so 0x43
[ 929.122063] rescued libc-2.9.so 0x44
[ 929.125863] rescued libc-2.9.so 0x45
[ 929.129666] rescued libc-2.9.so 0x46
[ 929.133469] rescued libc-2.9.so 0x4c
[ 929.137258] rescued libc-2.9.so 0x6b
[ 929.141050] rescued libc-2.9.so 0x6f
[ 929.144916] rescued libc-2.9.so 0x70
[ 929.148695] rescued libc-2.9.so 0x71
[ 929.152495] rescued libc-2.9.so 0x74
[ 929.156272] rescued libc-2.9.so 0x76
[ 929.160095] rescued libc-2.9.so 0x79
[ 929.163904] rescued libc-2.9.so 0x7b
[ 929.168007] rescued libc-2.9.so 0x7c
[ 929.171800] rescued libc-2.9.so 0x7d
[ 929.176518] rescued libnss_compat-2.9.so 0x0
[ 929.180362] rescued libc-2.9.so 0x105
[ 929.184617] rescued libc-2.9.so 0x4
[ 929.188173] rescued libc-2.9.so 0x106
[ 929.188191] rescued libc-2.9.so 0x5
[ 929.195487] rescued libc-2.9.so 0x6
[ 929.199042] rescued libc-2.9.so 0x116
[ 929.202805] rescued libc-2.9.so 0x7f
[ 929.202818] rescued libc-2.9.so 0x118
[ 929.202838] rescued libc-2.9.so 0x11a
[ 929.202863] rescued ld-2.9.so 0x1
[ 929.202878] rescued ld-2.9.so 0x2
[ 929.202892] rescued ld-2.9.so 0x3
[ 929.202909] rescued ld-2.9.so 0x4
[ 929.202925] rescued ld-2.9.so 0x5
[ 929.202940] rescued ld-2.9.so 0x6
[ 929.202956] rescued ld-2.9.so 0x7
[ 929.202973] rescued ld-2.9.so 0xa
[ 929.202989] rescued ld-2.9.so 0xb
[ 929.203005] rescued ld-2.9.so 0xc
[ 929.203021] rescued ld-2.9.so 0xf
[ 929.203037] rescued ld-2.9.so 0x10
[ 929.203052] rescued ld-2.9.so 0x14
[ 929.203068] rescued ld-2.9.so 0x16
[ 929.203084] rescued ld-2.9.so 0x18
[ 929.203100] rescued ld-2.9.so 0x1a
[ 929.203381] rescued libc-2.9.so 0x28
[ 929.203392] rescued libc-2.9.so 0x29
[ 929.203405] rescued libc-2.9.so 0x2a
[ 929.203423] rescued libc-2.9.so 0x2b
[ 929.203434] rescued libc-2.9.so 0x2f
[ 929.203457] rescued libc-2.9.so 0x72
[ 929.203477] rescued libc-2.9.so 0x73
[ 929.203497] rescued libc-2.9.so 0x75
[ 929.203516] rescued libc-2.9.so 0x77
[ 929.203528] rescued libc-2.9.so 0xb7
[ 929.203541] rescued libc-2.9.so 0xb8
[ 929.203560] rescued libc-2.9.so 0xd9
[ 929.203577] rescued libc-2.9.so 0x119
[ 929.203590] rescued libc-2.9.so 0x11f
[ 929.319976] rescued libc-2.9.so 0xa
[ 929.323657] rescued libc-2.9.so 0xc
[ 929.327374] rescued libc-2.9.so 0xd
[ 929.331028] rescued libc-2.9.so 0xe
[ 929.331575] rescued libc-2.9.so 0x24
[ 929.331591] rescued libc-2.9.so 0x6a
[ 929.331599] rescued libc-2.9.so 0x84
[ 929.331609] rescued libc-2.9.so 0x6c
[ 929.331627] rescued libc-2.9.so 0x6d
[ 929.331642] rescued libc-2.9.so 0x64
[ 929.331645] rescued libc-2.9.so 0xa6
[ 929.331648] rescued libc-2.9.so 0xa7
[ 929.331651] rescued libc-2.9.so 0xa8
[ 929.331654] rescued libc-2.9.so 0xa9
[ 929.331657] rescued libc-2.9.so 0xaa
[ 929.331660] rescued libc-2.9.so 0xab
[ 929.331663] rescued libc-2.9.so 0xae
[ 929.331666] rescued libc-2.9.so 0xaf
[ 929.331669] rescued libc-2.9.so 0xb0
[ 929.331672] rescued libc-2.9.so 0xb1
[ 929.331674] rescued libc-2.9.so 0xb2
[ 929.331677] rescued libc-2.9.so 0xb3
[ 929.331680] rescued libc-2.9.so 0xb4
[ 929.331683] rescued libc-2.9.so 0xb5
[ 929.331686] rescued libc-2.9.so 0xb6
[ 929.331707] rescued libnss_files-2.9.so 0x1
[ 929.331716] rescued libnss_files-2.9.so 0x2
[ 929.331724] rescued libnss_files-2.9.so 0x8
[ 929.424155] rescued libc-2.9.so 0x9e
[ 929.426448] rescued libnss_nis-2.9.so 0x1
[ 929.426457] rescued libnss_nis-2.9.so 0x2
[ 929.426467] rescued libnss_nis-2.9.so 0x5
[ 929.426475] rescued libnss_nis-2.9.so 0x4
[ 929.426484] rescued libnss_nis-2.9.so 0x7
[ 929.426500] rescued libnsl-2.9.so 0x1
[ 929.426511] rescued libnsl-2.9.so 0x2
[ 929.426520] rescued libnsl-2.9.so 0x3
[ 929.426530] rescued libnsl-2.9.so 0x4
[ 929.426540] rescued libnsl-2.9.so 0xf
[ 929.426556] rescued libnss_compat-2.9.so 0x1
[ 929.426565] rescued libnss_compat-2.9.so 0x3
[ 929.426574] rescued libnss_compat-2.9.so 0x2
[ 929.426584] rescued libnss_compat-2.9.so 0x5
[ 929.480343] rescued libc-2.9.so 0x35
[ 929.487805] rescued libc-2.9.so 0xc1
[ 929.488119] rescued libc-2.9.so 0x111
[ 929.488136] rescued libc-2.9.so 0x60
[ 929.488157] rescued libc-2.9.so 0x8f
[ 929.488170] rescued libc-2.9.so 0x9d
[ 929.488185] rescued libc-2.9.so 0xa0
[ 929.488193] rescued libc-2.9.so 0xcd
[ 929.488209] rescued libc-2.9.so 0xcf
[ 929.488221] rescued libc-2.9.so 0xde
[ 929.488506] rescued libc-2.9.so 0xfd
[ 929.488517] rescued libc-2.9.so 0xff
[ 929.488532] rescued libc-2.9.so 0x100
[ 929.488554] rescued libc-2.9.so 0x112
[ 929.488567] rescued libc-2.9.so 0x11b
[ 929.488584] rescued ld-2.9.so 0x11
[ 929.488588] rescued libblkid.so.1.0 0x0
[ 929.488599] rescued libpthread-2.9.so 0x0
[ 929.488607] rescued librt-2.9.so 0x0
[ 929.488616] rescued libselinux.so.1 0x1
[ 929.488621] rescued libselinux.so.1 0x2
[ 929.488626] rescued libselinux.so.1 0x3
[ 929.488632] rescued libselinux.so.1 0x4
[ 929.488637] rescued libselinux.so.1 0x5
[ 929.488642] rescued libselinux.so.1 0xa
[ 929.488647] rescued libselinux.so.1 0xc
[ 929.488651] rescued libselinux.so.1 0x11
[ 929.488656] rescued libselinux.so.1 0x12
[ 929.488660] rescued libselinux.so.1 0x13
[ 929.488665] rescued libselinux.so.1 0x14
[ 929.488675] rescued libc-2.9.so 0xc3
[ 929.488688] rescued libc-2.9.so 0xd1
[ 929.488692] rescued libc-2.9.so 0x107
[ 929.488703] rescued libc-2.9.so 0x9a
[ 929.488716] rescued libc-2.9.so 0x9b
[ 929.488720] rescued udevd 0x0
[ 929.488729] rescued libc-2.9.so 0xca
[ 929.489302] rescued libc-2.9.so 0x36
[ 929.489314] rescued libdl-2.9.so 0x1
[ 929.489324] rescued libc-2.9.so 0xda
[ 929.489335] rescued libc-2.9.so 0xdd
[ 929.489346] rescued libc-2.9.so 0xdc
[ 929.489350] rescued libc-2.9.so 0xdb
[ 929.489641] rescued udevd 0x2
[ 929.489644] rescued udevd 0xa
[ 929.489652] rescued libpthread-2.9.so 0x9
[ 929.489659] rescued libpthread-2.9.so 0x8
[ 929.489673] rescued udevd 0x1
[ 929.661734] rescued libc-2.9.so 0xc2
[ 929.665491] rescued libc-2.9.so 0xc6
[ 929.669228] rescued libc-2.9.so 0xc7
[ 929.673267] rescued libc-2.9.so 0x113
[ 929.677241] rescued libc-2.9.so 0x114
[ 929.681087] rescued libc-2.9.so 0x115
[ 929.685510] rescued libselinux.so.1 0x0
[ 929.689720] rescued libsepol.so.1 0x0
[ 929.693586] rescued ld-2.9.so 0x8
[ 929.697072] rescued ld-2.9.so 0x9
[ 929.700527] rescued ld-2.9.so 0xd
[ 929.703982] rescued ld-2.9.so 0xe
[ 929.707459] rescued ld-2.9.so 0x13
[ 929.711017] rescued ld-2.9.so 0x15
[ 929.714551] rescued ld-2.9.so 0x17
[ 929.718089] rescued libdl-2.9.so 0x0
[ 929.720395] rescued libc-2.9.so 0x11
[ 929.720412] rescued libc-2.9.so 0x17
[ 929.720428] rescued libc-2.9.so 0x18
[ 929.720443] rescued libc-2.9.so 0x19
[ 929.720457] rescued libc-2.9.so 0x1a
[ 929.720473] rescued libc-2.9.so 0x1b
[ 929.720488] rescued libc-2.9.so 0x1c
[ 929.720502] rescued libc-2.9.so 0x1d
[ 929.720521] rescued libc-2.9.so 0x31
[ 929.720541] rescued libc-2.9.so 0x32
[ 929.720558] rescued libc-2.9.so 0x33
[ 929.720578] rescued libc-2.9.so 0x34
[ 929.720598] rescued libc-2.9.so 0x62
[ 929.720614] rescued libc-2.9.so 0x63
[ 929.720628] rescued libc-2.9.so 0x65
[ 929.720646] rescued libc-2.9.so 0x6e
[ 929.720668] rescued libc-2.9.so 0x7a
[ 929.720685] rescued libc-2.9.so 0x82
[ 929.720700] rescued libc-2.9.so 0x83
[ 929.720721] rescued libc-2.9.so 0x9f
[ 929.720742] rescued libc-2.9.so 0xcb
[ 929.720754] rescued libc-2.9.so 0xcc
[ 929.720775] rescued libc-2.9.so 0xce
[ 929.720798] rescued libc-2.9.so 0x103
[ 929.721080] rescued udevd 0x15
[ 929.721083] rescued udevd 0x5
[ 929.721085] rescued udevd 0x7
[ 929.721088] rescued udevd 0xb
[ 929.721091] rescued udevd 0xc
[ 929.721093] rescued udevd 0xf
[ 929.721096] rescued udevd 0x11
[ 929.721098] rescued udevd 0x12
[ 929.721101] rescued udevd 0x14
[ 929.721104] rescued udevd 0x16
[ 929.721106] rescued udevd 0x3
[ 929.721109] rescued udevd 0xd
[ 929.721111] rescued udevd 0xe
[ 929.721125] rescued libc-2.9.so 0x52
[ 929.721134] rescued libc-2.9.so 0x53
[ 929.721145] rescued libc-2.9.so 0x54
[ 929.721148] rescued udevd 0x4
[ 929.721153] rescued libc-2.9.so 0x3d
[ 929.721162] rescued libc-2.9.so 0x2c
[ 929.721169] rescued libc-2.9.so 0x2d
[ 929.721175] rescued libc-2.9.so 0x80
[ 929.721181] rescued libc-2.9.so 0x81
[ 929.721185] rescued libc-2.9.so 0x38
[ 929.721190] rescued libc-2.9.so 0x39
[ 929.736569] rescued libc-2.9.so 0x3a
[ 929.736576] rescued libc-2.9.so 0x40
[ 929.736581] rescued libc-2.9.so 0x41
[ 929.736597] rescued libc-2.9.so 0x55
[ 929.736601] rescued libc-2.9.so 0x56
[ 929.736615] rescued libpthread-2.9.so 0x1
[ 929.736622] rescued libpthread-2.9.so 0x2
[ 929.736629] rescued libpthread-2.9.so 0x3
[ 929.736636] rescued libpthread-2.9.so 0x4
[ 929.736646] rescued libpthread-2.9.so 0x5
[ 929.736654] rescued libpthread-2.9.so 0xa
[ 929.736662] rescued libpthread-2.9.so 0xc
[ 929.736669] rescued libpthread-2.9.so 0xe
[ 929.736677] rescued libpthread-2.9.so 0x10
[ 929.736685] rescued librt-2.9.so 0x1
[ 929.736691] rescued librt-2.9.so 0x2
[ 929.736696] rescued librt-2.9.so 0x5
[ 929.736971] rescued libc-2.9.so 0x57
[ 929.736984] rescued libc-2.9.so 0x8e
[ 929.736992] rescued libc-2.9.so 0x90
[ 929.736999] rescued libc-2.9.so 0x91
[ 929.737007] rescued libc-2.9.so 0x95
[ 929.737014] rescued libc-2.9.so 0x96
[ 929.737021] rescued libc-2.9.so 0x97
[ 929.737027] rescued librt-2.9.so 0x3
[ 929.737042] rescued libc-2.9.so 0x47
[ 929.737047] rescued libc-2.9.so 0x59
[ 929.737055] rescued libc-2.9.so 0xc8
[ 929.737059] rescued libc-2.9.so 0xc9
[ 929.737067] rescued libm-2.9.so 0x0
[ 929.737073] rescued libm-2.9.so 0x1
[ 930.005563] rescued libnss_files-2.9.so 0x0
[ 930.005830] rescued libm-2.9.so 0x2
[ 930.005836] rescued libm-2.9.so 0x3
[ 930.005841] rescued libm-2.9.so 0x44
[ 930.005845] rescued libm-2.9.so 0x28
[ 930.005849] rescued portmap 0x0
[ 930.005861] rescued libwrap.so.0.7.6 0x0
[ 930.005865] rescued rpc.statd 0x0
[ 930.005868] rescued rpc.statd 0x1
[ 930.005870] rescued rpc.statd 0x2
[ 930.005873] rescued rpc.statd 0x4
[ 930.005876] rescued rpc.statd 0x8
[ 930.005881] rescued libwrap.so.0.7.6 0x1
[ 930.005885] rescued libwrap.so.0.7.6 0x2
[ 930.005889] rescued libwrap.so.0.7.6 0x3
[ 930.005893] rescued libwrap.so.0.7.6 0x6
[ 930.005897] rescued rpc.idmapd 0x0
[ 930.006158] rescued libresolv-2.9.so 0x0
[ 930.006171] rescued libc-2.9.so 0x9c
[ 930.006182] rescued libc-2.9.so 0xfe
[ 930.006186] rescued libnfsidmap.so.0.3.0 0x0
[ 930.006189] rescued libevent-1.3e.so.1.0.3 0x0
[ 930.006196] rescued libpthread-2.9.so 0xb
[ 930.006202] rescued libattr.so.1.1.0 0x0
[ 930.006208] rescued libc-2.9.so 0x61
[ 930.006211] rescued libattr.so.1.1.0 0x1
[ 930.006215] rescued libattr.so.1.1.0 0x3
[ 930.006219] rescued init 0x1
[ 930.006221] rescued init 0x2
[ 930.006224] rescued init 0x5
[ 930.006226] rescued init 0x7
[ 930.006229] rescued init 0x6
[ 930.006233] rescued libsepol.so.1 0x2
[ 930.006236] rescued libsepol.so.1 0x3
[ 930.006238] rescued libsepol.so.1 0x4
[ 930.006241] rescued libsepol.so.1 0x2d
[ 930.006244] rescued init 0x3
[ 930.006246] rescued init 0x4
[ 930.006250] rescued acpid 0x0
[ 930.006259] rescued libdbus-1.so.3.4.0 0x0
[ 930.006262] rescued dbus-daemon 0x0
[ 930.006275] rescued libpthread-2.9.so 0xd
[ 930.006279] rescued libexpat.so.1.5.2 0x0
[ 930.006287] rescued sshd 0x0
[ 930.006292] rescued libkeyutils-1.2.so 0x0
[ 930.006297] rescued libkrb5support.so.0.1 0x0
[ 930.006304] rescued libcom_err.so.2.1 0x0
[ 930.006311] rescued libk5crypto.so.3.1 0x0
[ 930.006318] rescued libgssapi_krb5.so.2.2 0x0
[ 930.006322] rescued libresolv-2.9.so 0x1
[ 930.006325] rescued libresolv-2.9.so 0x2
[ 930.006329] rescued libresolv-2.9.so 0x3
[ 930.006333] rescued libresolv-2.9.so 0xf
[ 930.006340] rescued libcrypt-2.9.so 0x0
[ 930.207054] rescued libnss_nis-2.9.so 0x0
[ 930.211539] rescued libnsl-2.9.so 0x0
[ 930.215805] rescued libz.so.1.2.3.3 0x0
[ 930.215968] rescued libkrb5.so.3.3 0x8
[ 930.215972] rescued libkrb5.so.3.3 0x9
[ 930.215976] rescued libkrb5.so.3.3 0xa
[ 930.215979] rescued libkrb5.so.3.3 0xb
[ 930.215983] rescued libkrb5.so.3.3 0xc
[ 930.215986] rescued libkrb5.so.3.3 0xd
[ 930.215989] rescued libkrb5.so.3.3 0xe
[ 930.215993] rescued libkrb5.so.3.3 0xf
[ 930.215996] rescued libkrb5.so.3.3 0x10
[ 930.216000] rescued libkrb5.so.3.3 0x11
[ 930.216003] rescued libkrb5.so.3.3 0x12
[ 930.216006] rescued libkrb5.so.3.3 0x13
[ 930.216009] rescued libkrb5.so.3.3 0x14
[ 930.216013] rescued libkrb5.so.3.3 0x15
[ 930.216016] rescued libkrb5.so.3.3 0x16
[ 930.216019] rescued libkrb5.so.3.3 0x19
[ 930.216023] rescued libkrb5.so.3.3 0x88
[ 930.216028] rescued libgssapi_krb5.so.2.2 0x1
[ 930.216031] rescued libgssapi_krb5.so.2.2 0x2
[ 930.216035] rescued libgssapi_krb5.so.2.2 0x3
[ 930.216038] rescued libgssapi_krb5.so.2.2 0x4
[ 930.216041] rescued libgssapi_krb5.so.2.2 0x5
[ 930.216045] rescued libgssapi_krb5.so.2.2 0x6
[ 930.216048] rescued libgssapi_krb5.so.2.2 0x25
[ 930.216053] rescued libcrypt-2.9.so 0x6
[ 930.216057] rescued libz.so.1.2.3.3 0x1
[ 930.216060] rescued libz.so.1.2.3.3 0x2
[ 930.216064] rescued libz.so.1.2.3.3 0xe
[ 930.216068] rescued libutil-2.9.so 0x1
[ 930.216075] rescued libcrypto.so.0.9.8 0x5
[ 930.216080] rescued libcrypto.so.0.9.8 0x6
[ 930.216085] rescued libcrypto.so.0.9.8 0x7
[ 930.216368] rescued libcrypto.so.0.9.8 0x8
[ 930.216373] rescued libcrypto.so.0.9.8 0x9
[ 930.216378] rescued libcrypto.so.0.9.8 0xa
[ 930.216383] rescued libcrypto.so.0.9.8 0xb
[ 930.216386] rescued libcrypto.so.0.9.8 0xc
[ 930.216391] rescued libcrypto.so.0.9.8 0xd
[ 930.216395] rescued libcrypto.so.0.9.8 0xe
[ 930.216399] rescued libcrypto.so.0.9.8 0xf
[ 930.216404] rescued libcrypto.so.0.9.8 0x10
[ 930.216408] rescued libcrypto.so.0.9.8 0x11
[ 930.216412] rescued libcrypto.so.0.9.8 0x12
[ 930.216415] rescued libcrypto.so.0.9.8 0x13
[ 930.216418] rescued libcrypto.so.0.9.8 0x14
[ 930.216423] rescued libcrypto.so.0.9.8 0x15
[ 930.216426] rescued libcrypto.so.0.9.8 0x16
[ 930.216431] rescued libcrypto.so.0.9.8 0x17
[ 930.216434] rescued libcrypto.so.0.9.8 0x18
[ 930.216439] rescued libcrypto.so.0.9.8 0x19
[ 930.216443] rescued libcrypto.so.0.9.8 0x1a
[ 930.216447] rescued libcrypto.so.0.9.8 0x1b
[ 930.216450] rescued libcrypto.so.0.9.8 0x1c
[ 930.216455] rescued libcrypto.so.0.9.8 0x1d
[ 930.216460] rescued libcrypto.so.0.9.8 0x1e
[ 930.216464] rescued libcrypto.so.0.9.8 0x1f
[ 930.216469] rescued libcrypto.so.0.9.8 0x20
[ 930.216473] rescued libcrypto.so.0.9.8 0x21
[ 930.216476] rescued libcrypto.so.0.9.8 0x22
[ 930.216480] rescued libcrypto.so.0.9.8 0x23
[ 930.216483] rescued libcrypto.so.0.9.8 0x24
[ 930.216487] rescued libcrypto.so.0.9.8 0x25
[ 930.216491] rescued libcrypto.so.0.9.8 0x26
[ 930.216494] rescued libcrypto.so.0.9.8 0x27
[ 930.216503] rescued libcrypto.so.0.9.8 0x28
[ 930.216506] rescued libcrypto.so.0.9.8 0x29
[ 930.216510] rescued libcrypto.so.0.9.8 0x2a
[ 930.216513] rescued libcrypto.so.0.9.8 0x2b
[ 930.216516] rescued libcrypto.so.0.9.8 0x2c
[ 930.216520] rescued libcrypto.so.0.9.8 0x2d
[ 930.216525] rescued libcrypto.so.0.9.8 0x2e
[ 930.216530] rescued libcrypto.so.0.9.8 0x2f
[ 930.216535] rescued libcrypto.so.0.9.8 0x30
[ 930.216538] rescued libcrypto.so.0.9.8 0x31
[ 930.216541] rescued libcrypto.so.0.9.8 0x32
[ 930.216544] rescued libcrypto.so.0.9.8 0x33
[ 930.216548] rescued libcrypto.so.0.9.8 0x34
[ 930.216551] rescued libcrypto.so.0.9.8 0x35
[ 930.217015] rescued libcrypto.so.0.9.8 0x36
[ 930.217019] rescued libcrypto.so.0.9.8 0x37
[ 930.217022] rescued libcrypto.so.0.9.8 0x38
[ 930.217025] rescued libcrypto.so.0.9.8 0x39
[ 930.217029] rescued libcrypto.so.0.9.8 0x3a
[ 930.217032] rescued libcrypto.so.0.9.8 0x3b
[ 930.217036] rescued libcrypto.so.0.9.8 0x3c
[ 930.217039] rescued libcrypto.so.0.9.8 0x3d
[ 930.217043] rescued libcrypto.so.0.9.8 0x3e
[ 930.217046] rescued libcrypto.so.0.9.8 0x3f
[ 930.217049] rescued libcrypto.so.0.9.8 0x40
[ 930.217053] rescued libcrypto.so.0.9.8 0x41
[ 930.217056] rescued libcrypto.so.0.9.8 0x42
[ 930.217060] rescued libcrypto.so.0.9.8 0x43
[ 930.217063] rescued libcrypto.so.0.9.8 0x44
[ 930.217066] rescued libcrypto.so.0.9.8 0x45
[ 930.217070] rescued libcrypto.so.0.9.8 0x46
[ 930.217073] rescued libcrypto.so.0.9.8 0x47
[ 930.217076] rescued libcrypto.so.0.9.8 0x48
[ 930.217080] rescued libcrypto.so.0.9.8 0x49
[ 930.217083] rescued libcrypto.so.0.9.8 0x4a
[ 930.217086] rescued libcrypto.so.0.9.8 0x4b
[ 930.217090] rescued libcrypto.so.0.9.8 0x4c
[ 930.217093] rescued libcrypto.so.0.9.8 0x4d
[ 930.217096] rescued libcrypto.so.0.9.8 0x4e
[ 930.217099] rescued libcrypto.so.0.9.8 0x4f
[ 930.217103] rescued libcrypto.so.0.9.8 0x50
[ 930.217106] rescued libcrypto.so.0.9.8 0x51
[ 930.217109] rescued libcrypto.so.0.9.8 0x52
[ 930.217113] rescued libcrypto.so.0.9.8 0x53
[ 930.217116] rescued libcrypto.so.0.9.8 0x54
[ 930.217119] rescued libcrypto.so.0.9.8 0x55
[ 930.217384] rescued libcrypto.so.0.9.8 0x56
[ 930.217387] rescued libcrypto.so.0.9.8 0x57
[ 930.217391] rescued libcrypto.so.0.9.8 0x58
[ 930.217394] rescued libcrypto.so.0.9.8 0x59
[ 930.217397] rescued libcrypto.so.0.9.8 0x5a
[ 930.217401] rescued libcrypto.so.0.9.8 0x5b
[ 930.217404] rescued libcrypto.so.0.9.8 0x5c
[ 930.217408] rescued libcrypto.so.0.9.8 0x5d
[ 930.217411] rescued libcrypto.so.0.9.8 0x5e
[ 930.217414] rescued libcrypto.so.0.9.8 0x5f
[ 930.217418] rescued libcrypto.so.0.9.8 0x60
[ 930.217421] rescued libcrypto.so.0.9.8 0x61
[ 930.217425] rescued libcrypto.so.0.9.8 0x62
[ 930.217428] rescued libcrypto.so.0.9.8 0x63
[ 930.217431] rescued libcrypto.so.0.9.8 0x64
[ 930.217436] rescued libcrypto.so.0.9.8 0x65
[ 930.217440] rescued libcrypto.so.0.9.8 0x66
[ 930.217443] rescued libcrypto.so.0.9.8 0x67
[ 930.217448] rescued libcrypto.so.0.9.8 0x68
[ 930.217452] rescued libcrypto.so.0.9.8 0x69
[ 930.217456] rescued libcrypto.so.0.9.8 0x6a
[ 930.217460] rescued libcrypto.so.0.9.8 0x6b
[ 930.217464] rescued libcrypto.so.0.9.8 0x6c
[ 930.217468] rescued libcrypto.so.0.9.8 0x6d
[ 930.217472] rescued libcrypto.so.0.9.8 0x6e
[ 930.217477] rescued libcrypto.so.0.9.8 0x70
[ 930.217482] rescued libcrypto.so.0.9.8 0x71
[ 930.217487] rescued libcrypto.so.0.9.8 0x73
[ 930.217492] rescued libcrypto.so.0.9.8 0x74
[ 930.217497] rescued libcrypto.so.0.9.8 0x75
[ 930.217501] rescued libcrypto.so.0.9.8 0x77
[ 930.217505] rescued libcrypto.so.0.9.8 0x78
[ 930.217515] rescued libcrypto.so.0.9.8 0x79
[ 930.217519] rescued libcrypto.so.0.9.8 0x12e
[ 930.217523] rescued libcrypto.so.0.9.8 0x12f
[ 930.217528] rescued libpam.so.0.81.12 0x1
[ 930.217533] rescued libpam.so.0.81.12 0x2
[ 930.217537] rescued libpam.so.0.81.12 0x8
[ 930.217544] rescued sshd 0x1
[ 930.217549] rescued sshd 0x2
[ 930.217554] rescued sshd 0x3
[ 930.217559] rescued sshd 0x4
[ 930.217564] rescued sshd 0x5
[ 930.217567] rescued sshd 0x6
[ 930.217571] rescued sshd 0x7
[ 930.217575] rescued sshd 0x8
[ 930.865111] rescued libutil-2.9.so 0x0
[ 930.865493] rescued sshd 0x9
[ 930.865499] rescued sshd 0xa
[ 930.865503] rescued sshd 0xb
[ 930.865508] rescued sshd 0xc
[ 930.865512] rescued sshd 0x12
[ 930.865517] rescued sshd 0x48
[ 930.865523] rescued sshd 0x4d
[ 930.865527] rescued sshd 0x51
[ 930.865532] rescued sshd 0x53
[ 930.865535] rescued sshd 0x54
[ 930.865540] rescued sshd 0x55
[ 930.865545] rescued sshd 0x56
[ 930.865550] rescued sshd 0x57
[ 930.865554] rescued sshd 0x58
[ 930.865558] rescued sshd 0x5e
[ 930.865562] rescued sshd 0x67
[ 930.865571] rescued libnss_files-2.9.so 0x3
[ 930.865598] rescued libc-2.9.so 0xc5
[ 930.865607] rescued libc-2.9.so 0xe6
[ 930.865618] rescued ld-2.9.so 0x19
[ 930.865622] rescued rpc.mountd 0x0
[ 930.865625] rescued libnss_files-2.9.so 0x6
[ 930.865628] rescued libnss_files-2.9.so 0x9
[ 930.865631] rescued libc-2.9.so 0xe7
[ 930.865635] rescued libc-2.9.so 0xef
[ 930.865638] rescued libc-2.9.so 0xf0
[ 930.865641] rescued libc-2.9.so 0xf1
[ 930.865644] rescued libc-2.9.so 0xf2
[ 930.865648] rescued libc-2.9.so 0xf3
[ 930.865921] rescued libc-2.9.so 0xf4
[ 930.865925] rescued libc-2.9.so 0xf6
[ 930.865928] rescued libc-2.9.so 0xf7
[ 930.865932] rescued hald 0x0
[ 930.865936] rescued hald-runner 0x0
[ 930.865942] rescued libdbus-glib-1.so.2.1.0 0x0
[ 930.865952] rescued libpcre.so.3.12.1 0x0
[ 930.865961] rescued libglib-2.0.so.0.2000.1 0x2
[ 930.865967] rescued libglib-2.0.so.0.2000.1 0x4
[ 930.865973] rescued libglib-2.0.so.0.2000.1 0x5
[ 930.865978] rescued libglib-2.0.so.0.2000.1 0xc
[ 930.865984] rescued libglib-2.0.so.0.2000.1 0xd
[ 930.865990] rescued libglib-2.0.so.0.2000.1 0x10
[ 930.865995] rescued libglib-2.0.so.0.2000.1 0x11
[ 930.866001] rescued libglib-2.0.so.0.2000.1 0x12
[ 930.866006] rescued libglib-2.0.so.0.2000.1 0x13
[ 930.866012] rescued libglib-2.0.so.0.2000.1 0x14
[ 930.866018] rescued libglib-2.0.so.0.2000.1 0x6f
[ 930.866022] rescued libglib-2.0.so.0.2000.1 0x70
[ 930.866027] rescued libglib-2.0.so.0.2000.1 0xb3
[ 930.866033] rescued libhal.so.1.0.0 0x0
[ 930.866036] rescued libglib-2.0.so.0.2000.1 0x7f
[ 930.866042] rescued libglib-2.0.so.0.2000.1 0xe
[ 930.866057] rescued libdbus-1.so.3.4.0 0x1
[ 930.866063] rescued libdbus-1.so.3.4.0 0x2
[ 930.866069] rescued libdbus-1.so.3.4.0 0x3
[ 930.866074] rescued libdbus-1.so.3.4.0 0x4
[ 930.866080] rescued libdbus-1.so.3.4.0 0x5
[ 930.866085] rescued libdbus-1.so.3.4.0 0x6
[ 930.866091] rescued libdbus-1.so.3.4.0 0x7
[ 930.866097] rescued libdbus-1.so.3.4.0 0x11
[ 930.866103] rescued libdbus-1.so.3.4.0 0x24
[ 930.866108] rescued libdbus-1.so.3.4.0 0x25
[ 930.866114] rescued libdbus-1.so.3.4.0 0x27
[ 930.866120] rescued libdbus-1.so.3.4.0 0x28
[ 930.866125] rescued libdbus-1.so.3.4.0 0x29
[ 930.866131] rescued libdbus-1.so.3.4.0 0x2a
[ 931.125223] rescued libpam.so.0.81.12 0x0
[ 931.129347] rescued libkeyutils-1.2.so 0x1
[ 931.132618] rescued libdbus-1.so.3.4.0 0x2c
[ 931.132624] rescued libdbus-1.so.3.4.0 0x2d
[ 931.132630] rescued libdbus-1.so.3.4.0 0x32
[ 931.132636] rescued libglib-2.0.so.0.2000.1 0x42
[ 931.132644] rescued libgobject-2.0.so.0.2000.1 0x1
[ 931.132650] rescued libgobject-2.0.so.0.2000.1 0x2
[ 931.132653] rescued hald 0x41
[ 931.132655] rescued hald 0x42
[ 931.132661] rescued libc-2.9.so 0x66
[ 931.132665] rescued hald-addon-input 0x0
[ 931.132671] rescued libdbus-1.so.3.4.0 0x8
[ 931.132676] rescued libdbus-1.so.3.4.0 0x9
[ 931.132682] rescued libdbus-1.so.3.4.0 0xa
[ 931.132687] rescued libdbus-1.so.3.4.0 0xb
[ 931.132693] rescued libdbus-1.so.3.4.0 0xc
[ 931.132698] rescued libdbus-1.so.3.4.0 0xd
[ 931.132704] rescued libdbus-1.so.3.4.0 0xe
[ 931.132709] rescued libdbus-1.so.3.4.0 0xf
[ 931.132715] rescued libdbus-1.so.3.4.0 0x10
[ 931.132720] rescued libdbus-1.so.3.4.0 0x12
[ 931.132726] rescued libdbus-1.so.3.4.0 0x13
[ 931.132731] rescued libdbus-1.so.3.4.0 0x14
[ 931.132737] rescued libdbus-1.so.3.4.0 0x15
[ 931.132742] rescued libdbus-1.so.3.4.0 0x16
[ 931.132748] rescued libdbus-1.so.3.4.0 0x17
[ 931.132753] rescued libdbus-1.so.3.4.0 0x18
[ 931.132759] rescued libdbus-1.so.3.4.0 0x19
[ 931.132764] rescued libdbus-1.so.3.4.0 0x1a
[ 931.132770] rescued libdbus-1.so.3.4.0 0x1b
[ 931.132775] rescued libdbus-1.so.3.4.0 0x1c
[ 931.132781] rescued libdbus-1.so.3.4.0 0x1d
[ 931.132786] rescued libdbus-1.so.3.4.0 0x20
[ 931.133058] rescued libdbus-1.so.3.4.0 0x21
[ 931.133064] rescued libdbus-1.so.3.4.0 0x22
[ 931.133070] rescued libdbus-1.so.3.4.0 0x23
[ 931.133075] rescued libdbus-1.so.3.4.0 0x26
[ 931.133081] rescued libdbus-1.so.3.4.0 0x2b
[ 931.133086] rescued libdbus-1.so.3.4.0 0x2e
[ 931.133092] rescued libdbus-1.so.3.4.0 0x2f
[ 931.133097] rescued libdbus-1.so.3.4.0 0x31
[ 931.133102] rescued libhal.so.1.0.0 0x1
[ 931.133106] rescued libhal.so.1.0.0 0x2
[ 931.133111] rescued libhal.so.1.0.0 0x3
[ 931.133114] rescued libhal.so.1.0.0 0x4
[ 931.133119] rescued libhal.so.1.0.0 0xb
[ 931.133123] rescued libhal.so.1.0.0 0xc
[ 931.133127] rescued libhal.so.1.0.0 0xd
[ 931.133133] rescued libpcre.so.3.12.1 0x1
[ 931.133138] rescued libpcre.so.3.12.1 0x1d
[ 931.133144] rescued libglib-2.0.so.0.2000.1 0x3
[ 931.133150] rescued libglib-2.0.so.0.2000.1 0x6
[ 931.133155] rescued libglib-2.0.so.0.2000.1 0x7
[ 931.133161] rescued libglib-2.0.so.0.2000.1 0x9
[ 931.133167] rescued libglib-2.0.so.0.2000.1 0xa
[ 931.133172] rescued libglib-2.0.so.0.2000.1 0xb
[ 931.133178] rescued libglib-2.0.so.0.2000.1 0xf
[ 931.133183] rescued libglib-2.0.so.0.2000.1 0x15
[ 931.133189] rescued libglib-2.0.so.0.2000.1 0x71
[ 931.133193] rescued hald-addon-cpufreq 0x0
[ 931.133198] rescued libglib-2.0.so.0.2000.1 0x8
[ 931.133204] rescued libglib-2.0.so.0.2000.1 0x3a
[ 931.133210] rescued libglib-2.0.so.0.2000.1 0x56
[ 931.133215] rescued libglib-2.0.so.0.2000.1 0x57
[ 931.133221] rescued libglib-2.0.so.0.2000.1 0x58
[ 931.133234] rescued libglib-2.0.so.0.2000.1 0x59
[ 931.133239] rescued libglib-2.0.so.0.2000.1 0x6e
[ 931.133245] rescued libglib-2.0.so.0.2000.1 0x7a
[ 931.133251] rescued libglib-2.0.so.0.2000.1 0x7e
[ 931.133254] rescued libhal.so.1.0.0 0xa
[ 931.133260] rescued libglib-2.0.so.0.2000.1 0x5a
[ 931.133266] rescued libglib-2.0.so.0.2000.1 0x41
[ 931.133271] rescued libglib-2.0.so.0.2000.1 0x5b
[ 931.133727] rescued hald 0x34
[ 931.133732] rescued libglib-2.0.so.0.2000.1 0x5c
[ 931.133737] rescued libglib-2.0.so.0.2000.1 0x5d
[ 931.133742] rescued libglib-2.0.so.0.2000.1 0x6c
[ 931.133746] rescued pulseaudio 0x0
[ 931.133750] rescued libogg.so.0.5.3 0x0
[ 931.133755] rescued libFLAC.so.8.2.0 0x0
[ 931.133759] rescued libsndfile.so.1.0.17 0x0
[ 931.133764] rescued libsamplerate.so.0.1.4 0x0
[ 931.133768] rescued libltdl.so.3.1.6 0x0
[ 931.133773] rescued libcap.so.1.10 0x0
[ 931.134047] rescued gconf-helper 0x0
[ 931.134051] rescued libpulsecore.so.5.0.1 0x12
[ 931.134053] rescued libpulsecore.so.5.0.1 0x13
[ 931.134057] rescued libpulsecore.so.5.0.1 0x14
[ 931.134063] rescued libpthread-2.9.so 0xf
[ 931.134067] rescued gconfd-2 0x0
[ 931.134072] rescued libgmodule-2.0.so.0.2000.1 0x0
[ 931.134076] rescued libgthread-2.0.so.0.2000.1 0x0
[ 931.134080] rescued pulseaudio 0x1
[ 931.134082] rescued pulseaudio 0x2
[ 931.134085] rescued pulseaudio 0x3
[ 931.134087] rescued pulseaudio 0x7
[ 931.134090] rescued pulseaudio 0x8
[ 931.134092] rescued pulseaudio 0x9
[ 931.134095] rescued pulseaudio 0xc
[ 931.134099] rescued liboil-0.3.so.0.3.0 0x1
[ 931.134102] rescued liboil-0.3.so.0.3.0 0x2
[ 931.134105] rescued liboil-0.3.so.0.3.0 0x3
[ 931.134107] rescued liboil-0.3.so.0.3.0 0x4
[ 931.134110] rescued liboil-0.3.so.0.3.0 0x5
[ 931.134113] rescued liboil-0.3.so.0.3.0 0x6
[ 931.134115] rescued liboil-0.3.so.0.3.0 0x7
[ 931.134118] rescued liboil-0.3.so.0.3.0 0x8
[ 931.134121] rescued liboil-0.3.so.0.3.0 0x9
[ 931.134131] rescued liboil-0.3.so.0.3.0 0xa
[ 931.134134] rescued liboil-0.3.so.0.3.0 0xb
[ 931.134137] rescued liboil-0.3.so.0.3.0 0xc
[ 931.134139] rescued liboil-0.3.so.0.3.0 0xd
[ 931.134142] rescued liboil-0.3.so.0.3.0 0xe
[ 931.134145] rescued liboil-0.3.so.0.3.0 0xf
[ 931.134147] rescued liboil-0.3.so.0.3.0 0x10
[ 931.134150] rescued liboil-0.3.so.0.3.0 0x11
[ 931.134153] rescued liboil-0.3.so.0.3.0 0x12
[ 931.134156] rescued liboil-0.3.so.0.3.0 0x13
[ 931.134158] rescued liboil-0.3.so.0.3.0 0x14
[ 931.134161] rescued liboil-0.3.so.0.3.0 0x15
[ 931.134164] rescued liboil-0.3.so.0.3.0 0x16
[ 931.134166] rescued liboil-0.3.so.0.3.0 0x17
[ 931.644692] rescued libkrb5support.so.0.1 0x1
[ 931.645060] rescued liboil-0.3.so.0.3.0 0x18
[ 931.645063] rescued liboil-0.3.so.0.3.0 0x19
[ 931.645066] rescued liboil-0.3.so.0.3.0 0x1a
[ 931.645069] rescued liboil-0.3.so.0.3.0 0x1b
[ 931.645072] rescued liboil-0.3.so.0.3.0 0x1c
[ 931.645075] rescued liboil-0.3.so.0.3.0 0x1d
[ 931.645078] rescued liboil-0.3.so.0.3.0 0x1e
[ 931.645080] rescued liboil-0.3.so.0.3.0 0x1f
[ 931.645083] rescued liboil-0.3.so.0.3.0 0x20
[ 931.645086] rescued liboil-0.3.so.0.3.0 0x21
[ 931.645089] rescued liboil-0.3.so.0.3.0 0x22
[ 931.645091] rescued liboil-0.3.so.0.3.0 0x23
[ 931.645094] rescued liboil-0.3.so.0.3.0 0x24
[ 931.645097] rescued liboil-0.3.so.0.3.0 0x25
[ 931.645100] rescued liboil-0.3.so.0.3.0 0x26
[ 931.645102] rescued liboil-0.3.so.0.3.0 0x27
[ 931.645105] rescued liboil-0.3.so.0.3.0 0x28
[ 931.645108] rescued liboil-0.3.so.0.3.0 0x29
[ 931.645111] rescued liboil-0.3.so.0.3.0 0x2a
[ 931.645113] rescued liboil-0.3.so.0.3.0 0x2b
[ 931.645116] rescued liboil-0.3.so.0.3.0 0x2c
[ 931.645119] rescued liboil-0.3.so.0.3.0 0x2d
[ 931.645123] rescued liboil-0.3.so.0.3.0 0x5a
[ 931.645126] rescued libogg.so.0.5.3 0x1
[ 931.645129] rescued libogg.so.0.5.3 0x3
[ 931.645133] rescued libFLAC.so.8.2.0 0x1
[ 931.645135] rescued libFLAC.so.8.2.0 0x2
[ 931.645138] rescued libFLAC.so.8.2.0 0x3
[ 931.645141] rescued libFLAC.so.8.2.0 0x4
[ 931.645143] rescued libFLAC.so.8.2.0 0x5
[ 931.645146] rescued libFLAC.so.8.2.0 0x6
[ 931.645149] rescued libFLAC.so.8.2.0 0x7
[ 931.645421] rescued libFLAC.so.8.2.0 0x8
[ 931.645424] rescued libFLAC.so.8.2.0 0x9
[ 931.645426] rescued libFLAC.so.8.2.0 0xa
[ 931.645429] rescued libFLAC.so.8.2.0 0xb
[ 931.645432] rescued libFLAC.so.8.2.0 0xc
[ 931.645435] rescued libFLAC.so.8.2.0 0x43
[ 931.645439] rescued libsndfile.so.1.0.17 0x1
[ 931.645441] rescued libsndfile.so.1.0.17 0x2
[ 931.645444] rescued libsndfile.so.1.0.17 0x3
[ 931.645447] rescued libsndfile.so.1.0.17 0x4
[ 931.645450] rescued libsndfile.so.1.0.17 0x3e
[ 931.645453] rescued libsamplerate.so.0.1.4 0x2
[ 931.645457] rescued libltdl.so.3.1.6 0x1
[ 931.645460] rescued libltdl.so.3.1.6 0x5
[ 931.645464] rescued libpulsecore.so.5.0.1 0x1
[ 931.645468] rescued libpulsecore.so.5.0.1 0x2
[ 931.645471] rescued libpulsecore.so.5.0.1 0x3
[ 931.645474] rescued libpulsecore.so.5.0.1 0x4
[ 931.645478] rescued libpulsecore.so.5.0.1 0x5
[ 931.645481] rescued libpulsecore.so.5.0.1 0x6
[ 931.645484] rescued libpulsecore.so.5.0.1 0x7
[ 931.645487] rescued libpulsecore.so.5.0.1 0x8
[ 931.645490] rescued libpulsecore.so.5.0.1 0x9
[ 931.645494] rescued libpulsecore.so.5.0.1 0xa
[ 931.645497] rescued libpulsecore.so.5.0.1 0xb
[ 931.645500] rescued libpulsecore.so.5.0.1 0xc
[ 931.645502] rescued libpulsecore.so.5.0.1 0xd
[ 931.645505] rescued libpulsecore.so.5.0.1 0xe
[ 931.645508] rescued libpulsecore.so.5.0.1 0xf
[ 931.645511] rescued libpulsecore.so.5.0.1 0x10
[ 931.645514] rescued libpulsecore.so.5.0.1 0x11
[ 931.645517] rescued libpulsecore.so.5.0.1 0x1a
[ 931.645527] rescued libpulsecore.so.5.0.1 0x29
[ 931.645530] rescued libpulsecore.so.5.0.1 0x5f
[ 931.645534] rescued libcap.so.1.10 0x2
[ 931.645548] rescued libc-2.9.so 0xb9
[ 931.645554] rescued libc-2.9.so 0xba
[ 931.645560] rescued libc-2.9.so 0xbb
[ 931.645568] rescued libc-2.9.so 0xec
[ 931.645574] rescued libc-2.9.so 0xee
[ 931.645580] rescued libc-2.9.so 0xeb
[ 931.645591] rescued getty 0x0
[ 931.645596] rescued sshd 0xd
[ 931.645601] rescued sshd 0xe
[ 931.967139] rescued libkrb5support.so.0.1 0x5
[ 931.971614] rescued libc-2.9.so 0xbc
[ 931.972606] rescued sshd 0x40
[ 931.972611] rescued sshd 0x41
[ 931.972615] rescued sshd 0x63
[ 931.972624] rescued libcrypto.so.0.9.8 0x6f
[ 931.972629] rescued libcrypto.so.0.9.8 0x72
[ 931.972634] rescued libcrypto.so.0.9.8 0x76
[ 931.972639] rescued libcrypto.so.0.9.8 0x7f
[ 931.972644] rescued libcrypto.so.0.9.8 0x80
[ 931.972649] rescued libcrypto.so.0.9.8 0x82
[ 931.972654] rescued libcrypto.so.0.9.8 0x83
[ 931.972659] rescued libcrypto.so.0.9.8 0x84
[ 931.972924] rescued libcrypto.so.0.9.8 0x85
[ 931.972928] rescued libcrypto.so.0.9.8 0x9e
[ 931.972931] rescued libcrypto.so.0.9.8 0x9f
[ 931.972934] rescued libcrypto.so.0.9.8 0xa0
[ 931.972939] rescued libcrypto.so.0.9.8 0xa1
[ 931.972943] rescued libcrypto.so.0.9.8 0xa2
[ 931.972947] rescued libcrypto.so.0.9.8 0xa3
[ 931.972950] rescued libcrypto.so.0.9.8 0xa4
[ 931.972954] rescued libcrypto.so.0.9.8 0xa6
[ 931.972958] rescued libcrypto.so.0.9.8 0xa7
[ 931.972962] rescued libcrypto.so.0.9.8 0xa8
[ 931.972965] rescued libcrypto.so.0.9.8 0xa9
[ 931.972969] rescued libcrypto.so.0.9.8 0xaa
[ 931.972973] rescued libcrypto.so.0.9.8 0xab
[ 931.972976] rescued libcrypto.so.0.9.8 0xac
[ 931.972981] rescued libcrypto.so.0.9.8 0xbc
[ 931.972986] rescued libcrypto.so.0.9.8 0xbf
[ 931.972990] rescued libcrypto.so.0.9.8 0xc3
[ 931.972995] rescued libcrypto.so.0.9.8 0xc9
[ 931.972999] rescued libcrypto.so.0.9.8 0xca
[ 931.973004] rescued libcrypto.so.0.9.8 0xcb
[ 931.973009] rescued libcrypto.so.0.9.8 0xd6
[ 931.973013] rescued libcrypto.so.0.9.8 0xd7
[ 931.973018] rescued libcrypto.so.0.9.8 0xd8
[ 931.973023] rescued libcrypto.so.0.9.8 0xd9
[ 931.973028] rescued libcrypto.so.0.9.8 0xdd
[ 931.973033] rescued libcrypto.so.0.9.8 0xde
[ 931.973037] rescued libcrypto.so.0.9.8 0xdf
[ 931.973042] rescued libcrypto.so.0.9.8 0xe0
[ 931.973047] rescued libcrypto.so.0.9.8 0xe3
[ 931.973052] rescued libcrypto.so.0.9.8 0x133
[ 931.973056] rescued libcrypto.so.0.9.8 0x13c
[ 931.973068] rescued libcrypto.so.0.9.8 0x147
[ 931.973072] rescued libcrypto.so.0.9.8 0x148
[ 931.973077] rescued sshd 0xf
[ 931.973080] rescued sshd 0x11
[ 931.973084] rescued sshd 0x17
[ 931.973088] rescued sshd 0x2f
[ 931.973093] rescued sshd 0x34
[ 931.973097] rescued sshd 0x35
[ 931.973102] rescued sshd 0x36
[ 931.973105] rescued sshd 0x3f
[ 931.973110] rescued sshd 0x49
[ 931.973113] rescued sshd 0x4a
[ 931.973118] rescued sshd 0x5c
[ 931.973121] rescued sshd 0x60
[ 931.973570] rescued sshd 0x61
[ 931.973575] rescued sshd 0x65
[ 931.973581] rescued pam_env.so 0x0
[ 931.973586] rescued pam_unix.so 0x0
[ 931.973591] rescued pam_nologin.so 0x0
[ 931.973597] rescued pam_motd.so 0x0
[ 931.973601] rescued pam_mail.so 0x0
[ 931.973606] rescued pam_limits.so 0x0
[ 931.973612] rescued libcrypto.so.0.9.8 0x9a
[ 931.973615] rescued libcrypto.so.0.9.8 0xc2
[ 931.973618] rescued libcrypto.so.0.9.8 0xc5
[ 931.973622] rescued libcrypto.so.0.9.8 0xc6
[ 931.973625] rescued libcrypto.so.0.9.8 0xc8
[ 931.973628] rescued libcrypto.so.0.9.8 0xcc
[ 931.973631] rescued libcrypto.so.0.9.8 0xcd
[ 931.973635] rescued libcrypto.so.0.9.8 0xce
[ 931.973638] rescued libcrypto.so.0.9.8 0xda
[ 931.973641] rescued libcrypto.so.0.9.8 0xdb
[ 931.973644] rescued libcrypto.so.0.9.8 0xdc
[ 931.973648] rescued libcrypto.so.0.9.8 0xe1
[ 931.973651] rescued libcrypto.so.0.9.8 0xe5
[ 931.973654] rescued libcrypto.so.0.9.8 0xe6
[ 931.973657] rescued libcrypto.so.0.9.8 0xee
[ 931.973660] rescued libcrypto.so.0.9.8 0xef
[ 931.973664] rescued libcrypto.so.0.9.8 0xf3
[ 931.973667] rescued libcrypto.so.0.9.8 0xf5
[ 931.973670] rescued libcrypto.so.0.9.8 0xf6
[ 931.973673] rescued libcrypto.so.0.9.8 0xf7
[ 931.973677] rescued libcrypto.so.0.9.8 0xfb
[ 931.973680] rescued libcrypto.so.0.9.8 0xfe
[ 931.973683] rescued libcrypto.so.0.9.8 0xff
[ 931.973687] rescued libcrypto.so.0.9.8 0x100
[ 931.973948] rescued libcrypto.so.0.9.8 0x102
[ 931.973952] rescued libcrypto.so.0.9.8 0x121
[ 931.973955] rescued libcrypto.so.0.9.8 0x130
[ 931.973959] rescued libcrypto.so.0.9.8 0x131
[ 931.973962] rescued libcrypto.so.0.9.8 0x132
[ 931.973967] rescued libcrypto.so.0.9.8 0x137
[ 931.973970] rescued libcrypto.so.0.9.8 0x149
[ 931.973974] rescued libcrypto.so.0.9.8 0x14a
[ 931.973985] rescued ld-2.9.so 0x12
[ 931.973988] rescued sshd 0x13
[ 931.973993] rescued sshd 0x14
[ 931.973996] rescued sshd 0x18
[ 931.974000] rescued sshd 0x19
[ 931.974004] rescued sshd 0x1b
[ 931.974008] rescued sshd 0x20
[ 931.974012] rescued sshd 0x26
[ 931.974016] rescued sshd 0x27
[ 931.974020] rescued sshd 0x33
[ 931.974024] rescued sshd 0x3e
[ 931.974028] rescued sshd 0x43
[ 931.974032] rescued sshd 0x45
[ 931.974036] rescued sshd 0x4c
[ 931.974041] rescued sshd 0x4e
[ 931.974045] rescued sshd 0x4f
[ 931.974048] rescued sshd 0x50
[ 931.974052] rescued sshd 0x59
[ 931.974056] rescued sshd 0x5a
[ 931.974059] rescued sshd 0x5d
[ 931.974064] rescued sshd 0x5f
[ 931.974069] rescued sshd 0x66
[ 931.974080] rescued libcrypto.so.0.9.8 0x88
[ 931.974084] rescued libcrypto.so.0.9.8 0x89
[ 931.974087] rescued libcrypto.so.0.9.8 0x97
[ 931.974091] rescued libcrypto.so.0.9.8 0x98
[ 932.448253] rescued libc-2.9.so 0x117
[ 932.448616] rescued libcrypto.so.0.9.8 0x99
[ 932.448620] rescued libcrypto.so.0.9.8 0x135
[ 932.448623] rescued libcrypto.so.0.9.8 0x136
[ 932.448627] rescued sshd 0x1e
[ 932.448630] rescued sshd 0x28
[ 932.448634] rescued sshd 0x44
[ 932.448637] rescued sshd 0x4b
[ 932.448642] rescued sshd 0x5b
[ 932.448651] rescued libc-2.9.so 0x20
[ 932.448657] rescued libc-2.9.so 0x21
[ 932.448663] rescued libc-2.9.so 0x22
[ 932.448670] rescued libc-2.9.so 0x27
[ 932.448677] rescued libc-2.9.so 0x30
[ 932.448690] rescued libc-2.9.so 0x11c
[ 932.448711] rescued zsh4 0x0
[ 932.448715] rescued libpam.so.0.81.12 0x3
[ 932.448719] rescued sshd 0x2d
[ 932.448723] rescued sshd 0x46
[ 932.448995] rescued libcap.so.2.11 0x0
[ 932.449032] rescued zsh4 0x2
[ 932.449035] rescued zsh4 0x4
[ 932.449039] rescued zsh4 0xa
[ 932.449042] rescued zsh4 0xd
[ 932.449045] rescued zsh4 0xe
[ 932.449048] rescued zsh4 0xf
[ 932.449058] rescued zsh4 0x25
[ 932.449062] rescued zsh4 0x27
[ 932.449065] rescued zsh4 0x28
[ 932.449068] rescued zsh4 0x29
[ 932.449071] rescued zsh4 0x2a
[ 932.449074] rescued zsh4 0x2b
[ 932.449077] rescued zsh4 0x2c
[ 932.449080] rescued zsh4 0x2d
[ 932.449084] rescued zsh4 0x2f
[ 932.449087] rescued zsh4 0x35
[ 932.449090] rescued zsh4 0x41
[ 932.449093] rescued zsh4 0x43
[ 932.449096] rescued zsh4 0x44
[ 932.449099] rescued zsh4 0x4c
[ 932.580004] rescued libcom_err.so.2.1 0x1
[ 932.584145] rescued libk5crypto.so.3.1 0x1
[ 932.584561] rescued zsh4 0x50
[ 932.584565] rescued zsh4 0x51
[ 932.584568] rescued zsh4 0x52
[ 932.584571] rescued zsh4 0x5b
[ 932.584575] rescued zsh4 0x5d
[ 932.584578] rescued zsh4 0x60
[ 932.584581] rescued zsh4 0x61
[ 932.584584] rescued zsh4 0x64
[ 932.584587] rescued zsh4 0x65
[ 932.584590] rescued zsh4 0x66
[ 932.584593] rescued zsh4 0x70
[ 932.584596] rescued zsh4 0x74
[ 932.584599] rescued zsh4 0x75
[ 932.584602] rescued zsh4 0x76
[ 932.584605] rescued zsh4 0x77
[ 932.584608] rescued zsh4 0x81
[ 932.584611] rescued zsh4 0x82
[ 932.584614] rescued zsh4 0x83
[ 932.584617] rescued zsh4 0x86
[ 932.584620] rescued zsh4 0x8d
[ 932.584623] rescued zsh4 0x91
[ 932.584627] rescued zsh4 0x92
[ 932.584632] rescued libncursesw.so.5.7 0x1
[ 932.584644] rescued libc-2.9.so 0x8d
[ 932.584647] rescued zsh4 0x3
[ 932.584650] rescued zsh4 0x88
[ 932.585372] rescued zsh4 0x33
[ 932.585376] rescued zsh4 0x78
[ 932.585382] rescued zsh4 0xc
[ 932.585386] rescued terminfo.so 0x0
[ 932.585389] rescued zsh4 0x26
[ 932.585393] rescued zsh4 0x31
[ 932.585396] rescued zsh4 0x4b
[ 932.585399] rescued zsh4 0x79
[ 932.585402] rescued zsh4 0x7e
[ 932.585405] rescued zsh4 0x85
[ 932.585408] rescued zsh4 0x8c
[ 932.585412] rescued zsh4 0x23
[ 932.585415] rescued zsh4 0x46
[ 932.585418] rescued zsh4 0x14
[ 932.585422] rescued zsh4 0x15
[ 932.585425] rescued zsh4 0x16
[ 932.585428] rescued zsh4 0x17
[ 932.585431] rescued zsh4 0x5e
[ 932.585434] rescued zsh4 0x7a
[ 932.585698] rescued zsh4 0x7b
[ 932.585708] rescued libc-2.9.so 0x8b
[ 932.585717] rescued libc-2.9.so 0x120
[ 932.585723] rescued libpthread-2.9.so 0x7
[ 932.585736] rescued zsh4 0x37
[ 932.585740] rescued zsh4 0x5c
[ 932.585743] rescued zsh4 0x7c
[ 932.585747] rescued libc-2.9.so 0x3f
[ 932.754456] rescued libk5crypto.so.3.1 0x2
[ 932.754832] rescued libc-2.9.so 0x23
[ 932.754848] rescued libc-2.9.so 0x101
[ 932.755122] rescued libc-2.9.so 0x102
[ 932.755135] rescued libc-2.9.so 0xe5
[ 932.755146] rescued libc-2.9.so 0xa1
[ 932.755154] rescued udevd 0x8
[ 932.755157] rescued udevd 0x9
[ 932.755160] rescued udevd 0x13
[ 932.755212] rescued libwrap.so.0.7.6 0x7
[ 932.755215] rescued libc-2.9.so 0xed
[ 932.794167] rescued libk5crypto.so.3.1 0x3
[ 932.798372] rescued libk5crypto.so.3.1 0x4
[ 932.802589] rescued libk5crypto.so.3.1 0x5
[ 932.806787] rescued libk5crypto.so.3.1 0x19
[ 932.811122] rescued libkrb5.so.3.3 0x1
[ 932.815223] rescued libkrb5.so.3.3 0x2
[ 932.819126] rescued libkrb5.so.3.3 0x3
[ 932.819740] rescued sshd 0x64
[ 932.819746] rescued libpam.so.0.81.12 0x4
[ 932.819752] rescued libpam.so.0.81.12 0x5
[ 932.819763] rescued libpam.so.0.81.12 0x6
[ 932.819766] rescued libpam.so.0.81.12 0x7
[ 932.819771] rescued libpam.so.0.81.12 0x9
[ 932.819776] rescued sshd 0x10
[ 932.819780] rescued sshd 0x15
[ 932.819783] rescued sshd 0x16
[ 932.819787] rescued sshd 0x1a
[ 932.819790] rescued sshd 0x1c
[ 932.819793] rescued sshd 0x1d
[ 932.819796] rescued sshd 0x21
[ 932.819800] rescued sshd 0x22
[ 932.819804] rescued sshd 0x23
[ 932.819807] rescued sshd 0x24
[ 932.819810] rescued sshd 0x25
[ 932.880415] rescued libkrb5.so.3.3 0x4
[ 932.884264] rescued libkrb5.so.3.3 0x5
[ 932.888127] rescued libkrb5.so.3.3 0x6
[ 932.892060] rescued libkrb5.so.3.3 0x7
[ 932.892504] rescued sshd 0x2c
[ 932.892510] rescued sshd 0x2e
[ 932.892514] rescued sshd 0x37
[ 932.892517] rescued sshd 0x38
[ 932.892521] rescued sshd 0x39
[ 932.892524] rescued sshd 0x3a
[ 932.892528] rescued sshd 0x3b
[ 932.892531] rescued sshd 0x3c
[ 932.892534] rescued sshd 0x3d
[ 932.892538] rescued sshd 0x42
[ 932.892541] rescued sshd 0x47
[ 932.892546] rescued libcrypto.so.0.9.8 0xa5
[ 932.892550] rescued libcrypto.so.0.9.8 0xbd
[ 932.892553] rescued libcrypto.so.0.9.8 0xbe
[ 932.892557] rescued libcrypto.so.0.9.8 0xc0
[ 932.892560] rescued libcrypto.so.0.9.8 0xed
[ 932.892564] rescued libcrypto.so.0.9.8 0xf4
[ 932.892850] rescued hald 0x30
[ 932.892853] rescued hald 0x31
[ 932.892857] rescued hald 0x35
[ 932.892859] rescued hald 0x36
[ 932.892871] rescued hald 0x43
[ 932.892875] rescued hald 0x45
[ 932.892877] rescued hald 0x46
[ 932.892904] rescued libdbus-glib-1.so.2.1.0 0x8
[ 932.892909] rescued libdbus-glib-1.so.2.1.0 0x9
[ 932.892920] rescued libdbus-glib-1.so.2.1.0 0xa
[ 932.892938] rescued libgobject-2.0.so.0.2000.1 0x27
[ 932.892949] rescued libgobject-2.0.so.0.2000.1 0x2b
[ 932.893495] rescued libgobject-2.0.so.0.2000.1 0x9
[ 932.893510] rescued libgobject-2.0.so.0.2000.1 0xe
[ 932.893514] rescued libgobject-2.0.so.0.2000.1 0xf
[ 932.893837] rescued libglib-2.0.so.0.2000.1 0x16
[ 932.893848] rescued libglib-2.0.so.0.2000.1 0x24
[ 932.893861] rescued libglib-2.0.so.0.2000.1 0x2b
[ 932.893866] rescued libglib-2.0.so.0.2000.1 0x2c
[ 932.893872] rescued libglib-2.0.so.0.2000.1 0x2e
[ 932.893884] rescued libglib-2.0.so.0.2000.1 0x37
[ 932.893889] rescued libglib-2.0.so.0.2000.1 0x38
[ 932.893901] rescued libglib-2.0.so.0.2000.1 0x39
[ 932.893907] rescued libglib-2.0.so.0.2000.1 0x3b
[ 932.893912] rescued hald 0x6
[ 932.893915] rescued hald 0x7
[ 932.893918] rescued hald 0x8
[ 932.893920] rescued hald 0x9
[ 932.893924] rescued hald 0xb
[ 932.893926] rescued hald 0xc
[ 932.893929] rescued hald 0xd
[ 933.077526] rescued hald 0x10
[ 933.077579] rescued libglib-2.0.so.0.2000.1 0x62
[ 933.077625] rescued libglib-2.0.so.0.2000.1 0x3c
[ 933.077631] rescued libglib-2.0.so.0.2000.1 0x3d
[ 933.077641] rescued libglib-2.0.so.0.2000.1 0x44
[ 933.077645] rescued libglib-2.0.so.0.2000.1 0x45
[ 933.077652] rescued libglib-2.0.so.0.2000.1 0x4a
[ 933.109022] rescued hald 0x11
[ 933.112194] rescued hald 0x12
[ 933.115274] rescued hald 0x13
[ 933.118380] rescued hald 0x18
[ 933.121476] rescued hald 0x23
[ 933.124563] rescued hald 0x24
[ 933.128902] rescued libpulsecore.so.5.0.1 0x61
[ 933.133460] rescued libpulsecore.so.5.0.1 0x64
[ 933.136587] rescued libpulsecore.so.5.0.1 0x34
[ 933.137423] rescued libpulsecore.so.5.0.1 0x55
[ 933.137426] rescued libpulsecore.so.5.0.1 0x56
[ 933.151551] rescued libpulsecore.so.5.0.1 0x68
[ 933.152274] rescued libgconf-2.so.4.1.5 0x10
[ 933.152278] rescued libgconf-2.so.4.1.5 0x11
[ 933.152283] rescued libgconf-2.so.4.1.5 0x14
[ 933.152289] rescued libgconf-2.so.4.1.5 0x18
[ 933.152293] rescued libgconf-2.so.4.1.5 0x19
[ 933.178388] rescued libpulsecore.so.5.0.1 0x15
[ 933.182934] rescued libpulsecore.so.5.0.1 0x16
[ 933.184590] rescued libORBit-2.so.0.1.0 0x53
[ 933.184930] rescued libORBit-2.so.0.1.0 0x27
[ 933.184934] rescued libORBit-2.so.0.1.0 0x28
[ 933.184938] rescued libORBit-2.so.0.1.0 0x29
[ 933.184949] rescued libORBit-2.so.0.1.0 0x2f
[ 933.184957] rescued libORBit-2.so.0.1.0 0x33
[ 933.184961] rescued libORBit-2.so.0.1.0 0x34
[ 933.184964] rescued libORBit-2.so.0.1.0 0x35
[ 933.184975] rescued libgthread-2.0.so.0.2000.1 0x1
[ 933.184979] rescued libgthread-2.0.so.0.2000.1 0x2
[ 933.185004] rescued libORBit-2.so.0.1.0 0x49
[ 933.185007] rescued libORBit-2.so.0.1.0 0x4a
[ 933.185011] rescued libORBit-2.so.0.1.0 0x4b
[ 933.185015] rescued libORBit-2.so.0.1.0 0x4d
[ 933.185471] rescued gconfd-2 0x4
[ 933.185475] rescued gconfd-2 0x6
[ 933.185479] rescued gconfd-2 0x8
[ 933.185482] rescued gconfd-2 0x9
[ 933.185484] rescued gconfd-2 0xa
[ 933.185494] rescued libgconfbackend-xml.so 0x4
[ 933.185810] rescued pam_env.so 0x1
[ 933.185815] rescued pam_env.so 0x2
[ 933.185820] rescued pam_unix.so 0x1
[ 933.185825] rescued pam_unix.so 0x2
[ 933.185830] rescued pam_unix.so 0x3
[ 933.185833] rescued pam_unix.so 0x4
[ 933.185837] rescued pam_unix.so 0x5
[ 933.185840] rescued pam_unix.so 0x6
[ 933.185843] rescued pam_unix.so 0x7
[ 933.185848] rescued pam_unix.so 0xa
[ 933.185851] rescued pam_unix.so 0xb
[ 933.185856] rescued pam_mail.so 0x1
[ 933.185860] rescued pam_limits.so 0x1
[ 933.185864] rescued pam_limits.so 0x2
[ 933.185876] rescued zsh4 0x87
[ 933.185879] rescued zsh4 0x89
[ 933.185882] rescued zsh4 0x8a
[ 933.185885] rescued zsh4 0x8b
[ 933.185888] rescued zsh4 0x8e
[ 933.185891] rescued zsh4 0x8f
[ 933.185895] rescued zsh4 0x90
[ 933.185898] rescued zsh4 0x93
[ 933.185901] rescued zsh4 0x94
[ 933.185904] rescued zsh4 0x95
[ 933.185914] rescued zsh4 0x5
[ 933.185917] rescued zsh4 0x6
[ 933.185920] rescued zsh4 0x7
[ 933.185923] rescued zsh4 0x8
[ 933.185926] rescued zsh4 0x9
[ 933.185929] rescued zsh4 0xb
[ 933.185932] rescued zsh4 0x10
[ 933.185935] rescued zsh4 0x11
[ 933.185938] rescued zsh4 0x12
[ 933.185941] rescued zsh4 0x13
[ 933.185944] rescued zsh4 0x18
[ 933.185947] rescued zsh4 0x1a
[ 933.185950] rescued zsh4 0x1b
[ 933.185954] rescued zsh4 0x1c
[ 933.392565] rescued libpulsecore.so.5.0.1 0x18
[ 933.392918] rescued zsh4 0x1d
[ 933.392922] rescued zsh4 0x1e
[ 933.392927] rescued zsh4 0x22
[ 933.392930] rescued zsh4 0x24
[ 933.392933] rescued zsh4 0x2e
[ 933.392936] rescued zsh4 0x30
[ 933.392939] rescued zsh4 0x32
[ 933.392942] rescued zsh4 0x34
[ 933.392945] rescued zsh4 0x36
[ 933.392948] rescued zsh4 0x38
[ 933.392951] rescued zsh4 0x39
[ 933.392954] rescued zsh4 0x3a
[ 933.392957] rescued zsh4 0x3b
[ 933.392961] rescued libcap.so.2.11 0x1
[ 933.392965] rescued libcap.so.2.11 0x2
[ 933.392970] rescued libncursesw.so.5.7 0x2
[ 933.392973] rescued libncursesw.so.5.7 0x3
[ 933.392976] rescued libncursesw.so.5.7 0x32
[ 933.392980] rescued libncursesw.so.5.7 0x33
[ 933.392983] rescued libncursesw.so.5.7 0x34
[ 933.392986] rescued libncursesw.so.5.7 0x35
[ 933.392990] rescued libncursesw.so.5.7 0x36
[ 933.392993] rescued libncursesw.so.5.7 0x37
[ 933.392996] rescued libncursesw.so.5.7 0x38
[ 933.392999] rescued libncursesw.so.5.7 0x39
[ 933.393002] rescued libncursesw.so.5.7 0x3a
[ 933.393006] rescued libncursesw.so.5.7 0x3b
[ 933.393009] rescued libncursesw.so.5.7 0x3c
[ 933.393012] rescued libncursesw.so.5.7 0x3d
[ 933.393015] rescued libncursesw.so.5.7 0x3e
[ 933.393286] rescued libncursesw.so.5.7 0x3f
[ 933.393289] rescued libncursesw.so.5.7 0x4
[ 933.393293] rescued libncursesw.so.5.7 0x5
[ 933.393296] rescued libncursesw.so.5.7 0x6
[ 933.393299] rescued libncursesw.so.5.7 0x7
[ 933.393303] rescued libncursesw.so.5.7 0x8
[ 933.393306] rescued libncursesw.so.5.7 0x9
[ 933.393309] rescued libncursesw.so.5.7 0xa
[ 933.393312] rescued libncursesw.so.5.7 0xb
[ 933.393315] rescued libncursesw.so.5.7 0xc
[ 933.393319] rescued libncursesw.so.5.7 0xd
[ 933.393322] rescued libncursesw.so.5.7 0xe
[ 933.393325] rescued libncursesw.so.5.7 0xf
[ 933.393329] rescued libncursesw.so.5.7 0x10
[ 933.393332] rescued libncursesw.so.5.7 0x11
[ 933.393335] rescued libncursesw.so.5.7 0x12
[ 933.393338] rescued libncursesw.so.5.7 0x13
[ 933.393341] rescued libncursesw.so.5.7 0x14
[ 933.393345] rescued zsh4 0x3c
[ 933.393348] rescued zsh4 0x3e
[ 933.393351] rescued zsh4 0x3f
[ 933.393354] rescued zsh4 0x40
[ 933.393357] rescued zsh4 0x42
[ 933.393360] rescued zsh4 0x45
[ 933.393363] rescued zsh4 0x48
[ 933.393366] rescued zsh4 0x49
[ 933.393369] rescued zsh4 0x4a
[ 933.393372] rescued zsh4 0x4d
[ 933.393375] rescued zsh4 0x4e
[ 933.393378] rescued zsh4 0x4f
[ 933.393381] rescued zsh4 0x53
[ 933.393384] rescued zsh4 0x54
[ 933.393394] rescued zsh4 0x55
[ 933.393397] rescued zsh4 0x56
[ 933.393400] rescued zsh4 0x57
[ 933.393403] rescued zsh4 0x58
[ 933.393406] rescued zsh4 0x59
[ 933.393409] rescued zsh4 0x5a
[ 933.393412] rescued zsh4 0x67
[ 933.393415] rescued zsh4 0x68
[ 933.393418] rescued zsh4 0x69
[ 933.393421] rescued zsh4 0x6a
[ 933.393424] rescued zsh4 0x6b
[ 933.393427] rescued zsh4 0x6c
[ 933.393430] rescued zsh4 0x6d
[ 933.393433] rescued zsh4 0x6e
[ 933.668221] rescued libpulsecore.so.5.0.1 0x19
[ 933.672781] rescued libpulsecore.so.5.0.1 0x24
[ 933.676581] rescued zsh4 0x6f
[ 933.676585] rescued zsh4 0x71
[ 933.676588] rescued zsh4 0x72
[ 933.676591] rescued zsh4 0x73
[ 933.676594] rescued zsh4 0x7d
[ 933.676597] rescued zsh4 0x7f
[ 933.676600] rescued zsh4 0x80
[ 933.676603] rescued zsh4 0x84
[ 933.676606] rescued zsh4 0x5f
[ 933.676609] rescued zsh4 0x62
[ 933.676612] rescued zsh4 0x63
[ 933.676618] rescued terminfo.so 0x1
[ 933.676622] rescued zle.so 0x1
[ 933.676625] rescued zle.so 0x2
[ 933.676628] rescued zle.so 0x3
[ 933.676632] rescued zle.so 0x27
[ 933.676635] rescued zle.so 0x28
[ 933.676638] rescued zle.so 0x29
[ 933.676641] rescued zle.so 0x2a
[ 933.676644] rescued zle.so 0x2b
[ 933.676647] rescued zle.so 0x2c
[ 933.676650] rescued zle.so 0x2d
[ 933.676653] rescued zle.so 0x2e
[ 933.676656] rescued zle.so 0x2f
[ 933.676659] rescued zle.so 0x30
[ 933.676662] rescued zle.so 0x31
[ 933.676665] rescued zle.so 0x32
[ 933.676668] rescued zle.so 0x33
[ 933.676671] rescued zle.so 0x34
[ 933.676674] rescued zle.so 0x36
[ 933.676677] rescued zle.so 0x38
[ 933.676950] rescued zle.so 0x39
[ 933.676953] rescued zle.so 0x4
[ 933.676956] rescued zle.so 0x5
[ 933.676960] rescued zle.so 0x6
[ 933.676963] rescued zle.so 0x7
[ 933.676966] rescued zle.so 0x8
[ 933.676969] rescued zle.so 0x9
[ 933.676972] rescued zle.so 0xa
[ 933.676975] rescued zle.so 0xb
[ 933.676978] rescued zle.so 0xc
[ 933.676981] rescued zle.so 0xd
[ 933.676983] rescued zle.so 0xe
[ 933.676987] rescued zle.so 0xf
[ 933.676990] rescued zle.so 0x10
[ 933.676993] rescued zle.so 0x11
[ 933.676996] rescued zle.so 0x12
[ 933.676999] rescued zle.so 0x13
[ 933.677002] rescued zle.so 0x14
[ 933.677005] rescued zle.so 0x15
[ 933.677008] rescued zle.so 0x16
[ 933.677011] rescued zle.so 0x17
[ 933.677014] rescued zle.so 0x1a
[ 933.677017] rescued zle.so 0x1b
[ 933.677021] rescued zle.so 0x1c
[ 933.677024] rescued zle.so 0x1d
[ 933.677027] rescued zle.so 0x1e
[ 933.677030] rescued zle.so 0x20
[ 933.677033] rescued zle.so 0x21
[ 933.677036] rescued zle.so 0x23
[ 933.677039] rescued zle.so 0x24
[ 933.677042] rescued zle.so 0x25
[ 933.677046] rescued zle.so 0x26
[ 933.677056] rescued complete.so 0x0
[ 933.677060] rescued complete.so 0x1
[ 933.677063] rescued complete.so 0x2
[ 933.677066] rescued complete.so 0x3
[ 933.677069] rescued complete.so 0x5
[ 933.677072] rescued complete.so 0x6
[ 933.677075] rescued complete.so 0x7
[ 933.677078] rescued complete.so 0x8
[ 933.677081] rescued complete.so 0x9
[ 933.677084] rescued complete.so 0xa
[ 933.677087] rescued complete.so 0xb
[ 933.677090] rescued complete.so 0xc
[ 933.677093] rescued complete.so 0xd
[ 933.677096] rescued complete.so 0xe
[ 933.692562] rescued complete.so 0xf
[ 933.692567] rescued complete.so 0x10
[ 933.692570] rescued complete.so 0x11
[ 933.692574] rescued complete.so 0x12
[ 933.692577] rescued complete.so 0x13
[ 933.692580] rescued complete.so 0x14
[ 933.692583] rescued complete.so 0x15
[ 933.692586] rescued complete.so 0x16
[ 933.692589] rescued complete.so 0x17
[ 933.692593] rescued complete.so 0x18
[ 933.692596] rescued complete.so 0x19
[ 933.692599] rescued complete.so 0x1a
[ 933.692602] rescued complete.so 0x1b
[ 933.692605] rescued complete.so 0x1c
[ 933.692608] rescued complete.so 0x1d
[ 933.692612] rescued complete.so 0x1e
[ 933.692615] rescued complete.so 0x1f
[ 933.692618] rescued complete.so 0x20
[ 933.692621] rescued complete.so 0x4
[ 933.692625] rescued zutil.so 0x0
[ 933.692628] rescued zutil.so 0x1
[ 933.692632] rescued zutil.so 0x2
[ 933.692635] rescued zutil.so 0x3
[ 933.692638] rescued zutil.so 0x4
[ 933.692641] rescued zutil.so 0x5
[ 933.692645] rescued rlimits.so 0x0
[ 933.692649] rescued rlimits.so 0x1
[ 933.692652] rescued rlimits.so 0x2
[ 933.692656] rescued complist.so 0x0
[ 933.692659] rescued complist.so 0x1
[ 933.692662] rescued complist.so 0x2
[ 933.692666] rescued complist.so 0x3
[ 933.692942] rescued complist.so 0xc
[ 933.692945] rescued complist.so 0xd
[ 933.692950] rescued parameter.so 0x0
[ 933.692954] rescued parameter.so 0x1
[ 933.692957] rescued parameter.so 0x2
[ 933.692960] rescued parameter.so 0x3
[ 933.692963] rescued parameter.so 0x4
[ 933.692966] rescued parameter.so 0x6
[ 934.070241] rescued libpulsecore.so.5.0.1 0x25
[ 934.074982] rescued libpulsecore.so.5.0.1 0x2b
[ 934.079663] rescued libpulsecore.so.5.0.1 0x2c
[ 934.084835] rescued computil.so 0x1
[ 934.084864] rescued computil.so 0x0
[ 934.092002] rescued computil.so 0x2
[ 934.095601] rescued computil.so 0x4
[ 934.099196] rescued computil.so 0x5
[ 934.102795] rescued computil.so 0x6
[ 934.106413] rescued computil.so 0x7
[ 934.110022] rescued computil.so 0x8
[ 934.113623] rescued computil.so 0xe
[ 934.397565] rescued zle.so 0x35
[ 934.553110] rescued parameter.so 0x5
[ 934.557012] rescued zsh4 0x47

Minchan Kim

unread,

May 8, 2009, 5:40:15 AM5/8/09

to

On Fri, 8 May 2009 16:09:21 +0800
Wu Fengguang <fenggu...@intel.com> wrote:

Yeah. I was confusing with title.
Thanks for quick reply. :)

Yes. I support your idea.

> Thanks,
> Fengguang

Minchan Kim

unread,

May 8, 2009, 8:10:19 AM5/8/09

to

Sometime this vma don't contain the anon page.
That's why we need page_check_address.
For such a case, wrong *vm_flag cause be harmful to reclaim.
It can be happen in your first class citizen patch, I think.

--
Kinds regards,
Minchan Kim

Wu Fengguang

unread,

May 8, 2009, 8:20:11 AM5/8/09

to

Yes I'm aware of that - the VMA area covers that page, but have no pte
actually installed for that page. That should be OK - the presentation
of such VMA is a good indication of it being some executable text.

Minchan Kim

unread,

May 8, 2009, 10:10:27 AM5/8/09

to

Sorry but I can't understand your point.

This is general interface but not only executable text.
Sometime, The information of vma which don't really have the page can
be passed to caller.
ex) It can be happen by COW, mremap, non-linear mapping and so on.
but I am not sure.
I doubt vm_flag information is useful.

--
Kinds regards,
Minchan Kim

Christoph Lameter

unread,

May 8, 2009, 10:30:24 AM5/8/09

to

On Fri, 8 May 2009, Minchan Kim wrote:

> > > Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ?
> > > Could you elaborate please ?
> >
> > I was under the wild assumption that only the _text_ section will be
> > PROT_EXEC mapped. No?
>
> Yes. I support your idea.

Why do PROT_EXEC mapped segments deserve special treatment? What about the
other memory segments of the process? Essentials like stack, heap and
data segments of the libraries?

Rik van Riel

unread,

May 8, 2009, 10:40:14 AM5/8/09

to

Christoph Lameter wrote:
> On Fri, 8 May 2009, Minchan Kim wrote:
>
>>>> Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ?
>>>> Could you elaborate please ?
>>> I was under the wild assumption that only the _text_ section will be
>>> PROT_EXEC mapped. No?
>> Yes. I support your idea.
>
> Why do PROT_EXEC mapped segments deserve special treatment? What about the
> other memory segments of the process? Essentials like stack, heap and
> data segments of the libraries?

Christopher, please look at what changed in the VM
since 2.6.29 and you will understand how the stack,
heap and data segments already get special treatment.

Please stop pretending you're an idiot.

--
All rights reversed.

Rik van Riel

unread,

May 8, 2009, 12:10:12 PM5/8/09

to

Elladan wrote:

>> Nobody (except you) is proposing that we completely disable
>> the eviction of executable pages. I believe that your idea
>> could easily lead to a denial of service attack, with a user
>> creating a very large executable file and mmaping it.
>>
>> Giving executable pages some priority over other file cache
>> pages is nowhere near as dangerous wrt. unexpected side effects
>> and should work just as well.
>
> I don't think this sort of DOS is relevant for a single user or trusted user
> system.

Which not all systems are, meaning that the mechanism
Christoph proposes can never be enabled by default and
would have to be tweaked by the user.

I prefer code that should work just as well 99% of the
time, but can be enabled by default for everybody.
That way people automatically get the benefit.

--
All rights reversed.