I get reports that people find this useful, so resending.
v1->v2
- adding new flag to all archs
- fixing typo
v2->v3
- one more typo fix
v3->v4
- return error if MAP_LOCKED | MAP_UNLOCKED is specified
v4->v5
- rebase to latest head
diff --git a/arch/alpha/include/asm/mman.h b/arch/alpha/include/asm/mman.h
index 99c56d4..cfc51ac 100644
--- a/arch/alpha/include/asm/mman.h
+++ b/arch/alpha/include/asm/mman.h
@@ -30,6 +30,7 @@
#define MAP_NONBLOCK 0x40000 /* do not block on IO */
#define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x100000 /* create a huge page mapping */
+#define MAP_UNLOCKED 0x200000 /* force page unlocking */
#define MS_ASYNC 1 /* sync memory asynchronously */
#define MS_SYNC 2 /* synchronous memory sync */
diff --git a/arch/mips/include/asm/mman.h b/arch/mips/include/asm/mman.h
index c892bfb..3e4d108 100644
--- a/arch/mips/include/asm/mman.h
+++ b/arch/mips/include/asm/mman.h
@@ -48,6 +48,7 @@
#define MAP_NONBLOCK 0x20000 /* do not block on IO */
#define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x80000 /* create a huge page mapping */
+#define MAP_UNLOCKED 0x100000 /* force page unlocking */
/*
* Flags for msync
diff --git a/arch/parisc/include/asm/mman.h b/arch/parisc/include/asm/mman.h
index 9749c8a..4e8b9bf 100644
--- a/arch/parisc/include/asm/mman.h
+++ b/arch/parisc/include/asm/mman.h
@@ -24,6 +24,7 @@
#define MAP_NONBLOCK 0x20000 /* do not block on IO */
#define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x80000 /* create a huge page mapping */
+#define MAP_UNLOCKED 0x100000 /* force page unlocking */
#define MS_SYNC 1 /* synchronous memory sync */
#define MS_ASYNC 2 /* sync memory asynchronously */
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index d4a7f64..7d33f01 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -27,6 +27,7 @@
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+#define MAP_UNLOCKED 0x80000 /* force page unlocking */
#ifdef __KERNEL__
#ifdef CONFIG_PPC64
diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h
index c3029ad..f80d203 100644
--- a/arch/sparc/include/asm/mman.h
+++ b/arch/sparc/include/asm/mman.h
@@ -22,6 +22,7 @@
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+#define MAP_UNLOCKED 0x80000 /* force page unlocking */
#ifdef __KERNEL__
#ifndef __ASSEMBLY__
diff --git a/arch/xtensa/include/asm/mman.h b/arch/xtensa/include/asm/mman.h
index fca4db4..c62bcd8 100644
--- a/arch/xtensa/include/asm/mman.h
+++ b/arch/xtensa/include/asm/mman.h
@@ -55,6 +55,7 @@
#define MAP_NONBLOCK 0x20000 /* do not block on IO */
#define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x80000 /* create a huge page mapping */
+#define MAP_UNLOCKED 0x100000 /* force page unlocking */
/*
* Flags for msync
diff --git a/include/asm-generic/mman.h b/include/asm-generic/mman.h
index 32c8bd6..59e0f29 100644
--- a/include/asm-generic/mman.h
+++ b/include/asm-generic/mman.h
@@ -12,6 +12,7 @@
#define MAP_NONBLOCK 0x10000 /* do not block on IO */
#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */
#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
+#define MAP_UNLOCKED 0x80000 /* force page unlocking */
#define MCL_CURRENT 1 /* lock all current mappings */
#define MCL_FUTURE 2 /* lock all future mappings */
diff --git a/mm/mmap.c b/mm/mmap.c
index ee22989..4bda220 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -962,6 +962,9 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
if (!can_do_mlock())
return -EPERM;
+ if (flags & MAP_UNLOCKED)
+ vm_flags &= ~VM_LOCKED;
+
/* mlock MCL_FUTURE? */
if (vm_flags & VM_LOCKED) {
unsigned long locked, lock_limit;
@@ -1050,7 +1053,10 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
struct file *file = NULL;
unsigned long retval = -EBADF;
- if (!(flags & MAP_ANONYMOUS)) {
+ if (unlikely((flags & (MAP_LOCKED | MAP_UNLOCKED)) ==
+ (MAP_LOCKED | MAP_UNLOCKED))) {
+ return -EINVAL;
+ } else if (!(flags & MAP_ANONYMOUS)) {
if (unlikely(flags & MAP_HUGETLB))
return -EINVAL;
file = fget(fd);
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Thanks for keeping working on it.
This version looks fine for me.
Acked-by: WANG Cong <xiyou.w...@gmail.com>
--
Live like a child, think like the god.
Looks good to me.
Acked-by: Chris Wright <chr...@sous-sol.org>
thanks,
-chris
This description is still wrong. It doesn't describe why this patch is useful.
--
Gleb.
My point is, introducing mmap new flags need strong and clearly use-case.
All patch should have good benefit/cost balance. the code can describe the cost,
but the benefit can be only explained by the patch description.
I don't think this poor description explained bit benefit rather than cost.
you should explain why this patch is useful and not just pretty toy.
--
Gleb.
Hmm..
Your answer didn't match I wanted.
few additional questions.
- Why don't you change your application? It seems natural way than kernel change.
- Why do you want your virtual machine have mlockall? AFAIK, current majority
virtual machine doesn't.
- If this feature added, average distro user can get any benefit?
I mean, many application developrs want to add their specific feature
into kernel. but if we allow it unlimitedly, major syscall become
the trushbox of pretty toy feature soon.
> few additional questions.
>
> - Why don't you change your application? It seems natural way than kernel change.
There is no way to change my application and achieve what I've described
in a multithreaded app.
> - Why do you want your virtual machine have mlockall? AFAIK, current majority
> virtual machine doesn't.
It is absolutely irrelevant for that patch, but just because you ask I
want to measure the cost of swapping out of a guest memory.
> - If this feature added, average distro user can get any benefit?
>
?! Is this some kind of new measure? There are plenty of much more
invasive features that don't bring benefits to an average distro user.
This feature can bring benefit to embedded/RT developers.
> I mean, many application developrs want to add their specific feature
> into kernel. but if we allow it unlimitedly, major syscall become
> the trushbox of pretty toy feature soon.
>
And if application developer wants to extend kernel in a way that it
will be possible to do something that was not possible before why is
this a bad thing? I would agree with you if for my problem was userspace
solution, but there is none. The mmap interface is asymmetric in regards
to mlock currently. There is MAP_LOCKED, but no MAP_UNLOCKED. Why
MAP_LOCKED is useful then?
--
Gleb.
I want to know the benefit of the patch for patch reviewing.
> > few additional questions.
> >
> > - Why don't you change your application? It seems natural way than kernel change.
> There is no way to change my application and achieve what I've described
> in a multithreaded app.
Then, we don't recommend to use mlockall(). I don't hope to hear your conclusion,
it is not objectivization. I hope to hear why you reached such conclusion.
> > - Why do you want your virtual machine have mlockall? AFAIK, current majority
> > virtual machine doesn't.
> It is absolutely irrelevant for that patch, but just because you ask I
> want to measure the cost of swapping out of a guest memory.
No. if you stop to use mlockall, the issue is vanished.
> > - If this feature added, average distro user can get any benefit?
> >
> ?! Is this some kind of new measure? There are plenty of much more
> invasive features that don't bring benefits to an average distro user.
> This feature can bring benefit to embedded/RT developers.
I mean who get benifit?
> > I mean, many application developrs want to add their specific feature
> > into kernel. but if we allow it unlimitedly, major syscall become
> > the trushbox of pretty toy feature soon.
> >
> And if application developer wants to extend kernel in a way that it
> will be possible to do something that was not possible before why is
> this a bad thing? I would agree with you if for my problem was userspace
> solution, but there is none. The mmap interface is asymmetric in regards
> to mlock currently. There is MAP_LOCKED, but no MAP_UNLOCKED. Why
> MAP_LOCKED is useful then?
Why? Because this is formal LKML reviewing process. I'm reviewing your
patch for YOU.
If there is no objective reason, I don't want to continue reviewing.
>
>
> > > - Why do you want your virtual machine have mlockall? AFAIK, current majority
> > > virtual machine doesn't.
> > It is absolutely irrelevant for that patch, but just because you ask I
> > want to measure the cost of swapping out of a guest memory.
>
> No. if you stop to use mlockall, the issue is vanished.
>
And emulator parts will be swapped out too which is not what I want.
>
> > > - If this feature added, average distro user can get any benefit?
> > >
> > ?! Is this some kind of new measure? There are plenty of much more
> > invasive features that don't bring benefits to an average distro user.
> > This feature can bring benefit to embedded/RT developers.
>
> I mean who get benifit?
Someone who wants to mlock all application memory, but wants to be able
to mmap big file for reading and understand that access to that file can
cause major fault.
>
>
> > > I mean, many application developrs want to add their specific feature
> > > into kernel. but if we allow it unlimitedly, major syscall become
> > > the trushbox of pretty toy feature soon.
> > >
> > And if application developer wants to extend kernel in a way that it
> > will be possible to do something that was not possible before why is
> > this a bad thing? I would agree with you if for my problem was userspace
> > solution, but there is none. The mmap interface is asymmetric in regards
> > to mlock currently. There is MAP_LOCKED, but no MAP_UNLOCKED. Why
> > MAP_LOCKED is useful then?
>
> Why? Because this is formal LKML reviewing process. I'm reviewing your
> patch for YOU.
>
I appreciate that, but unfortunately it seems that you are trying to dismiss
my arguments on the basis that _you_ don't find that useful.
> If there is no objective reason, I don't want to continue reviewing.
>
--
Gleb.
The benefit of the patch is that it makes it possible for an
application which has previously called mlockall(MCL_FUTURE) to
selectively exempt new memory mappings from memory locking, on a
per-mmap-call basis. As was pointed out earlier, there is currently no
thread-safe way for an application to do this. The earlier proposed
workaround of toggling MCL_FUTURE around calls to mmap is racy in a
multi-threaded context. Other threads may manipulate the address space
during the window where MCL_FUTURE is off, subverting the programmers
intended memory locking semantics.
The ability to exempt specific memory mappings from memory locking is
necessary when the region to be mapped is larger than physical memory.
In such cases a call to mmap the region cannot succeed, unless
MAP_UNLOCKED is available.
>
>> > few additional questions.
>> >
>> > - Why don't you change your application? It seems natural way than kernel change.
>> There is no way to change my application and achieve what I've described
>> in a multithreaded app.
>
> Then, we don't recommend to use mlockall(). I don't hope to hear your conclusion,
> it is not objectivization. I hope to hear why you reached such conclusion.
>
I agree that mlockall is a big hammer and should be avoided in most
cases, but there are situations where it is exactly what is needed. In
Gleb's instance, it sounds like he is doing some finicky performance
measurement and major page faults skew his results. In my case, I have
a realtime process where the measured latency impact of major page
faults is unacceptable. In both of these cases, mlockall is a
reasonable approach to eliminating major faults.
However, Gleb and I have independently found ourselves unable to use
mlockall because we also need to create a very large memory mapping
(for which we don't care about major faults). The proposed
MAP_UNLOCKED flag would allow us to override MCL_FUTURE for that one
mapping.
>
>> > - Why do you want your virtual machine have mlockall? AFAIK, current majority
>> > � virtual machine doesn't.
>> It is absolutely irrelevant for that patch, but just because you ask I
>> want to measure the cost of swapping out of a guest memory.
>
> No. if you stop to use mlockall, the issue is vanished.
>
And other issues arise. Gleb described a situation where the use of
mlockall is justified, identified an issue which prevents its use, and
provided a patch which resolves that issue. Why are you focusing on
the validity of using mlockall?
>
>> > - If this feature added, average distro user can get any benefit?
>> >
>> ?! Is this some kind of new measure? There are plenty of much more
>> invasive features that don't bring benefits to an average distro user.
>> This feature can bring benefit to embedded/RT developers.
>
> I mean who get benifit?
>
>
>> > I mean, many application developrs want to add their specific feature
>> > into kernel. but if we allow it unlimitedly, major syscall become
>> > the trushbox of pretty toy feature soon.
>> >
>> And if application developer wants to extend kernel in a way that it
>> will be possible to do something that was not possible before why is
>> this a bad thing? I would agree with you if for my problem was userspace
>> solution, but there is none. The mmap interface is asymmetric in regards
>> to mlock currently. There is MAP_LOCKED, but no MAP_UNLOCKED. Why
>> MAP_LOCKED is useful then?
>
> Why? Because this is formal LKML reviewing process. I'm reviewing your
> patch for YOU.
>
> If there is no objective reason, I don't want to continue reviewing.
>
There is an objective reason: the current interaction between
mlockall(MCL_FUTURE) and mmap has a deficiency. In 'normal' mode,
without MCL_FUTURE in force, the default is that new memory mappings
are not locked, but mmap provides MAP_LOCKED specifically to override
that default. However, with MCL_FUTURE toggled to on, there is no
analogous way to tell mmap to override the default. The proposed
MAP_UNLOCKED flag would resolve this deficiency.
Andrew
Very thank you, Andrew!
Your explanation help me lots rather than original patch description. OK, At least
MAP_UNLOCED have two users (you and gleb) and your explanation seems
makes sense.
So, if gleb resend this patch with rewrited description, I might take my reviewed-by tag to it, probagly.
Thanks.
The benefit of the patch is that it makes it possible for an application
which has previously called mlockall(MCL_FUTURE) to selectively exempt
new memory mappings from memory locking, on a per-mmap-call basis. There
is currently no thread-safe way for an application to do this as
toggling MCL_FUTURE around calls to mmap is racy in a multi-threaded
context. Other threads may manipulate the address space during the
window where MCL_FUTURE is off, subverting the programmers intended
memory locking semantics.
The ability to exempt specific memory mappings from memory locking is
necessary when the region to be mapped is larger than physical memory.
In such cases a call to mmap the region cannot succeed, unless
MAP_UNLOCKED is available.
Acked-by: WANG Cong <xiyou.w...@gmail.com>
Acked-by: Chris Wright <chr...@sous-sol.org>
Signed-off-by: Gleb Natapov <gl...@redhat.com>
---
I keep the acks since the patch is exactly the same, only commit message
is changed.
Commit message is mostly copied from Andrew C. Morrow email. Hope now it
is OK. Thank you Andrew :)
v1->v2
- adding new flag to all archs
- fixing typo
v2->v3
- one more typo fix
v3->v4
- return error if MAP_LOCKED | MAP_UNLOCKED is specified
v4->v5
- rebase to latest head
v5->v6
- commit message rewritten
--
Gleb.
On Mon, Jan 18, 2010 at 3:37 PM, Gleb Natapov <gl...@redhat.com> wrote:
> The current interaction between mlockall(MCL_FUTURE) and mmap has a
> deficiency. In 'normal' mode, without MCL_FUTURE in force, the default
> is that new memory mappings are not locked, but mmap provides MAP_LOCKED
> specifically to override that default. However, with MCL_FUTURE toggled
> to on, there is no analogous way to tell mmap to override the default. The
> proposed MAP_UNLOCKED flag would resolve this deficiency.
>
> The benefit of the patch is that it makes it possible for an application
> which has previously called mlockall(MCL_FUTURE) to selectively exempt
> new memory mappings from memory locking, on a per-mmap-call basis. There
> is currently no thread-safe way for an application to do this as
> toggling MCL_FUTURE around calls to mmap is racy in a multi-threaded
> context. Other threads may manipulate the address space during the
> window where MCL_FUTURE is off, subverting the programmers intended
> memory locking semantics.
>
> The ability to exempt specific memory mappings from memory locking is
> necessary when the region to be mapped is larger than physical memory.
> In such cases a call to mmap the region cannot succeed, unless
> MAP_UNLOCKED is available.
The changelog doesn't mention what kind of applications would want to
use this. Are there some? Using mlockall(MCL_FUTURE) but then having
some memory regions MAP_UNLOCKED sounds like a strange combination to
me.
--
Gleb.
If you add this flag you can't do that anyway - some library will
helpfully start up using it and then you are completely stuffed or will
be back in two or three years adding MLOCKALL_ALWAYS.
Alan
--
Gleb.
Agreed, mlockall() is a very bad interface and should not be used for a
plethora of reasons, this being one of them.
The thing is, if you cant trust your library to do sane things, then
don't use it.
That's debatable.
> The thing is, if you cant trust your library to do sane things, then
> don't use it.
>
Agreed, the are things that sane library should never do: exit() or output
debug info to stdio or meddle with memory mlock/munlock behind application's
back.
--
Gleb.
--
Gleb.
I would not advice that, just mlock() the text and data you need for the
real-time thread. mlockall() is a really blunt instrument.
Real-time?
--
error compiling committee.c: too many arguments to function
Esp for the real-time case I could advise not to use those libraries
then, since they're clearly not designed for that use case.
May not be feasible due to libraries.
--
error compiling committee.c: too many arguments to function
--
> On Mon, 2010-01-18 at 17:19 +0200, Avi Kivity wrote:
> > > I would not advice that, just mlock() the text and data you need for the
> > > real-time thread. mlockall() is a really blunt instrument.
> > >
> >
> > May not be feasible due to libraries.
>
> Esp for the real-time case I could advise not to use those libraries
> then, since they're clearly not designed for that use case.
In "hard" real time cases an awful lot of libraries have things like
memory allocations in them and don't care about stack growth which can
cause faults and sleeps. The memory allocator if you are running threaded
was not real time priority aware either last time I checked so the
standard libraries are not going to give the behaviour you want unless
you have a proper RT environment, and even then it may be a bit iffy here
and there.
I'm quite aware of that, which is why we recommend people to
pre-allocate, mlock() and pre-fault everything in advance and make sure
the RT thread doesn't touch any data/text outside of that and uses a
limited set of system calls.
You can also do that for stacks using pthread_attr_setstack().
Again, why do you want to MCL_FUTURE but then go and use MAP_UNLOCKED?
"Greater control" is not an argument for adding a new API that needs
to be maintained forever, a real world use case is.
And yes, this stuff needs to be in the changelog. Whether you want to
spell it out or post an URL to some previous discussion is up to you.
> "Greater control" is not an argument for adding a new API that needs
> to be maintained forever, a real world use case is.
>
If there is real world use case for mlockall() there is real use case for
this too. People seems to be trying to convince me that I don't need
mlockall() without proposing alternatives. The only alternative I see
lock everything from userspace.
> And yes, this stuff needs to be in the changelog. Whether you want to
> spell it out or post an URL to some previous discussion is up to you.
The discussion was here just a couple of days ago. Here is the link
were I describe my use case: http://marc.info/?l=linux-mm&m=126345374125942&w=2
If you think it needs to be spelled out in commit log I'll do it.
--
Gleb.
--
Gleb.
On Mon, Jan 18, 2010 at 7:08 PM, Gleb Natapov <gl...@redhat.com> wrote:
>> "Greater control" is not an argument for adding a new API that needs
>> to be maintained forever, a real world use case is.
>>
> If there is real world use case for mlockall() there is real use case for
> this too. People seems to be trying to convince me that I don't need
> mlockall() without proposing alternatives. The only alternative I see
> lock everything from userspace.
>
>> And yes, this stuff needs to be in the changelog. Whether you want to
>> spell it out or post an URL to some previous discussion is up to you.
> The discussion was here just a couple of days ago. Here is the link
> were I describe my use case: http://marc.info/?l=linux-mm&m=126345374125942&w=2
> If you think it needs to be spelled out in commit log I'll do it.
So this is a performance thing? Btw, is there are reason you can't use
plain mlock() for it as suggested by Peter earlier?
Pekka
Which keeps all the special cases in your app rather than in every single
users kernel. That seems to be the right way up, especially as you can
make a library of it !
Alan
Please stop suck.
This is the reviewing. The reviewers shouldn't need to look at all
previous thread. It mean your description isn't enough.
The thread took a direction of bashing mlockall(). This is especially
strange since proposed patch actually makes mlockall() more fine
grained and thus more useful.
--
Gleb.
On Tue, Jan 19, 2010 at 9:17 AM, Gleb Natapov <gl...@redhat.com> wrote:
> The thread took a direction of bashing mlockall(). This is especially
> strange since proposed patch actually makes mlockall() more fine
> grained and thus more useful.
No, the thread took a direction of you not being able to properly
explain why we want MMAP_UNLOCKED in the kernel. It seems useless for
real-time and I've yet to figure out why you need _mlockall()_ if it's
a performance thing.
It would be probably useful if you could point us to the application
source code that actually wants this feature.
Pekka
> real-time and I've yet to figure out why you need _mlockall()_ if it's
> a performance thing.
I don't do real-time so will not argue how useful it is for that,
but it seems to me that people who argue that it is not useful for real
time don't do it either and the only person in this thread who does real
time uses mlockall(). Hmm strange.
In my case (virtualization) I want to test/profile guest under heavy swapping
of a guests memory, so I intentionally create memory shortage by creating
guest much large then host memory, but I want system to swap out only
guest's memory.
>
> It would be probably useful if you could point us to the application
> source code that actually wants this feature.
>
This is two line patch to qemu that calls mlockall(MCL_CURRENT|MCL_FUTURE)
at the beginning of the main() and changes guest memory allocation to
use MAP_UNLOCKED flag. All alternative solutions in this thread suggest
that I should rewrite qemu + all library it uses. You see why I can't
take them seriously?
--
Gleb.
On Tue, Jan 19, 2010 at 9:52 AM, Gleb Natapov <gl...@redhat.com> wrote:
>> It would be probably useful if you could point us to the application
>> source code that actually wants this feature.
>>
> This is two line patch to qemu that calls mlockall(MCL_CURRENT|MCL_FUTURE)
> at the beginning of the main() and changes guest memory allocation to
> use MAP_UNLOCKED flag. All alternative solutions in this thread suggest
> that I should rewrite qemu + all library it uses. You see why I can't
> take them seriously?
Well, that's not going to be portable, is it, so the application
design would still be broken, no? Did you try using (or extending)
posix_madvise(MADV_DONTNEED) for the guest address space? It seems to
me that you're trying to use a big hammer (mlock) when a polite hint
for the VM would probably be sufficient for it do its job.
Pekka
> design would still be broken, no? Did you try using (or extending)
> posix_madvise(MADV_DONTNEED) for the guest address space? It seems to
After mlockall() I can't even allocate guest address space. Or do you mean
instead of mlockall()? Then how MADV_DONTNEED will help? It just drops
page table for the address range (which is not what I need) and does not
have any long time effect.
> me that you're trying to use a big hammer (mlock) when a polite hint
> for the VM would probably be sufficient for it do its job.
>
I what to tell to VM "swap this, don't swap that" and as far as I see
there is no other way to do it currently.
--
Gleb.
On Tue, Jan 19, 2010 at 10:26 AM, Gleb Natapov <gl...@redhat.com> wrote:
>> design would still be broken, no? Did you try using (or extending)
>> posix_madvise(MADV_DONTNEED) for the guest address space? It seems to
> After mlockall() I can't even allocate guest address space. Or do you mean
> instead of mlockall()? Then how MADV_DONTNEED will help? It just drops
> page table for the address range (which is not what I need) and does not
> have any long time effect.
Oh right, MADV_DONTNEED is no good.
On Tue, Jan 19, 2010 at 10:26 AM, Gleb Natapov <gl...@redhat.com> wrote:
>> me that you're trying to use a big hammer (mlock) when a polite hint
>> for the VM would probably be sufficient for it do its job.
>>
> I what to tell to VM "swap this, don't swap that" and as far as I see
> there is no other way to do it currently.
Yeah, which is why I was suggesting that maybe posix_madvise() needs
to be extended to have a MADV_NEED_BUT_LESS_IMPORTANT flag that can be
used as a hint by mm/vmscan.c to first swap the guest address spaces.
Pekka
--
Gleb.
So this isn't an API question this is an obscure corner case testing
question.
>
> >
> > It would be probably useful if you could point us to the application
> > source code that actually wants this feature.
> >
> This is two line patch to qemu that calls mlockall(MCL_CURRENT|MCL_FUTURE)
> at the beginning of the main() and changes guest memory allocation to
> use MAP_UNLOCKED flag. All alternative solutions in this thread suggest
> that I should rewrite qemu + all library it uses. You see why I can't
> take them seriously?
And you want millions of users to have kernels with weird extra functions
whole sole value is one test environment you wish to run
See why we can't take you seriously either ?
> > >
> > > It would be probably useful if you could point us to the application
> > > source code that actually wants this feature.
> > >
> > This is two line patch to qemu that calls mlockall(MCL_CURRENT|MCL_FUTURE)
> > at the beginning of the main() and changes guest memory allocation to
> > use MAP_UNLOCKED flag. All alternative solutions in this thread suggest
> > that I should rewrite qemu + all library it uses. You see why I can't
> > take them seriously?
>
> And you want millions of users to have kernels with weird extra functions
> whole sole value is one test environment you wish to run
>
We are talking about 4 lines of code that other people find useful too
and they commented in this thread. This wouldn't be the first kernel
feature not used by millions of people.
> See why we can't take you seriously either ?
>
I was taking about solutions. Thank you.
--
Gleb.
On Tue, 2010-01-19 at 10:44 +0200, Pekka Enberg wrote:
> Hi Gleb,
>
> On Tue, Jan 19, 2010 at 10:26 AM, Gleb Natapov <gl...@redhat.com> wrote:
> >> design would still be broken, no? Did you try using (or extending)
> >> posix_madvise(MADV_DONTNEED) for the guest address space? It seems to
> > After mlockall() I can't even allocate guest address space. Or do you mean
> > instead of mlockall()? Then how MADV_DONTNEED will help? It just drops
> > page table for the address range (which is not what I need) and does not
> > have any long time effect.
>
> Oh right, MADV_DONTNEED is no good.
>
> On Tue, Jan 19, 2010 at 10:26 AM, Gleb Natapov <gl...@redhat.com> wrote:
> >> me that you're trying to use a big hammer (mlock) when a polite hint
> >> for the VM would probably be sufficient for it do its job.
> >>
> > I what to tell to VM "swap this, don't swap that" and as far as I see
> > there is no other way to do it currently.
>
> Yeah, which is why I was suggesting that maybe posix_madvise() needs
> to be extended to have a MADV_NEED_BUT_LESS_IMPORTANT flag that can be
> used as a hint by mm/vmscan.c to first swap the guest address spaces.
>
> Pekka
Gleb. How about using MADV_SEQUENTIAL on guest memory?
It makes that pages of guest are moved into inactive reclaim list more
fast. It means it is likely to swap out faster than other pages if it
isn't hit during inactive list.
--
Kind regards,
Minchan Kim
Yeah, something like that but we don't want the readahead. OTOH, it's
not clear what Gleb's real problem is. Are the guest address spaces
anonymous or file backed? Which parts of the emulator are swapped out
that are causing the problem? Maybe it's a VM balancing issue that
mlock papers over?
Pekka
It wouldn't be the first completely dumb mistake in the kernel either,
but one dumb mistake doesn't argue for including others
> Which parts of the emulator are swapped out
> that are causing the problem?
I don't want anything that can be used during guest runtime to be
swapped out. And I run 2G guest in 512M container, so eventually
everything is swapped out :)
> Maybe it's a VM balancing issue that
> mlock papers over?
>
There is no problem. I do measurements on how host swapping affects
guest and I don't want qemu code to be swapped out.
--
Gleb.
> In my case (virtualization) I want to test/profile guest under heavy swapping
> of a guests memory, so I intentionally create memory shortage by creating
You mean "guest memory" that is area emulated DRAM in qemu?
It is anonymous vma.
> guest much large then host memory, but I want system to swap out only
> guest's memory.
Couldn't you use MADV_SEQUENTIAL on only guest memory area?
It doesn't make side effect about readahead since it's anon area.
And it would make do best effort to swap out guest's memory.
--
Kind regards,
Minchan Kim
--
--
Gleb.
Off topic:
posix_madvise(MADV_DONTNEED) is nop. glibc's posix_madvise(MADV_DONTNEED)
don't call linux's madvise(MADV_DONTNEED).
It's because madvise(MADV_DONTNEED) is not POSIX compliant.
The behavior of linux madvise(MADV_DONTNEED) is similar to Solaris (or *BSD)
madvise(MADV_FREE).