Demand paging for malloc

29 views
Skip to first unread message

pusnow...@gmail.com

unread,
Sep 9, 2019, 7:41:31 AM9/9/19
to OSv Development
Hi, 
I found malloc returns physical address in mempool area and does not perform demand paging (only mmap does).
Is there any reason for the design choice?
OSv fails, even if it only uses small portion of allocated memory.


#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

int main()
{
size_t size = 512 * 1024 * 1024;
printf("Hello from main\n");
printf("allocation %x start\n", size);
//int *p = (int *)malloc(size); // FAIL
int *p = (int *)mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); // OK
printf("allocation %x = %p\n", size, p);
*(p) = 512;
printf("access done\n");

return 0;
}


Thanks.

Waldek Kozaczuk

unread,
Sep 9, 2019, 11:35:07 AM9/9/19
to OSv Development
Interesting. I cannot reproduce the malloc() problem. I have no issues running your example with uncommented malloc:

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

int main()
{
size_t size = 512 * 1024 * 1024;
printf("Hello from main\n");
printf("allocation %lx start\n", size);
int *p = (int *)malloc(size); // FAIL
//int *p = (int *)mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); // OK
printf("allocation %lx = %p\n", size, p);
*(p) = 512;
printf("access done\n");

return 0;
}

OSv v0.53.0-107-g4f59f284

eth0: 192.168.122.15

Booted up in 135.90 ms

Cmdline: /test_large --help

Hello from main

allocation 20000000 start

allocation 20000000 = 0xffff800001b0d040

access done


As you can see the address returned is a virtual one. Can you email your thread stack trace when it crashes in your case?

As far as why we commit the memory for all malloc I can not answer this and others might be able to address why it was designed like this. 

This might be a relevant issue - https://github.com/cloudius-systems/osv/issues/854. I guess we might change implementation of malloc_large() and for large sizes create a VMA like for mmap. But then malloc is used all over the place in kernel code so I wonder if we would not have issues like nested faults - see this - https://github.com/cloudius-systems/osv/issues/143 - it has some background between malloc() and mmap() handling in OSv.

Waldek

pusnow...@gmail.com

unread,
Sep 9, 2019, 11:18:58 PM9/9/19
to OSv Development
It only fails on firecracker whose default memory size is 128M.

For QEMU, if I set VM's memory to 128M, it also fails.

./scripts/build -j24 image=native-example && ./scripts/run.py -m 128M 

trace:

OSv v0.53.0-87-gf7b6bee5
eth0: 192.168.122.15
Booted up in 345.76 ms
Hello from main
allocation 20000000 start
Unreasonable allocation attempt, larger than memory. Aborting.
[backtrace]
0x00000000403e29b4 <memory::reclaimer::wait_for_memory(unsigned long)+132>
0x00000000403e5da1 <???+1077829025>
0x00000000403e60b7 <???+1077829815>
0x00000000403e63f6 <malloc+70>
0x000010000140094f <???+20973903>
0x000000004042d39c <osv::application::run_main()+60>
0x000000004020dfe3 <osv::application::main()+147>
0x000000004042d568 <???+1078121832>
0x0000000040461be5 <???+1078336485>
0x00000000403f9fb6 <thread_main_c+38>
0x0000000040399e52 <???+1077517906>
0x9f01e98d66991fff <???+1721311231>
0x00000000403f997f <???+1077909887>
0x4156415741e58947 <???+1105561927>



Wonsup


2019년 9월 10일 화요일 오전 12시 35분 7초 UTC+9, Waldek Kozaczuk 님의 말:

Nadav Har'El

unread,
Sep 10, 2019, 2:52:52 AM9/10/19
to pusnow...@gmail.com, OSv Development
On Mon, Sep 9, 2019 at 2:41 PM <pusnow...@gmail.com> wrote:
Hi, 
I found malloc returns physical address in mempool area and does not perform demand paging (only mmap does).
Is there any reason for the design choice?

I guess you're not really asking about demand paging ("swapping") because this feature is usually an unnecessary complication in single-application kernels. If I understand correctly, your question more about why does malloc() allocate physically contiguous memory unlike mmap().

The answer is that we originally did this because of the issue of huge pages. Modern CPUs have another level above the regular 4K pages - 2 MB pages called "huge pages". Applications get a performance boost by using huge pages, because the CPU's page table cache (the TLB) can only fit a fixed number of pages, so an application using few huge tables instead of a large number of small pages will have a higher hit rate in this cache, and improved performance. So it is inefficient to allocate a 8 KB allocation using small pages (two separate pages in physical pages but contiguous in virtual memory) - it is more efficient to set up huge pages and return the 8KB allocation as a contiguous part of such a huge-table. We measured this to noticeably improve (by a few percent) of applications which use memory allocated in small and-medium sized allocations.

That being said, for really large allocations - significantly over 2MB (the huge-page size) - there's no real reason why we need those to be contiguous in physical memory - we can build them from 2MB huge-pages, each contiguous in physical memory but overall the entire object is not. In fact, this is exactly what our mmap() does. So it would be nice if malloc() could fall back to call mmap() for allocations larger than some threshold (2MB, 4MB, or whatever). This is definitely doable - we have an open issue about this: https://github.com/cloudius-systems/osv/issues/854 - and it explains how it can be done.
 
OSv fails, even if it only uses small portion of allocated memory.

In your example, if I understand correctly, you tried to allocate 512 MB with a 128 MB memory, so it's not "a small portion" of memory - it's more than the memory you have :-)

But the issue still has merit. If you tried to allocate 50 MB it might have still have failed, because of memory fragmentation (i.e., we have 50 MB free memory, but not contiguous in physical memory).



#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

int main()
{
size_t size = 512 * 1024 * 1024;
printf("Hello from main\n");
printf("allocation %x start\n", size);
//int *p = (int *)malloc(size); // FAIL
int *p = (int *)mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); // OK
printf("allocation %x = %p\n", size, p);
*(p) = 512;
printf("access done\n");

return 0;
}


Thanks.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/5378fb86-73b9-4987-aa0a-70a573d1921b%40googlegroups.com.

pusnow...@gmail.com

unread,
Sep 10, 2019, 8:09:08 AM9/10/19
to OSv Development
I mean "allocated but not used" case.

In the example, it only uses the first 4KB of 512MB. 4KB is a small portion of 128MB. 

I also totally agree with mmap-backed malloc for large memory allocation.


Wonsup Yoon


2019년 9월 10일 화요일 오후 3시 52분 52초 UTC+9, Nadav Har'El 님의 말:

To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.

Waldek Kozaczuk

unread,
Sep 10, 2019, 10:15:49 AM9/10/19
to OSv Development


On Tuesday, September 10, 2019 at 2:52:52 AM UTC-4, Nadav Har'El wrote:

On Mon, Sep 9, 2019 at 2:41 PM <pusno...@gmail.com> wrote:
Hi, 
I found malloc returns physical address in mempool area and does not perform demand paging (only mmap does).
Is there any reason for the design choice?

I guess you're not really asking about demand paging ("swapping") because this feature is usually an unnecessary complication in single-application kernels. If I understand correctly, your question more about why does malloc() allocate physically contiguous memory unlike mmap().

The answer is that we originally did this because of the issue of huge pages. Modern CPUs have another level above the regular 4K pages - 2 MB pages called "huge pages". Applications get a performance boost by using huge pages, because the CPU's page table cache (the TLB) can only fit a fixed number of pages, so an application using few huge tables instead of a large number of small pages will have a higher hit rate in this cache, and improved performance. So it is inefficient to allocate a 8 KB allocation using small pages (two separate pages in physical pages but contiguous in virtual memory) - it is more efficient to set up huge pages and return the 8KB allocation as a contiguous part of such a huge-table. We measured this to noticeably improve (by a few percent) of applications which use memory allocated in small and-medium sized allocations.

That being said, for really large allocations - significantly over 2MB (the huge-page size) - there's no real reason why we need those to be contiguous in physical memory - we can build them from 2MB huge-pages, each contiguous in physical memory but overall the entire object is not. In fact, this is exactly what our mmap() does. So it would be nice if malloc() could fall back to call mmap() for allocations larger than some threshold (2MB, 4MB, or whatever). This is definitely doable - we have an open issue about this: https://github.com/cloudius-systems/osv/issues/854 - and it explains how it can be done.
Wouldn't we also have to employ the trick you suggested in issue https://github.com/cloudius-systems/osv/issues/143 - pre-fault the memory to make sure that kernel code does not access non-committed when preemption is disabled? Or that requirement only applies to memory mmaped for stacks? 
 
OSv fails, even if it only uses small portion of allocated memory.

In your example, if I understand correctly, you tried to allocate 512 MB with a 128 MB memory, so it's not "a small portion" of memory - it's more than the memory you have :-)

But the issue still has merit. If you tried to allocate 50 MB it might have still have failed, because of memory fragmentation (i.e., we have 50 MB free memory, but not contiguous in physical memory).



#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

int main()
{
size_t size = 512 * 1024 * 1024;
printf("Hello from main\n");
printf("allocation %x start\n", size);
//int *p = (int *)malloc(size); // FAIL
int *p = (int *)mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); // OK
printf("allocation %x = %p\n", size, p);
*(p) = 512;
printf("access done\n");

return 0;
}


Thanks.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.

Waldek Kozaczuk

unread,
Sep 10, 2019, 10:17:37 AM9/10/19
to OSv Development


On Tuesday, September 10, 2019 at 8:09:08 AM UTC-4, pusno...@gmail.com wrote:
I mean "allocated but not used" case.

In the example, it only uses the first 4KB of 512MB. 4KB is a small portion of 128MB. 

I also totally agree with mmap-backed malloc for large memory allocation.
We are always on the lookout for volunteers so we would welcome a patch implementing it :-)

Nadav Har'El

unread,
Sep 10, 2019, 10:52:03 AM9/10/19
to Waldek Kozaczuk, OSv Development
On Tue, Sep 10, 2019 at 5:15 PM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:


On Tuesday, September 10, 2019 at 2:52:52 AM UTC-4, Nadav Har'El wrote:

On Mon, Sep 9, 2019 at 2:41 PM <pusno...@gmail.com> wrote:
Hi, 
I found malloc returns physical address in mempool area and does not perform demand paging (only mmap does).
Is there any reason for the design choice?

I guess you're not really asking about demand paging ("swapping") because this feature is usually an unnecessary complication in single-application kernels. If I understand correctly, your question more about why does malloc() allocate physically contiguous memory unlike mmap().

The answer is that we originally did this because of the issue of huge pages. Modern CPUs have another level above the regular 4K pages - 2 MB pages called "huge pages". Applications get a performance boost by using huge pages, because the CPU's page table cache (the TLB) can only fit a fixed number of pages, so an application using few huge tables instead of a large number of small pages will have a higher hit rate in this cache, and improved performance. So it is inefficient to allocate a 8 KB allocation using small pages (two separate pages in physical pages but contiguous in virtual memory) - it is more efficient to set up huge pages and return the 8KB allocation as a contiguous part of such a huge-table. We measured this to noticeably improve (by a few percent) of applications which use memory allocated in small and-medium sized allocations.

That being said, for really large allocations - significantly over 2MB (the huge-page size) - there's no real reason why we need those to be contiguous in physical memory - we can build them from 2MB huge-pages, each contiguous in physical memory but overall the entire object is not. In fact, this is exactly what our mmap() does. So it would be nice if malloc() could fall back to call mmap() for allocations larger than some threshold (2MB, 4MB, or whatever). This is definitely doable - we have an open issue about this: https://github.com/cloudius-systems/osv/issues/854 - and it explains how it can be done.
Wouldn't we also have to employ the trick you suggested in issue https://github.com/cloudius-systems/osv/issues/143 - pre-fault the memory to make sure that kernel code does not access non-committed when preemption is disabled? Or that requirement only applies to memory mmaped for stacks? 

Most OSv kernel code runs in preemption mode. Only a small amount of kernel code runs with preemption disabled, and it doesn't normally access user-allocated objects. One notable exception is the stack which even preemption-disabled code uses.

But you're right that there may be *kernel* code which uses malloc() with the implicit assumption that it always returns mapped and/or physically-contiguous memory. Such code should really call alloc_phys_contiguous_aligned() but perhaps doesn't (and in any case that function calls malloc() today :-)).

I'm hoping that if we'll only use mmap for very large malloc(), we'll never notice any of these problems, because the kernel will not likely be working with very large allocations.


 
OSv fails, even if it only uses small portion of allocated memory.

In your example, if I understand correctly, you tried to allocate 512 MB with a 128 MB memory, so it's not "a small portion" of memory - it's more than the memory you have :-)

But the issue still has merit. If you tried to allocate 50 MB it might have still have failed, because of memory fragmentation (i.e., we have 50 MB free memory, but not contiguous in physical memory).



#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

int main()
{
size_t size = 512 * 1024 * 1024;
printf("Hello from main\n");
printf("allocation %x start\n", size);
//int *p = (int *)malloc(size); // FAIL
int *p = (int *)mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); // OK
printf("allocation %x = %p\n", size, p);
*(p) = 512;
printf("access done\n");

return 0;
}


Thanks.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/5378fb86-73b9-4987-aa0a-70a573d1921b%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/65925351-9586-43c1-a68f-d51083e86aa7%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages