Re: OSv Development Question

25 views

Skip to first unread message

Waldek Kozaczuk

unread,

Aug 21, 2025, 12:31:23 AM8/21/25

to Alex Wollman, OSv Development

Hi,

I'm sorry for not getting back to you sooner.

Please take a look at my questions and answers below. You may also find reading the comments of this issue helpful - https://github.com/cloudius-systems/osv/issues/651.

On Thu, Aug 7, 2025 at 1:36 PM Alex Wollman <alex.w...@gmail.com> wrote:

Greetings Mr.Kozaczuk,

Bottom Line Up Front
My name is Alex Wollman, a PhD student at Dakota State University, and I am contacting you to inquire about the memory management and page allocation code for OSv. If you have time to respond to my questions I would greatly appreciate it, but if not I also understand you are a very busy individual. I also understand my first email to you is unreasonably long, for which I apologize. My personal bias is to provide more information up-front than a lengthy back-and-forth. I hope this is not too much of an inconvenience, as I value your opinion.

Context
My name is Alex Wollman and I am a Cyber Operations PhD student at Dakota State University in Madison, SD. My dissertation topic is focused on the security of unikernels, and I am currently using OSv as a case study to implement Address Space Layout Randomization. I have 2 papers published indicating the need for the research (https://ieeexplore.ieee.org/document/10778787 and the preprint for the 2nd one is here https://arxiv.org/abs/2412.10904.) I've been reading about unikernels for approximately 2 years now and they are an extremely fascinating topic in the field of computing. Especially now with the drive to make smaller and smaller compute environments viable.

I have seen all the work you've done in the Cloudius-osv repository and found your email address in the Google Groups for OSv. First, thank you for all the work you've done on this project! As I read through the code it's very impressive work.

If you have time to answer a few questions I have about the memory management and page allocation code I would truly appreciate it, however I also understand that you are a very busy individual and may not have time.

Implementation
I have implemented a rudimentary ASLR for the stack in custom x64 applications by modifying the init_stack function within the threading codebase (osv/arch/x64/arch-switch.hh line 160 begins the init_stack function, and I can provide more details upon request.) However my implementation just chooses a "random" location within the allocated memory (0x100000 in size) and sets the "stack.top" value appropriately. This will result in "0x100000 - stack.top" bytes of memory being allocated but always unused, which is not optimal.

How exactly do you allocate stack memory to make it "random"? I think you will need to differentiate between kernel and application thread stacks.

Using realloc for the stack has resulted in page_fault errors and malloc has not seemed appropriate as the memory has already been allocated at the point I begin my work (at least according to the program flow.) I am willing to provide more details upon request to provide more context.

Do you have a branch I can look at?

Attempting to implement heap based ASLR is a completely different story. In the Linux kernel there is a brk structure to track the starting address so when the program executes memory can be assigned appropriately. Obviously unikernels operate totally differently, and this forms the basis of my questions.

Questions
How do I identify where the heap is assigned in the code, and is there "a heap" for applications?
From what I can tell given the documentation and functionality, there is no 'brk' structure to track the heap. In fact I have not been able to identify a specific page or code functionality that identifies a heap at all (though through developmental testing I know there is one.) My research has taken me down the MMU code path and when I get to physical memory discovery and page assignment I'm afraid I've gone too far. Is there any guidance you can provide to help me navigate page creation and management?

You guessed it right. There is no concept of a "heap" in OSv. Applications executed on OSv run within the same memory space and normally integrate using the standard C library interface (malloc, free, open, etc), not at the system call level. So they do not need to know the heap.

Having said that, there is a fairly new way of executing apps - statically linked or executed via Linux dynamic linker ("ld.so") - which DO integrate at the system call level and OSv implements brk.

Given there is no brk style heap management, would randomizing the heap starting address require serious alterations to page allocations?
As I understand it, physical memory is discovered and then memory pages are created, and the heap is "just" one of those pages. If I wanted to implement an ASLR solution for the heap (and possibly a better one for the stack) I think functionality would need to be introduced at this level (memory page assignment) to identify the heap and enable downstream functionality. Is this an achievable approach?

Given there is no heap, I am not sure if this question is applicable.

Conclusion
First if you are reading this, thank you very much for your generous donation of time and effort. I truly appreciate it!

I check this email consistently so it is the best way to contact me. If you are concerned I am not a real person (in this day and age, I wouldn't blame you) I also work at DSU, so you can find me in the directory at dsu.edu/directory

I look forward to your response.

Sincerely,
Alex Wollman

PS. I took the liberty of replying to your email and to the OSv mailing list so maybe others can chime in.

Nadav Har'El

unread,

Aug 24, 2025, 9:44:13 AM8/24/25

to Waldek Kozaczuk, Alex Wollman, OSv Development

On Thu, Aug 21, 2025 at 7:31 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:

Hi,

I'm sorry for not getting back to you sooner.

Please take a look at my questions and answers below. You may also find reading the comments of this issue helpful - https://github.com/cloudius-systems/osv/issues/651.

On Thu, Aug 7, 2025 at 1:36 PM Alex Wollman <alex.w...@gmail.com> wrote:
Greetings Mr.Kozaczuk,

Bottom Line Up Front
My name is Alex Wollman, a PhD student at Dakota State University, and I am contacting you to inquire about the memory management and page allocation code for OSv. If you have time to respond to my questions I would greatly appreciate it, but if not I also understand you are a very busy individual. I also understand my first email to you is unreasonably long, for which I apologize. My personal bias is to provide more information up-front than a lengthy back-and-forth. I hope this is not too much of an inconvenience, as I value your opinion.

Context
My name is Alex Wollman and I am a Cyber Operations PhD student at Dakota State University in Madison, SD. My dissertation topic is focused on the security of unikernels, and I am currently using OSv as a case study to implement Address Space Layout Randomization. I have 2 papers published indicating the need for the research (https://ieeexplore.ieee.org/document/10778787 and the preprint for the 2nd one is here https://arxiv.org/abs/2412.10904.) I've been reading about unikernels for approximately 2 years now and they are an extremely fascinating topic in the field of computing. Especially now with the drive to make smaller and smaller compute environments viable.

I have seen all the work you've done in the Cloudius-osv repository and found your email address in the Google Groups for OSv. First, thank you for all the work you've done on this project! As I read through the code it's very impressive work.

If you have time to answer a few questions I have about the memory management and page allocation code I would truly appreciate it, however I also understand that you are a very busy individual and may not have time.

Implementation
I have implemented a rudimentary ASLR for the stack in custom x64 applications by modifying the init_stack function within the threading codebase (osv/arch/x64/arch-switch.hh line 160 begins the init_stack function, and I can provide more details upon request.) However my implementation just chooses a "random" location within the allocated memory (0x100000 in size) and sets the "stack.top" value appropriately. This will result in "0x100000 - stack.top" bytes of memory being allocated but always unused, which is not optimal.
How exactly do you allocate stack memory to make it "random"? I think you will need to differentiate between kernel and application thread stacks.

Hi,

As I explained a decade ago (wow...) in https://github.com/cloudius-systems/osv/issues/651#issuecomment-136970958, init_stack() is used mostly by internal threads (e.g., implementation of the filesystem), and application threads which use pthread do something else - using mmu::map_anon() - which I think should be easier to randomize since you just need to pick a random place in the address space instead of the next available one.

Using realloc for the stack has resulted in page_fault errors and malloc has not seemed appropriate as the memory has already been allocated at the point I begin my work (at least according to the program flow.) I am willing to provide more details upon request to provide more context.

Do you have a branch I can look at?

Attempting to implement heap based ASLR is a completely different story. In the Linux kernel there is a brk structure to track the starting address so when the program executes memory can be assigned appropriately. Obviously unikernels operate totally differently, and this forms the basis of my questions.

Questions
How do I identify where the heap is assigned in the code, and is there "a heap" for applications?
From what I can tell given the documentation and functionality, there is no 'brk' structure to track the heap. In fact I have not been able to identify a specific page or code functionality that identifies a heap at all (though through developmental testing I know there is one.) My research has taken me down the MMU code path and when I get to physical memory discovery and page assignment I'm afraid I've gone too far. Is there any guidance you can provide to help me navigate page creation and management?

You guessed it right. There is no concept of a "heap" in OSv. Applications executed on OSv run within the same memory space and normally integrate using the standard C library interface (malloc, free, open, etc), not at the system call level. So they do not need to know the heap.
Having said that, there is a fairly new way of executing apps - statically linked or executed via Linux dynamic linker ("ld.so") - which DO integrate at the system call level and OSv implements brk.

Given there is no brk style heap management, would randomizing the heap starting address require serious alterations to page allocations?
As I understand it, physical memory is discovered and then memory pages are created, and the heap is "just" one of those pages. If I wanted to implement an ASLR solution for the heap (and possibly a better one for the stack) I think functionality would need to be introduced at this level (memory page assignment) to identify the heap and enable downstream functionality. Is this an achievable approach?

Given there is no heap, I am not sure if this question is applicable.

I'm not sure exactly what you guys mean by "there is no heap" -

Although the physical memory is not segregated between different uses, there is definitely a separation of virtual memory addresses - malloc() returns virtual addresses from a different space from mmap().

What is true is that the kernel and userspace use the same heap - both the kernel and the user space code use malloc(), and it allocates memory from the same place.

If you have more concrete questions I'll be happy to dig in and recollect how things work, but you should note that these days I don't work full-time (or even part-time on OSv) so my memory isn't as fresh as it used to be. But I can still try to explain what I remember (or look at the code again, and remember :-)).

Conclusion
First if you are reading this, thank you very much for your generous donation of time and effort. I truly appreciate it!

I check this email consistently so it is the best way to contact me. If you are concerned I am not a real person (in this day and age, I wouldn't blame you) I also work at DSU, so you can find me in the directory at dsu.edu/directory

I look forward to your response.

Sincerely,
Alex Wollman

PS. I took the liberty of replying to your email and to the OSv mailing list so maybe others can chime in.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/osv-dev/CAL9cFfN082wR9gG5sn_uj8XH_VGCWXhMUgPJgrqaL-HjXcD7Vg%40mail.gmail.com.

Alex Wollman

unread,

Aug 26, 2025, 9:58:30 PM8/26/25

to Nadav Har'El, Waldek Kozaczuk, OSv Development

Thanks for your responses!

I've created a repository with my current code: https://github.com/Instructor123/osv/tree/appASLR_f515d191. It is pretty rudimentary at the moment, with the significant changes residing in core/app.c::start and arch/x64/arch-switch.hh::init_stack. My first time replying to a mailing list, so hopefully I didn't mess up the formatting.

On Sun, Aug 24, 2025 at 8:44 AM Nadav Har'El <n...@scylladb.com> wrote:

On Thu, Aug 21, 2025 at 7:31 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
Hi,

I'm sorry for not getting back to you sooner.

Please take a look at my questions and answers below. You may also find reading the comments of this issue helpful - https://github.com/cloudius-systems/osv/issues/651.

On Thu, Aug 7, 2025 at 1:36 PM Alex Wollman <alex.w...@gmail.com> wrote:
Greetings Mr.Kozaczuk,

Bottom Line Up Front
My name is Alex Wollman, a PhD student at Dakota State University, and I am contacting you to inquire about the memory management and page allocation code for OSv. If you have time to respond to my questions I would greatly appreciate it, but if not I also understand you are a very busy individual. I also understand my first email to you is unreasonably long, for which I apologize. My personal bias is to provide more information up-front than a lengthy back-and-forth. I hope this is not too much of an inconvenience, as I value your opinion.

Context
My name is Alex Wollman and I am a Cyber Operations PhD student at Dakota State University in Madison, SD. My dissertation topic is focused on the security of unikernels, and I am currently using OSv as a case study to implement Address Space Layout Randomization. I have 2 papers published indicating the need for the research (https://ieeexplore.ieee.org/document/10778787 and the preprint for the 2nd one is here https://arxiv.org/abs/2412.10904.) I've been reading about unikernels for approximately 2 years now and they are an extremely fascinating topic in the field of computing. Especially now with the drive to make smaller and smaller compute environments viable.

I have seen all the work you've done in the Cloudius-osv repository and found your email address in the Google Groups for OSv. First, thank you for all the work you've done on this project! As I read through the code it's very impressive work.

If you have time to answer a few questions I have about the memory management and page allocation code I would truly appreciate it, however I also understand that you are a very busy individual and may not have time.

Implementation
I have implemented a rudimentary ASLR for the stack in custom x64 applications by modifying the init_stack function within the threading codebase (osv/arch/x64/arch-switch.hh line 160 begins the init_stack function, and I can provide more details upon request.) However my implementation just chooses a "random" location within the allocated memory (0x100000 in size) and sets the "stack.top" value appropriately. This will result in "0x100000 - stack.top" bytes of memory being allocated but always unused, which is not optimal.
How exactly do you allocate stack memory to make it "random"? I think you will need to differentiate between kernel and application thread stacks.

Strictly speaking no memory is allocated, a new memory address is chosen from within the allocated region as the top and simply assigned to the stacktop variable. This is not optimal, but was a starting place.

Hi,

As I explained a decade ago (wow...) in https://github.com/cloudius-systems/osv/issues/651#issuecomment-136970958, init_stack() is used mostly by internal threads (e.g., implementation of the filesystem), and application threads which use pthread do something else - using mmu::map_anon() - which I think should be easier to randomize since you just need to pick a random place in the address space instead of the next available one.

This is interesting, thanks! I had initially looked at this location but hadn't figured out how to propagate the change. Looking back with fresh eyes, in libc/pthread.c::allocate_stack instead of passing a nullptr would a random value suffice here? I use the term "random value" loosely here to convey the meaning, as I'd likely need to ensure some amount of page alignment happens. But that would seem to alter starting address in map_anon as you suggest?

Using realloc for the stack has resulted in page_fault errors and malloc has not seemed appropriate as the memory has already been allocated at the point I begin my work (at least according to the program flow.) I am willing to provide more details upon request to provide more context.

Do you have a branch I can look at?

https://github.com/Instructor123/osv/tree/appASLR_f515d191

Attempting to implement heap based ASLR is a completely different story. In the Linux kernel there is a brk structure to track the starting address so when the program executes memory can be assigned appropriately. Obviously unikernels operate totally differently, and this forms the basis of my questions.

Questions
How do I identify where the heap is assigned in the code, and is there "a heap" for applications?
From what I can tell given the documentation and functionality, there is no 'brk' structure to track the heap. In fact I have not been able to identify a specific page or code functionality that identifies a heap at all (though through developmental testing I know there is one.) My research has taken me down the MMU code path and when I get to physical memory discovery and page assignment I'm afraid I've gone too far. Is there any guidance you can provide to help me navigate page creation and management?

You guessed it right. There is no concept of a "heap" in OSv. Applications executed on OSv run within the same memory space and normally integrate using the standard C library interface (malloc, free, open, etc), not at the system call level. So they do not need to know the heap.
Having said that, there is a fairly new way of executing apps - statically linked or executed via Linux dynamic linker ("ld.so") - which DO integrate at the system call level and OSv implements brk.

I'm working on an old commit due to self-imposed research restrictions, where was this feature introduced?

Given there is no brk style heap management, would randomizing the heap starting address require serious alterations to page allocations?
As I understand it, physical memory is discovered and then memory pages are created, and the heap is "just" one of those pages. If I wanted to implement an ASLR solution for the heap (and possibly a better one for the stack) I think functionality would need to be introduced at this level (memory page assignment) to identify the heap and enable downstream functionality. Is this an achievable approach?

Given there is no heap, I am not sure if this question is applicable.

I'm not sure exactly what you guys mean by "there is no heap" -
Although the physical memory is not segregated between different uses, there is definitely a separation of virtual memory addresses - malloc() returns virtual addresses from a different space from mmap().

What is true is that the kernel and userspace use the same heap - both the kernel and the user space code use malloc(), and it allocates memory from the same place.

In Linux there's a special structure that tracks the heap per-application, in part, so that it can be randomized. From what I can find (or can't find) in OSv, is that there is no code that specifically identifies a region and treats it as "the heap." In the core/mempool.cc file (which seems to be the home of vma - physical relationships) if I'm reading the code correctly, the physical addresses are identified and managed so that virtual addressing can take over, and the heap is simply assigned from a block of virtual memory. What I would want, or expect, to see is a region extracted and treated specially as "the heap" in order to give that blob special treatment. Is there such a structure or assignment and I simply missed it? You mention the difference between mmap and malloc return values, how do they know where to return memory from?

If you have more concrete questions I'll be happy to dig in and recollect how things work, but you should note that these days I don't work full-time (or even part-time on OSv) so my memory isn't as fresh as it used to be. But I can still try to explain what I remember (or look at the code again, and remember :-)).

I'm pretty far out of my depth at this level of system and memory management, so any and all help is greatly appreciated! I feel fortunate if I can get through the code with any understanding. Thanks very much for your reply!

Thanks for replying!

Conclusion
First if you are reading this, thank you very much for your generous donation of time and effort. I truly appreciate it!

I check this email consistently so it is the best way to contact me. If you are concerned I am not a real person (in this day and age, I wouldn't blame you) I also work at DSU, so you can find me in the directory at dsu.edu/directory

I look forward to your response.

Sincerely,
Alex Wollman

PS. I took the liberty of replying to your email and to the OSv mailing list so maybe others can chime in.

Thank you very much for your time and help, and for forwarding my email along!

Cheers,

Alex

Reply all

Reply to author

Forward

0 new messages