Windows does overcommit stacks !

Bonita Montero

unread,

Nov 9, 2022, 10:16:24 PM11/9/22

to

This is a little test-program that proofs that Windows does overcommit
stacks:

#include <Windows.h>
#include <iostream>
#include <atomic>
#include <vector>

using namespace std;

int main()
{
static SYSTEM_INFO si;
GetSystemInfo( &si );
HANDLE hThread = CreateThread( nullptr, 0,
[]( LPVOID lpvThreadParam ) -> DWORD
{
ULONG_PTR lowLimit, highLimit;
GetCurrentThreadStackLimits( &lowLimit, &highLimit );
bool first = true;
for( atomic_char *p = (atomic_char *)highLimit; ; )
__try
{
p -= si.dwPageSize;
(void)p->load( memory_order_relaxed );
MEMORY_BASIC_INFORMATION mbi;
char const *scn = (char *)lowLimit;
size_t allocated = 0;
for( ; scn < (void *)highLimit; )
if( VirtualQuery( scn, &mbi, sizeof mbi ) == sizeof mbi )
if( mbi.AllocationBase == (void *)lowLimit )
scn = (char *)mbi.BaseAddress + mbi.RegionSize,
allocated += mbi.State == MEM_COMMIT ? mbi.RegionSize : 0;
else
break;
else
return EXIT_FAILURE;
cout << ", " + first * 2 << allocated / si.dwPageSize;
first = false;
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
break;
}
cout << endl;
return 0;
}, nullptr, 0, nullptr );
(void)WaitForSingleObject( hThread, INFINITE );
}

yx ma

unread,

Nov 10, 2022, 2:32:52 AM11/10/22

to

?

Michael S

unread,

Nov 10, 2022, 5:59:37 AM11/10/22

to

On Thursday, November 10, 2022 at 5:16:24 AM UTC+2, Bonita Montero wrote:
> This is a little test-program that proofs that Windows does overcommit
> stacks:

You obviously don't know the meaning of the word 'overcommit' in
application to virtual memory.
FYI, unlike another popular OS, Windows *never* overcommits.
Neither stack, nor heap.

Bonita Montero

unread,

Nov 10, 2022, 6:42:24 AM11/10/22

to

Am 10.11.2022 um 11:59 schrieb Michael S:

> FYI, unlike another popular OS, Windows *never* overcommits.

Try it out yourself:

#include <Windows.h>
#include <iostream>
#include <thread>
#include <vector>
#include <latch>
#include <memory>

using namespace std;

using XHANDLE = unique_ptr<void, decltype([]( void *h ) { h && h !=
INVALID_HANDLE_VALUE && CloseHandle( h ); })>;

int main()
{
constexpr unsigned N_THREADS = 0x10000;
vector<XHANDLE> threads;
threads.reserve( N_THREADS );
static latch latSync( N_THREADS );
for( unsigned t = N_THREADS; t--; )
{
auto threadFn = []( LPVOID ) -> DWORD { latSync.arrive_and_wait();
return 0; };
threads.emplace_back( CreateThread( nullptr, 0x1000000, threadFn,
nullptr, 0, nullptr ) );
if( !threads.back().get() )
cout << "out of resources" << endl;
}
threads.resize( 0 );

}

This demo will allocate 2 ^ 16 threads with one terabyte of stack.
But as the stack on Windows is overcommitted, i.e. actually committed
when it is touched, this program won't crash !

Michael S

unread,

Nov 10, 2022, 7:34:57 AM11/10/22

to

Ones again, you don't know the meaning of 'overcommit'.
As long as area that was successfully committed either by
VirtualAlloc(..., MEM_COMMIT, ...) or by other means is
guaranteed to be legal to access from user mode, it's not called
'overcommitment' even when the process of access begins by
page fault.

Also, it sounds like you don't know the meaning of the word
'commit' in application to virtual memory. It seems, you think
that 'commit' means 'makes page resident in physical memory',
but it's not so, at least not in the language used by Window
documentation.

Bonita Montero

unread,

Nov 10, 2022, 8:08:47 AM11/10/22

to

Am 10.11.2022 um 13:34 schrieb Michael S:

> Ones again, you don't know the meaning of 'overcommit'.
> As long as area that was successfully committed either by
> VirtualAlloc(..., MEM_COMMIT, ...) or by other means is
> guaranteed to be legal to access from user mode, it's not
> called 'overcommitment' even when the process of access
> begins by page fault.

With stacks things are different. The commit is done by the kernel when
you hit the guard page. If you touch the stack's address range beyond
the guard page the application crashes. If the kernel can't dynamically
commit the memory for the guard page you hit you get a SEH guard page
exeption - earlier than with the last valid position of the guard page
when the stack "successfully" extends to its maximum range. So if you
have a default stack size of one MB, you might get one MB minus the
size of the (last) guard page, but this actually might not happen if
the system runs out of memor meanwhile.

Öö Tiib

unread,

Nov 10, 2022, 8:19:03 AM11/10/22

to

So if commit is done lazily on need then there are no overcommit.
Process gets SEH exception that can be even caught and handled.
Way better than what C++ provides ... going over automatic storage
limit is simply not defined.

Bonita Montero

unread,

Nov 10, 2022, 8:36:15 AM11/10/22

to

Am 10.11.2022 um 14:18 schrieb Öö Tiib:

> So if commit is done lazily on need then there are no overcommit.

A lazy commit which may fail is overcomitting.

> Process gets SEH exception that can be even caught and handled.

Of course, but the according guard-page might not become a comitted
page then.

Try it out yourself:

#include <Windows.h>
#include <iostream>
#include <thread>
#include <vector>

#include <barrier>
#include <memory>
#include <atomic>

using namespace std;

using XHANDLE = unique_ptr<void, decltype([]( void * h ) { h&& h !=

INVALID_HANDLE_VALUE && CloseHandle( h ); })> ;

int main( int argc, char ** )
{
constexpr unsigned N_THREADS = 1 << 10;
constexpr size_t STACK_SIZE = 1 << 30;

static SYSTEM_INFO si;
GetSystemInfo( &si );

static barrier barSync( N_THREADS );
static atomic<size_t> sumComitted( 0 );
static bool touch = argc >= 2;
size_t sumReserved = 0;

vector<XHANDLE> threads;
threads.reserve( N_THREADS );

for( unsigned t = 0; t != N_THREADS; ++t, sumReserved += STACK_SIZE )

{
auto threadFn = []( LPVOID ) -> DWORD
{

barSync.arrive_and_wait();
if( !touch )
return 0;

ULONG_PTR lowLimit, highLimit;
GetCurrentThreadStackLimits( &lowLimit, &highLimit );

atomic_char *pScn = (atomic_char *)highLimit;
for( ; ; )
__try
{
pScn -= si.dwPageSize;
(void)pScn->load( memory_order::relaxed );
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
sumComitted += highLimit - (size_t)pScn;
break;
}
barSync.arrive_and_wait();
return 0;
};
threads.emplace_back( CreateThread( nullptr, STACK_SIZE, threadFn,

nullptr, 0, nullptr ) );

if( !threads.back() )
{
cout << "out of resources: " << t -1 << " threads" << endl;
break;
}
}
threads.resize( 0 );
if( touch )
cout << trunc( 100.0 * (ptrdiff_t)sumReserved / (ptrdiff_t)sumComitted
+ 0.5 ) << "%" << endl;
}

Bonita Montero

unread,

Nov 14, 2022, 4:03:58 PM11/14/22

to

To dispel the doubts that Windows Stacks is overcommitted, I wrote a
small program that creates threads recursively and outputs every second
how many threads have been created so far. Unless you set something else
in the linker, Windows reserves one megabyte of address space for each
new stack. I can easily create 250,000 threads on my machine with this
program, which then consumes 250 gigabytes of address space. If all this
were committed without being physically assigned, then I would need at
least a lot of swap, which would keep the available swap in case the
committed pages were also written.

Here's the code:

#include <iostream>
#include <vector>
#include <thread>
#include <functional>
#include <semaphore>
#include <chrono>
#include <syncstream>

using namespace std;
using namespace chrono;

int main()
{
vector<jthread> threads;
threads.reserve( 1'000'000 );
function<void ()> threadFn;
atomic_uint32_t n;
counting_semaphore semFinish( 0 );
steady_clock::time_point start = steady_clock::now();
atomic_uint lastElapsed = 0;
auto create = [&]()
{
try
{
threads.emplace_back( threadFn );
++n;
unsigned elapsed = (unsigned)duration_cast<seconds>(
steady_clock::now() - start ).count();
if( elapsed > lastElapsed )
osyncstream( cout ) << n << endl,
lastElapsed = elapsed;
semFinish.acquire();
}
catch( system_error const & )
{
semFinish.release( n );
}
};
(threadFn = create)();
threads.resize( 0 );
cout << n << endl;
}

Öö Tiib

unread,

Nov 15, 2022, 7:09:57 AM11/15/22

to

On Monday, 14 November 2022 at 23:03:58 UTC+2, Bonita Montero wrote:
> To dispel the doubts that Windows Stacks is overcommitted, I wrote a
> small program that creates threads recursively and outputs every second
> how many threads have been created so far. Unless you set something else
> in the linker, Windows reserves one megabyte of address space for each
> new stack. I can easily create 250,000 threads on my machine with this
> program, which then consumes 250 gigabytes of address space. If all this
> were committed without being physically assigned, then I would need at
> least a lot of swap, which would keep the available swap in case the
> committed pages were also written.

The overcommit involves committing. Reserve is not commit. Letting
software to reserve more than is physically available is allowing
over-reserve not doing overcommit. The outcome of over-reserve and
overcommit is different. Overcommit results with OOM killer killing
processes, over-reserve results with processes attempting to commit
getting exceptions.

Bonita Montero

unread,

Nov 15, 2022, 8:23:22 AM11/15/22

to

You don't understood what I wrote.
And I guess you aren't qualified to discuss the issue.

Michael S

unread,

Nov 15, 2022, 9:23:37 AM11/15/22

to

It seems to me that Öö Tiib is mostly correct.
Except that I don't expect that processes are getting exceptions when
attempting to commit. I expect that software entity that attempts to
commit gets error code from the system call and then, in turn, raises
an exception.
But if said entity does not run in context of the process then result is the same.
I don't know fine details and don't feel that they matter all that much.
What *does* matter is that over-reserve is not the same as over-commit and that
observed behavior does not prove over-commit.

Bonita Montero

unread,

Nov 15, 2022, 10:39:45 AM11/15/22

to

Am 15.11.2022 um 15:23 schrieb Michael S:

> It seems to me that Öö Tiib is mostly correct.
> Except that I don't expect that processes are getting exceptions when

> attempting to commit. ...

For Linux this is always true with overcomitting enabled, for Windows it
it true when one of the guard pages of the stack is hit and the system
can't assign a physical page for that guard page.

> I expect that software entity that attempts to commit gets error
> code from the system call and then, in turn, raises an exception.

When you touch a guard page there's no system call.
You could simply walk down the whole stack until the last guard
page on thread creation if you need reliable stack allocation.

> What *does* matter is that over-reserve is not the same as over-commit

> ...

There's nothing like over-reserving, it's really over-comitting since
the actual commit may fail if you touch the region of guard pages of
the stack. There are usually two or three guard pages of the stack.
I don't know what was Microsoft's decision to have a small amount of
guard pages. I'd chosen that the whole stack region would be comittable
on access except for the page at the bottom of the stack region. This
f.e. would make an alloca() just a subtration from the current stack
pointer. Actually there's a special function called which compares
the final stack pointer after allocation by the lower stack limit
which can be found in the thread information block (fs:[0x10] on
x64) and walks the stack pages if appropriate.

Bonita Montero

unread,

Mar 20, 2023, 1:43:56 PM3/20/23

to

Here's another proof that Windows does overcommit stacks:

#include <Windows.h>
#include <iostream>
#include <sstream>

#pragma warning(disable: 6387) // parameter ... could ne null

using namespace std;

int main( int argc, char **argv )
{
auto stackThread = []( LPVOID lpvThreadParam ) -> DWORD
{
char const *stackTop, *stackBottom;
GetCurrentThreadStackLimits( &(ULONG_PTR &)stackBottom, &(ULONG_PTR
&)stackTop );

SYSTEM_INFO si;
GetSystemInfo( &si );

MEMORY_BASIC_INFORMATION mbi;
auto query = [&]( char const *p ) -> char const *
{
if( VirtualQuery( p, &mbi, sizeof mbi ) != sizeof mbi )
ExitProcess( EXIT_FAILURE );
return (char *)mbi.AllocationBase;
};
char const *base = query( stackBottom ), *p;
do
{
cout << mbi.RegionSize / si.dwPageSize << ": ";
unsigned n = 0;
auto append = [&]<typename ... T>( T &&... values ) { if( n++ ) cout
<< ", "; ((cout << values), ...); };
if( mbi.State != MEM_FREE )
if( mbi.State == MEM_COMMIT )
append( "comitted" );
else if( mbi.State == MEM_RESERVE )
append( "reserved" );
else
append( "S: 0x", hex, mbi.State );
if( !mbi.Protect )
append( "unacessible" );
else if( mbi.Protect & PAGE_GUARD )
append( "guard page" );
else if( mbi.Protect == PAGE_READWRITE )
append( "read-write" );
else
append( "P: 0x", hex, mbi.Protect );
cout << endl;
p = (char *)mbi.BaseAddress + mbi.RegionSize;
} while( query( p ) == base );
return 123;
};
HANDLE hThread = CreateThread( nullptr, 0x100000, stackThread, (void
*)(ptrdiff_t)(argc >= 2), STACK_SIZE_PARAM_IS_A_RESERVATION, nullptr);
WaitForSingleObject( hThread, INFINITE );
CloseHandle( hThread );
return 0;
}

This prints the attributes of the pages in the range of the stack.