Alternative memory managers

Zilsch

unread,

Jul 3, 2004, 2:34:57 PM7/3/04

to

We've noticed that default heap implementation in MSVC6 performs really
badly in Win2000 on dual CPU machine in a multi-threaded application. Are
there any alternative memory managers that can be used in this environment?
Does there exist an API to replace the default memory manager in MSVCRT? Any
tricks?

Thanks

William DePalo [MVP VC++]

unread,

Jul 3, 2004, 4:33:27 PM7/3/04

to

"Zilsch" <z...@ztop.not> wrote in message
news:c%CFc.197954$207.2...@news20.bellglobal.com...

> We've noticed that default heap implementation in MSVC6 performs really
> badly in Win2000 on dual CPU machine in a multi-threaded application. Are
> there any alternative memory managers that can be used in this
environment?

This one, with which I have no experience, is often advertised in the
developer trade rags:

http://www.microquill.com/

> Does there exist an API to replace the default memory manager in MSVCRT?

Well, C++ allows you to override new operators and Win32 allows for a few
ways of allocating memory. So, if you knew an awful lot about how your
application uses memory you might be able to do better by coming up with
your own sub-allocation scheme assuming you are willing to handle the thread
synchronization issues on your own.

> Any tricks?

Take a look at HeapCreate(). It creates heaps in addition to the standard
process heap. HeapAlloc() has a flag which can be used to turn off its
synchronization. Of course, if you do that it is up to you to protect any
heap which is shared among threads. Note that the docs specifically warn
against doing that on the process heap.

Regards,
Will

Zilsch

unread,

Jul 3, 2004, 5:06:57 PM7/3/04

to

> Well, C++ allows you to override new operators and Win32 allows for a few
> ways of allocating memory. So, if you knew an awful lot about how your
> application uses memory you might be able to do better by coming up with
> your own sub-allocation scheme assuming you are willing to handle the
thread
> synchronization issues on your own.

Taking into account that I use 3rd party libraries, and they allocate memory
on their own, it looks impossible or at least very difficult.
Some of these 3rd party libs such as STLport are available in source code
(not the most intuitive C++ code I must say) so in principle I can modify
them to use alternative heap manager. But for other libraries I don't even
have source code - however all of them seem to be based on MSVCRT runtime
library.

> http://www.microquill.com/

> > Any tricks?
> Take a look at HeapCreate(). It creates heaps in addition to the standard
> process heap.

thanks for the info

Ivan Brugiolo [MSFT]

unread,

Jul 5, 2004, 9:59:50 PM7/5/04

to

If you use operator new/operator delete,
just implement your own operator new in global scope,
and the linker will do the rest for you.

If you relay on malloc()/free(), then there is no general replacement
facility.
[actually, I've seen commercial libraries to replace the first 5 bytes
of msvcrt!malloc to jump to a replacement allocator,
but that is not a recomended and supportable approach.]

Depending on the problems your application is suffering the most
(not expressed below), you might consider LowFragmentation heap,
a pool of private heaps,
or writing custom std::Allocator<> for your SC++L containers.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Zilsch" <z...@ztop.not> wrote in message
news:c%CFc.197954$207.2...@news20.bellglobal.com...

Zilsch

unread,

Jul 6, 2004, 2:55:05 AM7/6/04

to

> If you use operator new/operator delete,
> just implement your own operator new in global scope,
> and the linker will do the rest for you.

Like I said in a reply to William DePalo, we use 3rd party libraries, and
they allocate memory on their own.
Just replacing custom operators new/new[]/delete/delete[] will not solve the
problem
for the DLLs.

> If you relay on malloc()/free(), then there is no general replacement
> facility.
> [actually, I've seen commercial libraries to replace the first 5 bytes
> of msvcrt!malloc to jump to a replacement allocator,
> but that is not a recomended and supportable approach.]

Looks like there is no other easy way.
I am not going to replace system MSVCRT's,
just put a hacked DLL for my application.

> Depending on the problems your application is suffering the most
> (not expressed below), you might consider LowFragmentation heap,
> a pool of private heaps,
> or writing custom std::Allocator<> for your SC++L containers.

This is all good, but for me it looks like too much unnecessary development
effort to compensate for lousy heap implementation in MSVCRT. I am just
wondering how is it possible that we still have a heap manager that is
basically designed for single threaded apps in the operating system that
claims to natively support both multithreading and SMP (Win2k).

Is there a heap implementation from Microsoft that scales?
Should I try Advanced Server or Win2K Enterprise version?
Do all of them have this problem?

Zilsch

unread,

Jul 6, 2004, 2:58:02 AM7/6/04

to

> http://www.microquill.com/

FYI: they ask 5 to 25K USD depending on license terms.

Way too expensive and doesn't make any sense.

Keith MacDonald

unread,

Jul 6, 2004, 9:43:59 AM7/6/04

to

I think small object allocation was improved in VC 7.1, so you could try
that. The Loki (http://sourceforge.net/projects/loki-lib/) small object
allocator is free, but it's probably better to read about it first in
"Modern C++ design", by Andrie Alexandrescu. Another free one is Hoard
(http://sourceforge.net/projects/libhoard/).

- Keith MacDonald

"Zilsch" <z...@ztop.not> wrote in message

news:61sGc.17761$WM5.8...@news20.bellglobal.com...

Zilsch

unread,

Jul 6, 2004, 11:04:45 AM7/6/04

to

> I think small object allocation was improved in VC 7.1, so you could try
> that. The Loki (http://sourceforge.net/projects/loki-lib/) small object
> allocator is free, but it's probably better to read about it first in
> "Modern C++ design", by Andrie Alexandrescu.

I've read the book, but I guess it will not solve a memory contention issue
when multiple thread simultaneously access the heap/allocator. Another issue
is that such small object allocators tend to never release the memory they
once allocated which is not desirable in my case. Their internal storage can
only grow.

In addition I need to allocate variable length memory blocks (dynamic
arrays) for which the trick with a linked list constant time allocation
doesn't work.

> Another free one is Hoard
> (http://sourceforge.net/projects/libhoard/).

Good stuff. I will check it out.

Ivan Brugiolo [MSFT]

unread,

Jul 6, 2004, 11:35:24 AM7/6/04

to

The heap manager in msvcrt.dll is the same that is in the operative system.
[you can use a set of environment variable to have VC5-compatible heap].
The claim that the heap is single-threaded is quite untrue.
LowFragHeap is the last "flavor" of the heap that was designed
with contention avoidance in mind. Same goes for the LookAside front-end
(that at this point is quite old, but at the time it was a good
optimization).
Most of the heap complains comes from savvy user of C++ paradigms
in large applications. Certain C++ programs tend to employ deep copy
of structure way more than necessary, and, certain designs tend to have
a very high degree of constellation for allocations.
If you have a "main" class with 5-10 dependent allocations,
then no matter how good the heap can be made, you will always have problems.
Basically the allocation pattern can kill the advances in the heap code
way quicker than you might expect.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Zilsch" <z...@ztop.not> wrote in message

news:61sGc.17761$WM5.8...@news20.bellglobal.com...

Zilsch

unread,

Jul 6, 2004, 12:18:04 PM7/6/04

to

>Most of the heap complains comes from savvy user of C++ paradigms
>in large applications. Certain C++ programs tend to employ deep copy
>of structure way more than necessary, and, certain designs tend to have
>a very high degree of constellation for allocations.

We have tried WinHeap library (http://www.winheap.com/) and our application
started to run about 50% faster. Other libraries such as SmartHeap SMP and
Hoard can also deliver great speedups. There is nothing especially esoteric
in the application - just an ordinary C++ code in a multithreaded
environment on Dual CPU machine (yes, we use STL, but we don't use any
home-grown allocators of our own). You shouldn't blame us, poor victims of
less than perfect Microsoft's heap implementation.

Ivan Brugiolo [MSFT]

unread,

Jul 6, 2004, 12:28:41 PM7/6/04

to

I did not claim any implementation being perfect.
For any given allocation pattern you can write a perfectly matched allocator
given a set of requirements.
It's way harder to write a generic allocator for a generic set of
requirements.
My point was that changing the allocation pattern to match the allocator
is somewhost equivalent to change the allocator to match the allocation
pattern.
Each solution has a cost, in terms of licensing/writing a
different allocator and/or re-working the code.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Zilsch" <z...@ztop.not> wrote in message

news:10891306...@news.pubnix.net...

Zilsch

unread,

Jul 6, 2004, 12:38:19 PM7/6/04

to

> The claim that the heap is single-threaded is quite untrue.

Well, if heap access is serial, then, apparently, the single-threaded code
has been "ported" to MT environment by simply putting a global lock on top
of it, instead of implementing true MT heap which must allow parallel
allocations.

Ivan Brugiolo [MSFT]

unread,

Jul 6, 2004, 1:21:18 PM7/6/04

to

Bot the Lookaside and the LowFrag Frontend
do not take the heap lock, but they use Interlocked operations
(that emplys bus synchronization instead of OS-level synchronization).
LowFrag heap can also be thread affine.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Zilsch" <z...@ztop.not> wrote in message

news:10891319...@news.pubnix.net...

Zilsch

unread,

Jul 6, 2004, 1:43:02 PM7/6/04

to

> do not take the heap lock, but they use Interlocked operations
> (that emplys bus synchronization instead of OS-level synchronization).
> LowFrag heap can also be thread affine.

Can I configure Visual C++ runtime library to use "Lookaside" and/or "low
fragmentation heap"?

Ivan Brugiolo [MSFT]

unread,

Jul 6, 2004, 2:00:53 PM7/6/04

to

The LookAside is on by default, unless disabled for AppCompat resons.
If your C-Runtime exports the _get_heap_handle function,
then you can call HeapSetInformation on it, and get the LowFrag front-end.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Zilsch" <z...@ztop.not> wrote in message

news:10891357...@news.pubnix.net...

Zilsch

unread,

Jul 6, 2004, 3:13:00 PM7/6/04

to

> The LookAside is on by default, unless disabled for AppCompat resons.
> If your C-Runtime exports the _get_heap_handle function,
> then you can call HeapSetInformation on it, and get the LowFrag front-end.

This means we were already using the LookAside and in our performance tests
it didn't do well. We have not run into serious memory fragmentation
troubles yet so I don't see how LowFrag can be better.

Ivan Brugiolo [MSFT]

unread,

Jul 6, 2004, 3:23:17 PM7/6/04

to

Without any data from your current heap usage
(or any !heap dump), it's not possible to diagnose anything in a meaningful
manner.
I was just poiting out that the current implementations supports
lock free operations in certain code-paths.
If that is relevant to your application or not,
it cannot be told with the information provided.
Sorry.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Zilsch" <z...@ztop.not> wrote in message

news:10891411...@news.pubnix.net...

Zilsch

unread,

Jul 6, 2004, 5:11:26 PM7/6/04

to

Obviously I cannot post my company's source code. But if you want to
convince yourself take a look at this simple benchmark:
http://www.microquill.com/smartheap/shbench/bench.zip
and its results:
http://www.microquill.com/smartheapsmp/index.html

Check this one too:
http://www.winheap.com/winheap_info/benchmarks.php
You can get a trial version of WinHeap for free.

I haven't checked all the results myself but the charts are a bit alarming,
aren't they?

Ivan Brugiolo [MSFT]

unread,

Jul 6, 2004, 5:41:54 PM7/6/04

to

Similarly, we cannot look at other's implementations,
for contamination avoidance and obvious legal reasons.

The output of `!heap -p -all` and/or `!heap -s` and/or `!heap -a`
does not contain any sensitive information (or al least it's not supposed
to,
in any case it would be plain text information, that is pretty much
a gigantic list of hex numbers).

It might simply confirm that your case would not get any benefit from
the allocators available in the OS, or it might not.

So far, nothing that is actionable has been provided
(besides catalog-listing the options in the OS,
in the runtime and in the ISV implementations).

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Zilsch" <z...@ztop.not> wrote in message

news:10891482...@news.pubnix.net...

Russell Hind

unread,

Jul 7, 2004, 3:48:16 AM7/7/04

to

Have a look at http://gee.cs.oswego.edu/dl/html/malloc.html It has the
source code available and some documentation on the features implemented
in it.

It automatically creates per-thread heaps if the main heap is locked
when another thread requires memory so should perform well in
multi-threaded/processor environments. I haven't tried it myself in
VC++ though

Cheers

Russell

Brand Hunt

unread,

Jul 7, 2004, 7:42:22 PM7/7/04

to

When I worked for Rogue Wave, we used (and exclusively licensed) an
alternative memory manager from NewCode called MTS. It was extremely
fast, scalable, easy-to-use and would even benefit single-threaded
applications. You can check 'em out at http://www.newcodeinc.com

hth,
Brand

"Zilsch" <z...@ztop.not> wrote in message news:<c%CFc.197954$207.2...@news20.bellglobal.com>...

Chris Noonan

unread,

Jul 8, 2004, 3:13:19 PM7/8/04

to

"Zilsch" <z...@ztop.not> wrote in message news:<61sGc.17761$WM5.8...@news20.bellglobal.com>...

> Like I said in a reply to William DePalo, we use 3rd party libraries, and
> they allocate memory on their own.
> Just replacing custom operators new/new[]/delete/delete[] will not solve the
> problem
> for the DLLs.

malloc() and free() in Microsoft Visual C++ version 6 are
generally delegated to the WinAPI routines HeapAlloc() and
HeapFree(). Therefore try LeapHeap (http://www.leapheap.com)
which intercepts these calls from the executable and (just
about) all loaded DLLs, and vectors them to a lock-free memory
allocator.

Chris

(do not reply to the email address given; it is spammed out)

Martin Aupperle

unread,

Jul 12, 2004, 8:27:05 AM7/12/04

to

On Sat, 3 Jul 2004 17:06:57 -0400, "Zilsch" <z...@ztop.not> wrote:

>Taking into account that I use 3rd party libraries, and they allocate memory
>on their own, it looks impossible or at least very difficult.
>Some of these 3rd party libs such as STLport are available in source code
>(not the most intuitive C++ code I must say) so in principle I can modify
>them to use alternative heap manager.

First, what is called "STL" is now official part of the language (more
precisely: part of the C++ Standard Library) Unless you have very
specific situations, you do not need STLPort or other STL-LIbraries.

Second, you do not need to tweak the source code to use a different
memory manager. All functionality that needs to allocate/free memory
provides a template argument that specifies a so called Allocator.
Write your own Allocator class and instantiate the templates with
that, and you're done. (Yes: the difficult part is to write an
allocator that is substantialy better for your algorithm and operating
system and whatever than the standard-allocator ).

BTW, Stroustrup reports dramatic performance gains when using special
allocators for special situations. For the
standard-Windows-Application this gain will be less dramatic, but for
computing-intensive things it very well is.
------------------------------------------------
Martin Aupperle
------------------------------------------------

Zilsch

unread,

Jul 13, 2004, 1:53:15 PM7/13/04

to

> First, what is called "STL" is now official part of the language (more
> precisely: part of the C++ Standard Library)

It has been an official part of the language for about 5 years or more.
So what?

>Unless you have very specific situations, you do not need STLPort or other
STL-LIbraries.

STLport provides some facilities not available in other implementations such
as "debug mode".
http://www.stlport.org/doc/debug_mode.html

In reality it can be simpler to deal with a single STL implementation, which
is portable, than with a bunch of slightly differing STL libs from various
vendors.

> Second, you do not need to tweak the source code to use a different
> memory manager. All functionality that needs to allocate/free memory

> [....]

Aha, and I have to modify tons of existing and working code (BTW, other
people's code) replacing std::vector<int> with unreadable std::vector<in,
MyAllocator> and so forth....

> Yes: the difficult part is to write an allocator that is substantialy
better
> for your algorithm and operating
> system and whatever than the standard-allocator

We are talking about multi-threaded apps on SMP platform (SMP/MT). In SMP/MT
environment, outperforming MSVCRT is not difficult at all. Visual C++
runtime library is not really designed for SMP/MT - it has been tweaked (in
a simplistic way) to run on SMP/MT - but as a result we have pathetic
performance.

You can check MSVCRT source code to convince yourself.
Search for
_mlock( _HEAP_LOCK );

Ivan Brugiolo [MSFT]

unread,

Jul 13, 2004, 4:13:48 PM7/13/04

to

_mlock( _HEAP_LOCK )
is used for the now obsolete and not default SBH allocator,
and in the debug version of the C-Rutnime.

As much as you can have issues with
the multi-threaded performance of the SC++L usage in your application,
it's not a relevant argument for the case.

For eaxmple, if you set a breakpoint in ntdll!RtlpLowFragHeapAlloc,
there will be no locks whatsoever taken in the heap manager code path.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Zilsch" <z...@ztop.not> wrote in message

news:10897411...@news.pubnix.net...

Zilsch

unread,

Jul 13, 2004, 6:40:07 PM7/13/04

to

> _mlock( _HEAP_LOCK )
> is used for the now obsolete and not default SBH allocator,
> and in the debug version of the C-Rutnime.

I don't know if it is outdated or not, but the thing can still be found in
Microsoft Visual Studio .NET 2003\Vc7\crt\src\
and benchmark results are really bad for both Release and Debug.

Ivan Brugiolo [MSFT]

unread,

Jul 13, 2004, 8:33:06 PM7/13/04

to

For what I can read in the code and cross check in the debugger
the default release code path does not take a lock at the C-Runtime level.
A lock MAY be taken at the RtlAllocateHeap level, indeed.

Benchmarking the debug version of the C-Rutnime is not that interesting.
If you benchmark an application with FullPageHeap enabled and not enabled,
the results can be radically different.

Again, I second the point that some benchmarks
on certain SC++L usages can give bad results.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Zilsch" <z...@ztop.not> wrote in message

news:10897584...@news.pubnix.net...

Steve Dubak

unread,

Jul 19, 2004, 8:00:23 PM7/19/04

to

"Zilsch" <z...@ztop.not> wrote in message news:<10891306...@news.pubnix.net>...

We tried all of theose as well but found that the fasteset memory
manager came from Cherrystone Software Labs in the form of a product
called ESA. It was faster than all of them. 2x faster than smartheap
on multi processor systems. I'd be hard pressed to believe that 50%
speedup with winheap because of all of the benchmarking that we've
done, we found winheap to be the slowest of the 4 (ESA, smartheap,
hoard, winheap ... in that order).

But this problem isn't just with Microsoft, it's with all of
the OS vendors. Linux too. Linux is poor in the memory management area
in terms of application memory management via malloc/free. And STL
certainly doesn't make things faster either.

Steve
st...@barragesystems.com

Zilsch

unread,

Jul 20, 2004, 11:22:50 AM7/20/04

to

>But this problem isn't just with Microsoft, it's with all of the OS
vendors.

that's a sorry state of affairs. The big OS vendors such as Sun and
Microsoft are selling products that have proud names like "Advanced Server",
"Enterprise", "SMP", "N-way" etc and they still didn't bother themselves to
implement a decent malloc()/free()... I guess they were busy doing more
important things - crafting cute GUIs, bloating application suites, etc.

Tobias Güntner

unread,

Jul 20, 2004, 3:05:11 PM7/20/04

to

Zilsch wrote:
> and benchmark results are really bad for both Release and Debug.
>

This can also happen if the benchmark is started with a debugger
attached (yes, that's also possible in relase builds). In this case,
HeapAlloc() automatically fills all allocated memory with 0xbaadf00d
which obviously takes some time...
So I hope all benchmarks have been run without a debugger ;)

Ivan Brugiolo [MSFT]

unread,

Jul 20, 2004, 3:16:26 PM7/20/04

to

You can disable this behavior with the `-hd` option in cdb/ntsd/windbg

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Tobias Güntner" <fat...@web.de> wrote in message
news:cdjqbk$k30$06$1...@news.t-online.com...

Jack Dao

unread,

Jul 21, 2004, 7:51:47 AM7/21/04

to

"Zilsch" <z...@ztop.not> wrote in message news:<R3sGc.17820$WM5.8...@news20.bellglobal.com>...
> > http://www.microquill.com/
>
> FYI: they ask 5 to 25K USD depending on license terms.
>
> Way too expensive and doesn't make any sense.

It only doesn't make sence if 1 of the following is true:

1) You don't care about the performance of your application.
2) You're developing freeware.

If you do care about the performance of your application, or perhaps
oyu don't but your customers do, thne it makes sence to do whatever it
takes to inclrease the performance of the said application. Whether it
be ESA from cherrystone, smartheap from microquill, winheap from
whoever, or something else, by all means do it. Not doing it is not
serving in the best interests of your company and your customers.

We settled on ESA and increased our transactions from 600 transactions
per second to almost 2000 per second (we're a financial house). So who
gains? Ultimately the customer does, and by extension, the company
developing the software, and by further extension, you, the engineer,
for choosing to do something about performance.

Look into the following memory managers:

ESA http://www.cherrystonesoftware.com (Cherrystone Software Labs)
smartheap http://www.microquill.com (Microquill)
winheap http://winheap.com (bevan tech?)

We found ESA to be faster and more scalable than the others (by a long
shotan cheaper than smartheap as as well), but YMMV. Do your own
benchmarks and see which is best for your application.

But you have to care about the performance of your application to have
the motivation to do it.

Jack Dao

Joe Greer

unread,

Jul 27, 2004, 10:42:14 AM7/27/04

to

One problem that some of these allocators have is the inability to ever
return memory back to the OS. If you are writing a server that runs 24/7,
there are times when you might hit a peak of usage and allocate a huge
amount of memory for the processing. It would be nice if there were a way
to return to a more reasonable level when the demand is low.

joe

PS You can also look at www.hoard.org for an efficient free multi-threaded
memory allocator. It gives a nice scalable improvement, but also suffers
from the inability to easily return memory to the OS.

"Jack Dao" <d...@snakebrook.com> wrote in message
news:cff1ea7c.04072...@posting.google.com...

Russell Hind

unread,

Jul 27, 2004, 12:11:16 PM7/27/04

to

I'm pretty sure the link I posted below to Doug Lee's malloc is
customisable so it will return memory to the heap when 'top' gets above
a certain size.

My be worth a look if that is an issue for you.

http://gee.cs.oswego.edu/dl/html/malloc.html

Cheers

Russell

Dave Stanley

unread,

Jul 28, 2004, 5:53:45 PM7/28/04

to

"Joe Greer" <remove.th...@remove.this.nsisoftware.com> wrote in message news:<Ominrf#cEHA...@TK2MSFTNGP09.phx.gbl>...

> One problem that some of these allocators have is the inability to ever
> return memory back to the OS. If you are writing a server that runs 24/7,
> there are times when you might hit a peak of usage and allocate a huge
> amount of memory for the processing. It would be nice if there were a way
> to return to a more reasonable level when the demand is low.
>

ESA from Cherrystone Software does not suffer from this
problem. Ican sit there and watch the memory usage of the process go
up and down and memory is mapped in and out.

> joe
>
> PS You can also look at www.hoard.org for an efficient free multi-threaded
> memory allocator. It gives a nice scalable improvement, but also suffers
> from the inability to easily return memory to the OS.

Hoard also suffers from a speed problem as well. In a ESA vs
hoard benchmarking tests we've run, hoard was just plain not fast
enough.

John