Static memory allocation

Bob Masta

unread,

Jul 2, 2004, 8:23:55 AM7/2/04

to

I am wondering about the differences between
allocating uninitialized memory in a .data? (BSS)
section versus Windows GlobalAlloc. The former
seems better in every way I can think of, but my
understanding of protected mode memory management
is lacking. In particular, allocating several large
blocks via BSS makes it appear they require
contiguous memory, whereas GlobalAlloc can
use any memory it can find. If that were true,
there might be some (unusual) fragmented memory
circumstance where the BSS program couldn't load,
but the GlobalAlloc program could. However,
this wouldn't be an issue if the BSS memory
could be pieced together from separate chunks
via the memory manager at load time. Does it
work that way?

Or is there *any* reason to prefer GlobalAlloc
for memory that is used for the life of the program?

Thanks!

Bob Masta
dqatechATdaqartaDOTcom

D A Q A R T A
Data AcQuisition And Real-Time Analysis
www.daqarta.com

Robert Redelmeier

unread,

Jul 2, 2004, 10:46:10 AM7/2/04

to

Bob Masta <NoS...@daqarta.com> wrote:
> I am wondering about the differences between
> allocating uninitialized memory in a .data? (BSS)
> section versus Windows GlobalAlloc. The former

Sorry, I'm not an MS-Windows expert.

> seems better in every way I can think of, but my
> understanding of protected mode memory management is lacking.
> In particular, allocating several large blocks via BSS
> makes it appear they require contiguous memory, whereas
> GlobalAlloc can use any memory it can find.

The OS will scrape up all VM it needs. The page table
mechanism will make it appear contiguous. On a good
OS, all pages will be mapped to the zero page, fault
on writing when a page of real RAM will be allocated.

malloc() can be free()d, but so can .data BSS with
brk(). Linux _does not_ have a malloc() syscall,
it is a creation of libc.

-- Robert

Matt Taylor

unread,

Jul 3, 2004, 3:34:26 AM7/3/04

to

"Bob Masta" <NoS...@daqarta.com> wrote in message
news:40e54d1...@news.itd.umich.edu...

> I am wondering about the differences between
> allocating uninitialized memory in a .data? (BSS)
> section versus Windows GlobalAlloc. The former
> seems better in every way I can think of, but my
> understanding of protected mode memory management
> is lacking. In particular, allocating several large
> blocks via BSS makes it appear they require
> contiguous memory, whereas GlobalAlloc can
> use any memory it can find. If that were true,
> there might be some (unusual) fragmented memory
> circumstance where the BSS program couldn't load,

This seems unlikely unless you are dealing with a very large program. By
very large I am referring to something larger than 1.5 GB when mapped into
virtual memory.

When a process is created, at least on NT systems, a file mapping object is
allocated backed by the initial executable file. Initially the process is
just an empty PEB structure with roughly 2 GB of free user address space and
an associated file mapping object. The initial file mapping object is, I
believe, the first executable that gets mapped into the newly created
process. If not, then it is right after critical system DLLs are mapped. At
this point in time the address space is unfragmented, so if a .exe file with
BSS cannot load, then surely it GlobalAlloc will simply result in a runtime
failure.

> but the GlobalAlloc program could. However,
> this wouldn't be an issue if the BSS memory
> could be pieced together from separate chunks
> via the memory manager at load time. Does it
> work that way?

PE sections must be contiguous. PE files themselves must be contiguous in
virtual memory, though under certain circumstances you can create gaps
between sections.

> Or is there *any* reason to prefer GlobalAlloc
> for memory that is used for the life of the program?

None whatsoever. It is already highly unlikely that a PE file would be
unable to load in 2 GB of address space. Collisions are solved by using
relocation data (/FIXED:NO switch to the MS linker).

For that matter, I wouldn't use GlobalAlloc at all. GlobalAlloc(x) is nearly
synonymous with HeapAlloc(GetProcessHeap(), 0, x), the difference being that
the former has extra overhead and headache. It is purely a carry-over from
the Win16 days, and the moveable/discardable flags are completely ignored by
the virtual memory system.

-Matt

Bob Masta

unread,

Jul 4, 2004, 2:15:21 PM7/4/04

to

Mat, thanks for your reply. But I'm still unclear on this.
I understand that PE sections are contiguous in virtual
memory, but I am wondering about physical memory.
If physical memory is fragmented (for whatever reason),
can the OS piece together multiple fragments and make
them appear as contiguous virtual memory at load time?

My thought was that if the OS required contiguous physical
memory at load time, then there might be a benefit to
making the load image smaller and later allocating
memory that had no restrictions on it physical location
with GlobalAlloc.

If the OS can use fragmented phsical memory at load
time to make a contiguous image, then I understand
that there would be no point to later GlobalAllocs.

Matt Taylor

unread,

Jul 4, 2004, 4:39:20 PM7/4/04

to

"Bob Masta" <NoS...@daqarta.com> wrote in message

news:40e7f41...@news.itd.umich.edu...
<snip>

> Mat, thanks for your reply. But I'm still unclear on this.
> I understand that PE sections are contiguous in virtual
> memory, but I am wondering about physical memory.
> If physical memory is fragmented (for whatever reason),
> can the OS piece together multiple fragments and make
> them appear as contiguous virtual memory at load time?

The whole point of virtual memory is to make physical fragmentation a
non-issue.

> My thought was that if the OS required contiguous physical
> memory at load time, then there might be a benefit to
> making the load image smaller and later allocating
> memory that had no restrictions on it physical location
> with GlobalAlloc.

There used to be no concept of allocating physical memory in Win32. That
changed when Intel introduced PSE-36, and Windows responded by adding some
extensions that would allow applications to allocate physical memory and map
it whenever desired. The core Win32 model still is utterly divorced from
physical memory itself.

> If the OS can use fragmented phsical memory at load
> time to make a contiguous image, then I understand
> that there would be no point to later GlobalAllocs.

The .bss section is not stored in the PE file. It should be allocated
through NtAllocateVirtualMemory. This would impose no requirement of
physical contiguity.

For sections that are stored in the PE file, I don't believe there is any
requirement of physical contiguity either. File-mapping objects used for
executables are backed by the file system. Since all of the physical memory
management is done by the file system, the requirement of physical
contiguity depends solely on the file system's capabilities. Given the
implications, it seems unlikely that the file system would be incapable of
managing non-contiguous files.

-Matt

Randall Hyde

unread,

Jul 4, 2004, 6:20:41 PM7/4/04

to

"Matt Taylor" <spam...@crayne.org> wrote in message
news:AKZFc.9246$Bv.9...@twister.tampabay.rr.com...

> "Bob Masta" <NoS...@daqarta.com> wrote in message
> news:40e7f41...@news.itd.umich.edu...
>

> The .bss section is not stored in the PE file. It should be allocated
> through NtAllocateVirtualMemory. This would impose no requirement of
> physical contiguity.
>

Well, technically the .bss section need not appear in a PE file, but the
truth
is that many compilers (particularly from Microsoft) tend to allocate
.bss variables in the .data section for performance reasons (mapping
sections
to VM pages in memory, allowing the use of the disk space as VM space).
IIRC, there are linker/compiler options available on most compilers to do
away with this translation, but the default is to merge the .bss and .data
sections (treating the .bss objects as though you'd initialized them with
zero).
As this is a compiler issue, YMMV. But definitely don't count on .bss
variables *not* appearing in the PE file.
Cheers,
Randy Hyde

David J. Craig

unread,

Jul 4, 2004, 6:49:42 PM7/4/04

to

I cannot believe the amount of misinformation being imparted. Physical
memory is something that doesn't exist to any program running under the
Win32 subsystem of Windows NT/2000/XP/2003. Forget about it and don't even
use the words "physical memory" again. If you are writing drivers to
support hardware that does DMA, you might need to use those words, but not
in this case.

Have you read the book "Inside Windows NT" by Helen Custer? Read about
"virtual address space" and other topics. The answer to the only question
asked in the last response is "YES". This of course leads to the remainder
of your post which is incorrect. Multi-tasking and multi-processing add
needs for allocation of memory that cannot be allocated as part of the
program image. How can you realistically determine how many buffers you may
need to process requests if you can't be sure how many processors are going
to be in the system you are executing upon? You also can't statically
determine how many requests will be need to be processed in a fixed time.

>From your signature it appears that real-time data acquisition is your area
of expertise. Just as you can't process incoming data with hardware that
can only handle one percent of the data being received, the same is true of
Windows applications. Unlike your area, you cannot just size the 'job' to
be done. You have to be far more flexible. Let's go the old USAF Sage Air
Defense System that was built by IBM in the 1950's. Each part of the US had
so many radars, a fairly fixed number, feeding information to one computer
made of first generation hardware using tubes. If the radars had been
upgraded to cover more airspace without hardware and even software upgrades,
there would have been choke points that would have caused data to be lost.
Since the radar units had their data recorded on to the computer's mass
storage without any CPU involvement, there was a fixed amount of data per
time unit that could be handled.

The long point I was making is that each of us by specializing in one area
or another have ideas that won't work in other ideas. As I mentioned above,
some DMA devices require physical memory to be contiguous, but referencing
memory from an application doesn't even require that the data being
referenced be located in physical memory when the reference is attempted.
The OS will page in the appropriate 'page' and then restart the instruction.
This also applies to code pages so your executable program may only have one
page allocated for code especially if memory is over committed. Try running
Windows XP Pro and Microsoft Word on a 64KB system and see how slow it runs.
I recommend that XP Pro have one GB to keep paging to a minimum, but Windows
does like to page out some stuff if it gets bored from lack of use. Paging
out code is relatively inexpensive compared to data because it only has to
mark that page as being non-present while with data it must copy that data
page to the page file on disk before it can be marked as paged out.

Also read the Intel manuals on page tables which is where the logical
addresses get converted to physical addresses. It is done by the hardware
as enabled by the OS. The other option is to just not worry about it and
ignore the world of 'real-time'. Windows is NOT real-time and cannot be
made 'real-time' by a programmer writing drivers or applications. If you
want to slow your system down, just force your hard and optical drives on
the IDE bus to use PIO instead of Ultra DMA. You too can have a XT, 8088,
class system if you also only install 128MB or less of physical memory and
use PIO for mass storage.

"Bob Masta" <NoS...@daqarta.com> wrote in message

news:40e7f41...@news.itd.umich.edu...

arargh4...@now.at.arargh.com

unread,

Jul 4, 2004, 7:15:45 PM7/4/04

to

What I've usually seen is that the .bss section appears in the PE file
with a virtual start & length, and with a file start & length of Zero.

So the only part in the file is the section header, without any
contents.

--
Arargh407 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html

To reply by email, remove the garbage from the reply address.

Matt Taylor

unread,

Jul 5, 2004, 2:35:07 AM7/5/04

to

"Randall Hyde" <spam...@crayne.org> wrote in message
news:Wk%Fc.5477$oD3...@newsread1.news.pas.earthlink.net...

>
> "Matt Taylor" <spam...@crayne.org> wrote in message
> news:AKZFc.9246$Bv.9...@twister.tampabay.rr.com...
> > "Bob Masta" <NoS...@daqarta.com> wrote in message
> > news:40e7f41...@news.itd.umich.edu...
> >
> > The .bss section is not stored in the PE file. It should be allocated
> > through NtAllocateVirtualMemory. This would impose no requirement of
> > physical contiguity.
> >
>
> Well, technically the .bss section need not appear in a PE file, but the
> truth
> is that many compilers (particularly from Microsoft) tend to allocate
> .bss variables in the .data section for performance reasons (mapping
> sections
> to VM pages in memory, allowing the use of the disk space as VM space).
> IIRC, there are linker/compiler options available on most compilers to do
> away with this translation, but the default is to merge the .bss and .data
> sections (treating the .bss objects as though you'd initialized them with
> zero).

<snip>

Disk space can't completely substitute for the page file. It only works when
the page has not been written to, and then it can be discarded from memory
without having to flush the contents to disk. The pages are marked
copy-on-write, so as soon as variables are initialized, the pages can no
longer be discarded. Pages that are not modified can be discarded since the
pagefile is already up-to-date, so really this wouldn't have any boost in
performance. Instead it will actually degrade performance -- lots of 0's
will unnecessarily get loaded from disk.

-Matt

f0dder

unread,

Jul 5, 2004, 5:04:40 AM7/5/04

to

<snip>

> Disk space can't completely substitute for the page file. It only
> works when the page has not been written to, and then it can be
> discarded from memory without having to flush the contents to disk.
> The pages are marked copy-on-write, so as soon as variables are
> initialized, the pages can no longer be discarded. Pages that are not
> modified can be discarded since the pagefile is already up-to-date,
> so really this wouldn't have any boost in performance. Instead it
> will actually degrade performance -- lots of 0's will unnecessarily
> get loaded from disk.
>

BSS generally works by having a .bss section with a rawsize of 0
(ie, no data stored in file), or having virtual size of your .data
section much larger than the rawsize - so there's no 0's being loaded
from disk unless you're using bad compilers/assemblers/linkers.

I can't think of any reason why using BSS storage instead of
dynamically allocated data would be A Bad Thing, at least not on
win32. Of course this implies that your storage demands are static,
but the OP probably knows his demands better than I do so I'm not
going to do any speculation on that part :)

hutch--

unread,

Jul 5, 2004, 8:16:00 AM7/5/04

to

Bob,

Just a comment on the old GlobalAlloc() API call, much of its
documented capacity is out of date except for the GMEM_FIXED version
that allocates fixed memory and here it performs well on every system
I have tested it with. I have never been able to get a timing on its
allocation speed and it can comfortably allocate over 1 gig of memory
if the box has enough installed.

You are probably safer using VirtualAlloc() on later systems but it is
a bit more complicated to use. With an assembler like MASM,
uninitialised data does not directly increase the executable file size
and it is useful for many things but dynamically allocated memory is a
lot more flexible and you can dump it when its not needed where
uninitialised memory is with you for the life of the app.

Regards,

hutch at movsd dot com

Matt Taylor

unread,

Jul 5, 2004, 1:30:06 PM7/5/04

to

"hutch--" <spam...@crayne.org> wrote in message
news:af910ce4.04070...@posting.google.com...

> Bob,
>
> Just a comment on the old GlobalAlloc() API call, much of its
> documented capacity is out of date except for the GMEM_FIXED version
> that allocates fixed memory and here it performs well on every system
> I have tested it with. I have never been able to get a timing on its
> allocation speed and it can comfortably allocate over 1 gig of memory
> if the box has enough installed.

<snip>

That's because it calls HeapAlloc AKA RtlAllocateHeap and allocates from the
process heap. As far as I can tell, most of the old GMEM flags are ignored.
There really is no sense in calling GlobalAlloc when RtlAllocateHeap is more
flexible, easier to work with, and minutely faster.

-Matt

Relvinian

unread,

Jul 5, 2004, 4:40:59 PM7/5/04

to

Only time I ever use GlobalAlloc() is when I need to use the GMEM_MOVABLE so
I have a HANDLE to the memory to pass to API calls that require the handle.

Other then that, if I'm allocating less then 10meg of memory, I use
HeapAlloc(), if over, I use VirtualAlloc().

Relvinian

"Matt Taylor" <spam...@crayne.org> wrote in message
news:L3gGc.9684$Bv.11...@twister.tampabay.rr.com...

hutch--

unread,

Jul 5, 2004, 8:29:37 PM7/5/04

to

Matt,

I agree that the GMEM_MOVABLE range is old tech from 16 bit Windows
but apart from the fixed memory version being fast enough, its simpler
to use which is a factor for people who are new to API style memory
allocation. For the price of a very simple macro,

alloc MACRO bcount
invoke GlobalAlloc,GMEM_FIXED or GMEM_ZEROINIT,bcount
EXITM <eax>
ENDM

You allocate memory with code as simple as,

mov hMem, alloc(4096)

Use it then ditch it after with GlobalFree(). Windows has a number of
memory allocation strategies that suit a range of purposes but this
method has simplicity going for it and it performs well for a fixed
memory requirement.

Randall Hyde

unread,

Jul 5, 2004, 9:59:45 PM7/5/04

to

"f0dder" <spam...@crayne.org> wrote in message
news:40e914df$0$181$edfa...@dtext01.news.tele.dk...
> <snip>

>
>
> I can't think of any reason why using BSS storage instead of
> dynamically allocated data would be A Bad Thing, at least not on
> win32. Of course this implies that your storage demands are static,
> but the OP probably knows his demands better than I do so I'm not
> going to do any speculation on that part :)

Well, maybe you should explain that to Microsoft.
Cause their compilers (and linkers) seem to do exactly that by default.
There are command-line options to change this, but by default...
Cheers,
Randy Hyde

Randall Hyde

unread,

Jul 5, 2004, 10:00:44 PM7/5/04

to

<arargh4...@NOW.AT.arargh.com> wrote in message
news:6l1he0tk1l5vp5801...@4ax.com...

> On Sun, 4 Jul 2004 22:20:41 +0000 (UTC), "Randall Hyde"
> <spam...@crayne.org> wrote:
>
>
> What I've usually seen is that the .bss section appears in the PE file
> with a virtual start & length, and with a file start & length of Zero.

The PE format allows this, but compilers don't have to use it.

>
> So the only part in the file is the section header, without any
> contents.
>

If your assembler/compiler does it that way. By default, most
Microsoft compilers/assemblers (I've used, anyway) do not.
Cheers,
Randy Hyde

f0dder

unread,

Jul 11, 2004, 2:58:37 PM7/11/04

to

>> I can't think of any reason why using BSS storage instead of
>> dynamically allocated data would be A Bad Thing, at least not on
>> win32. Of course this implies that your storage demands are static,
>> but the OP probably knows his demands better than I do so I'm not
>> going to do any speculation on that part :)
>
> Well, maybe you should explain that to Microsoft.
> Cause their compilers (and linkers) seem to do exactly that by
> default. There are command-line options to change this, but by
> default... Cheers,
> Randy Hyde
>

Umm, sorry, but I don't really understand what you're saying :(

Anyway, the Microsoft C++ compiler and linker implement static
storage (ie, a thing like "int bifbuff[1024*1024*50]" ) by
having .data VirtualSize larger than the PhysicalSize, and of
course having a correct SizeOfImage as well. In fact if you
compile a (somewhat silly :-) C snippet with a volatile buffer
(volatile to avoid the compiler removing it even though it
isn't used), and don't link the C runtimes in, you will get a
PE file with a .data section that has no physical content
(ie, PhysSize and RawDataOffset = 0). Tested with CL v13 and
link 7.10, but I'm pretty sure the old toolset did it the
same way.

I've seen other compilers produce a .BSS section, can't remember
if it was borland or watcom or both. There isn't much reason to
do this, and there isn't much reason not to, either - having the
static data as part of .data is probably a little more efficient
than reserving another section, but the savings (if any - I haven't
looked much into the PE loader) can probably not even be measured.

If you were referring to static vs. dynamic memory allocation,
well, I'd say use whatever fits you. If I needed, say, a
320x200x32bpp offscreen buffer for the lifetime of my application,
static storage wouldn't be bad. Most of the time I personally
prefer/need dynamic solutions, though... except sometimes when
doing quick hacks :)

Cheers