[Boost-users] shared memory overhead

427 views
Skip to first unread message

Jason Sachs

unread,
May 29, 2008, 3:42:16 PM5/29/08
to boost...@lists.boost.org
I have a bunch of data I am trying to store in shared memory which is basically a vector of bytes + a vector of some metadata; both vectors, in general, grow as time goes on at an unpredictable rate, until someone stops using them. The vector lengths are extremely variable; total amount of shared memory could be as small as 10K and as large as several hundred megabytes. I will not know beforehand the amount needed.

From what I understand about Boost shared memory segments, they are not infinitely growable, so I think I have to organize my data into chunks of a more reasonable size, and allocate a series of separate shared memory segments with a few chunks in each. (there are other application-specific reasons for me to do this anyway) So I am trying to figure out what is a reasonable size is of shared memory to allocate. (probably in the 64K - 1MB range but I'm not sure)

Does anyone have any information about the overhead of a shared memory segment? Just an order of magnitude estimate, e.g. M bytes fixed + N bytes per object allocated + total size of names. Are M,N on the order of 10 bytes or 10Kbytes or what?

Zeljko Vrba

unread,
May 29, 2008, 4:21:11 PM5/29/08
to boost...@lists.boost.org
On Thu, May 29, 2008 at 03:42:16PM -0400, Jason Sachs wrote:
>
> reasonable size is of shared memory to allocate. (probably in the 64K - 1MB
> range but I'm not sure)
>
>From the low-level POV:

Modern systems use on demand allocation. I.e. you can allocate a (f.ex.) 32 MB
SHM chunk, but the actual resource usage (RAM) will correspond to what you
actually use. For example:

0 1M 32M
[****|..................]
| |
| +- unused part
|
+- used part of the SHM segment

As long as your program does not touch (neither reads nor writes) the unused part,
the actual physical memory usage will be 1M + small amount for page tables (worst
case: 4kB of page tables for 4MB of virtual address space). This is at least how
SYSV SHM works on Solaris 10 (look up DISM - dynamic intimate shared memory); I
would expect it to work in the same way on new linux kernels too. I'm not
sufficiently acquainted with NT kernel to be able to comment on it.

Bottom line: allocate as few as large chunks as possible; modern VMs should be
able to handle it gracefully.

===

If you don't know how much memory you will need in total, how do you handle out of
memory situations?

Alternatively, why not use files instead?

_______________________________________________
Boost-users mailing list
Boost...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Jason Sachs

unread,
May 30, 2008, 9:56:45 AM5/30/08
to boost...@lists.boost.org
Modern systems use on demand allocation.  I.e. you can allocate a (f.ex.) 32 MB
SHM chunk, but the actual resource usage (RAM) will correspond to what you
actually use.  For example:

0   1M                 32M
[****|..................]
  |           |
  |           +- unused part
  |
  +- used part of the SHM segment

As long as your program does not touch (neither reads nor writes) the unused part,
the actual physical memory usage will be 1M + small amount for page tables (worst
case: 4kB of page tables for 4MB of virtual address space).  This is at least how
SYSV SHM works on Solaris 10 (look up DISM - dynamic intimate shared memory); I
would expect it to work in the same way on new linux kernels too.  I'm not
sufficiently acquainted with NT kernel to be able to comment on it.

Interesting. I hadn't thought about that.

I tried a test program (running on Windows XP), and had a number of separate processes each allocate a new managed_windows_shared_memory object (with a different name for each process) with size 2^29 bytes (=512MB). I'm not exactly sure what resources it allocates; using TaskInfo to view resource usage, each process's "virtual KB" usage goes up by 512MB, but its "working set KB" usage doesn't increase until I actually allocate memory within the shared memory segment.

Sounds good, but the 6th one of these failed and I got a warning saying my system was low on virtual memory. So it sounds like there is a 4GB total system limit for WinXP even for just reserving virtual address space -- which seems silly since each process should have its own address space and therefore as long as I don't actually allocate the memory, and each process's reserved address space doesn't exceed 2^32 (or 2^31 or whatever the per-process limit is), I should be able to reserve an unlimited amount of total address space. No can do. :(

So strategy #1 of being profligate in choosing shared memory segment size fails on WinXP; there's a significant resource cost even if you don't actually allocate any memory. Drat.

Jason Sachs

unread,
May 30, 2008, 10:02:51 AM5/30/08
to boost...@lists.boost.org
>Alternatively, why not use files instead?
I am using an HDF5 file to log data as a physical store, but it doesn't handle concurrent access + I am looking for a fast way (in addition to the log file) to share some acquired data between processes.

Ion Gaztañaga

unread,
May 30, 2008, 10:10:38 AM5/30/08
to Boost User List
Jason Sachs wrote:
> From what I understand about Boost shared memory segments, they are not
> infinitely growable, so I think I have to organize my data into chunks
> of a more reasonable size, and allocate a series of separate shared
> memory segments with a few chunks in each. (there are other
> application-specific reasons for me to do this anyway) So I am trying to
> figure out what is a reasonable size is of shared memory to allocate.
> (probably in the 64K - 1MB range but I'm not sure)
>
> Does anyone have any information about the overhead of a shared memory
> segment? Just an order of magnitude estimate, e.g. M bytes fixed + N
> bytes per object allocated + total size of names. Are M,N on the order
> of 10 bytes or 10Kbytes or what?

If you mean the overhead added by a named allocation and indexes, use
managed_shared_memory.get_free_memory() function to know how many bytes
you have after creating the managed segment and after creating an empty
vector.

Take in care that if you fill those vectors alternatively, the
reallocations needed by the vector might not take advantage of all the
needed memory (one vector can end just in the middle of the segment and
the other vector just can't take the memory before and after that
vector). If you need to minimize needed shared memory, pre-calculate all
the data and then dump it in shared memory.

Regards,

Ion

Zeljko Vrba

unread,
May 30, 2008, 11:01:55 AM5/30/08
to boost...@lists.boost.org
On Fri, May 30, 2008 at 09:56:45AM -0400, Jason Sachs wrote:
>
> usage, each process's "virtual KB" usage goes up by 512MB, but its "working
> set KB" usage doesn't increase until I actually allocate memory within the
> shared memory segment.
>
Excellent :-) Working set is the actual amount of RAM used, while virtual
memory is just the size of the virtual address space which might not have
yet entered into the working set, or might not have been "committed" at all.
(I believe that "commit" is the NT's technical term for first-time faulting
in a page and thus also reserving physical RAM.)

Is there a separate column for "committed memory"? Virtual is, well, just
reserved; committed is actually allocated; working set is what is currently
in RAM (usually less than committed).

Again, I'm not an NT expert -- plese cross-check the above paragraph(s) with
other sources.

>
> Sounds good, but the 6th one of these failed and I got a warning saying my
> system was low on virtual memory. So it sounds like there is a 4GB total
>

I'd rather say that you're low on swap. Each SHM segment needs a corresponding
amount of swap space which can be used as backing store, should you decide to
really use all of the reserved memory. I.e. when the total working set size
of all programs exceeds the total amount of physical memory (minus kernel
memory), some of the pages need to be swapped out to backing store -- in this
case swap.

Also note that the swap space is also just _reserved_ -- the kernel needs to
ensure that it's there before it hands you out the SHM segment, but it will
not be used unless you become short on physical memory. I.e. a mere swap
space _reservation_ will not slow down your system or program.

Try increasing the amount of swap space (so that it's [64MB * # of programs]
larger[*] than the [SHM segment size * # of programs]), repeat the experiment
and see what happens. 6 programs x 512MB, so you should be safe at 3GB +
amount of physical RAM + extra ~1GB for everything else on the system.

[*] Rule of thumb. Every process needs additional VM for stack, data, code,
etc.

>
> which seems silly since each process should have its own address space and
>

it does.

>
> the per-process limit is), I should be able to reserve an unlimited amount
> of total address space. No can do. :(
>

what do you mean by "total address space"? total address space == RAM + swap
(and that is, I guess, what NT calls "virtual memory"), so it is not unlimited.
it is very reasonable that the kernel refuses to overcommit memory (i.e. does
not allow you to reserve more than the "total address space"); simulation of
truly unlimited memory quickly leads to nasty situations (read about linux's
out-of-memory killer).

>
> So strategy #1 of being profligate in choosing shared memory segment size
> fails on WinXP; there's a significant resource cost even if you don't
> actually allocate any memory. Drat.
>

Well, the only resource cost that I can see is disk space reserved for swap.
Given today's disks, I don't see that being a problem if it buys you a simpler
programming model. (And to make it clear, just in case: this is my comment on
your particular application; I do *not* recommend this approach as a general
programming practice!)

Zeljko Vrba

unread,
May 30, 2008, 11:06:20 AM5/30/08
to boost...@lists.boost.org
Could you maybe use a raw memory-mapped file instead, and convert it to
HDF5 off-line? You would use memory mapping to share data, and ordinary
file write calls to append data to the file. (I understood that you're
working on 32bits.. then you'd have to do some "windowing" over a possibly
file > 2GB).

This would at least solve your swap problem because the file itself *is*
the backing store for its own mapping (well, if the mapping is shared so
that modifications don't create anonymous COW pages which in turn need
swap) - no additional swap needed.

Ray Burkholder

unread,
May 30, 2008, 11:28:49 AM5/30/08
to boost...@lists.boost.org
>
> On Fri, May 30, 2008 at 10:02:51AM -0400, Jason Sachs wrote:
> > >Alternatively, why not use files instead?
> >
> > I am using an HDF5 file to log data as a physical store, but it
> doesn't
> > handle concurrent access + I am looking for a fast way (in addition
> to the
> > log file) to share some acquired data between processes.
> >

I havn't followed this whole thread, but I seem to recall that HDF5 supports
MPI with Parallel HDF5.
http://www.hdfgroup.org/HDF5/PHDF5/

Or does that not solve your requirements?


--
Scanned for viruses and dangerous content at
http://www.oneunified.net and is believed to be clean.

Jason Sachs

unread,
May 30, 2008, 12:25:25 PM5/30/08
to boost...@lists.boost.org
>I havn't followed this whole thread, but I seem to recall that HDF5
>supports MPI with Parallel HDF5.
>http://www.hdfgroup.org/HDF5/PHDF5/
>Or does that not solve your requirements?

Alas, Parallel HDF5 != concurrent file access. As I understand it,
parallel HDF5 = cooperating threads within a process writing in
parallel, and I need one process to write & others to monitor/display
the data.

>Could you maybe use a raw memory-mapped file instead,
>and convert it to HDF5 off-line?

well, technically yes, but for robustness reasons I want to decouple
the HDF5 logging from the shared memory logging. I'm very happy with
the file format's storage efficiency and robustness + have not had to
worry about file corruption (though oddly enough, the "official" HDF5
editor from the HDF5 maintainers has caused corruption in a few logs
when I added some attributes after the fact), so would like to
maintain independent paths: the HDF5 file as a (possibly) permanent
record, and my shared memory structure, which could possibly become
corrupt if I have one of those impossible-to-reproduce bugs -- but I
don't care since I have the log file.

I'm also dealing with a very wide range of storage situations; most
are going to be consecutive packets of data that are written to the
file + left there, but in some cases I may actually delete portions of
previously-written data that has been deemed discardable, in order to
make room for a long test run... more complicated than a vector that
grows with time, or a circular buffer. I've defined structures within
the HDF5 file which handle this fine; in the shared memory I was going
to do essentially the same thing & have a boost::interprocess::list<>
or map<> of moderately-sized data chunks (64K-256K) that I can
keep/discard.

But back to the topic at hand -- let me restate my problem:

Suppose you have N processes where each process i=0,1,...N-1 is going
to need a pool of related memory with a maximum usage of sz[i] bytes.
This size sz[i] is not known beforehand but is guaranteed less than
some maximum M; it has a mean expected value of m where m is much
smaller than M. From a programmer's standpoint, the best way to handle
this would be to reserve a single shared memory segment and ask
Boost::interprocess to make the segment size equal to M. If I do this
then my resource usage in the page file (or on disk if I use a
memory-mapped file) is N*M which is much higher than I need. (I
figured out the source of this: windows_shared_memory pre-commits
space in the page file equal to the requested size)

So what's a reasonable way to architect shared memory use to support
this kind of demand? I guess maybe I could use a vector of shared
memory segments, starting with something like 256KB and increasing
this number as I need to add additional segments. It just seems like a
pain to have to maintain separate memory segments and have to remember
which items live where.

Just for numbers, I may have an occasional log going on that needs to
be in the 512MB range (though most of the time though it will be in
the 50-500K range, occasionally several megabytes), and I can have 4-6
of these going on at once (though usually just one or two). On my own
computer I have increased my max swap file size from 3GB to 7GB (so
the hard limit is somewhat adjustable), though it didn't take effect
until I restarted my PC. I'm going to be using my programs on several
computers + it seems silly to have to go to this extent.

Zeljko Vrba

unread,
May 30, 2008, 1:53:34 PM5/30/08
to boost...@lists.boost.org
On Fri, May 30, 2008 at 12:25:25PM -0400, Jason Sachs wrote:
>
> >Could you maybe use a raw memory-mapped file instead,
> >and convert it to HDF5 off-line?
>
> well, technically yes, but for robustness reasons I want to decouple
>
ok, i just wanted to know more about your problem.

>
> Boost::interprocess to make the segment size equal to M. If I do this
> then my resource usage in the page file (or on disk if I use a
> memory-mapped file) is N*M which is much higher than I need. (I
>

Which is much higher than you need on the average. I fail to see why
having a 10GB, or even a 20GB, swap-file is a problem for you. Too much
of a hassle to configure it on all workstations?

>
> of these going on at once (though usually just one or two). On my own
> computer I have increased my max swap file size from 3GB to 7GB (so
> the hard limit is somewhat adjustable), though it didn't take effect
> until I restarted my PC. I'm going to be using my programs on several
> computers + it seems silly to have to go to this extent.
>

Ok, and you'll run your job on a machine with e.g. 1GB of swap[*], and this
particular instance will need 4GB of swap. What will happen when the
allocation fails? Note that growing the SHM segment in small chunks will
not help you with insufficient virtual memory, so you might as well allocate
M*N at once and exit immediately if the memory is not available.

[*] I'm using "swap" somewhat imprecisely to refer to total virtual memory
(RAM + swap).

Next-best solution: use binary search to find the maximum size you can
allocate and use that instead of M*N.

Neither way is particularly friendly towards other processes on the machine
(I assumed that you were running the jobs on dedicated machines), but is
least painful. Which is least expensive: your time spent developing multi-
chunk SHM management or just allocating a big chunk and reconfiguring all
computers *once*?

(I'm sorry, I'm very pragmatic, and I don't seem to have enough info to really
understand why you're making such a fuss over the swap size issue. I'm afraid
I can't offer you any further suggestions, since I consider this a non-problem
unless you have further constraints.)

Jason Sachs

unread,
May 30, 2008, 2:25:10 PM5/30/08
to boost...@lists.boost.org
>> of these going on at once (though usually just one or two). On my own
>> computer I have increased my max swap file size from 3GB to 7GB (so
>> the hard limit is somewhat adjustable), though it didn't take effect
>> until I restarted my PC. I'm going to be using my programs on several
>> computers + it seems silly to have to go to this extent.
>
>Ok, and you'll run your job on a machine with e.g. 1GB of swap[*], and this
>particular instance will need 4GB of swap. What will happen when the
>allocation fails? Note that growing the SHM segment in small chunks will
>not help you with insufficient virtual memory, so you might as well allocate
>M*N at once and exit immediately if the memory is not available.

I wouldn't allocate M*N at once. Each process could start/stop at
random times (this is triggered by users other than me who would start
multiple logs as necessary) so N changes as a function of time.

What I will probably do is just use one memory segment of size M[i]
memory per process #i, where M[i] has a default value M0, say 64MB,
that I can preset to a larger value if I know I'm going to have a long
duration log.

It's not a huge deal to increase the swap file (even in an old
computer, which most of our lab pc's are, I could add a 2nd hard drive
if I needed), & is almost certainly the most expedient solution for
the time being.

>(I'm sorry, I'm very pragmatic, and I don't seem to have enough info to really
>understand why you're making such a fuss over the swap size issue. I'm afraid
>I can't offer you any further suggestions, since I consider this a non-problem
>unless you have further constraints.)

Not a fuss, just trying to be aware of all the problems. This
discussion has been helpful. I have a career where my resources are
spread thinly among a wide range of things, & it's much more expensive
for me to design quickly for 90% success + then refactor 1-2yrs later
when absolutely necessary, than it is to spend the extra effort up
front to design for 99% success, understand where the 1% failure lies,
and move on to other things knowing I'm far less likely to have to
revisit. Especially when 90% success rates have a tendency to be
overestimated as there are customers who forget to mention certain
design requirements ;)

Zeljko Vrba

unread,
May 30, 2008, 2:35:41 PM5/30/08
to boost...@lists.boost.org
On Fri, May 30, 2008 at 02:25:10PM -0400, Jason Sachs wrote:
>
> Not a fuss, just trying to be aware of all the problems. This
>
Another thing that just occurred to me: allocate huge SHM segment but
don't commit it immediately; let the os allocate the swap on-demand.
Now, when you run out of swap, the OS should signal the condition
somehow (structured exception handling? something else?). You can
catch the condition and take appropriate action, whatever that might be.

Jason Sachs

unread,
May 30, 2008, 3:43:21 PM5/30/08
to boost...@lists.boost.org
>Another thing that just occurred to me: allocate huge SHM segment but
>don't commit it immediately; let the os allocate the swap on-demand.
>Now, when you run out of swap, the OS should signal the condition
>somehow (structured exception handling? something else?). You can
>catch the condition and take appropriate action, whatever that might be.

See ticket #1975: http://svn.boost.org/trac/boost/ticket/1975

Ray Burkholder

unread,
May 30, 2008, 6:36:01 PM5/30/08
to boost...@lists.boost.org

>
> >I havn't followed this whole thread, but I seem to recall that HDF5
> >supports MPI with Parallel HDF5.
> >http://www.hdfgroup.org/HDF5/PHDF5/
> >Or does that not solve your requirements?
>
> Alas, Parallel HDF5 != concurrent file access. As I
> understand it, parallel HDF5 = cooperating threads within a
> process writing in parallel, and I need one process to write
> & others to monitor/display the data.
>

Coincidently on the HDF5 mail list today, here were a few possibly related
comments (one user indicates that they do simultaneous writes, which may or
may not be similar to what is needed here :

================
Hello;
i've got an implementation which uses HL API and i run multiple writers and
possibly one reader. The writers go to the same os file but different hdf
files.

in use case scenario, the reader and writer are operational on same hdf
asset at the same time. this reader is also written in a manner that if it
reaches EOF, then it'll wait sometime and then proceed reading.

all this is for win32/vc++... not sure if the same applies to *nix. and it
works fine.

the only thing i needed to do was to enable multi-threading building of
HDF5 and HL. i think there is a link on how to do that... i believe one
need only define the symbol "H5_HAVE_THREADSAFE" and uncomment some
commented out lines in H5pubconf.h.

not sure that answers your questions... and... hope it helps.

regards,
Sheshadri

Jason Sachs wrote:
> I was wondering where I could find some more technical details about
> concurrent reading/writing.
>
> The FAQ discusses it briefly
> (http://www.hdfgroup.org/hdf5-quest.html#grdwt):
>
> <excerpt>
> It is possible for multiple processes to read an HDF5 file when it is
> being written to, and still read correct data. (The following steps
> should be followed, EVEN IF the dataset that is being written to is
> different than the datasets that are read.)
>
> Here's what needs to be done:
>
> * Call H5Fflush() from the writing process.
>
> * The writing process _must_ wait until either a copy of the file
> is made for the reading process, or the reading process is done
> accessing the file (so that more data isn't written to the file,
> giving the reader an inconsistent view of the file's state).
>
> * The reading process _must_ open the file (it cannot have the
> file open before the writing process flushes its information, or it
> runs the risk of having its data cached in memory being incorrect with
> respect to the state of the file) and read whatever information it wants.
>
> * The reading process must close the file.
>
> * The writing process may now proceed to write more data to the
> file.
>
> There must also be some mechanism for the writing process to signal
> the reading process that the file is ready for reading and some way
> for the reading process to signal the writing process that the file
> may be written to again.
> </excerpt>
>
> Could someone elaborate in a more technical manner? e.g. SWMR
> (single-writer multiple-reader) can occur if the following is true
> (not sure if I have this correct; I use "process" rather than
> "threads" here & am not sure if HDF5 in-memory caches have thread
affinity):
>
> 1. At all times the file is in one of the following states:
> (a) unmodified
> (b) modified (written to, but not flushed)
>
> 2. In the unmodified state, zero or more processes may have the file
> open. No process may write to the data.
>
> 3. In the modified state, exactly one process may have the file open.
> This is the process that can write to it.
>
> 4. A successful transition from the unmodified state -> modified state
> takes place when exactly one process has the file open and begins
> writing to it.
>
> 5. A successful transition from the modified state -> unmodified state
> takes place when the process that has written to the file completes a
> successful call to H5Fflush().
>
> The facilities to ensure that only one process has the file open for
> (4) above are not provided by the HDF5 library and must be provided by
> OS-specific facilities e.g. mutexes/semaphores/messaging/etc.
>
==================


--
Scanned for viruses and dangerous content at
http://www.oneunified.net and is believed to be clean.

_______________________________________________

Reply all
Reply to author
Forward
0 new messages