[Boost-users] boost::interprocess shared memory performance

Andy Wiese

unread,

Dec 27, 2008, 1:30:55 PM12/27/08

to boost...@lists.boost.org

This is my first experience with using shared memory for anything more
than trivial IPC. Thanks again to Ion Gaztañaga for getting the
library working on FreeBSD. I'm developing initially on OS-X 10.5.6
with boost_trunk, but the eventual target platform is FreeBSD--I'm
just an old mac programmer used to cushy development tools.

I originally used managed_mapped_file, and got everything working, but
I was disappointed by performance. Profiling with shark, I found that
the application was spending a lot of time in msync, which according
to man(2) is used to synchronize mapped memory with the filesystem. It
makes sense to me that the interprocess containers I was creating
could have a lot of file I/O overhead, and managed_mapped_file was
probably not a good choice.

So I rewrote the code to use managed_shared_memory instead of
managed_mapped_file, thinking that it would eliminate the file I/O and
therefore be faster. However, I am surprised that it is not much
faster, and when I profile with Shark on os-x I see that it is still
spending a lot of time in msync, specifically whenever a
managed_shared_memory object is destroyed.
(in boost::interprocess::mapped_region::flush(unsigned long, unsigned
long), which is called within basic_managed_shared_memory's destructor)

Does managed_shared_memory really need to call msync?

I see that I should optimize my code to cache the
managed_shared_memory objects so that fewer create/deletes are
necessary, but this is still going to happen fairly frequently and I
wonder if this expensive msync call is necessary.

In the tradition of coder forums everywhere, someone will probably ask
what I'm trying to accomplish and whether there may be a better way.
Suggestions welcome.
I'm writing a little cgi driven database utility that queries data
stored in a filesystem directory using a simple query language. I
would like to keep indexes of the data to speed query resolution. The
utility is old-school cgi, so all its resources (such as indexes) have
to be instantiated into memory each time the cgi process is started. I
could write indexes to files, but then I incur a de/serialization
overhead that is expensive. My intention was to keep the indexes as
ready-to-use interprocess::maps in shared memory, to be used by all
invocations of the cgi. It works, but the performance of the shared
memory is poor enough that I'm not getting much increase over just
doing a brute force search through the datafiles.

All suggestions appreciated!

Andy
_______________________________________________
Boost-users mailing list
Boost...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Ion Gaztañaga

unread,

Dec 28, 2008, 7:02:22 AM12/28/08

to boost...@lists.boost.org

Andy Wiese wrote:
> Does managed_shared_memory really need to call msync?

I don't know. Maybe even managed_mapped_file shouldn't call flush() in
the destructor, because the OS should handle the changes made to the
memory segment, perhaps maintaining data in memory. The question is
maybe if closing a file should call fflush() and interprocess should do
the same.

Anyway, it is possible that unmap provokes implicitly a msync. Can you
try to comment out the call to flush() in mapped_region's destructor and
measure it again?

And just a question, if you bottleneck is msync, this means that you are
creating and destroying a lot of managed_shared_memory /
managed_mapped_file instances? That does not seem very
performance-friendly, since you will be mapping and unmapping pages,
which is not a lightweight operation.

> I'm writing a little cgi driven database utility that queries data
> stored in a filesystem directory using a simple query language. I would
> like to keep indexes of the data to speed query resolution. The utility
> is old-school cgi, so all its resources (such as indexes) have to be
> instantiated into memory each time the cgi process is started. I could
> write indexes to files, but then I incur a de/serialization overhead
> that is expensive. My intention was to keep the indexes as ready-to-use
> interprocess::maps in shared memory, to be used by all invocations of
> the cgi. It works, but the performance of the shared memory is poor
> enough that I'm not getting much increase over just doing a brute force
> search through the datafiles.

Ok, try to comment out flush() call and tell me if the difference is
appreciable.

Regards,

Ion

Zeljko Vrba

unread,

Dec 28, 2008, 8:18:19 AM12/28/08

to boost...@lists.boost.org

On Sun, Dec 28, 2008 at 01:02:22PM +0100, Ion Gaztañaga wrote:
> Andy Wiese wrote:
> >Does managed_shared_memory really need to call msync?
>
> I don't know. Maybe even managed_mapped_file shouldn't call flush() in
>

Sorry to jump into the discussion. Here's a quote from the manual:

msync() flushes changes made to the in-core copy of a file that was
mapped into memory using mmap(2) back to disk. Without use of this
call there is no guarantee that changes are written back before mun-
map(2) is called.

>
> >like to keep indexes of the data to speed query resolution. The utility
> >is old-school cgi, so all its resources (such as indexes) have to be
> >instantiated into memory each time the cgi process is started. I could
>

Have you (Andy) considered to use FastCGI? It's still a regular CGI, just
that it's a long-running process instead of a one-shot process. So the
overheads will be amortized over several requests.

Andy Wiese

unread,

Dec 28, 2008, 1:57:13 PM12/28/08

to boost...@lists.boost.org

On Dec 28, 2008, at 7:18 AM, Zeljko Vrba wrote:

> On Sun, Dec 28, 2008 at 01:02:22PM +0100, Ion Gaztañaga wrote:
>> Andy Wiese wrote:
>>> Does managed_shared_memory really need to call msync?
>>
>> I don't know. Maybe even managed_mapped_file shouldn't call flush()
>> in
>>
> Sorry to jump into the discussion. Here's a quote from the manual:
>
> msync() flushes changes made to the in-core copy of a file
> that was
> mapped into memory using mmap(2) back to disk. Without use
> of this
> call there is no guarantee that changes are written back
> before mun-
> map(2) is called.
>
>>

Poking around under the hood, I discover that I may have been naive
about shared_memory_object. It appears that on os-x and freebsd,
shared_memory_object is implemented as a file in the filesystem. I
noticed this because shared_memory_object::remove was returning an
error condition on FreeBSD, so I looked a little deeper at ::remove on
both platforms and see that on os-x it simply removes a file in a tmp
directory, and on FreeBSD it calls shm_unlink, about which the man
page says that POSIX shared memory objects are implemented as files.

So iiuc, on my two target platforms at least, there is no fundamental
difference between managed_mapped_file and managed_shared_memory. I
should not expect to see any fundamental performance difference
between them, and the msync call in question is probably correct.
Someone please correct me if I'm mistaken.

My previous experience with shared memory IPC has been with shmget and
its family. If those area also implemented as files, it has never
mattered to me and I haven't noticed.

>>> like to keep indexes of the data to speed query resolution. The
>>> utility
>>> is old-school cgi, so all its resources (such as indexes) have to be
>>> instantiated into memory each time the cgi process is started. I
>>> could
>>
> Have you (Andy) considered to use FastCGI? It's still a regular
> CGI, just
> that it's a long-running process instead of a one-shot process. So
> the
> overheads will be amortized over several requests.
>

Yep. Eventually FastCGI will be the way to go for the CGI
implementation. In the current case, one target platform is a small
embedded webserver that isn't FastCGI enabled, but I may be able to
upgrade to Lighty or something in the future. However, I would also
like to use the same library in other processes, to access the same
data, and these processes are short lived similar to old-school CGI.
So, my hope is to make a good-enough implementation for the one-shot
scenario, and then use something like FastCGI where that is possible.

Zeljko Vrba

unread,

Dec 28, 2008, 3:39:03 PM12/28/08

to boost...@lists.boost.org

On Sun, Dec 28, 2008 at 12:57:13PM -0600, Andy Wiese wrote:
>
> So iiuc, on my two target platforms at least, there is no fundamental
> difference between managed_mapped_file and managed_shared_memory. I
> should not expect to see any fundamental performance difference
> between them, and the msync call in question is probably correct.
> Someone please correct me if I'm mistaken.
>

Yes, calling msync() on file-backed storage is correct.

>
> My previous experience with shared memory IPC has been with shmget and
> its family. If those area also implemented as files, it has never
> mattered to me and I haven't noticed.
>

They are not. SYSV shared memory segments are kernel objects. They
_might_ be implemented via a special filesystem (UNIX likes to map
internally memory pages to "vnodes"), but its operations are not forwarded
to disk. In the old days, SYSV SHM was not even pageable (though, this
has changed now).

>
> So, my hope is to make a good-enough implementation for the one-shot
> scenario, and then use something like FastCGI where that is possible.
>

My advice is to find out how to persuade the interprocess library to use SYSV
SHM, if at all possible.

Lothar Werzinger

unread,

Jan 22, 2009, 4:31:58 PM1/22/09

to boost...@lists.boost.org

On Sunday 28 December 2008, Andy Wiese wrote:
> Yep. Eventually FastCGI will be the way to go for the CGI
> implementation. In the current case, one target platform is a small
> embedded webserver that isn't FastCGI enabled, but I may be able to
> upgrade to Lighty or something in the future. However, I would also
> like to use the same library in other processes, to access the same
> data, and these processes are short lived similar to old-school CGI.
> So, my hope is to make a good-enough implementation for the one-shot
> scenario, and then use something like FastCGI where that is possible.

Did you consider creating a server process that holds the indexes and just
open a pipe or socket to the server from your cgi processes to execute the
query?

Lothar
--
Lothar Werzinger Dipl.-Ing. Univ.
framework & platform architect
Tradescape Inc. - Enabling Efficient Digital Marketplaces
1754 Technology Drive, Suite 128
San Jose, CA 95110
web: http://www.tradescape.biz

Reply all

Reply to author

Forward