VxWorks memory partitions

Jim Caprio

unread,

Feb 18, 1999, 3:00:00 AM2/18/99

to

BSP: mv2300 (604e PPC)

I'm interested in some general information on how to optimize memory
usage in VxWorks. We're running into a memory fragmentation problem,
which is caused in part to multiple threads concurrently allocating
memory from the same memory partition (the default heap). Due to
VxWorks "first fit" (vs. best fit) allocation algorithm, the result is
inefficient placement of objects in memory, and resultant fragmentation.

Several candidate solutions have been proposed:
(a) Assign (pre-allocated?) memory partitions on a per task basis
(b) Create memory buffers for homogenous object size
(c) Modify the VxWorks memory allocation algorithm to make better use of
available space

To your knowledge, are any of these options possible in VxWorks? A
brief explanation of how to implement the required modifications, and
some motivation behind the "best" approach would be greatly
appreciated. Thanks for the help.

--
=================================
Jim Caprio
Moore Research Center, Inc.
(716)773-0333
FAX : (716)773-0462
email: cap...@research.moore.com
=================================

David Laight

unread,

Feb 19, 1999, 3:00:00 AM2/19/99

to

IMHO the memory allocater in vxworks (memPartAlignedAlloc does the work)
seems designed to fragment memory as much as humanly possible.
As well as using a 'first fit' algorythm it compounds things by using a LIFO
free list.
The effect is that if you free a large piece of memory the next allocate will be
from (the end of) that piece.
(If the free list were in address order, then memory would not get as fragmented
- but would still get fragmented.)

I think this behaviour could be considerd a bug...

David

> Subject: VxWorks memory partitions

> Submitted-by: Jim Caprio <cap...@research.moore.com>

>
> BSP: mv2300 (604e PPC)
>
> I'm interested in some general information on how to optimize memory
> usage in VxWorks. We're running into a memory fragmentation problem,
> which is caused in part to multiple threads concurrently allocating
> memory from the same memory partition (the default heap). Due to
> VxWorks "first fit" (vs. best fit) allocation algorithm, the result is
> inefficient placement of objects in memory, and resultant fragmentation.
>
> Several candidate solutions have been proposed:
> (a) Assign (pre-allocated?) memory partitions on a per task basis
> (b) Create memory buffers for homogenous object size
> (c) Modify the VxWorks memory allocation algorithm to make better use of
> available space
>
> To your knowledge, are any of these options possible in VxWorks? A
> brief explanation of how to implement the required modifications, and
> some motivation behind the "best" approach would be greatly
> appreciated. Thanks for the help.
>

> Jim Caprio

----------------------------------------------------------------
David Laight email: d...@tadpole.co.uk
Tadpole Technology plc phone: +44 1223 278 256
Cambridge, UK fax: +44 1223 278 201

Leonid Rosenboim

unread,

Feb 19, 1999, 3:00:00 AM2/19/99

to David Laight

This behaviour of VxWorks memory allocator is know for a long time,
increasing number of customers find this not acceptable:
In the past, mot hard real-time systems where designed such that all
object allocations are done during system initialization, and very
little or no allocations of memory at all happen during the critical operations
time.

This is changing with the advent of OO techniques, hence the allocation
mechanism which used to be satisfactory, is not so anymore.

WRS seems to ignore this trend, rightly claiming the the primary competitor
(i.e. pSOS) is no better...

There is a solution from SeaWeed systems, who sell a replacement library for
memPartLib which does much more stuff.
Some customers have devised some smart techniques which create sub-partitions
for specific block sizes and add a layer on top of memAlloc() and malloc()
which chooses a partition according to block size, and that in many cases solves
the issue too.

HTH,
Leonid

Jim Caprio

unread,

Feb 19, 1999, 3:00:00 AM2/19/99

to

From the number and tone of the responses my question has generated, it seems that
there's a strong current of opinion (if not an outright consensus) that VxWorks
does not implement memory allocation efficiently. This is forcing my project to
evaluate a $6000 third party product for a feature which should be optimized for a
so-called real-time operating system.

I agree with David's comments below: this behavior should be considered a bug.
Does anyone know why Wind River doesn't just redesign their implementation?

> Submitted-by owner-vxwexplo-process Fri Feb 19 01:26:02 1999
> Submitted-by: David Laight <d...@tadpole.co.uk>

Hwa Jin Bae

unread,

Feb 19, 1999, 3:00:00 AM2/19/99

to Leonid Rosenboim

As Leonid said this memory fragmentation problem has been with VxWorks
for a long time, and it is in general prevalent in most non VM oriented
systems.

The strategy for avoiding problems in this area has always been: know
what you are allocating and freeing. By careful coding you can avoid
memory fragmentation. This is the view that was expressed to me by the
original author of VxWorks memLib when I complained about memLib
fragmentation issue over 10 years ago. In fact, malloc/free, etc. are
considered a useful add-on, not as a core facility to be used in hard
realtime systems. My views obviously differ on this.

The problem is that this issue can be eased with simple modifications to
the memLib itself without adding costly penalties. And another issue is
that this is a *real* problem. I have seen numerous projects that
duplicate needless hours of coding to avoid this problem, or fix this
problem in their own way.

One of the solutions I have come up with over the years is to replace
the memLib with another implementation. A good alogorithm that seems to
behave much better than VxWorks memLib is described in
http://g.oswego.edu/dl/html/malloc.html which is a public domain
implementation. This has been ported in VxWorks and has served well
for at least one project that did massive malloc/free's (due to ported
Unix code behavior that could not be prevented).

Unfortunately, the way the port is done requires VxWorks source code,
this is due to many problems that has to do with link time errors --
they do not make it easy to replace the memLib! So this port is not
availble PD. Perhaps someday.

The partitioning scheme, as Leonid suggests, solves some problems and is
quite useful as well. Unfortunately, VxWorks memory partitions, once
created cannot be destroyed. This makes it less desirable. Plus it
requires a lot of changes in ported Unix code base and is not as easy to
do.

--
Hwa Jin Bae
PSO Systems Inc
mailto:h...@pso.com
http://www.pso.com

Keith Buchanan

unread,

Feb 19, 1999, 3:00:00 AM2/19/99

to

> >From the number and tone of the responses my question has
> generated, it seems that there's a strong current of opinion
> (if not an outright consensus) that VxWorks
> does not implement memory allocation efficiently. This is
> forcing my project to evaluate a $6000 third party product for
> a feature which should be optimized for a
> so-called real-time operating system.

I vehemently disagree, it is a tried and true, hard-core, deadline-driven,
real-time OS. I have delivered many life and death types of systems where
VxWorks has performed in a deterministic manner w/o fail. If you can't
solve your allocation problems with existing tools, the design is suspect
not the operating system.

> I agree with David's comments below: this behavior should be
> considered a bug.
> Does anyone know why Wind River doesn't just redesign their
> implementation?

Geez, this is laying it on a little thick don't you think? I tend to
believe any designer who allocates and frees memory willy-nilly in a
real-time application is a bug ;o).

You cannot get something for nothing. The more elaborate the memory
allocation algorithms, the slower the functions that allocate and free
memory and the larger the footprint of the OS. Allocate your resources
when your application starts up. Use memory partitions for applications
that must often allocate and free data structures. Don't dynamically
create and kill tasks for no good purpose. Memory managment routines are
slow in any API. Why on earth would someone call them so often that
you fragment memory? If you have that kind of requirement, the buffers
should be managed more efficiently. If you want garbage collection,
use SmallTalk running under Windoze.

Wind River had to compromise between complex alogrithms with a
large footprint and the current implementation. In addition, many of
the choices for algorithms are not deterministic. The functionality
provided is reasonable and my opinion of it more typical.

adios

------------------------------------
/\ Thomas Keith Buchanan
/**\ Principal Engineer
/****\ SPARTA, Inc.
/****/.. 205 East IH 30
/****/.... Rockwall, TX 75087
.\****\__...
...\**\ /.... Voice: 972 772 4487
....\**\/\.... FAX: 972 722 3201
....\****\.. PCS: 972 672 9657
....\****/ WWW: www.sparta.com
.../***/
./***/ SPARTA
\**/ ======
\/ Pride In Performance

Fred Roeber

unread,

Feb 19, 1999, 3:00:00 AM2/19/99

to

On Fri, 19 Feb 1999, Jim Caprio <cap...@research.moore.com> wrote:

> From the number and tone of the responses my question has generated,
> it seems that there's a strong current of opinion (if not an outright
> consensus) that VxWorks does not implement memory allocation efficiently.
> This is forcing my project to evaluate a $6000 third party product for
> a feature which should be optimized for a so-called real-time operating
> system.
>

> I agree with David's comments below: this behavior should be considered a bug.
> Does anyone know why Wind River doesn't just redesign their implementation?

Jim was commenting about a "bug" in the VxWorks memory allocation algoritm.
Having analyzed memory allocation algorithms in some depth and having gone
through the VxWorks memory allocation code, I feel qualified to make a
comment.

There are numerous memory allocation algorithms floating around with
significantly different execution and memory efficiencies. The Wind River
one is not elegant but it is well implemented and fairly widely used.
It isn't optimized for use in a time critical system, however. The normal
use of memory allocation in real-time systems is to allocate all dynamic
memory areas during initialization or as part of a background task. If
you have an application that allocates and frees a lot of memory during
operation then that isn't the type of application that the VxWorks
allocator is designed for. There are some alternatives you might consider.

If you don't care about efficient memory use then one of the most time
efficient allocators is a fixed block size allocator. There is code for
such an algorithm in the VxWorks archive (called poolLib if I remember).
If you need to handle different size memory blocks then an algorithm
we have had good success with is the "binary buddy algorithm". A variant
of it is used in various GNU programs so is easily available. We found
that it could be written in such a fashion that it was usable by multiple
tasks concurrently with a deterministic maximum run time (the VxWorks
allocator is single threaded and non deterministic).

Getting an allocation algorithm to work correctly and robustly is not
trivial, however. You may be better off buying one unless you have time
to check out algorithms in the literature (some good sources are "Data
Structures & Their Algorithms" by Lewis and Denenberg and one of the Knuth
algorithm books; can't remember which volume as someone has borrowed it).
I have heard good things about allocation code for VxWorks from Seaweed
Systems but have never used it (query in...@seaweed.com).

I guess my main point is that the VxWorks algorithm may not be optimal but,
then, I don't think there is any one algorithm that is optimal for all
applications. Real time systems that do significant dynamic memory allocation
during operation need careful design to avoid fragmentation problems. Most
applications are well advised to avoid the problem by preallocating memory.

Well, that's my $.02. Fred

| Fred J Roeber, BBN Systems & Technologies |
| 4 John Clarke Road Middletown, RI 02842-5202 |
| fro...@bbn.com 401-848-3548 |
| TraceMaker product manager -> www.tracemaker.bbn.com |

Hwa Jin Bae

unread,

Feb 19, 1999, 3:00:00 AM2/19/99

to Fred Roeber

One thing that needs to be considered, but often forgotten, is the
fact that a vast majority of VxWorks applications and products
out there really are *not* hard real-time required applications
at all. There are a lot of people that use VxWorks simply as
an embedded OS. And many applications use ported Unix software
as basis in lots of products. As VxWorks is used more in information
appliance type of devices, a good malloc/free is essential.
Besides, having a good malloc/free that does not fragment as badly
is not compromising realtime performance at all because if your
realtime requirement is strict, you wouldn't be malloc/free'ing
all the time anyway. The point is that you should be able to
choose which way to go: realtime preallocated memory usage or
generic embedded systems that have legacy code that expects
decent malloc/free that won't chop up memory into useless little
bits after a day of running.

I should add that in all my years of doing VxWorks consulting
for the past 10 years, I have seen less than 10% of the projects
that really do use VxWorks as hard realtime system. The rest
simply use VxWorks because it is a good embeddable OS, and it
runs on a wide variety of platforms.

--
Hwa Jin Bae
PSO Systems Inc

h...@pso.com
http://www.pso.com

Hwa Jin Bae

unread,

Feb 19, 1999, 3:00:00 AM2/19/99

to Fred Roeber

Georg Feil

unread,

Feb 23, 1999, 3:00:00 AM2/23/99

to

Jim Caprio <cap...@research.moore.com> writes:
>I'm interested in some general information on how to optimize memory
>usage in VxWorks. We're running into a memory fragmentation problem,
>which is caused in part to multiple threads concurrently allocating
>memory from the same memory partition (the default heap). Due to
>VxWorks "first fit" (vs. best fit) allocation algorithm, the result is
>inefficient placement of objects in memory, and resultant fragmentation.

The inefficiencies in the usual VxWorks mem allocator are well known...
Another thing which compounds fragmentation is how dosFs works, so if
you use dosFs in certain ways (e.g. create many files) this may be a factor.
(The replacement dosFs on the WRS web site may help, but can also make
things worse).

>Several candidate solutions have been proposed:
>(a) Assign (pre-allocated?) memory partitions on a per task basis
>(b) Create memory buffers for homogenous object size
>(c) Modify the VxWorks memory allocation algorithm to make better use of
>available space

All of these sound like they could help specific problems. The approach
I've taken is to replace the stock malloc() and free() routines with
modified routines... e.g. to do (b) the revised malloc() would use a
different partition for certain allocation sizes. These are "quick and
dirty" ways to reduce fragmentation, rewriting or buying a replacement
allocator would of course be a more complete solution.

Georg
--
Georg Feil
| http://www.sgl.crestech.ca/
Space Geodynamics Laboratory | Email: ge...@sgl.crestech.ca
CRESTech | Phone: (416) 665-5458
4850 Keele St./North York/Ont/Canada/M3J 3K1 | Fax: (416) 665-1815

Fred Roeber

unread,

Feb 24, 1999, 3:00:00 AM2/24/99

to

David Laight wrote:

> Fred Roeber and Keith Buchanan are not incorrect in saying that a true 'real
> time' application will not use malloc/free during its normal operation. However
> it is also true that many of use are using VxWorks as a (small) O/S for a single
> application system. We don't necessarily care exactly how long malloc/free take
> (within reason) but do wish that the system will not fail to allocate memory
> just because it has fragmented the free list.

I have to apologize. I had wondered what all the flax about
fragmentation of the
WRS algorithm was since I had done a lot of work studying/measuring the
algorithm
for a major project I worked on a number of years back. I had found that
it had
reasonable fragmentation results compared to other good algorithms. I
went back
and checked my notes. As is so true in software, the devil is in the
details. On
this project we had VxWorks source. I had found that memPartAlloc used a
first fit
algorithm with a LIFO free list. That was known to be far worse in
performance than
using a FIFO free list so I had changed one line of code so that a LIFO
free list
was used. Of course this pretty much invalidates my comments about how
good the
normal VxWorks algorithm is. I had suggested this change to WRS but
didn't have
strong enough backup.

Since then, some good research on different memory algorithms has
backed up this result showing that a first fit with a LIFO free list
behaves
about the same as using a best fit algorithm but has better performance
and
is less susceptible to real bad performance under certain deviant but
possible
allocation scenarios. The LIFO algorithm that the real VxWorks malloc
uses does
show up as being significantly worse. One of the best comparisons of
allocators
I have seen is available from
ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps.
Its about 80 pages but has a very comprehensive review of the work on
memory
allocation done prior to 1995. It seems to be widely refered to these
days. The
review focuses on fragmentation rather than performance and totally
avoids parallel
algorithms however.

> Fred said:
> > The Wind River one is not elegant but it is well implemented and fairly widely
> > used.
>

> My observation is that the code paths through malloc/free could be halved in
> size without much effort.

The VxWorks code is, in general, extremelly well written I have found.
Most of
the code could be shortened (a lot) by removing things like error
checking and
performance tracking capabilities. I have always felt that the extra
code was
good from a robustness and portability perspective. Of course, if speed
is of
the essence then one could always get a more optimized algorithm (like
the
good one Hwa Jin Bae suggested or the one from Seaweed).

<< cut parts to get to discussion about cases where performance is a
real issue >>

> > If you don't care about efficient memory use then one of the most time
> > efficient allocators is a fixed block size allocator. There is code for
> > such an algorithm in the VxWorks archive (called poolLib if I remember).
>

> Allocators that issue fixed size blocks can be quite efficient in cpu time and
> even memory use. A quite good example of this is the allocater in UnixWare 7.

A lot of the systems we develop do have one aspect that results in a lot
of
dynamic allocation. We tend to use message passing based architectures.
Messages
are allocated and freed frequently in these designs. Where performance
is
important, we have used special allocators for these message pools. This
is an
area where many people use fixed block allocators to get good
performance. I
suggested:

> > If you need to handle different size memory blocks then an algorithm
> > we have had good success with is the "binary buddy algorithm". A variant
> > of it is used in various GNU programs so is easily available. We found
> > that it could be written in such a fashion that it was usable by multiple
> > tasks concurrently with a deterministic maximum run time (the VxWorks
> > allocator is single threaded and non deterministic).
>

> The 'binary buddy' algorithm is horrid. While fine in principle, the difficulty
> in determining whether the 'buddy' is free requires additional data structures
> which can easily get out of step, attempts to speed up the code just make it
> more complicated (and take longer and be buggy). Getting it efficient for
> multi-cpu systems is almost impossible (all of the control structures must be
> locked for every access).

This is the only response where I really disagree with David. From a
fragmentation perspective, the binary buddy algorithm is much worse than
things like first fit with LIFO free list or best fit. However, we use
it in the specialized case where we need very high performance involving
lots of message allocation and freeing across concurrent tasks
(sometimes in multiprocessors). As I have said, most people use fixed
block lists in such cases. These are hard to configure and tune,
however, to make sure you have the right size lists for the messages
your system uses. Binary
buddy gives you much more flexibility from this perspective. As for the
last comment about making it efficient, yes, it is hard. The thing is,
while the code is tricky to get right, it isn't very long when complete.
The big win with binary buddy is that it can be highly parallelized. The
version we have uses very little locking and gets great performance.
This is a specialized area; most studies (like the one I referred to
earlier) don't go into parellel algorithms. We should probably write
ours up.

I'd like to wrap up by repeating my apology. Unlike my original claim,
I'm now agreeing that the Wind River malloc has poor fragmentation
performance. However, with a one line change to how freed blocks are
linked onto the free block list, it can achieve very decent
fragmentation performance (changing the second parameter of the
dllInsert call that updates the free list from a NULL value to the
list's tail pointer). Sorry for the mistake. Fred

--

Fred Roeber

unread,

Feb 24, 1999, 3:00:00 AM2/24/99

to

Boy, Some days are just bad days. This morning I posted the message
that is quoted at below. It has a *** BIG *** mistake in it. I totally
switched references to FIFO and LIFO. I wish I could claim dyslexia
but... The correct version of the message should have the word FIFO
everywhere it says LIFO and vice versa.

Message from this morning with LIFO and FIFO reversed:

> I had wondered what all the flax about
> fragmentation of the WRS algorithm was since I had done a lot of work
> studying/measuring the algorithm for a major project I worked on a
> number of years back. I had found that it had reasonable fragmentation
> results compared to other good algorithms. I went back and checked my
> notes. As is so true in software, the devil is in the details. On this
> project we had VxWorks source. I had found that memPartAlloc used a
> first fit algorithm with a LIFO free list. That was known to be far
> worse in performance than using a FIFO free list so I had changed one
> line of code so that a LIFO free list was used. Of course this pretty
> much invalidates my comments about how good the normal VxWorks
> algorithm is. I had suggested this change to WRS but didn't have strong
> enough backup.
>
> Since then, some good research on different memory algorithms has
> backed up this result showing that a first fit with a LIFO free list
> behaves about the same as using a best fit algorithm but has better
> performance and is less susceptible to real bad performance under
> certain deviant but possible allocation scenarios. The LIFO algorithm
> that the real VxWorks malloc uses does show up as being significantly
> worse. One of the best comparisons of allocators I have seen is
> available from ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps. Its
> about 80 pages but has a very comprehensive review of the work on
> memory allocation done prior to 1995. It seems to be widely refered to
> these days.

From page 68 of the paper I give the FTP address for above the real
quote is:
To our great surprise, we found that best fit, address-ordered
first fit, and FIFO-ordered first fit all performed extremely
well -- and nearly identically as well.
...
For first fit and next fit, we note that the LIFO free list
order performed far worse than the FIFO free list order.

VxWorks does by default use the "wrong" type of list ordering (LIFO).
I had changed it to FIFO for my testing. VxWorks should be fixed.

From my notes, there is code in the VxWorks memPartFree routine
that adds blocks to the free list that does:

pHdr->free = TRUE; /* add new free block */
dllInsert (&partId->freeList, (DL_NODE *) NULL, HDR_TO_NODE (pHdr));

I have disassembled the 5.3 VxWorks code and it still makes the same
call for free list updates. The change I had made to this line of
code when I did performance testing back in 1992 was:

pHdr->free = TRUE; /* add new free block */
dllAdd (&partId->freeList, HDR_TO_NODE (pHdr));

Given the current interest in efficiency, this could be done slightly
faster as:

dllInsert (&partId->freeList, partId->freeList.tail, HDR_TO_NODE (pHdr));

I REALLY WISH I hadn't caused whatever confusion results by my
getting things backwards in my original post. It comes from having
my reference papers and books at home and my old (sort of sparse)
project notes at work. A poor excuse to be sure. I'll be more careful
in the future.

Fred

David Laight

unread,

Feb 25, 1999, 3:00:00 AM2/25/99

to

> Boy, Some days are just bad days. This morning I posted the message
> that is quoted at below. It has a *** BIG *** mistake in it. I totally
> switched references to FIFO and LIFO. I wish I could claim dyslexia
> but... The correct version of the message should have the word FIFO
> everywhere it says LIFO and vice versa.

Actually I think the first reference was right!

As Fred said:

A LIFO free list (as used by VxWorks) is horrid
A FIFO free list, and an address ordered free list are much better.

I'm surprised that the 'best fit' algorithm wasn't found to be significantly
better that the FIFO free list. Obviously it benefits from an ordered (eg
binary tree) free list.

Simply changing the vxworks code to be FIFO will have the side effect of
continuously cutting bits of the largest free fragment (as the list cycles).
(Fixable by making the allocate code always remove the item it will use from the
free list.)

I've just done some measurements on a real VxWorks Java app comparing best fit v
LIFO. I've sorted the free list into size order so the amount of fragmentation
is clear. With the 'best fit' the code ran on a different board with only 16Mb
of memory.

I'm not sure, but I suspect the difference in bytes allocated is because the
'best fit' code is more likely to give the caller a slightly oversize area
(because the part that would be freed is too small).

David

Best fit test
-------------

FREE LIST:
num addr size
--- ---------- ----------
6 0x10b41bc 16
2 0x10df9c0 20
3 0x10abe18 20
4 0x1081c64 20
5 0x10abeb0 20
20 0x10b4434 24
1 0x111fdb0 32
18 0x10df728 40
21 0x10b4534 40
19 0x771f18 48
23 0x111a274 204
7 0x10b45bc 908
8 0x110974c 920
9 0x10b9598 928
22 0x11193f8 1292
11 0x10b2d64 1352
16 0x1107604 3776
14 0x1127d78 5056
10 0x111a958 8072
13 0x1121df8 8392
12 0x11109cc 17708
15 0x10f5cbc 36168
17 0x1136740 49043640

SUMMARY:
status bytes blocks avg block max block
------ --------- -------- ---------- ----------
current
free 49128696 23 2136030 49043640
alloc 12340316 11403 1082 -
cumulative
alloc 22283572 45495 489 -

The 'average block' figures are skewed by the 8Mb Java heap. Excluding it
gives 346 for current and 305 for cumulative average size. The average size
of freed items is only 291.
The implication is that there are a lot of small allocates.

---------------------------------------------------------------------------

standard vxworks
----------------

FREE LIST:
num addr size
--- ---------- ----------
41 0x34a6670 16
42 0x348e144 16
43 0x34777b4 16
45 0x3475dac 16
48 0x348f390 16
56 0x3504e4c 16
69 0x34b329c 16
75 0x34d7c70 16
24 0x34a7448 20
25 0x347786c 20
26 0x3f60e80 20
44 0x3f60dfc 20
54 0x3505044 20
55 0x349aeac 20
71 0x34adfd0 20
74 0x3f60f94 20
97 0x3f68da4 20
93 0x3f75214 28
1 0x343a194 32
30 0x348df44 32
58 0x349c4d4 32
68 0x3f79340 32
70 0x34b3310 32
73 0x34e17a0 32
77 0x34e2e54 36
60 0x34a6188 40
65 0x34ae358 40
91 0x3f88ed4 52
40 0x348f1a0 64
7 0x34894f8 68
47 0x34a61f8 72
35 0x34921f8 76
4 0x345f9c4 80
9 0x3492504 80
31 0x3491e2c 80
59 0x3497654 80
53 0x349aed8 84
84 0x3f88f88 84
46 0x349234c 88
96 0x3f68b8c 96
18 0x34402b0 112
67 0x34e1698 112
57 0x3498d28 116
6 0x345f748 128
95 0x3f68c68 136
49 0x348db04 148
99 0x3f66c60 152
39 0x3492408 156
63 0x34a0258 192
36 0x34a793c 208
92 0x3f60c9c 240
50 0x34920d4 272
81 0x34fef70 284
52 0x3504e74 328
80 0x34f25e0 336
2 0x348e374 340
76 0x34d7af0 344
32 0x3491eac 392
98 0x3f66d84 392
101 0x3f66a20 400
8 0x34a7188 440
82 0x3529618 440
34 0x34a7ff4 532
16 0x34318b0 544
20 0x344b474 544
79 0x34f4860 544
85 0x3511648 556
103 0x3fa2a00 652
14 0x3429968 672
83 0x35149a8 728
100 0x3f653ec 860
37 0x34a753c 864
38 0x343b93c 916
94 0x3f5f7b4 924
12 0x341fcf8 1040
61 0x349a80c 1112
62 0x34a172c 1140
22 0x344f21c 1152
88 0x352e0c0 1152
78 0x34e11d0 1156
27 0x3435ce0 1220
51 0x34918b4 1288
104 0x3db6f64 1464
33 0x34a6a60 1668
90 0x3de9f3c 1680
10 0x345f040 1764
89 0x3ddbc1c 1812
11 0x342c6b8 1848
28 0x343af3c 1848
102 0x3ebf2b4 2308
72 0x34b2280 2412
5 0x342ddf4 2444
21 0x343e698 2512
86 0x351cd98 2868
3 0x3438534 3196
66 0x34af3f0 3224
17 0x342fa08 3232
13 0x341d900 3312
29 0x344b6e4 3312
87 0x35259e8 3360
23 0x3447168 3420
64 0x34a5130 4132
15 0x3427440 4384
19 0x3450e64 55856
105 0x560e80 48992184

SUMMARY:
status bytes blocks avg block max block
------ --------- -------- ---------- ----------
current
free 49129152 105 467896 48992184
alloc 12339880 11403 1082 -
cumulative
alloc 22053824 43012 512 -

Fred Roeber

unread,

Feb 25, 1999, 3:00:00 AM2/25/99

to

David Laight wrote:

> > Boy, Some days are just bad days. This morning I posted the message
> > that is quoted at below. It has a *** BIG *** mistake in it. I totally
> > switched references to FIFO and LIFO. I wish I could claim dyslexia
> > but... The correct version of the message should have the word FIFO
> > everywhere it says LIFO and vice versa.
>
> Actually I think the first reference was right!

I'm hoping to put this disastrous thread to bed. Dave is right. In my
description the very first reference to LIFO and the first reference to
FIFO were correct; all the rest were wrong.

> A LIFO free list (as used by VxWorks) is horrid
> A FIFO free list, and an address ordered free list are much better.

This is the main point I wanted to make.

> I'm surprised that the 'best fit' algorithm wasn't found to be significantly

> better than the FIFO free list. Obviously it benefits from an ordered (eg

> binary tree) free list.
>
> Simply changing the vxworks code to be FIFO will have the side effect of
> continuously cutting bits of the largest free fragment (as the list cycles).
> (Fixable by making the allocate code always remove the item it will use from the
> free list.)

I just wanted to suggest a simple fix (one that people could even patch
into the code to try). The study of memory allocators has been a big
subject since computers were first used. Some people spend a lot of time
on the issue and have done a lot of research. The end result of the
research is that the things that seem "intuitively obvious" don't often
hold up in real situations.

There has been a whole study of the issue of handling the largest free
block (which is cutely referred to as the "wilderness area" in the
research). Without good study, I wouldn't want to do anything more than
changing the free list from straight LIFO to straight FIFO through the
one line change I had suggested. Small changes in implementation (as
David is suggesting) can have big changes in behaviour of the allocator.
He very well could be right but...

People interested in better behaviour in a memory allocator really
should check out the allocator reference that Hwa Jin Bae provided the
other day (http://gee.cs.oswego.edu/dl/html/malloc.html). This is an
allocator that has been widely used and carefully tuned for over 10
years. It has been widely measured by researchers.

I think this has been a useful thread; especially if Wind River could be
influenced to make a simple fix to their code that would help out
everyone. Due to the blunders I have made in this thread, however, I
want to put it behind me. Later. Fred

Keith Buchanan

unread,

Feb 26, 1999, 3:00:00 AM2/26/99

to

> I'm hoping to put this disastrous thread to bed. Dave is right. In my

Before it gets put to bed, I'd like to back off from my earlier comments.
Some said 90% of their work was not hard real-time (porting Unix apps and
such) and thus they were interested in the performance of the malloc/free.
My experience is in the opposite direction, way over 90% of my stuff is hard
real-time. Watching this thread, I can see how other users would want the
heap code upgraded. I hope WRS does it though I don't need it.

Hwa-Jin Bae

unread,

Feb 27, 1999, 3:00:00 AM2/27/99

to

Fred Roeber <fro...@bbn.com> wrote:

> People interested in better behaviour in a memory allocator really
> should check out the allocator reference that Hwa Jin Bae provided the
> other day (http://gee.cs.oswego.edu/dl/html/malloc.html). This is an
> allocator that has been widely used and carefully tuned for over 10
> years. It has been widely measured by researchers.

When I get a chance I will dig up some old code that essentially is
a port of this Doug Lea's malloc onto VxWorks and make it available
in my VxHacks repository. It really does wonders in terms of
avoiding fragmentation in an embedded system, and is pretty fast.
Stay tuned.

--
Hwa-Jin Bae
PSO Systems, Inc.
http://www.pso.com
mailto:h...@pso.com

Hwa-Jin Bae

unread,

Feb 28, 1999, 3:00:00 AM2/28/99

to

Leonid Rosenboim <leo...@bitband.com> wrote:
> Just a brief follow-up:

[snip... something about memPartAlloc hack to overcome memory problem]

Yes, it is a well known hack to avoid the memory fragmentation problem.
Many people use this hack. However, it is hardly a clean solution.
You can still fragment the partition that you have created, and
the threshold of bad fragmentation behavior will persist.
There are better solutions.

Leonid Rosenboim

unread,

Feb 28, 1999, 3:00:00 AM2/28/99

to Hwa Jin Bae

Just a brief follow-up:

Hwa Jin Bae wrote:

> [snip]

>
> The partitioning scheme, as Leonid suggests, solves some problems and is
> quite useful as well. Unfortunately, VxWorks memory partitions, once
> created cannot be destroyed. This makes it less desirable. Plus it
> requires a lot of changes in ported Unix code base and is not as easy to
> do.

Indeed being unable to destroy or even reduce the size of a partition is a
drawback.
However, there is a way to avoid modifying a large code base (e.g. when
porting
from UNIX) by means of creating a replacement for the ANSi functions:
malloc(),
calloc(), realloc() and free(), such that these function call mamPartAlloc()
with a
particular partition ID deduced from the SIZE of the block requested, and
can also
automatically increase the size of the selected partition with
memPartAddToPool(),
borrowing more memory from the main partition. The replacement free() has to

peek into the blocks overhead header to see its size and return it to the
appropriate
partition.

This has been already done by at least one customer without access to WRS
sources.

Leonid

Leonid Rosenboim

unread,

Feb 28, 1999, 3:00:00 AM2/28/99

to Keith Buchanan

Keith Buchanan wrote:

> [...snip...]

> Geez, this is laying it on a little thick don't you think? I tend to
> believe any designer who allocates and frees memory willy-nilly in a
> real-time application is a bug ;o).
>

Used to be true, but now with Object Oriented design techniques (either C++ or

good old C) this is a different world out there now. Besides, applications
grew
much more complex, and requires a different way of thinking. I used to share
Keith's view some years ago, but I changed my mind.

>
> You cannot get something for nothing. The more elaborate the memory
> allocation algorithms, the slower the functions that allocate and free
> memory and the larger the footprint of the OS

If you believe this, you should be still using bubble-sort ... ;-)
In other words, I don't agree with this either - the OS is scalable,
and WRS should have offered a more sophisticated memory allocation
library AS AN OPTION. Much better algorithm was available from GNU
emacs for over 10 years now, which is much faster indeed, and only is slightly

larger in footprint. I am sure that there are many applications which would
gladly
pay for the extra footprint to get better allocation efficiency and less
fragmentation.

> [snip]

>
> Wind River had to compromise between complex alogrithms with a
> large footprint and the current implementation. In addition, many of
> the choices for algorithms are not deterministic. The functionality
> provided is reasonable and my opinion of it more typical.
>

No, they did not have to compromise - they should offer both (or more) of
them.
For example, for the management of the Ready Queue, there is an option (which
is
turned on by default) to add an extra hash table to make Ready Q operations
require constant time (its actually constant insertion time), so those
applications
which want to save memory and have slower kernel operation can turn this
option
off. So why not doing the same with memory allocation ?

By the way, there is a fixed-block memory allocation library in VxWorks which
is not even documented - bufLib. It was done for internal needs, but there is
plenty
of uses for it in applications too, but they didn't bother to include its
manual
pages in the product. This only confirms the feeling that many of you have
that
such shortcomings are simply due to lack of attention to the customers needs,
or plain laziness.

Adios Amigos,

Leonid