Object pooling in HXCPP

426 views
Skip to first unread message

Hugh

unread,
May 16, 2012, 11:55:46 AM5/16/12
to haxe...@googlegroups.com
Hi,
Has anyone done any experiments that show that object pooling has and performance advantage with hxcpp?
I have just done a few experiments, and it seems that cpp spends not much more that 7% time in the allocator/collector, which does not leave a lot of room for optimisation, even if this time was completely eliminated.
The other effect of "memory coherency", where objects that are used together stay together, could be more significant, especially with the small caches on mobile devices.
So I'm looking for hard data/examples of things that are improved by object pooling, so that I can improve the GC along these lines.

Hugh

Raoul Duke

unread,
May 16, 2012, 1:33:41 PM5/16/12
to haxe...@googlegroups.com
hi,

i have mucked around with reusing objects via pools, across all the
targets i use (hxcpp, flash, js, neko). empirically / subjectively, it
seemed to help reduce pauses (presumed to be gc pauses) when i had a
bunch of particles in one game, but in another game i ended up with a
system that was even more slow. (but that could have been because i
ended up with a lot of anonymous functions coming and going, which
probably generated enough garbage themselves to cause trouble.)

i can't really say anything about what the underlying behaviour was
since i never figured out how to get good profiling going. at best i
vaguely got dtrace to show things on mac os, but even then it wasn't
clear to me what was going on.

i would dearly like to be set up to profile things. i generally want
to be able to "just" draw lines and filled triangles, and to just do
my own full frame draw, rather than use sprites etc. and it seems to
go down the toilet performance-wise on android 2.2 droid 2.

Max

unread,
May 16, 2012, 2:40:16 PM5/16/12
to Haxe
7% can be

- performance difference between the Intel and AMD
- 7% less energy needed
- 7% less pollution around the world :)

I more interested in what you said earlier: " The main differences
come with object-access patterns. C++ code can avoid "allocs" by
putting stuff on the stack, but flash (and hxcpp) can't. So it is the
object creation stuff that is slower, not really the numerical stuff."

Why hxcpp cant' put stuff on the stack?

Philippe Elsass

unread,
May 16, 2012, 3:37:16 PM5/16/12
to haxe...@googlegroups.com
Isn't the problem, as you said, about the memory cache? ie. if the pool is kept under the cache limit it would be faster, but if it grows too much it's going to be slower.

In my RunnerMark, each "test" is creating hundreds of objects (for the drawList wrapper), then everything is thrown away to be GC'd. All that obviously doesn't represent a lot of data - a better test should be playing music & sound effects at the same time. That said successive tests have given identical "scores" on all test platforms (iOS, Android) and memory seemed to behave cleanly (profiled on iPad1 it would stay at around 14Mb RAM).

On the other side, I asked Andreas Ronning to share his test which he said ran great on Android but complete crap on iOS.

Hugh

unread,
May 16, 2012, 8:00:23 PM5/16/12
to haxe...@googlegroups.com
Things can't be put on the stack because it is a garbage collected language.
If you pass, say, a Point to a function and that function decides to hold on to it (eg, store it in a static variable) and you leave the calling function,  the stack will get wiped, invalidating the stored point.
Haxe would need some kind of "pass by value" variable (like a "struct") so that copies are taken on the stack, rather than references taken when variables are passed.

Obviously I would like to drop the 7% to 0% or even 5% - but you should not optimise anything without profiling.  I have a feeling pooling will make things worse due to the extra bookkeeping - hence my question.

Hugh

Max

unread,
May 16, 2012, 8:41:41 PM5/16/12
to Haxe
> Things can't be put on the stack because it is a garbage collected language.
> If you pass, say, a Point to a function and that function decides to hold
> on to it (eg, store it in a static variable) and you leave the calling
> function,  the stack will get wiped, invalidating the stored point.

Yes, but if Point is only a local object, not passed outside of the
scope, then there is no reason to avoid stack, e.g.

CPoint point;

against

CPoint* point = new CPoint();

I hope hxcpp is already doing that.

> Haxe would need some kind of "pass by value" variable (like a "struct") so
> that copies are taken on the stack, rather than references taken when
> variables are passed.
>

If I remember clearly, you can "pass by value" by doing

someFunc(CPoint point)

against

someFunc(CPoint& point) or someFunc(CPoint* point)

However, I think "pass by value" in above example will use stack again.

Hugh

unread,
May 17, 2012, 1:12:24 AM5/17/12
to haxe...@googlegroups.com
Hi,
You can pretty much tell exactly what hxcpp is doing by the "new" call.  If you write "new" in the haxe code, then there is a new in the cpp code.
I think Nicolas has done some optimisations to eliminate the new if possible via some tricky inlining, but this is no trivial task, since you have to track the side-effects of what goes on in the constructor and all subsequent function calls etc.  I'm sure he would welcome a patch if you have one :)

c++:
 CPoint point(1,2); 
someFunc(point) ; // ...  sets point.x = 10
trace(point.x); //= 1 (pass by value)

haxe:
var point = new CPoint(1,2); 
someFunc(point) ; // ...  sets point.x = 10
 trace(point.x); //= 10 (pass by reference)

What you want:
var _tmp_point_x = 1; // stack
var _tmp_point_y = 2; // stack
 someFunc(_tmp_point_x, _tmp_point_y); // stack
trace(_tmp_point_x) ?? 

It is theoretically possible to do some of this, but what would you expect  _tmp_point_x to be?
I think some kind of language extension for "struct" could allow the _tmp_ version to be created (and leave point_x at 1).
It could possibly even be done with macros?

class Point implements lib.CopyByValue
{
   ...
}

I leave this as an exercise to the reader.

Hugh


 

Max

unread,
May 17, 2012, 4:12:45 AM5/17/12
to Haxe
> What you want:
> var _tmp_point_x = 1; // stack
> var _tmp_point_y = 2; // stack
>  someFunc(_tmp_point_x, _tmp_point_y); // stack
> trace(_tmp_point_x) ??
>

This is not what I want. I'm definitely missing how inner haxe-
>backend conversion really works, but above example is suitable as
implementation at the highest level: Haxe itself (to force backend
using stack where available).

In case of cpp, situation like this

{
var point = new Point();
point.blabla;
}

can be converted into CPoint point, since point is never passed out of
the current scope (while in your example it is).

Furthermore, my example only needs to check the current scope, while
your example must check all the inner scopes to see if reference to
the point will be stored somewhere, making "stacked" version not
applicable.

And finally, cpp compiler might optimize "CPoint point" example by not
pushing variables again onto the stack (by passing stack pointer and
using peek instead of pop) if code is using "pass by value".

Cauê Waneck

unread,
May 17, 2012, 8:14:16 AM5/17/12
to haxe...@googlegroups.com
Hey Hugh!

I was thinking about stack allocation on hxcpp, and here's some tricks we could do:

  1. private typedef StackAllocated<T> = T //this would be for internal compiler use:
    • Run an AST pass which does a escape analysis. This analysis could go as far as static functions or functions from @:final classes and analysing if the value is stored.
    • Convert any allocated object which won't escape the context from MyType to StackAllocated<MyType>
    • On new(), see if the type is StackAllocated. If it is, allocate in stack, and either get its pointer reference, or change the field access to use "." instead of "->" for StackAllocated types
  2. Just like C#, create a struct type like you suggested with lib.CopyByValue. This one would be a little more tricky since IMO it would need a better generics support
    • Array<StructType> -> struct array;
    • C# allows you to make references to stack locations, with a simple rule that you can't return these kind of references from a function, and you can't also store these references in an object. This way it's always safe

Cheers!
Cauê

Hugh

unread,
May 17, 2012, 9:21:06 AM5/17/12
to haxe...@googlegroups.com
Hi,
Implementation-wise, I think it could be done with the existing pointer class, but with a stack buffer, like

Point point = new Point_obj(1,2)

becomes

StackBuffer<Point> buffer;
Point point = new Point_obj(buffer, 1,2)

and nothing else need change much.

But I don't really see many cases where this will work, since it is the returning of the "new' variable that is interesting.
Here is where I think the stack would be useful:

var point = p1.add(p2).add(p3);

The "add" will "new" off new points.  These could theoretically be eliminated but I'm not sure how.  I guess if "add" were sufficiently inlined, it may reduce to StackBuffer case.

Hugh

Cauê Waneck

unread,
May 17, 2012, 10:04:19 AM5/17/12
to haxe...@googlegroups.com
yes, returning them would be a more tricky case.
What could be done is to turn add into a void function, and the returning type be an argument, which may refer to the calling function stack.

e.g.

var p1 = new Point(a,b);
var p2 = new Point(c,d);
var p3 = new Point(e,f);
var point = p1.add(p2).add(p3);

becomes (c code, sorry):

Point _p1 = (Point) { a, b };
Point *p1 = &_p1;
Point p2 = (Point) { c, d };
Point *p2 = &_p2;
Point p3 = (Point) { e, f };
Point *p3 = &_p3;

Point tmp;
Point *tmp_addr = &tmp;
point_add(p1, tmp_addr);
Point _point;
Point *point = &_point; //possible optimization: reuse tmp
point_add(tmp_addr, point);


Stack pointers are safe to use if we never return them or store them in any fields. That's how C# handles them.


When I was thinking about a possible C target, I was thinking about a very exciting way to deal with generics that had the same underlying logic. This way would allow us to use safely use stack pointers while not needing to provide a custom implementation for each memory layout.



2012/5/17 Hugh <game...@gmail.com>
--

Hugh

unread,
May 18, 2012, 1:29:22 AM5/18/12
to haxe...@googlegroups.com
Yes, I think we are on the same wave-length.

As for add's that take a slot for the result, I think this is probably a better way of doing "object pooling", ie have some tmps lying around that accumulate the partial results. This could be done today without any backend changes. However, not as nice to write :(

Maybe you could have a optional buffer add:

var tmp1 = new Point();
var tmp2 = new Point();

for(i in 0...1000000)
{
    setResult(  p1.addBuf(p2,tmp1).addBuf(p3,tmp2).clone() ); 
}

but this requires significant changes to the algorithm. Or perhaps are are thinking of using some magic to do this automatically?

Hugh


Raoul Duke

unread,
May 18, 2012, 12:38:51 PM5/18/12
to haxe...@googlegroups.com
On Thu, May 17, 2012 at 10:29 PM, Hugh <game...@gmail.com> wrote:
>     setResult(  p1.addBuf(p2,tmp1).addBuf(p3,tmp2).clone() );
> but this requires significant changes to the algorithm. Or perhaps are are
> thinking of using some magic to do this automatically?

ah, reminds me of back when i was asking on the list how i could
implicitly pass in the pool :-)

($0.02 i mucked around with a few small variations on the themes and
in the end the best thing for me and my code was to have explicitly 2
versions of a given type, 1 for gc, 1 for pooling, and the pooling one
could be constructed from the gc one, but also given a pool in the
constructor, and then the add / subtract / normalize / etc. calls
would return values out of that pool. then somebody would have to know
when to recycle things. one pool was a frame pool, just recycleAll().
in other situations i had to explicitly track when to pool.recycle(
instance ) which of course means we're back in malloc/free land which
is no fun.) (having the 2 different variants lets me experiment with
code to see where i f'd up with the use of pooling.)

laurens...@gmail.com

unread,
Jan 2, 2014, 10:54:29 AM1/2/14
to haxe...@googlegroups.com
Sorry to resurrect this thread,
but I can't find a better post about Haxe GC and possible solutions.
We have a game that is affected by the small pauses that the GC does,
we delayed a bit the GC pause reusing the matrix Array used to do the render (using Tilesheet)
and I have been looking for ways to improve my game further,

AFAIK the only "safe" variables are ints and floats, Is this correct?

I still miss a way to return objects, I can use callbacks but that seems a bit off.
Also I was reading the generated C++ code and found that a lot of overhead is introduced to support Reflection,
Static constants are not static at all, inline functions are in some cases not respected.
Any tips or additional pointers to similar threads would help me a lot.

Thanks!

Raoul Duke

unread,
Jan 2, 2014, 1:43:25 PM1/2/14
to haxe...@googlegroups.com
> AFAIK the only "safe" variables are ints and floats, Is this correct?
>
> I still miss a way to return objects, I can use callbacks but that seems a
> bit off.
> Also I was reading the generated C++ code and found that a lot of overhead
> is introduced to support Reflection,
> Static constants are not static at all, inline functions are in some cases
> not respected.

it seemed to me that there's always boxing/unboxing of Double so using
NME/OpenFL showed a lot of Double collection in Apple Instruments, if
I understand what I was looking at at all, which I might not. :-) The
overall time, I think, wasn't a lot really but it was one of those
things that just felt frustrating :-}

i wish i had the time + brains + money to be able to learn how hxcpp
works and try to help document and/or contribute to the code. but for
the most part i have none of those things in sufficient quantity.
Reply all
Reply to author
Forward
0 new messages