hxScout is coming to hxcpp! (and some questions)

875 views
Skip to first unread message

Jeff Ward

unread,
Jan 8, 2015, 10:19:49 AM1/8/15
to haxe...@googlegroups.com
Hi all,

In case you haven't seen it on Twitter, I'm working on support for profiling of hxcpp apps with hxScout. I just finished pulling the stack samples out of Hugh's Profiler class in Debug.cpp and pushing them out to hxScout:


I'd appreciate any thoughts on where to find certain data (if it's even possible):

- Allocation data (types, sizes)?  GC.cpp?
- Total memory and CPU(s) usage stats?

Considering color coding the frame timing data:

- In Lime, could I put timing measurement code around (almost) anywhere user-code gets invoked? Such as a global event dispatcher for Timers, mouse/keyboard events, EnterFrame, etc?
- Is Lime's Renderer.hx:render() function the best place to measure rendering time?

Also, I plan to make support for other frameworks (besides Lime / OpenFL) pretty straightforward to implement.

Thanks,
-Jeff

Joshua Granick

unread,
Jan 8, 2015, 2:03:02 PM1/8/15
to haxe...@googlegroups.com
Hey Jeff,

Congratulations on all the great work you've done! :)

I believe you can tell how much HXCPP has allocated using cpp.vm.Gc.memUsage, but this does not include any addition allocations done, say, by a native NDLL outside of HXCPP. I wonder if there's an OS-specific way to track this for the current process, or if we could track it manually, somehow.

I'm not sure about allocation types or sizes, particularly, but perhaps HXCPP has this data available in there, somewhere ;)

So long as it does not affect performance, I'd be happy to have frame timing code. Perhaps if we had this behind an "advanced telemetry" define (matching what is used for Flash) then we could make this consistent, and optional. Yeah, you would want to handle this around Application, Window, Renderer, MouseEventManager, TouchEventManager and KeyEventManager, where they dispatch events. Yep, "render" would be the right spot, similar to the JS stats timing code there
--
To post to this group haxe...@googlegroups.com
http://groups.google.com/group/haxelang?hl=en
---
You received this message because you are subscribed to the Google Groups "Haxe" group.
For more options, visit https://groups.google.com/d/optout.



Lars Doucet

unread,
Jan 8, 2015, 2:09:58 PM1/8/15
to haxe...@googlegroups.com
I'm not as smart as anyone else here, but I've used Sleepy before to profile my Haxe apps:

http://philippe.elsass.me/2012/06/nme-profiling-your-app-performance/

(That tutorial is old but you can update it mentally to work with latest openFL).

Does running through that give you any insights into information you can find?

Raoul Duke

unread,
Jan 8, 2015, 2:18:10 PM1/8/15
to haxe...@googlegroups.com
so in my previous experience, Instruments was great for iOS. But
trying to get anything working with Android was so annoying that I
never did it. Which was bad because Android was where I was seeing
what appeared to be GC issues that I didn't see on iOS. Which I never
figured out or debugged. Which led me to sorta giving up on NME. The
problem is that it looked like an uphill climb to get Android NDK
stuff working on a Mac. Anybody have experience with it? Is it easier
to get some kind of profiling set up for Android on Linux, or Windows?
Have people tried to debug the hxcpp gc on Android? Thanks for any
pointers! I do miss Haxe a little bit ;-) [currently I'm mostly
looking at typed-lua and some engine (Corona, Cocos, SDL, etc.)]

Lars Doucet

unread,
Jan 8, 2015, 2:33:12 PM1/8/15
to haxe...@googlegroups.com
I believe people have been able to use crashdumper (http://www.github.com/larsiusprime/crashdumper) on Android, but that just catches (most) crashes, not in-depth profiling information.

Jeff Ward

unread,
Jan 8, 2015, 5:57:57 PM1/8/15
to haxe...@googlegroups.com
Thanks for the info, Joshua. Yes, I was planning on #if def'ing any changes to lime / openfl. Once I'm finished, what do I do to get these changes into the release cycle of hxcpp and lime / openfl?  I assume I'll submit pull requests on their respective github repos. I don't guess I'll know what version of hxcpp my new haxelib will be depending on until it's been released?

I noticed Hugh mentioned (on Philippe's post) that you may want to profile in release mode with -D debuglink (now HXCPP_DEBUG_LINK) so I'll look into that, though it seems like that parameter only affects the Windows toolchain.

Anyway, that's all a bit tactical. Thanks, and I'll let you know if I have any other questions.

Best,
-Jeff

Joshua Granick

unread,
Jan 8, 2015, 7:02:07 PM1/8/15
to haxe...@googlegroups.com
Yep, this is probably a pull request to Lime, should be able to roll out quickly once you're sure what code needs to go in. I believe HXCPP_DEBUG_LINK is fairly universal, there was a bug for Lime/iOS that prevented this from being pushed in the C++ build, but everything else should be fine (and this was resolved).

Hugh

unread,
Jan 8, 2015, 8:58:41 PM1/8/15
to haxe...@googlegroups.com
Hi,
HXCPP_DEBUG_LINK is needed on windows for native (VisualStudio/very sleepy) debugging/profiling.
For built-in profiling (Debug.cpp) you will probably want  HXCPP_STACK_TRACE, which will give you function-level profiling.  You may also want 

HXCPP_STACK_LINE, which will give information about which line in which function is being used, but this will incur more performance penalties and may invalidate the profiling to some extent.

There are some more details here: http://gamehaxe.com/2012/09/14/hxcpp-built-in-debugging/

Hugh

Nicolas Cannasse

unread,
Jan 9, 2015, 4:01:26 AM1/9/15
to haxe...@googlegroups.com
Le 08/01/2015 16:19, Jeff Ward a écrit :
> Hi all,
>
> In case you haven't seen it on Twitter, I'm working on support for
> profiling of hxcpp apps with hxScout <http://hxscout.com>. I just
> finished pulling the stack samples out of Hugh's Profiler class in
> Debug.cpp and pushing them out to hxScout:
>
> https://twitter.com/Jeff__Ward/status/552362763834908672 (first hxcpp
> screenshot)
> https://twitter.com/Jeff__Ward/status/553087870378844160 (samples GIF/video)

Great work !

I've been using the original Scout for our games and it's a great way to
profile and find performance bottlenecks.

If we can get the trace output, the memory allocations per-frame (with
stack) and the profiling stack that will be great.

BTW, it seems that Scout is using a statistical measurement for CPU,
which gives better results since it doesn't cause an overhead on a
per-function call, see:

http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29#Statistical_profilers

Looking forward for the 1.0 version :)

Best,
Nicolas

Jeff Ward

unread,
Jan 9, 2015, 5:46:51 PM1/9/15
to haxe...@googlegroups.com
Joshua - I'm having a very hard time setting up a dev environment for Lime. With hxcpp, I checkout, point haxelib at it, change code, and it all works. With Lime, it seems like there's an intermediate rebuild step (not to mention a legacy and current version) -- and no code I change (even with rebuild lime) seems to be making into my project.  Any pointers?

Thanks for that list, Nicolas. A couple questions about how you use Scout:

1) I assume you use the top-down / bottom-up in both profiling and memory?
2) I assume you use sorting in those views?
3) I forget, is there an allocation snapshot/diff feature in Scout, or was that another tool?  Seems useful.

Now a general Haxe question regarding multiple threads - if I setup the profiler in various threads, I see that I get different threads from the hxcpp profiler and they each connect to hxScout separately -- perfect. But in a non-primary thread, if I try to setup an event loop callback, either with haxe.Timer or OpenFL stage ENTER_FRAME event, the callbacks for both those seem to happen on the main thread (or at least, they access the hxcpp Profilers / Callstack of the primary thread). Any idea why that is, or how I can setup an event loop in a secondary thread?

Best,
-Jeff

Jeff Ward

unread,
Jan 9, 2015, 5:56:53 PM1/9/15
to haxe...@googlegroups.com
Oh, and Nicolas, are you using Lime, OpenFL, or your own framework typically? I'm planning on implementing generic hooks for measuring timing start/stop (for GC, user, render, etc), but then also some default implementations for Lime.

Best,
-Jeff

Hugh

unread,
Jan 9, 2015, 8:54:14 PM1/9/15
to haxe...@googlegroups.com
If you are using the built-in hxcpp profiler, then you are also using statistical sampling, since the expensive timing operations are happening in a separate thread, and not every call is being counted.
The timer/onEnter events are on the main thread by design.  It may be easiest for you to simply setup a new thread and do either "while(true) { Sleep(1); ... }", or wait in a "Thread.readMessage", and fire messages from the onEnter function.

Depending on exactly what you are trying to do, you may instead want to use the existing profiling loop, "ProfileMainLoop" and add an optional callback to this routine that gets run every time the profile clock ticks over - although you should not run any haxe code from this thread, and any operation should be quick so as not to upset the timing.

Hugh

Joshua Granick

unread,
Jan 9, 2015, 10:26:45 PM1/9/15
to haxe...@googlegroups.com
Hi Jeff,

Here are the instructions:


Once you have it cloned from GIT, and set it using "haxelib dev", you use "lime rebuild <target>" or "lime rebuild <target> -Dlegacy"

If you are testing on Windows, for example, you'll want to run both "lime rebuild windows" and "lime rebuild windows -Dlegacy", and then you should be ready to go
--

Nicolas Cannasse

unread,
Jan 10, 2015, 7:37:24 AM1/10/15
to haxe...@googlegroups.com
> *Thanks for that list, Nicolas*. A couple questions about how you use Scout:
>
> 1) I assume you use the top-down / bottom-up in both profiling and memory?

Yes. That's very useful.

> 2) I assume you use sorting in those views?

Yes. Mostly by time spent / alloc count

> 3) I forget, is there an allocation snapshot/diff feature in Scout, or
> was that another tool? Seems useful.

Not that I know.

> Oh, and Nicolas, are you using Lime, OpenFL, or your own framework
> typically? I'm planning on implementing generic hooks for measuring
> timing start/stop (for GC, user, render, etc), but then also some
> default implementations for Lime.

ATM I'm using my own framework Heaps.io with SWF/AIR output. Heaps also
supports WebGL output and NME is in the works (not fully supported yet).

I'm sure I'll use hxscout if I want to get profiling for CPP output.

Best,
Nicolas

Jeff Ward

unread,
Jan 10, 2015, 5:37:17 PM1/10/15
to haxe...@googlegroups.com

Hugh> if you are using the built-in hxcpp profiler, then you are also using statistical sampling ... you may instead want to use the existing profiling loop, "ProfileMainLoop"

Right, building into the existing Profiler is exactly what I did__hxcpp_start_flm_profiler starts Profiler with a dump filename of "FLM", then it collects in a different mode. And rather than a callback (maybe that'd be better?) I have Haxe collect these stats every frame by calling C++ dump functions, then sending the data to hxScout. I'm happy with this if you're happy.

Hugh> The timer/onEnter events are on the main thread by design.  It may be easiest for you to simply setup a new thread and do either "while(true) { Sleep(1); ... }", or wait in a "Thread.readMessage", and fire messages from the onEnter function.

Well, I'm asking about multi-threaded event loops for hxScout users to be able to profile their multi-threaded Haxe applications. Each Haxe thread initially gets its own Profiler, but creating an event loop in Haxe to poll the Profiler caused them all to go back to polling the primordial Profiler. Anyway, this is a low-priority issue, and I'll ponder it.

After pouring over the code last night, here's my thoughts for collecting data for the memory allocation object tracking feature, feel free to comment:
  • In Object.h, the operator new function seems to be called for every Haxe object (except Strings, which are elsewhere). In this function, I'll store the inSize and object reference in a global location.
  • In Profiler's PushStackFrame (gated by #ifdef of course) monitor the top of the stack for "*.new" - aka Haxe Object constructors. When you hit a Haxe Object Constructor, pick up the reference and size stored from Object.h
  • You now have the reference, the size, and the Haxe allocation stack. Rejoice. But not for long, 'cause you're not done yet...
  • Watch for re-allocation (size update, e.g. a growing Array) on your objects. I assume / hope GCInternal.InternalRealloc's void* inData is the Object reference.
  • Watch for delete() from Object.h.
  • I think String and maybe even Array did not come through Object.h new operator, and may need special handling -- I'll have to double check, and look for other outliers. And like Joshua said, anything in a DLL/SO that isn't an hx::Object wouldn't be counted.
Does it seem like this will work?

Oh, and Nicolas, I noticed that Scout has a "hide deallocated" toggle. I think this gives a good loitering object check. If you make a selection of multiple frames, then turn on "hide deallocated", I think you'd be left with only loitering objects (created within the selection, and not yet deleted).

Joshua - I was working from that page, but I'll try again and let you know how it goes.

Thanks, guys.

-Jeff

Nicolas Cannasse

unread,
Jan 10, 2015, 8:11:28 PM1/10/15
to haxe...@googlegroups.com
> Oh, and *Nicolas*, I noticed that Scout has a "hide deallocated" toggle.
> I think this gives a good loitering object check. If you make a
> selection of multiple frames, then turn on "hide deallocated", I think
> you'd be left with only loitering objects (created within the selection,
> and not yet deleted).

Yes, but I'm not that much concerned in general with leaks, they are
actually quite rare with the proper architecture.

However performance hit because too many allocation and GC happens more
often in complex apps such as games.

Best,
Nicolas

Hugh

unread,
Jan 11, 2015, 3:09:22 AM1/11/15
to haxe...@googlegroups.com
Hi,
You can monitor the size of all object allocations from the gc function 
void InternalNew(int inSize,bool inIsObject)
This does not however directly know the name of the class being constructed, just the size.  For arrays, InternalRealloc is also used.
These should already be showing up in the profiler log, with the size hacked into the "line" field for now.
The object delete() function will never get called - this is the nature of GC.

Strings are different - the are not actual objects, more like "structs".  The char * data to which they point, may or may not need allocating.  For constant strings, the pointer is simply copied.  Something like String + String will allocate a buffer to store the result, but the "String" struct will be stored on the stack.

For multi-threaded profiling, I think you need to take a different approach to reading the result back.  Getting them to dump something in an "onFrame" does not make much sense, since it could be some kind of socket call that lasts for minutes, and you can't interrupt this.  Instead, you need the profiling thread to read the profile data from the other threads, with a suitable lock to prevent the other threads from updating the data while you are trying to read it.  The mProfiler is setup by the current thread, essentially "profiling itself"  - but this does not have to be the case.  One thread could control the mProfiler for all the threads, creating the object and gathering the results when they are needed.  Or, it could set a global flag "profiling wanted", and each of the threads could create their own mProfiler next time they are in a "PushStackFrame" call.  The mProfiler class could then maintain a list of active mProfilers and gather all the results from your onFrame callback.

Hugh

Jeff Ward

unread,
Jan 12, 2015, 11:11:42 AM1/12/15
to haxe...@googlegroups.com
Hugh, can you tell me if the dump functions here copy the strings into result, or if they are by reference?

If they copy the strings, then I don't think there's a problem. The ProfilerMainLoop is in a thread of it's own (let's call it the profiler thread), profiling the thread that called the start function (let's call it the application thread), correct? So the profiler thread is only blocked while this data is copied to the application thread's data structure, right? Then the socket transfer doesn't affect the profiler thread, it's on the haxe side (either on the application thread, or I could put it in a separate thread.) I'm not too worried about socket latency, as most profiling is on localhost (or LAN) anyway.

On the topic of object allocation tracking, simply having object size is not enough information to be useful. My "ah ha" moment was that the allocation always happens right before the stack enters a constructor, so by monitoring the stack for constructors, I can leverage this to collect all the information I need (size, type, and construction stack.) I'll see if my plan works, but I'll take the action of being careful about multiple application threads and profiling threads.

Also, Sven alerted me to the fact that the acronym "FLM" could cause irritation at the hxcpp level (even though none of this data is FLM-specific, it's just callstacks). So I'm changing my library from hxflm to hxtelemetry (HXT). If you prefer Hugh, I could leave your Profiler alone, add a new Class in a separate cpp file, and call it's Sample from PushStackFrame, like yours.

Honestly, my goal is minimal performance and code impact -- the less C++ code I write the better -- and I think this goal is very achievable.

I appreciate your time and feedback.

Cheers,
-Jeff

Jeff Ward

unread,
Jan 12, 2015, 2:16:05 PM1/12/15
to haxe...@googlegroups.com
Oh, Hugh, and regarding object collection (and not deletion, as you said):

Ok, so I see the Mark / Collect / Reclaim code in GCInternal. Assuming I track objects by ID as returned by InternalNew() / Alloc(), at what in the collectreclaim process can I identify the individual object IDs being deleted?

If I'm lucky, perhaps header on line 530 is related to my desired object ID.  If I'm not lucky, perhaps deletion it is tracked only by row/line/block and not individual objects? Would I need to traverse a row/line/block being deleted to find the objects being deleted?

Man, that garbage collector is some fancy machinery!  =)  Any pointers would be greatly appreciated!

Jeff Ward

unread,
Jan 22, 2015, 11:18:01 AM1/22/15
to haxe...@googlegroups.com
Hey guys, quick update:

I'm making good progress on this, you can see in my integration branch of hxcpp here, or look at the comparison with hxcpp master.  Everything is tidly #if'd, and I decided - rather than to modify Hugh's Profiler class - to copy the pattern Profiler.  Profiler is still a great text-based profiler that stores stats, collates them, and prints them, whereas I want to store more (and different) stats, but not manipulate them or display them from CPP.

So I've created a class called Telemetry mimicking the pattern of Profiler - the CallStack has an mTelemetry member and global accessors are marshaled through CallStack to the appropriate thread. Telemetry maintains raw statistics for it's thread. The stats are not specific to hxscout, but they are informed by it (e.g. profiling stack samples, int->name map, object allocations size, stack, id, etc). So one could poll and print those stats, if they wanted to. My hxtelemetry project (will be a haxelib) polls the statistics and packages them up in the FLM protocol and sends them to HxScout.

Adobe Scout compatibility with hxcpp will be hard, plus I envision some custom features not found in Scout - so that's on the back burner for now.

There's a lot to do, and things are still too in flux to distribute early access yet, but soon! Please let me know if you have feedback!

Best,
-Jeff

Hugh

unread,
Jan 22, 2015, 11:56:31 PM1/22/15
to haxe...@googlegroups.com
Hi,
I like you approach.
Not 100% sure about recording the object constructors.  The haxe generated code can override the operator new, and different versions of haxe can handle this differently.
Would it be easier if I added and "class name" parameter to InternalNew, probably iff  HXCPP_STACK_TRACE is defined ?  Then you can offload your code to InternalNew.
This way you could report the number and type (or reason) of each object allocated, as well as the where.

In answer to your earlier question about tracking objects, I think the best way would be to add some callback to happen at the end of "RunFinalizers", where you can iterate over all your objects and see if they are still alive (this is basically you the weakref/finalizer code works).

Hugh

Jeff Ward

unread,
Jan 23, 2015, 1:05:56 AM1/23/15
to haxe...@googlegroups.com
Yes, if I had the class name (like the String allocation tracking), this would be much more solid and I wouldn't rely on entering a classname::new StackFrame (which I've seem some weirdness -- it's actually currently crashing when DisplayObject's are instantiated.)  I just couldn't determine how to achieve that.  Yes, these are only available with HXCPP_STACK_TRACE.  Let me know if you look into this!

Regarding finalizers, did you see the code I put into Reclaim()? These appear to be the addresses returned by InternalNew - is tracking collections this way a bad idea?

Thanks again for your time!

Best,
-Jeff

Hugh

unread,
Jan 23, 2015, 3:48:07 AM1/23/15
to haxe...@googlegroups.com
With the reclaiming, you can walk the "small" allocations (which includes all non-const objects) by walking the table.  You can also iterate the large allocs easily enough.
Be aware that this table is not normally updated every frame, since its main purpose is to tell if a stack object is really-really-really an hx::Object, which is not called all that often, so it is cheaper to check the mark-byte than to re-link the list.
If you already have a set of pointers, it may be more convenient to iterate these - or maybe not - your choice.  It would be more memory coherent to iterate the table, but you would need to map-lookup (?) so I could not say which would be faster.
I wonder if the "super" calls are messing with the obj names?

Another way of getting the name might be to call the virtual __CStr function on it.  This is only valid for true objects (created via Object::operator new), and only valid *after* the operator new call (since until then, the compiler has not filled in the vtable).  But you could call this at the profiling stage.

Hugh

Jeff Ward

unread,
Jan 28, 2015, 1:24:55 AM1/28/15
to haxe...@googlegroups.com
Hugh,

I tried an alternate approach to allocation tracking, removing the HXObject.new hook (and corresponding StackFrame monitor) and hooking instead into HX_STACK_THIS (preceeded by HX_STACK_FRAME with a 'new' functionName). I could remove the strcmp for const "new" with a change to gencpp.ml in the compiler, but that could come later. It also involves looking up the size of an object in GCInternal, which I suppose could use some validation.

What do you think of this approach? It's a little messier in the macros, but It fixed my displayobject crash issue (I think it was calling __CStr before the object construction was complete, as you allude to) and seems a bit more solid.

I'll investigate the reclaiming issue more later as I build out the hxScout display of deallocs.

Cheers,
-Jeff

Hugh

unread,
Jan 29, 2015, 12:26:02 AM1/29/15
to haxe...@googlegroups.com
What do you think of the idea of delaying the object class name resolution until is it actually needed?
So recored the allocs as "new objects, size =x, class='unknown'".  Then, in dump stats or whatever, fill in any "unknowns" using __CStr if you actually need the name of the object.
Object will only be partially initialized if you call "dump stats" while the thread is in "InternalNew".  You can tell if the object is partially initialized by checking for a valid vtable, which will be the first pointer in the data-structure.

Another random though is to record the callstacks using thread-local maps from name->stack id, and then merge the results when (if) needed.  This avoids expensive locks.

Hugh

Jeff Ward

unread,
Jan 30, 2015, 1:16:57 PM1/30/15
to haxe...@googlegroups.com
Thanks for the ideas, Hugh, I'll look into them.

You made me also realize -- we just map those names to ints anyway with a map<String,int> so the strings are temporary... perhaps I could instead (and much faster) map the Class object pointers to ints ala map<classPtr, int> and avoid the String entirely until we need to dump them.  There's only one class object for all instances, right?

You might have to show me exactly how to "check for a valid vtable", or point to an example.

The idea with the lock was to avoid collision between the dump and the sampler, not necessarily two different samplers. Perhaps this isn't necessary?  Do the samples and allocs (HX_STACK_FRAME) come from the same thread as the __hxcpp_hxt_dump_xxx calls?  The separate profileMainLoop thread only updates the clock? Perhaps I simply don't need the locks as long as the maps are thread-local.

Thanks as always for your time!

Best,
-Jeff

Hugh

unread,
Mar 18, 2015, 12:49:29 AM3/18/15
to haxe...@googlegroups.com
Hey - not sure what the state of hxScout is at the moment.

In the latest hxcpp + 3.2 haxe, I've added the object name to the "operator new" for the objects.  Currently this is unused, but is there it you want to integrate it into the profiler.

Other new calls are generally under the control of the hxcpp code directly, so additional information can be provided there.

Hugh

Jeff Ward

unread,
Mar 18, 2015, 6:31:27 PM3/18/15
to haxe...@googlegroups.com
Thanks, Hugh, I'll check it out.

I had moved away from "operator new" in favor of the HX_STACK_FRAME / HX_STACK_THIS mechanism I mentioned above... I guess there are pros and cons... I'd still have to track Strings and Arrays (and other low-level types?) instantiated in hxcpp.  The String implementation is done. Arrays (and array resize via realloc) are still TBD.

The "operator new" mechanism always seemed a little questionable because it wasn't bullet-proof that the first "operator new" after an HX_STACK_FRAME was for the constructor pointed to by the stack.

HX_STACK_FRAME / HX_STACK_THIS seems pretty stable, by comparison.

Actually if you like this mechanism, it might be nice to get a change into Haxe's gencpp.ml, such that when it generates HX_STACK_THIS(), for a constructor (specifically, line 3477) it hints that it is a constructor with a second bool param:

output_cpp "HX_STACK_THIS(this, true)\n";

And elsewhere (non-constructors) with false. Then hxcpp's Debug.h would have to change to accept the bool parameter.

Anyway, I was busy refactoring my hxcpp changes in Feb, specifically for performance and per the above. Since then it's been working quite smoothly (on Linux). So I've turned my attention to refactoring UI code in HxScout itself so I can get to realloc (and display it in the UI).

However, some early testers (on Win / OSX) haven't had an entirely smooth time of things -- my own tests in Windows have been semi-stable. So perhaps I should update to Haxe 3.2 (merging the latest hxcpp branch into my fork), and try to determine if my hxcpp changes are the source of any crashing.

Anyway, I'm still working on it, slowly but surely.  =)

Cheers,
-Jeff

Jeff Ward

unread,
Mar 19, 2015, 12:00:55 PM3/19/15
to haxe...@googlegroups.com
FYI, I instrumented Array and realloc tracking last night (not pushed yet), and a testcase to watch an Array<Int> grow.  Looks like Dynamic might be another base type to instrument.

Best,
-Jeff

Jeff Ward

unread,
Apr 10, 2015, 4:05:52 PM4/10/15
to haxe...@googlegroups.com
Hugh, Joshua, Sven, or Nicolas,

Perhaps you could help me understand some of Haxe's build framework.

While investigating the performance of the OpenFL-based apps (including HxScout itself), I notice a lot of Dynamic allocations happening underneath extern::cffi - but there is no further stack information than that -- presumably because HX_STACK_FRAME's don't occur or weren't built into the external C library it's referencing? Though it is using the hxcpp GC / allocator, since I'm tracking the allocations it makes.

For example, in openfl/legacy/_v2/Lib's generated Lib.cpp:

   Dynamic __run(D a,D b,D c,D d,D e)
   {
      HX_STACK_FRAME("extern", "cffi",0,  "extern::cffi", __FILE__, __LINE__,0);
      if (mArgCount!=5) throw HX_INVALID_ARG_COUNT;
      if (mProc==0) hx::Throw( HX_NULL_FUNCTION_POINTER );
      return ((prim_5)mProc)(a.GetPtr(),b.GetPtr(),c.GetPtr(),d.GetPtr(),e.GetPtr());
   }

Whatever's inside mProc seems to have no HX_STACK_FRAME calls.

But perhaps for OpenFL the external library is Lime? Is there any way I could build Lime from source or get HX_STACK_FRAME calls from within the external library I'm building against?

Or is extern::cffi something else? SDL? It might not be possible to instrument with HX_STACK_FRAME?

Thanks for any info,
-Jeff

underscorediscovery

unread,
Apr 11, 2015, 4:35:16 PM4/11/15
to haxe...@googlegroups.com
Usually when calling into cffi, the functions on that side call the CFFI api, specifically, the ones starting with alloc_* , i.e alloc_int, alloc_empty_object etc,

From what I understand, you have to return something, usually even if the function has no return, it uses return alloc_null();

That's what I suspect you're seeing, but I could be wrong.

Hugh

unread,
Apr 22, 2015, 12:54:44 AM4/22/15
to haxe...@googlegroups.com
Yes - there are two places it could be.
1 - explicitly from the alloc_(int/float/string) function called from cffi code.  These can be found in src/hx/CFFI.cpp
  You could look at instrumenting these functions.
2. - implicitly from boxing of args passed from haxe to cffi functions (or any function closure).  These are in src/Dynamic.cpp DoubleData/BoolData/IntData and
StringData in hx/String.cpp.
Note that there is some overlap between these 2, since #1 will call into #2.

Hugh

Jeff Ward

unread,
Apr 22, 2015, 1:13:07 PM4/22/15
to haxe...@googlegroups.com
Hugh / Sven,

I think you guys are right - seems like most are from arg boxing, which explains the allocations.

But would there be a way (or is it helpful?) to get more stack information from within lime? If Lime is mostly written in haxe, why don't I see more Haxe stack info from Lime? Or by the time OpenFL uses Lime, it's in a pre-compiled binary state?

Thanks for any thoughts.

-Jeff

Joshua Granick

unread,
Apr 22, 2015, 1:32:02 PM4/22/15
to haxe...@googlegroups.com
Lime uses Haxe code on HTML5 and Flash, but uses a combination of CFFI and Haxe code on native targets. Anything above Lime (such as OpenFL), if not using -Dlegacy, should be Haxe code

Hugh

unread,
Apr 22, 2015, 9:13:45 PM4/22/15
to haxe...@googlegroups.com
The stack info is really the last haxe function you see.  I guess the exception would be the "main loop", which might allocate event object data.  But this could still be attributed to the last haxe function, which might be something like "main".  On android this might be a bit different, since it would come from a different thread, and could maybe associated with the "set top of stack" call.

Thinking about the external primitive code again, you could look at using the primitive name instead of "extern::cffi".
You would do this by changing the ExternalPrimitive in Lib.cpp, to use "mName.__s" for the name string, but before you do this, you should ensure that the String is a "constant" String, so the pointer remains fixed.  You can do this with "mName.dupConst()" in the constructor.

Hugh

Jeff Ward

unread,
Apr 23, 2015, 10:29:57 AM4/23/15
to haxe...@googlegroups.com
Thanks Hugh, I'll look into that.

BTW, I've merged the latest hxcpp and openfl into my forks, and updated my haxe & haxelib, so I should be working fairly close to the latest stuff.

Best,
-Jeff

Jeff Ward

unread,
Apr 24, 2015, 9:52:41 PM4/24/15
to haxe...@googlegroups.com
Oh, hey, that's certainly more informative (I prepended extern::cffi, then dupConst), thanks Hugh!

Note - the "create_main_frame" call comes from App.Main -> openfl.legacy.Lib.create, and it's just idle time.

Jeff Ward

unread,
Apr 25, 2015, 9:13:14 AM4/25/15
to haxe...@googlegroups.com
Hugh, any idea where memory allocation occurs for adding Map keys?

When I run the test that pushings Ints into an Array, I get array allocations and reallocations (for expanding).  Great.

But when I do a test where I keep adding Int keys to an IntMap (giving them all the same pointer value so-as to isolate the key storage), I don't see any allocations anywhere, though the GC-reported memory usage is obviously increasing.

Ideally I think I'd see the Map object updating it's "size" as keys are added.  Any ideas where/how this memory is allocated?

Thanks,
-Jeff

Hugh

unread,
Apr 25, 2015, 10:56:40 AM4/25/15
to haxe...@googlegroups.com
In  $HXCPP/src/hx/Hash.h 'allocElement', plus a bit for the bucket.

Hugh

Jeff Ward

unread,
Apr 30, 2015, 2:12:49 AM4/30/15
to haxe...@googlegroups.com
Excellent, thanks, got Hash bucket size tracking!

I'm running into hxScout crashes when it starts a Thread to accept a 2nd or 3rd incoming socket (when a new app-under-profile is started). I don't think this is related to my modifications to hxcpp, but I could be wrong.

I've pasted a gdb backtrace of all threads here: http://pastebin.com/qGjY1BLJ

From my reading, it looks like threads 5,4,3 and 2 are actively Marking for the GC (marker threads spawned from GCInternal:2374 by master GC Thread 9.)  Meanwhile Thread 10 comes along from my FLMListener code and tries to accept the new socket connection, and perhaps does some unprotected GC / Alloc operation from sys/net/Socket.cpp:213 below -- what do you think?

#0  0x00007ffff7bcb6dd in accept () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff409388e in socket_accept(_value*) () from ./std.dso
#2  0x00000000007b712a in ExternalPrimitive::__run (this=0x7ffff7f191f4, a=...) at /home/jward/dev/hxcpp/src/hx/Lib.cpp:140
#3  0x00000000004248ab in Dynamic::operator() (this=0xb564f0 <sys::net::Socket_obj::socket_accept>, inArg0=...)
    at /home/jward/dev/hxcpp/include/Dynamic.h:187
#4  0x000000000042200c in sys::net::Socket_obj::accept (this=0x7fffd7b382e8) at ./src/sys/net/Socket.cpp:213
#5  0x000000000071bd7d in FLMListener_obj::start () at ./src/FLMListener.cpp:130

Any ideas for a remedy? Can I request GC protection while spawning that socket listener thread? I guess I'm potentially doing something weird where each child (telemetry listener) Thread spawns the next listener (instead of the main thread spawning the children.) Maybe try the master-creates-all-listeners approach?

Thanks,
-Jeff

Hugh

unread,
Apr 30, 2015, 3:36:59 AM4/30/15
to haxe...@googlegroups.com
Have not studied the backtrace too closely, but what you say is "normal".
The thread that is waiting in the "accept" call should have called gc_enter_blocking, and so is allowed to continue blocking while the GC happens around it.

Looks like the issues is in hx::Hash<hx::TStringElement<Dynamic> >::HashMarker::operator() , where it is either marking a hash key that is dodgy for some reason (old value overwritten somehow?), or  maybe the whole hash is crap.  If you can follow it back and see what the value associated with the key is, or which hash is actually being marked (if only you has a way of going from pointer to allocation location!!) it might give you some clue as to which hash it is, what the key should be, and why it is not there.

Hugh
Reply all
Reply to author
Forward
0 new messages