The code that's in there was my first-attempt at the need for stack
walking code. There's one optimization in place, but the algorithm behind
the optimization could use some work.
Basically, it finds the min and max values of all headers. It does a check
(for quick failure purposes) to see if the data on the stack is in that
range, and then proceeds to do the accurate check. The accurate check
consists of walking through each header pool in an attempt to see if this
pointer-sized data could be interpreted as being in that pool.
Currently, this is a linear walk over the header pools. I imagine there
are many better algorithms for determing a root set from a stack. The
boehm collector probably has decent code in this regard. However, given
that we have O(N) with size of stack, I'm not sure how we'll be able to
alleviate this in the long run.
Anyone feeling adventuresome and want to attempt to speed this up? It
should be an easy introduction to the GC code in general. Just start out
in trace_system_stack, and work your way down.
Mike Lambert
> As Peter has pointed out, our stackwalk code is rather slow.
> Anyone feeling adventuresome and want to attempt to speed this up?
If you want to get some improvement at the cost of some duplicated
code, you can remove the * direction logic, and write two copies of the
loop. (patch attached, but not fully tested)
On my machine, that changes 5000 lives from 168 seconds to 133
(tested for one run each only)
I'm sure there is more that can be done, but that may help for now.
--
Peter Gibbs
EmKel Systems
> If you want to get some improvement at the cost of some duplicated
> code, you can remove the * direction logic, and write two copies of the
> loop. (patch attached, but not fully tested)
> I'm sure there is more that can be done, but that may help for now.
The configure-time-stack-growth-direction patch makes these #define'd constants,
so you get the same speedup.
--
Jason
> The configure-time-stack-growth-direction patch makes these #define'd
constants,
> so you get the same speedup.
Do you also have a patch to determine the minimum stack increment?
The IA32 architecture, for example, has no restrictions on pointer
alignment (and hence PARROT_PTR_ALIGNMENT = 1), but IIRC a push
or pop will always change the SP by 2 or 4, therefore the stack walk code
can do the same. In a quick test, using a hardcoded value of 2 in the
stack walk code, 5000 lives dropped down to 112 seconds.