;)
Here is the first pseudo-code implementation I created for my new allocator:
Well, that's simple enough, right? Once you apply the fixes that I outlined,
the pseudo-code is 100% bug free.
However...The presented design is not the best way to implement my design...
Here are some quick notes on the implementation I finally decided to use in
my vZOOM library, which will be available soon:
http://groups.google.com/group/comp.programming.threads/msg/bc019dc04362be41
http://groups.google.com/group/comp.programming.threads/msg/1d3aeaa6ebf5181f
There you have it... This invention is my version of a highly scaleable
low-overhead lock-free slab-based memory allocator; very fast, IMHO... What
do you think of my design?
Any comments/critiques/suggestions/help are encouraged and will be
appreciated!
Thank you for you time...
BTW.... I included comp.benchmarks because I think my multi-threaded
allocator has the potential to get high marks across the board...
I should clarify that the fixes are relevant to the atomic operations... I
think there is a misspelled word on the one of the local pointer accesses;
but it is not really relevant because the fix is so obvous...
If you have ANY trouble with the pseudo-code please let me know.
IMHO, it should be easily understood...
always prefered the circular linked list implement with list handle
pointing to the tail, so that tail can be got, and next is head.
would be quite good for parallel cpu this with seperate memory spaces?
umm, looking for something which has possible overlap of part of memory
spaces, as this eases communication in certain mem bus topologies.
i do wonder if a pipelined parallel bit ring is best local net for
wafer scale. i.e. serial shift in and out, but parallel shift arround
loop??
or just a giga bit shift ring on each cpu corner, of length 4 bits
total. and this would need some kind of variable address relocation
scheme for operation by more than 1 processor. a global export var
handle process running on 1 cpu would be needed, unless cpu had ip and
global variable was indexed by ip/port num pair
2KB blocks efficient ///
cheers.
>There you have it... This invention is my version of a highly scaleable
>low-overhead lock-free slab-based memory allocator; very fast, IMHO... What
>do you think of my design?
You asked the same question a while ago, right? If so, how do you
respond to the comments which were made back then?
-- g
It's a good design. In fact, it's what I've been saying they should
have done in Hoard for a while now :-). I think Hans Boehm said
something to the effect in his paper on a "mostly parallel" garbage
collector as well. That said, I'd be interested in seeing a
performance comparison with Doug Lea's allocator plugged into this vs.
some of the other high-performance multithreaded allocators.
Sean
Thank you.
> In fact, it's what I've been saying they should
> have done in Hoard for a while now :-).
I think Hoard has per-thread allocators, however, my design radically
differs from theirs. Is that what you were referring to?
> I think Hans Boehm said
> something to the effect in his paper on a "mostly parallel" garbage
> collector as well.
I think I read a paper on that... FWIW, my vZOOM library has the allocator
hooked up to my proxy garbage collector design:
This integrates a solution for the reader/writers problem in lock-free
data-structures directly into the lock-free allocator... Therefore, the
allocator supports lock-free reader patterns directly:
https://coolthreads.dev.java.net/servlets/ProjectForumMessageView?forumID=1797&messageID=11068
This is a nice feature for an allocator to support?
> That said, I'd be interested in seeing a
> performance comparison with Doug Lea's allocator plugged into this vs.
> some of the other high-performance multithreaded allocators.
I was thinking about comparing to stuff like:
http://portal.acm.org/citation.cfm?id=996893.996848
http://citeseer.ist.psu.edu/johnson91concurrent.html
What do you think?
IMHO, I think that my multithreaded allocator has the "potential" to perform
as good, or better, than most existing multithread designs...
After I get my act together and publish the x86-32 and SPARC-32/64 versions
of my vZOOM library I can provide a non-commercial academic style license
that will allow people to benchmark and experiment... Of course, I will also
provide commercial license(s) for those who are interested...
;)
Thank you for your comments...
I did not get any responses.
;^(.....
Could you please provide a link? Perhaps I missed something...
Thank you.
[...]
> always prefered the circular linked list implement with list handle
> pointing to the tail, so that tail can be got, and next is head.
A circular linked list would be perfectly compatible with this design...
However, you should try to minimize the per-block storage you need to
maintain the list...
> would be quite good for parallel cpu this with seperate memory spaces?
Indeed.
> umm, looking for something which has possible overlap of part of memory
> spaces, as this eases communication in certain mem bus topologies.
>
> i do wonder if a pipelined parallel bit ring is best local net for
> wafer scale. i.e. serial shift in and out, but parallel shift arround
> loop??
FWIW, this allocator is NUMA friendly...
I was referring to the lock-free LIFO queue for deletions. Hoard has
per-thread allocators, but all allocations and deletions are protected
by a mutex and deletions are processed by the deleting thread.
Appending 'foreign' deletions to a per-thread lock-free queue instead
seemed an obvious way to improve performance.
> > I think Hans Boehm said
> > something to the effect in his paper on a "mostly parallel" garbage
> > collector as well.
>
> I think I read a paper on that... FWIW, my vZOOM library has the allocator
> hooked up to my proxy garbage collector design:
> http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/65c9c2673682d4cf/bc1506fb6d0e3ba7?lnk=gst&q=vzoom&rnum=1#bc1506fb6d0e3ba7
I've been trying to find the time to do something like this in a
traditional garbage collector, but other projects have taken
precedence. Unfortunately, it's not really possible to implement smart
pointers in my current language of choice, so proxy GC isn't really
feasible.
> This integrates a solution for the reader/writers problem in lock-free
> data-structures directly into the lock-free allocator... Therefore, the
> allocator supports lock-free reader patterns directly:
> https://coolthreads.dev.java.net/servlets/ProjectForumMessageView?forumID=1797&messageID=11068
>
> This is a nice feature for an allocator to support?
Assuming a proxy GC design is an option, yes.
> > That said, I'd be interested in seeing a
> > performance comparison with Doug Lea's allocator plugged into this vs.
> > some of the other high-performance multithreaded allocators.
>
> I was thinking about comparing to stuff like:
>
> http://portal.acm.org/citation.cfm?id=996893.996848
> http://portal.acm.org/citation.cfm?id=773039.512451&coll=GUIDE&dl=ACM&idx=J706&part=periodical&WantType=periodical&title=ACM%20SIGPLAN%20Notices&CFID=11111111&CFTOKEN=2222222
>
> http://citeseer.ist.psu.edu/johnson91concurrent.html
>
> What do you think?
I've only read the first paper, but they all look like good candidates.
Sean
Ah...
> Appending 'foreign' deletions to a per-thread lock-free queue instead
> seemed an obvious way to improve performance.
Yeah... Well, you need some form of atomic reference counting. My allocator
code has a special form of atomic reference counting (check pseudo-code)...
Have you seen this type of reference counting before?
I think I may have invented a specialized form that avoids memory barriers
and atomic operations for the lifetime of the entity that is protected by
the count.... The pseudo-code example shows how to keep a per-thread
allocator object alive if there are any outstanding references' to any
blocks in any of its slabs... I think I have a novel design...
Any thoughts?
>> > I think Hans Boehm said
>> > something to the effect in his paper on a "mostly parallel" garbage
>> > collector as well.
>>
>> I think I read a paper on that... FWIW, my vZOOM library has the
>> allocator
>> hooked up to my proxy garbage collector design:
>> http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/65c9c2673682d4cf/bc1506fb6d0e3ba7?lnk=gst&q=vzoom&rnum=1#bc1506fb6d0e3ba7
>
> I've been trying to find the time to do something like this in a
> traditional garbage collector, but other projects have taken
> precedence. Unfortunately, it's not really possible to implement smart
> pointers in my current language of choice, so proxy GC isn't really
> feasible.
You can design proxy gc by tracking per-thread/cpu/whatever synchronization
epochs. You don't need smart pointers to implement proxy gc. You can do a
prototype in Java by using volatile... However, the Java memory model is so
strict that it would force your reader threads to use load-acquire barriers
(#LoadStore | #LoadLoad) during their tedious traversals throughout your
applications shared data-structures. Not worth the trouble in Java.
>> This integrates a solution for the reader/writers problem in lock-free
>> data-structures directly into the lock-free allocator... Therefore, the
>> allocator supports lock-free reader patterns directly:
>> https://coolthreads.dev.java.net/servlets/ProjectForumMessageView?forumID=1797&messageID=11068
>>
>> This is a nice feature for an allocator to support?
>
> Assuming a proxy GC design is an option, yes.
I thought so! I need some more feedback... This thread is not receiving the
traffic I thought it might encounter...
;)
>> > That said, I'd be interested in seeing a
>> > performance comparison with Doug Lea's allocator plugged into this vs.
>> > some of the other high-performance multithreaded allocators.
>>
>> I was thinking about comparing to stuff like:
>>
>> http://portal.acm.org/citation.cfm?id=996893.996848
>> http://portal.acm.org/citation.cfm?id=773039.512451&coll=GUIDE&dl=ACM&idx=J706&part=periodical&WantType=periodical&title=ACM%20SIGPLAN%20Notices&CFID=11111111&CFTOKEN=2222222
>>
>> http://citeseer.ist.psu.edu/johnson91concurrent.html
>>
>> What do you think?
>
> I've only read the first paper, but they all look like good candidates.
Yup... Also... I need to bench against leapheap...