lock-free node allocator for fifo-queues

Dmitriy Vyukov

unread,

May 8, 2007, 12:34:13 PM5/8/07

to

I propose the special lock-free allocator for allocating nodes for
fifo-queues.
The feature of this allocator is that allocation and deallocation
issues no atomic operations.
Here goes code:

template<typename type>
class node_allocator
{
private:
struct header
{
header* next_;
bool free_;
// actually this struct can locate in one word
// we can use lsb of next pointer as free flag
};

// head of fifo queue of allocated items
// here we place newly allocated item
header* head_;

// tail of fifo queue of allocated items
// here is situated the oldest item
header* tail_;

// count of allocated items
unsigned count_;

public:
node_allocator()
: head_()
, tail_()
, count_()
{}

type* alloc()
{
// this method must be called by only one thread
// that owns this allocator - the "producer" -
// that allocates the nodes and enqueue them into
// the fifo-queue

header* h;
if (tail_ && tail_->free_)
{
h = tail_;
tail_ = tail_->next_;
}
else
{
h = reinterpret_cast<header*>(new char[sizeof(header) +
sizeof(type)]);
++count_;
}
h->free_ = false;
h->next_ = 0;
if (head_)
head_->next_ = h;
if (!tail_)
tail_ = h;
head_ = h;
return reinterpret_cast<type*>(h + 1);
}

static void free(type* node)
{
// this method is called by any thread
// that free the node that owned by this allocator -
// the "consumer"

header* h = reinterpret_cast<header*>(node) - 1;
// release_membar
h->free_ = true;
}
};

This version doesn't free any nodes to system allocator and only
accumulates nodes. But it seems that freeing of nodes can be added to
algorithm - especially since most of the data is touched by only one
thread - so addition of freeing should not add any "CAS-overhead".
Nevertheless if we want really high performance we must accumulate
_many_ nodes anyway.

Usage example:

template<typename type>
class mpsc_queue
{
public:
struct node
{
node* next_;
type* data_;
};

typedef node_allocator<node> allocator_t;

private:
node* head_;
node* tail_;

public:
mpsc_queue()
: head_()
, tail_()
{
}

void enqueue(allocator_t& a, type* v)
{
// here we allocate from thread-local allocator
node* n = a.alloc();
n->data_ = v;
do {n->next_ = head_;}
while (n->next_ != (node*)atomic_cas((long volatile*)&head_,
(long)n, (long)n->next_));
}

type* dequeue()
{
if (!tail_)
{
node* head = (node*)atomic_exchange((long volatile*)&head_, 0);
if (!head)
return 0;
node* prev = 0;
node* curr = head;
do
{
node* next = curr->next_;
curr->next_ = prev;
prev = curr;
curr = next;
}
while (curr);
tail_ = prev;
}
type* v = tail_->data_;
node* n = tail_;
tail_ = n->next_;
// here we free the node
allocator_t::free(n);
return v;
}
};

int main()
{
typedef mpsc_queue<int> queue_t;
typedef queue_t::allocator_t allocator_t;

queue_t q;
allocator_t a;

int i1 = 11;
int i2 = 12;
int i3 = 13;

q.enqueue(a, &i1);
q.enqueue(a, &i2);
q.enqueue(a, &i3);

int* ii1 = q.dequeue();
int* ii2 = q.dequeue();

q.enqueue(a, &i1);
q.enqueue(a, &i2);
q.enqueue(a, &i3);

int* ii3 = (int*)q.dequeue();
int* ii4 = (int*)q.dequeue();
}

I think that usage of such allocator can be very beneficial in some
scenarios, particular with various heavy load fifo queues.

Chris Thomasson distributed lock-free slab-allocator:
http://groups.google.com/group/comp.programming.threads/msg/e3efa5628aad4a82?hl=en&
doesn't handle such situations very good - requires one CAS on
deallocation and zero CAS'es on allocation amortized.
It seems that with my node_allocator mpsc_queue::dequeue() operation
doesn't issue any atomic operations at all (amortized :) ).

Dmitriy V'jukov

Chris Thomasson

unread,

May 8, 2007, 7:03:43 PM5/8/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:1178642053.4...@y5g2000hsa.googlegroups.com...

>I propose the special lock-free allocator for allocating nodes for
> fifo-queues.

[...]

> I think that usage of such allocator can be very beneficial in some
> scenarios, particular with various heavy load fifo queues.
>
> Chris Thomasson distributed lock-free slab-allocator:
> http://groups.google.com/group/comp.programming.threads/msg/e3efa5628aad4a82?hl=en&
> doesn't handle such situations very good - requires one CAS on
> deallocation and zero CAS'es on allocation amortized.

You have to keep in mind that my allocator is for general purpose use only.

:^)

So far, it looks like the algorithm might be one of the fastest general
purpose multi-threaded memory allocators on the market...

Dmitriy Vyukov

unread,

May 9, 2007, 4:42:40 AM5/9/07

to

On 9 май, 03:03, "Chris Thomasson" <cris...@comcast.net> wrote:

> > Chris Thomasson distributed lock-free slab-allocator:

> >http://groups.google.com/group/comp.programming.threads/msg/e3efa5628...

> > doesn't handle such situations very good - requires one CAS on
> > deallocation and zero CAS'es on allocation amortized.
>
> You have to keep in mind that my allocator is for general purpose use only.

Yes, I know.
But what one must use for node allocation in fifo-queue? I don't see
any other special allocators,,, so one must use general purpose
allocator for this...

> :^)

;)

Dmitriy V'jukov

Dmitriy Vyukov

unread,

May 10, 2007, 7:08:49 PM5/10/07

to

On 8 май, 20:34, Dmitriy Vyukov <dvyu...@gmail.com> wrote:
> I propose the special lock-free allocator for allocating nodes for
> fifo-queues.
> The feature of this allocator is that allocation and deallocation
> issues no atomic operations.

The main feature:
Allocator tracks not free nodes (like freelists in other allocators),
but it tracks currently allocated in-use nodes.
Allocator tracks currently allocated nodes in some particular order
(fifo or lifo).
Allocator checks the tail node (wrt fifo or lifo order) whether it is
freed by some other thread by checking the boolean field "free" of
node.
The free() procedure of allocator is no more than setting the boolean
field "free" of node to "true".

I think it is new. And it is fast.
No one memory location is written by two threads. So no CASes at all.

Dmitriy V'jukov

Chris Thomasson

unread,

May 11, 2007, 12:17:21 AM5/11/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1178838529.7...@y80g2000hsf.googlegroups.com...

For the special usage case, it is a workable solution.

Chris Thomasson

unread,

May 11, 2007, 12:41:21 AM5/11/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1178642053.4...@y5g2000hsa.googlegroups.com...

>I propose the special lock-free allocator for allocating nodes for
> fifo-queues.

I think Joe Seigh posted something like the queue algorithm used in your
example. Use LIFO for producers and atomic xchg for consumer and reverse
order and process.

Please correct me if I am wrong Joe. I use it for LIFO only and don't bother
with the reversing logic... Man, think if you dequeued 100,000 nodes? And
you have to perform a reversal operation on all of these nodes before you
can processes them in FIFO order?

Dmitriy Vyukov

unread,

May 11, 2007, 1:39:05 AM5/11/07

to

On 11 май, 08:41, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message

>
> news:1178642053.4...@y5g2000hsa.googlegroups.com...
>
> >I propose the special lock-free allocator for allocating nodes for
> > fifo-queues.
>
> I think Joe Seigh posted something like the queue algorithm used in your
> example. Use LIFO for producers and atomic xchg for consumer and reverse
> order and process.

Do you mean mpsc_queue and not node_allocator?
mpsc_queue algorithm is not my invention - I have seen it somewhere...

> Please correct me if I am wrong Joe. I use it for LIFO only and don't bother
> with the reversing logic... Man, think if you dequeued 100,000 nodes? And
> you have to perform a reversal operation on all of these nodes before you
> can processes them in FIFO order?

We just steal one reverse operation from every producer - and give
them all to consumer... so it is not so expensive if we distribute
work to all nodes... except memory locality problem - consumer have
to touch great amount of potentially distributed memory...

Dmitriy V'jukov

Chris Thomasson

unread,

May 11, 2007, 4:37:45 AM5/11/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1178861945.3...@p77g2000hsh.googlegroups.com...

> On 11 май, 08:41, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message
>>
>> news:1178642053.4...@y5g2000hsa.googlegroups.com...
>>
>> >I propose the special lock-free allocator for allocating nodes for
>> > fifo-queues.
>>
>> I think Joe Seigh posted something like the queue algorithm used in your
>> example.

[...]

> Do you mean mpsc_queue and not node_allocator?
> mpsc_queue algorithm is not my invention - I have seen it somewhere...
>
>
>> Please correct me if I am wrong Joe. I use it for LIFO only and don't
>> bother
>> with the reversing logic... Man, think if you dequeued 100,000 nodes?

>> [...]

> We just steal one reverse operation from every producer - and give
> them all to consumer...

[...]

Good points.

Chris Thomasson

unread,

May 11, 2007, 4:53:22 AM5/11/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1178861945.3...@p77g2000hsh.googlegroups.com...

> On 11 май, 08:41, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message
>>
>> news:1178642053.4...@y5g2000hsa.googlegroups.com...
>>
>> >I propose the special lock-free allocator for allocating nodes for
>> > fifo-queues.
>>
>> I think Joe Seigh posted something like the queue algorithm used in your
>> example. Use LIFO for producers and atomic xchg for consumer and reverse
>> order and process.
>
> Do you mean mpsc_queue and not node_allocator?

No. I am referring to the queue algorithm that uses you allcator in your
example.

> mpsc_queue algorithm is not my invention - I have seen it somewhere...

Yup.

Chris Thomasson

unread,

May 11, 2007, 4:55:09 AM5/11/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1178861945.3...@p77g2000hsh.googlegroups.com...

> On 11 май, 08:41, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message
>>
>> news:1178642053.4...@y5g2000hsa.googlegroups.com...
>>
>> >I propose the special lock-free allocator for allocating nodes for
>> > fifo-queues.
>>
>> I think Joe Seigh posted something like the queue algorithm used in your
>> example. Use LIFO for producers and atomic xchg for consumer and reverse
>> order and process.
>
> Do you mean mpsc_queue and not node_allocator?
> mpsc_queue algorithm is not my invention - I have seen it somewhere...
>
>
>> Please correct me if I am wrong Joe. I use it for LIFO only and don't
>> bother
>> with the reversing logic... Man, think if you dequeued 100,000 nodes? And
>> you have to perform a reversal operation on all of these nodes before you
>> can processes them in FIFO order?
>
> We just steal one reverse operation from every producer - and give
> them all to consumer... so it is not so expensive if we distribute
> work to all nodes...

Nothing wrong with distribution methods!

;^)

Dmitriy Vyukov

unread,

May 11, 2007, 10:50:12 AM5/11/07

to

On 11 май, 08:17, "Chris Thomasson" <cris...@comcast.net> wrote:

> > I think it is new. And it is fast.
> > No one memory location is written by two threads. So no CASes at all.
>
> For the special usage case, it is a workable solution.

Is it just workable? Or is it good for special usage case?
And would you use it for your fifo-queue?

Now I thinking about analogous allocator for lifo-stack - it must
track allocated nodes in lifo order too...

Dmitriy V'jukov

Chris Thomasson

unread,

May 11, 2007, 6:15:04 PM5/11/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1178895012....@n59g2000hsh.googlegroups.com...

> On 11 май, 08:17, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> > I think it is new. And it is fast.
>> > No one memory location is written by two threads. So no CASes at all.
>>
>> For the special usage case, it is a workable solution.
>

A workable solution 'is' a fairly good solution by default?

Chris Thomasson

unread,

May 11, 2007, 6:19:19 PM5/11/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:YIydnXMlorHzs9nb...@comcast.com...

> "Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
> news:1178861945.3...@p77g2000hsh.googlegroups.com...
>> On 11 май, 08:41, "Chris Thomasson" <cris...@comcast.net> wrote:
>>> "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message
>>>
>>> news:1178642053.4...@y5g2000hsa.googlegroups.com...
>>>
>>> >I propose the special lock-free allocator for allocating nodes for
>>> > fifo-queues.
>>>
>>> I think Joe Seigh posted something like the queue algorithm used in your
>>> example. Use LIFO for producers and atomic xchg for consumer and reverse
>>> order and process.
>>
>> Do you mean mpsc_queue and not node_allocator?
>
> No. I am referring to the queue algorithm that uses you allcator in your
> example.

^^^^^^^^

Umm, that 'No' should be a 'Yes'...

:^)

Dmitriy Vyukov

unread,

May 11, 2007, 7:08:41 PM5/11/07

to

On 12 май, 02:15, "Chris Thomasson" <cris...@comcast.net> wrote:

> >> For the special usage case, it is a workable solution.
>
> A workable solution 'is' a fairly good solution by default?

Oh... Are you mean "workable" like "profitable"?
I think that you mean "workable" like "working"...

Dmitriy V'jukov

Dmitriy Vyukov

unread,

May 11, 2007, 7:12:08 PM5/11/07

to

On 12 май, 02:19, "Chris Thomasson" <cris...@comcast.net> wrote:

> >> Do you mean mpsc_queue and not node_allocator?
>
> > No. I am referring to the queue algorithm that uses you allcator in your
> > example.
>
> ^^^^^^^^
>
> Umm, that 'No' should be a 'Yes'...
>
> :^)

My english is not so good anyway and this 'No' confuse me a little ;)
I was thinking that I don't understand you... ;)

Dmitriy V'jukov

Dmitriy Vyukov

unread,

May 13, 2007, 9:12:49 AM5/13/07

to

On 9 май, 03:03, "Chris Thomasson" <cris...@comcast.net> wrote:

> > Chris Thomasson distributed lock-free slab-allocator:

> >http://groups.google.com/group/comp.programming.threads/msg/e3efa5628...

> > doesn't handle such situations very good - requires one CAS on
> > deallocation and zero CAS'es on allocation amortized.
>
> You have to keep in mind that my allocator is for general purpose use only.

IMHO Your allocator is not for general purpose use in the sense that
it is not handle blocks of different size and freeing of blocks to
parent allocator. So it suitable for allocating "nodes" or some
"messages" - something that created/destroyed frequently and have
fixed size.

Dmitriy V'jukov

Chris Thomasson

unread,

May 13, 2007, 11:03:16 AM5/13/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1179061969....@e65g2000hsc.googlegroups.com...

> On 9 май, 03:03, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> > Chris Thomasson distributed lock-free slab-allocator:
>> >http://groups.google.com/group/comp.programming.threads/msg/e3efa5628...
>> > doesn't handle such situations very good - requires one CAS on
>> > deallocation and zero CAS'es on allocation amortized.
>>
>> You have to keep in mind that my allocator is for general purpose use
>> only.
>
> IMHO Your allocator is not for general purpose use in the sense that
> it is not handle blocks of different size and freeing of blocks to
> parent allocator.

This is not true: My allocator most certainly can handle many different
sized blocks; no problem at all.

I just stuck to single-size blocks in the pseudo-code because I wanted to
"keep it simple" and just show off the lock-free aspects of it... They just
have to be at least the size of a void*. Putting a more elaborate allocation
scheme in the pseudo-code would obfuscate the lock-free aspect, IMHO:

http://groups.google.com/group/comp.programming.threads/msg/bc019dc04362be41

http://groups.google.com/group/comp.programming.threads/msg/1d3aeaa6ebf5181f

The lock-free aspects of the allocator are compatible with virtually all of
the actual allocation schemes out there. For instance, instead of using a
single slab, a thread could have several slabs which contain blocks of
different sizes. I have implemented the allocation scheme of my algorithm in
many different ways over the years. So far, I have not needed to modify the
lock-free aspect in order to accommodate any particular technique. In fact,
I am planning on posting several allocation schemes here:

http://groups.google.com/group/comp.lang.c++/browse_frm/thread/beeee1f61fdbb52c

which are all compatible with the lock-free per-thread reference counting
algorithm I invented for the allocator. I am curious as to why you thought
that the allocator could not handle multiple sized blocks... Was it because
the pseudo-code only focused on single sized blocks? If it was, I am sorry
for any confusions. I guess I should of mentioned the fact that various
allocation schemes are compatible with my per-thread counting logic...

Dmitriy Vyukov

unread,

May 17, 2007, 1:08:37 PM5/17/07

to

On May 13, 7:03 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> > IMHO Your allocator is not for general purpose use in the sense that
> > it is not handle blocks of different size and freeing of blocks to
> > parent allocator.
>
> This is not true: My allocator most certainly can handle many different
> sized blocks; no problem at all.

So you mean that your allocator is just a _part_ of general purpose
allocator... and my allocator can be a part of the same general
purpose allocator :)

Dmitriy V'jukov

Chris Thomasson

unread,

May 17, 2007, 3:41:23 PM5/17/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1179421717....@p77g2000hsh.googlegroups.com...

> On May 13, 7:03 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> > IMHO Your allocator is not for general purpose use in the sense that
>> > it is not handle blocks of different size and freeing of blocks to
>> > parent allocator.
>>
>> This is not true: My allocator most certainly can handle many different
>> sized blocks; no problem at all.
>
> So you mean that your allocator is just a _part_ of general purpose
> allocator...

I did not want to post a full blown allocator algorithm. That why I just
posted a very simple slab-based allocator w/ fixed sized blocks.

I only wanted to show off my per-thread counting w/ lock-free stack stuff...

> and my allocator can be a part of the same general
> purpose allocator :)

Alls you have to do is create a single-threaded allocator, and my lock-free
allocator algorithm can make it multi-threaded.

Pretty cool?

:^)

Chris Thomasson

unread,

May 17, 2007, 3:59:29 PM5/17/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:peedneEUh_yiMtHb...@comcast.com...

> "Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
> news:1179421717....@p77g2000hsh.googlegroups.com...
>> On May 13, 7:03 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>>
>>> > IMHO Your allocator is not for general purpose use in the sense that
>>> > it is not handle blocks of different size and freeing of blocks to
>>> > parent allocator.
>>>
>>> This is not true: My allocator most certainly can handle many different
>>> sized blocks; no problem at all.
>>
>> So you mean that your allocator is just a _part_ of general purpose
>> allocator...
>
> I did not want to post a full blown allocator algorithm. That why I just
> posted a very simple slab-based allocator w/ fixed sized blocks.
>
> I only wanted to show off my per-thread counting w/ lock-free stack
> stuff...

The patent claims for this thing only make reference to the per-thread
lock-free algorithm.

Andy

unread,

May 19, 2007, 8:15:03 PM5/19/07

to

>
> Alls you have to do is create a single-threaded allocator, and my lock-free
> allocator algorithm can make it multi-threaded.
>
> Pretty cool?
>
> :^)

Can you explain in brief how the lock-free allocator works in terms of
contention-waits and contention-fairness.

Obviously, the api is like free/malloc, in which from the api's
perspective the caller wants to malloc the block "now" or "free" the
block now. In my application, there is lots of contention when there
are many threads because I have a "superheap" from which I alloc and
free chunks, normally of the same fixed size of 68K.

If the contenting threads are on the same cpu, I assume that there
still could be contention because of the way native threads work. So
please explain the contention side of things.

Also, as you know, I'm integrating with JVM native library. We've had
discussions that say that maybe on some or all JREs, there could be a
context switch of my Java thread inside a signal handler. Does your
contention solution handle this the same way?

If so, and if I find out that the JREs are not pthread friendly that I
intend to deploy on, I might be interested in a license of some kind.

Andy

Chris Thomasson

unread,

May 20, 2007, 3:16:25 AM5/20/07

to

"Andy" <andre...@yahoo.com> wrote in message
news:1179620103.4...@p47g2000hsd.googlegroups.com...

> >
>> Alls you have to do is create a single-threaded allocator, and my
>> lock-free
>> allocator algorithm can make it multi-threaded.
>>
>> Pretty cool?
>>
>> :^)
>
> Can you explain in brief how the lock-free allocator works in terms of
> contention-waits and contention-fairness.

Well, the only thread-to-thread communication that occurs in the allocator
is when a thread frees a block of memory that it did not allocate itself.
And when a thread tries to alloc a block from an empty queue. The former
case is handled with a CAS, and the latter uses a single XCHG. Everything is
100% lock-free. So your not dealing with contention in terms of a lock. In
other words, there is not going to be any lock-convoys, or starvations, or
any of the negative aspects of using locks under moderate to heavy
contention.

> Obviously, the api is like free/malloc, in which from the api's
> perspective the caller wants to malloc the block "now" or "free" the
> block now. In my application, there is lots of contention when there
> are many threads because I have a "superheap" from which I alloc and
> free chunks, normally of the same fixed size of 68K.

You can create a simple 100% single-threaded allocator. No locks, no
nothing. Just assume that only 1 thread will ever be using your allocator at
any one time. I have an api that can allow you to transform that
single-threaded allocator into a 100% lock-free distributed memory
allocator. No kidding!

:^)

> If the contenting threads are on the same cpu, I assume that there
> still could be contention because of the way native threads work. So
> please explain the contention side of things.

You could get the kind of contention that deadlocks you, because using a
"custom" mutex in the JVM operating environment could expose you to the same
problems associated with using locks in signal handlers. Contention on a
mutex can possibly expose you to lock-convoy, heavy FSB traffic wrt all of
the atomic operations that are going on under the covers of the mutex
implementation. Possible starvation is an issue in real-time systems...

> Also, as you know, I'm integrating with JVM native library. We've had
> discussions that say that maybe on some or all JREs, there could be a
> context switch of my Java thread inside a signal handler. Does your
> contention solution handle this the same way?

This "on-the-fly" preemptive context-switching will have no effect on the
lock-free version of the allocator. If your going to use lock-based
soultion, then I would avise you to make use of Java monitors.

> If so, and if I find out that the JREs are not pthread friendly that I
> intend to deploy on, I might be interested in a license of some kind.

Just let me know, and we can begin sketching some initial solutions wrt to
getting vZOOM integrated into your software architecture. I can quickly
implement the lock-free allocator portion of the library in pure Java (e.g.,
using the Atomic package) if that is needed... It will most likely
outperform the Java Native allocator...

Andy

unread,

May 20, 2007, 7:23:11 AM5/20/07

to

> Just let me know, and we can begin sketching some initial solutions wrt to
> getting vZOOM integrated into your software architecture. I can quickly
> implement the lock-free allocator portion of the library in pure Java (e.g.,
> using the Atomic package) if that is needed... It will most likely
> outperform the Java Native allocator...

The lock is used on the native side only (JNI library). So the Java
Atomic package doesn't help. You'll have to implement a C-library on
every popular platform that JREs are available on.

My allocator that needs a lock is a singleton class so it encounters
heavy contention. It is fairly naive because almost all of the
allocations are the same (large) size. (Smaller allocations of
random sizes are within the same thread and within this large block.)
What's the URL for your api and license?

Chris Thomasson

unread,

May 20, 2007, 6:33:02 PM5/20/07

to

"Andy" <andre...@yahoo.com> wrote in message

news:1179660191.0...@h2g2000hsg.googlegroups.com...

>> Just let me know, and we can begin sketching some initial solutions wrt
>> to
>> getting vZOOM integrated into your software architecture. I can quickly
>> implement the lock-free allocator portion of the library in pure Java
>> (e.g.,
>> using the Atomic package) if that is needed... It will most likely
>> outperform the Java Native allocator...
>
> The lock is used on the native side only (JNI library). So the Java
> Atomic package doesn't help. You'll have to implement a C-library on
> every popular platform that JREs are available on.

I have got it working under pure POSIX Threads... however, as you said, some
JRE don't like pthreads too much... So, I think an initial strategy would be
to get at least the vzoom allocator running with JNI monitors and
thread-specific-storage (TSS). After that, then we could start to drill down
on specific architectures, like the SPARC, x86, PPC, ARM9...

> My allocator that needs a lock is a singleton class so it encounters
> heavy contention. It is fairly naive because almost all of the
> allocations are the same (large) size. (Smaller allocations of
> random sizes are within the same thread and within this large block.)
> What's the URL for your api and license?

The public web-site will be up in about 30-40 days. I will announce it here
for sure. However, before we get into license, I want to be sure that this
is going to work!

:^)

No need to license anything if the product is not workable within your
existing software architecture right?

--- Does the project your working on have to be completed within a certain
timeframe? ---

Small-Business License Brief-Overview - $2750.00 sticker price.
__________________________

We recommend this license to be reviewed by counsel before signing!

You may not provide the binaries or source-codes to any third-party.

You may use the binaries to create products.

You may use the source-code to create products. Source-code usage requires
consultation.

You may sell these products to third-parties.

The license will expire in 5 years. Once the license expires you may not use
the binaries or source-codes in any new products.
__________________________

The vzoom apis that will be of interest to you are going to be:

__________________________
/* I am just quickly typing these out. A complete API overview will be on
the web-site. */

/*General Purpose Memory Allocator
- these are analogous to malloc/free
*/
extern "C" void* vzMalloc(size_t);
extern "C" void vzFree(void*);
__________________________

There is a lot more stuff in the library, but I think for know, we should
focus on the memory allocation aspect. I don't want to confuse you with
dissuasions about PDR and eventcounts, ect... Let's keep things simple for
now. BTW, how would you rate your threading skills strength: very mild,
mild, moderate, good, really good, advanced, extreme or out-of-this-world?

;^)

Andy

unread,

May 20, 2007, 6:58:10 PM5/20/07

to

On May 20, 6:33 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
> The public web-site will be up in about 30-40 days. I will announce it here
> for sure. However, before we get into license, I want to be sure that this
> is going to work!
>

I'll keep checking this newsgroup.

> :^)
>
> No need to license anything if the product is not workable within your
> existing software architecture right?
>
> --- Does the project your working on have to be completed within a certain
> timeframe? ---
>

My beta version will be ready in 2 months.

> Small-Business License Brief-Overview - $2750.00 sticker price.
> __________________________
>
> We recommend this license to be reviewed by counsel before signing!
>
> You may not provide the binaries or source-codes to any third-party.
>
> You may use the binaries to create products.
>

I only intend to use binaries, assuming they are bug-free and allow
for a faster engine when there is heavy contention.

> You may use the source-code to create products. Source-code usage requires
> consultation.
>
> You may sell these products to third-parties.
>
> The license will expire in 5 years. Once the license expires you may not use
> the binaries or source-codes in any new products.
> __________________________
>
> The vzoom apis that will be of interest to you are going to be:
>
> __________________________
> /* I am just quickly typing these out. A complete API overview will be on
> the web-site. */
>
> /*General Purpose Memory Allocator
> - these are analogous to malloc/free
> */
> extern "C" void* vzMalloc(size_t);
> extern "C" void vzFree(void*);

This won't help me. I mustn't allow the JNI library to malloc or free
anything in the heap being shared by the JVM. I want only JVM memory
to be used. The trick is that I create very large java.nio.Buffers as
"direct" and hold them during the duration of the load time of the
library. Those buffers are used by my own custom and currently
threadsafe version of malloc/free, which is a contention bottleneck
because its a global singleton. By the way, my heap is disjointed
union of all of the direct buffers.

I'll need your ability to wrap my malloc/free singleton functions in a
lock-free threadsafe version. That's why the 2 apis listed above
won't work.

> __________________________
>
> There is a lot more stuff in the library, but I think for know, we should
> focus on the memory allocation aspect. I don't want to confuse you with
> dissuasions about PDR and eventcounts, ect... Let's keep things simple for
> now. BTW, how would you rate your threading skills strength: very mild,
> mild, moderate, good, really good, advanced, extreme or out-of-this-world?
>
> ;^)

My threading skills in Java are advanced or better. My skills in C++
are coming up to speed, and are good enough to write fast code, and
find all of the bugs. I'm likely only to use apis that behave fully
like a native lock mutex, other than the allocator. That's why it
would be good to explain how your "mutex" works in terms of (1)
contention, (2) allowing a context switch (esp to a Java thread vs
posix thread), and (3) do you fall back to a POSIX mutex when there is
contention. In my case, you may need to fall back to a Java mutex.

So I'll keep a lookout for your website.