Multi-cores and contention,,,

Amine

unread,

Jul 29, 2009, 2:35:11 PM7/29/09

to

Hello all,

I was thinking about my lock-free Threadpool and lock-free_SPMC

http://www.colba.net/~aminer/

i am using a queue for every thread worker ( and in the producer side
you can push in parallel also) and i have come to the conclusion that
this will scale very well.in the software side...

But i was thinking more about *scalability* and i have come to the
conclusion that there is still a problem, please follow with me:

As Amadahl law state, the Speed = 1 / (1 -P) + P /N (N: number of
cores)
now let's take as an example we are working with a scalable lock-free
Hashtable
now, if the transactions(find,insert..) are coming to the lock-free
hashtable
and the jobs(transactions) are processed in an M/M/m manner with
multiple
threads and cores servicing the jobs, that's very good...

Now suppose that the application like a lock-free Hashtable is
working
frequently with data and we are accessing big data directly in
memory
*AND* the data is randomly accessed *AND* there is contention between
processors over the memory bus in and M/M/1 manner , so, there will
be
an appreciable serial part 1-P in the Amadahl equation, hence:

what would will you expect from the scalable lock-free Hashtable ?

what would be then the performance ? i think this will be very bad.

So, now imagine that we elevate the problem by using an M/M/m in
the bus memory side , this will lower contention. But imagine that
we are using some big in memory data and accessing it *randomly*
with a lock-free Hash-table , what will be the performance then ?
not so good i think...

Also i have another question:

Read this:

http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf

How the performance of this lock-free Hashtable can scale well in
performance
on an Azul system with more than 700 cores, if we are using some very
big in
memory data and accessing it *RANDOMLY* ? and what memory bus
architechture
are they using ?...

What do you think about this problem ?

Regards,
Amine Moulay Ramdane.

Amine

unread,

Jul 29, 2009, 3:21:07 PM7/29/09

to

I wrote:
> So, now imagine that we elevate the problem by using an M/M/m in
> the bus memory side , this will lower contention. But imagine that
> we are using some big in memory data and accessing it *randomly*
> with a lock-free Hash-table , what will be the performance then ?
> not so good i think...

I mean what is the benefit of a very FAST Ferari car if the roads are
not so good
and they lower a lot it's speed ?

What is the benefit of a very fast lock-free hash-table if the in
memory data
it access is big and it's *randomly* accessed: and this will lower the
speed
*AND*
also there is a contention on the bus in an M/M/1 manner and this
will also
lower the speed ?

Regards,
Amine. Moulay Ramdane.

Amine

unread,

Jul 29, 2009, 3:32:42 PM7/29/09

to

On Jul 29, 3:21 pm, Amine <ami...@colba.net> wrote:
> I wrote:
> > So, now imagine that we elevate the problem by using an M/M/m in
> > the bus memory side , this will lower contention. But imagine that
> > we are using some big in memory data and accessing it *randomly*
> > with a lock-free Hash-table , what will be the performance then ?
> > not so good i think...
>
> I mean what is the benefit of a very FAST Ferari car if the roads are
> not so good
> and they lower a lot it's speed ?

typo:

I mean what is the benefit of a very FAST Ferari car if the roads are

not so good and they lower a lot its speed ?

Regards,
Amine Moulay Ramdane.

> > Amine Moulay Ramdane.- Hide quoted text -
>
> - Show quoted text -

Amine

unread,

Jul 29, 2009, 8:34:06 PM7/29/09

to

Hello all,

I wrote:
"I mean what is the benefit of a very FAST Ferari car if the roads
are
not so good and they lower a lot its speed ?"

I was still *thinking* ...

So, let say for example that we are runnning under an Azul system
with more than 700 cores: http://www.azulsystems.com/ and using
this lock-free Hash table:
http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf

and millions of transactions (let say find() transactions that search
and return
some data) are coming with an arrival rate, those transations are
mostly
memory bound and let suppose an instant that those are mostly random
transactions
that *miss* the cash, so that they are serviced not in an M/M/m manner
but in M/M/1
with the memory as the server and with its low service rate lower than
the caches,
that will be very bad. So in this situation: how can even this Azul
system with more
than 700 cores with let say a lock-free Hash-table scale in
performance ?

Regards,
Amine.

> > - Show quoted text -- Hide quoted text -

David Schwartz

unread,

Jul 29, 2009, 10:16:24 PM7/29/09

to

On Jul 29, 12:21 pm, Amine <ami...@colba.net> wrote:

> What is the benefit of a very fast lock-free hash-table if the in
> memory data
> it access is big and it's *randomly* accessed: and this will lower the
> speed
> *AND*
> also there is a contention on the bus in an M/M/1 manner and this
> will also
> lower the speed ?

The benefit is that the lock-free hash table stays out of the way and
doesn't have any significant negative impact on performance, and
that's the best you can expect it to do.

> I mean what is the benefit of a very FAST Ferari car if the roads are
> not so good
> and they lower a lot it's speed ?

The benefit is that you only have one thing to work on rather than
two. The roads being slow is not the problem the fast car is intended
to solve.

DS

Dave Butenhof

unread,

Jul 30, 2009, 6:39:17 AM7/30/09

to

And, let's face the facts: the Ferrari is pretty cool even standing
still in your driveway. Sometimes that alone is sufficient benefit. ;-)

But to build on David's final comment, having the Ferrari means that
when you hit the Autobahn you can floor it and really move. The value
isn't diminished by the fact that it inevitably goes slow on congested
residential streets. (Although, to take it even further, if you plan to
drive exclusively on those streets, the value of the Ferrari is reduced
to the fact that it looks cooler than those other cars going the same
speed.)

In other words, even if the way you generally need to use your lock-free
algorithms prevents them from doing their best, you may still be able to
use them in other ways that exploit their abilities. If you don't ever
use them the way they really were intended to be used, you still have
the satisfaction of knowing you're using cool state-of-the-art lock free
synchronization.

(And maybe you can think about improving the road...)

Amine

unread,

Jul 30, 2009, 1:29:31 PM7/30/09

to

Hello again,

I was still *thinking*...

Now if you have followed with me, i think there is three things:
You have to look at the *scalability* from three perspectives:

ALGORITHM + HARDWARE + DATA

I wrote:

"I mean what is the benefit of a very FAST Ferrari car if the roads

are not so good and they lower a lot its speed ?"

Now please follow with me..

I wrote:

[1] "So, let say for example that we are runnning under an Azul system
with more than 700 cores: http://www.azulsystems.com/and using

this lock-free Hash table:http://www.azulsystems.com/events/
javaone_2007/2007_LockFreeHash.pdf

and millions of transactions (let say find() transactions that search
and return some data) are coming with an arrival rate, those
transations
are mostly memory bound and let suppose an instant that those are
mostly
random transactions that *miss* the cash, so that they are serviced
not in an
M/M/m manner but in M/M/1 with the memory as the server and with its
low service
rate lower than the caches, that will be very bad. So in this
situation: how can even
this Azul system with more than 700 cores with let say a lock-free
Hash-table scale
in performance ?"

If you have noticed, this problem can still occur with my lock-free
Threadpool and lockfree_SPMC.

Now this problem doesn't come from the ALGORITHM but can
come from the HARDWARE: one bus that is servicing the jobs
in an M/M/1 manner, or/and come from the essence of
the DATA: random access to memory that *miss* the cash.

Dave Butenhof wrote in a post:

"And maybe you can think about improving the road... "

But if you have noticed the 'bad road' can be the essence of DATA in
itself.

So let take as an example the folowing situation
(that is not the worst case):

Let say that the access of the big in-memory data is not so
random and 3 meg of data is frequently used in let say 80% of
the time, so what i want is to avoid *completly* the situation
where the 3 meg cached data that is used 80% of the time will
not be erased completly from the cache: a situation that
can look like a livelock if we have a very high transaction arrival
rate....

So, what i want in this situation is also an explicit control over
the cache , that means to *force* the cache to lock 3 meg of data
in a cache of let say 4 meg (to be able to parallelize like an M/M/
m...)

Hence my question:

Wich hardware can do this sort of thing in this sort of situation ?

Regards,
Amine Moulay Ramdane.

On Jul 29, 8:34 pm, Amine <ami...@colba.net> wrote:
> Hello all,
>

> I wrote:
>
> "I mean what is the benefit of a very FAST Ferari car if the roads
> are
> not so good and they lower a lot its speed ?"
>
> I was still *thinking* ...
>
> So, let say for example that we are runnning under an Azul system

> with more than 700 cores: http://www.azulsystems.com/and using

Amine

unread,

Jul 31, 2009, 12:06:44 PM7/31/09

to

Hello all,

I am still *thinking*...

Please look with me carefully...

I have done for example a parallel program, please look at
Parallel Matrix demo: http://www.colba.net/~aminer/

and this is working fine and it's scalable because there is
an appreciable (1-P) in the Amadhal equation: because it's
using the cache and doing some calculations in *PARALLEL*

But there is still a problem...

I wrote:

"Now if you have followed with me, i think there is three things:
You have to look at the *scalability* from three perspectives:

ALGORITHM + HARDWARE + DATA"

and i wrote:

"If you have noticed, this problem can still occur with my lock-free
Threadpool and lockfree_SPMC."

If you have noticed, i am using a queue for every worker thread etc..

So now my question is:

How can we say that my lockfree Threadpool and lockfree_SPMC
queue is scalable in performance if for every transaction in the pop
()
side we are missing the cache(cause of *RANDOM* data that miss
the cache: so we are not servicing in an M/M/m manner: that means
bringing the data from the caches in *PARALLEL*, but we are are
just servicing in an M/M/1 manner with contention on a single
memory bus and with the lower service rate of the memory ??

Do you undertand ?

And how do you think about this problem ?

Regards,
Amine Moulay Ramdane.

Amine

unread,

Jul 31, 2009, 1:34:27 PM7/31/09

to

Hello all,

So now let's make it clear...:

If someone say to you that an ALGORITHM is *SCALABLE*
in performance: let say a lockfree hash, a lockfree list , a
lockfree queue , a lockfree stack...

to say it is *NOT* sufficient to make it *REALLY* scalable in
performance.

You have to look at three things:

ALGORITHM + HARDWARE + DATA

and if for example the essence of the *DATA* accessed is
random and does miss completly or a lot the cache , you
can NOT parallelize by using multiple cache in PARALLEL.
in an M/M/m manner

=> how you can you say that it's scalable in performance ?

=> It's simply NOT scalable.

Regards,
Amine Moulay Ramdane.

Amine

unread,

Jul 31, 2009, 1:49:06 PM7/31/09

to

On Jul 31, 1:34 pm, Amine <ami...@colba.net> wrote:
> Hello all,
>

> So now let's make it clear...:
>
> If someone say to you that an ALGORITHM is *SCALABLE*
> in performance: let say a lockfree hash, a lockfree list , a
> lockfree queue , a lockfree stack...
>
> to say it is *NOT* sufficient to make it *REALLY* scalable in
> performance.
>
> You have to look at three things:
>
> ALGORITHM + HARDWARE + DATA
>
> and if for example the essence of the *DATA* accessed is
> random and does miss completly or a lot the cache , you
> can NOT parallelize by using multiple cache in PARALLEL.
> in an M/M/m manner

So they are serviced by the memory as the server in an M/M/1 manner,
with the low service rate of the memory bus....

>
> => how you can you say that it's scalable in performance ?
>
> => It's simply NOT scalable.

I was speaking about those lock-free and wait-free algorithms:
like: stack,queue,list,hashtable..

I was not speaking about the CPU bound algorithms.

Do you understand now ?

Regards,
Amine Moulay Ramdane.

> ...
>
> read more »- Hide quoted text -

Amine

unread,

Jul 31, 2009, 3:24:23 PM7/31/09

to

Hello,

We have the abstraction and the reality.

If you are designing an abstract algorithm like a lock-free
or wait-free datastructure, example: a list or hashtable...
and you are interrest in *scalability*, do you take into
account the present reality ?

Of course, as an example: false sharing.

Now the same thing:

If you are saying that your ALGORITHM is scalable
in performance , i think we have to take into account the
present REALITY and look at the present HARDWARE
and the essence of the DATA..

As i wrote before:

"Now if you have followed with me, i think there is three things:
You have to look at the *scalability* from three perspectives:

ALGORITHM + HARDWARE + DATA"

Now, let say for example that you are using those wait-free and
lock-free datastructures in a DATA intensive manner and let say
you are receiving a high rate of transations like find() that search
and RETURN the data and you are missing completly or a lot the
cash: so you are not working in an M/M/m manner with the caches
as the servers but in an M/M/1 manner with the memory (bus) as
the server => low service rate.

What will be the benefit ?

Can you still say those wait-free or lock-free datastructures
are scalable in performance ?

Do we take into account the REALITY ?

I think yes.

So, if we take into account the REALITY: we will say that
they are NOT scalable in performance.

Hence, I will say that they are NOT scalable in performance.

Regards,
Amine Moulay Ramdane.

> ...
>
> read more »- Hide quoted text -
>

> - Show quoted text -- Hide quoted text -

Dmitriy V'jukov

unread,

Jul 31, 2009, 5:33:10 PM7/31/09

to

On 31 июл, 23:24, Amine <ami...@colba.net> wrote:

What do you mean by all these "M/M/1" and "M/M/m"? I am totally
confused.

--
Dmitriy V'jukov

Amine

unread,

Jul 31, 2009, 6:33:23 PM7/31/09

to

On Jul 31, 5:33 pm, "Dmitriy V'jukov" <dvyu...@gmail.com> wrote:
> On 31 июл, 23:24, Amine <ami...@colba.net> wrote:
>
> What do you mean by all these "M/M/1" and "M/M/m"? I am totally
> confused.

M means Markov: because of the memoryless nature of the
exponential distribution it makes the arrival and departure
processes into continuous-time versions of Marov chain.

M/M/m means the arrival process(client side: jobs,transations)
and service process(in the server side) is exponentially distributed,
m: means the number of servers..

Regards,
Amine.

>
> --
> Dmitriy V'jukov

Amine

unread,

Jul 31, 2009, 8:25:28 PM7/31/09

to

On Jul 31, 5:33 pm, "Dmitriy V'jukov" <dvyu...@gmail.com> wrote:

Let's take for example there is a high rate of transactions
like find('key',out obj), that are serviced by a wait-free or a
lock-free hash-table. After 'obj' is returned we copy the data
inside the obj to somewhere else over some very fast networks.

What is the benefit of those wait-free or lock-free datastructures
if the accessed memory data inside 'obj' is random and we are
missing completly or a lot the cash: so that we will not be
servicing in an M/M/m manner with the caches as the servers

but in an M/M/1 manner with the memory (bus) as the server:

=> LOW SERVICE RATE + NO SCALABILITY.

Regards,
Amine.

Amine

unread,

Jul 31, 2009, 9:45:22 PM7/31/09

to

And since there is no scalabity benefit in those search+copy
transations , how will you call that ?

I have simply said:

"What will be the benefit ?

Can you still say those wait-free or lock-free datastructures
are scalable in performance ?

Do we take into account the REALITY ?

I think yes.

So, if we take into account the REALITY: we will say that
they are NOT scalable in performance."

Regards,
Amine.

>
> Regards,
> Amine.

Amine

unread,

Jul 31, 2009, 11:19:26 PM7/31/09

to

Dmitriy V'jukov wrote:
> What do you mean by all these "M/M/1" and "M/M/m"? I am totally
> confused.

> --
> Dmitriy V'jukov

I wrote:
"Let's take for example there is a high rate of transactions
like find('key',out obj), that are serviced by a wait-free or a
lock-free hash-table. After 'obj' is returned we copy the data
inside the obj to somewhere else over some very fast networks.

What is the benefit of those wait-free or lock-free datastructures
if the accessed memory data inside 'obj' is random and we are
missing completly or a lot the cash: so that we will not be
servicing in an M/M/m manner with the caches as the servers
but in an M/M/1 manner with the memory (bus) as the server:

=> LOW SERVICE RATE + NO SCALABILITY.

Is it not something like a database that doesn't scale in REALITY ?

Even with those wait-free or lock-free data-strutures ?

How will you call that ?

And tell me who is responsible of this situation Dmitriy V'jukov ?

Is it the ALGORITHM ?

The HARDWARE ?

Or the DATA ?

Can i say the ALGORITHM ?

Or is it forbiden ?

So, i will say the DATA is reponsible.

Am i correct to say that ?

Or is it the HARDWARE ?

Regards,
Amine Moulay Ramdane.

Amine

unread,

Aug 1, 2009, 12:17:27 AM8/1/09

to

I wrote:

" Is it not something like a database that doesn't scale in REALITY ?

Even with those wait-free or lock-free data-strutures ?

How will you call that ?

And tell me who is responsible of this situation Dmitriy V'jukov ?

Is it the ALGORITHM ?

The HARDWARE ?

Or the DATA ?

Can i say the ALGORITHM ?

Or is it forbiden ?

So, i will say the DATA is reponsible.

Am i correct to say that ?

Or is it the HARDWARE ?"

And what is the solution to this catastrophe Dmitriy V'jukov ?

Think with me my dear Dmitriy V'jukov...

Regards,
Amine.