Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

About my parallel algorithms and NUMA

1 view

Skip to first unread message

Ramine

unread,

Feb 17, 2015, 4:46:14 PM2/17/15

Hello,

We have to be smart, so follow with me please..

As you have noticed i have implemented and invented a parallel Conjugate
gradient linear system solver library...

Here it is:

https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware

My parallel algorithm is scalable on NUMA architecture...

But You have to undertand my way of designing my NUMA-aware parallel
algorithms, the first way of implementing a NUMA-aware parallel
algorithm is by implementing a threadpool that schedules a job on a
given thread by specifying for example the NUMA-node explicitly
depending on the wich NUMA node's memory you will do your processing ...
this way will buy you 40% more throughput on NUMA architecture, but
there is another way of doing is to use the classical threadpool without
specifying the NUMA node explicitly , but you will divide for exemple
your parallel memory processing between the NUMA nodes, this is the way
i have implemented my parallel algorithms that are NUMA-aware, my way of
doing is scalable on NUMA architecture but you will get 40% less
throughput on NUMA architecture, but even if it's 40% throughput i think
that my parallel algorithms that are NUMA-aware are scalable on NUMA
architecture and they are still good enough, my next parallel sort
library will be also scalable on NUMA-architecture.

From were i have got this 40% ? please read here:

"Performance impact: the cost of NUMA remote memory access

For instance, this Dell whitepaper has some test results on the Xeon
5500 processors, showing that local memory access can have 40% higher
bandwidth than remote memory access, and the latency of local memory
access is around 70 nanoseconds whereas remote memory access has a
latency of about 100 nanoseconds."

Read more here:

http://sqlblog.com/blogs/linchi_shea/archive/2012/01/30/performance-impact-the-cost-of-numa-remote-memory-access.aspx

Amine Moulay Ramdane.

Ramine

unread,

Feb 17, 2015, 4:57:54 PM2/17/15

On 2/17/2015 4:46 PM, Ramine wrote:
> Hello,
>
> We have to be smart, so follow with me please..
>
> As you have noticed i have implemented and invented a parallel Conjugate
> gradient linear system solver library...
>
> Here it is:
>
> https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware
>
>
> My parallel algorithm is scalable on NUMA architecture...
>
> But You have to undertand my way of designing my NUMA-aware parallel
> algorithms, the first way of implementing a NUMA-aware parallel
> algorithm is by implementing a threadpool that schedules a job on a
> given thread by specifying for example the NUMA-node explicitly
> depending on the wich NUMA node's memory you will do your processing ...
> this way will buy you 40% more throughput on NUMA architecture, but
> there is another way of doing is to use the classical threadpool without
> specifying the NUMA node explicitly , but you will divide for exemple
> your parallel memory processing between the NUMA nodes, this is the way
> i have implemented my parallel algorithms that are NUMA-aware, my way of
> doing is scalable on NUMA architecture but you will get 40% less
> throughput on NUMA architecture, but even if it's 40% throughput i think

I mean: even if it's 40% less throughput...

0 new messages