Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

About the full paper on the Swarm chip..

10 views
Skip to first unread message

amin...@gmail.com

unread,
Feb 11, 2020, 7:47:29 PM2/11/20
to
Hello..


About the full paper on the Swarm chip..

I have just read the following full PhD paper from MIT about the new Swarm chip:

https://people.csail.mit.edu/sanchez/papers/2015.swarm.micro.pdf

I think there are disadvantages with this chip, first it is
using the same mechanisms as Transactional memory, but those mechanisms
of Transactional memory are not so efficient(read below to know more), but we are already having Intel hardware Transactional memory , and i don't think it is globally faster on parallelism than actual hardware and software because look at the writing on the paper about the benchmarks and you will understand more.

And about Transactional memory and more read my following thoughts:

About Hardware Transactional Memory and my invention that is my powerful Fast Mutex:

"As someone who has used TSX to optimize synchronization primitives, you can expect to see a ~15-20% performance increase, if (big if) your program is heavy on disjoint data access, i.e. a lock is needed for correctness, but conflicts are rare in practice. If you have a lot of threads frequently writing the same cache lines, you are probably going to see worse performance with TSX as opposed to traditional locking. It helps to think about TSX as transparently performing optimistic concurrency control, which is actually pretty much how it is implemented under the hood."

Read more here:

https://news.ycombinator.com/item?id=8169697

So as you are noticing, HTM (hardware transactional memory) and TM can not replace locks when doing IO and for highly contended critical sections, this is why i have invented my following powerful Fast Mutex:

More about research and software development..

I have just looked at the following new video:

Why is coding so hard...

https://www.youtube.com/watch?v=TAAXwrgd1U8


I am understanding this video, but i have to explain my work:

I am not like this techlead in the video above, because i am also an "inventor" that has invented many scalable algorithms and there implementions, i am also inventing effective abstractions, i give you an example:

Read the following of the senior research scientist that is called Dave Dice:

Preemption tolerant MCS locks

https://blogs.oracle.com/dave/preemption-tolerant-mcs-locks

As you are noticing he is trying to invent a new lock that is preemption tolerant, but his lock lacks some important characteristics, this is why i have just invented a new Fast Mutex that is adaptative and that is much much better and i think mine is the "best", and i think you will not find it anywhere, my new Fast Mutex has the following characteristics:

1- Starvation-free
2- Good fairness
3- It keeps efficiently and very low the cache coherence traffic
4- Very good fast path performance (it has the same performance as the
scalable MCS lock when there is contention.)
5- And it has a decent preemption tolerance.

this is how i am an "inventor", and i have also invented other scalable algorithms such as a scalable reference counting with efficient support for weak references, and i have invented a fully scalable Threadpool, and i have also invented a Fully scalable FIFO queue, and i have also invented other scalable algorithms and there implementations, and i think i will sell some of them to Microsoft or to Google or Embarcadero or such software companies.

And about composability of lock-based systems now:

Design your systems to be composable. Among the more galling claims of the detractors of lock-based systems is the notion that they are somehow uncomposable:

“Locks and condition variables do not support modular programming,” reads one typically brazen claim, “building large programs by gluing together smaller programs[:] locks make this impossible.”9 The claim, of course, is incorrect. For evidence one need only point at the composition of lock-based systems such as databases and operating systems into larger systems that remain entirely unaware of lower-level locking.

There are two ways to make lock-based systems completely composable, and each has its own place. First (and most obviously), one can make locking entirely internal to the subsystem. For example, in concurrent operating systems, control never returns to user level with in-kernel locks held; the locks used to implement the system itself are entirely behind the system call interface that constitutes the interface to the system. More generally, this model can work whenever a crisp interface exists between software components: as long as control flow is never returned to the caller with locks held, the subsystem will remain composable.

Second (and perhaps counterintuitively), one can achieve concurrency and
composability by having no locks whatsoever. In this case, there must be
no global subsystem state—subsystem state must be captured in per-instance state, and it must be up to consumers of the subsystem to assure that they do not access their instance in parallel. By leaving locking up to the client of the subsystem, the subsystem itself can be used concurrently by different subsystems and in different contexts. A concrete example of this is the AVL tree implementation used extensively in the Solaris kernel. As with any balanced binary tree, the implementation is sufficiently complex to merit componentization, but by not having any global state, the implementation may be used concurrently by disjoint subsystems—the only constraint is that manipulation of a single AVL tree instance must be serialized.

Read more here:

https://queue.acm.org/detail.cfm?id=1454462


About deadlocks and race conditions in parallel programming..

I have just read the following paper:

Deadlock Avoidance in Parallel Programs with Futures

https://cogumbreiro.github.io/assets/cogumbreiro-gorn.pdf

So as you are noticing you can have deadlocks in parallel programming
by introducing circular dependencies among tasks waiting on future values or you can have deadlocks by introducing circular dependencies among tasks waiting on windows event objects or such synchronisation objects, so you have to have a general tool that detects deadlocks,
but if you are noticing that the tool called Valgrind for C++
can detect deadlocks only happening from Pthread locks , read
the following to notice it:

http://valgrind.org/docs/manual/hg-manual.html#hg-manual.lock-orders

So this is not good, so you have to have a general way that permits
to detect deadlocks on locks , mutexes, and deadlocks from introducing circular dependencies among tasks waiting on future values or deadlocks you may have deadlocks by introducing circular dependencies among tasks waiting on windows event objects or such synchronisation objects etc.
this is why i have talked before about this general way that detects deadlocks, and here it is, read my following thoughts:

Yet more precision about the invariants of a system..

I was just thinking about Petri nets , and i have studied more
Petri nets, they are useful for parallel programming, and
what i have noticed by studying them, is that there is two methods
to prove that there is no deadlock in the system, there is the
structural analysis with place invariants that you have to mathematically find, or you can use the reachability tree, but we have to notice that the structural analysis of Petri nets learns you more, because it permits you to prove that there is no deadlock in the system, and the place invariants are mathematically calculated by the following
system of the given Petri net:

Transpose(vector) * Incidence matrix = 0

So you apply the Gaussian Elimination or the Farkas algorithm to
the incidence matrix to find the Place invariants, and as you will
notice those place invariants calculations of the Petri nets look
like Markov chains in mathematics, with there vector of probabilities
and there transition matrix of probabilities, and you can, using
Markov chains mathematically calculate where the vector of probabilities
will "stabilize", and it gives you a very important information, and
you can do it by solving the following mathematical system:

Unknown vector1 of probabilities * transition matrix of probabilities = Unknown vector1 of probabilities.

Solving this system of equations is very important in economics and
other fields, and you can notice that it is like calculating the
invariants , because the invariant in the system above is the
vector1 of probabilities that is obtained, and this invariant,
like in the invariants of the structural analysis of Petri nets,
gives you a very important information about the system, like where
market shares will stabilize that is calculated this way in economics.

About reachability analysis of a Petri net..

As you have noticed in my Petri nets tutorial example (read below),
i am analysing the liveness of the Petri net, because there is a rule
that says:

If a Petri net is live, that means that it is deadlock-free.

Because reachability analysis of a Petri net with Tina
gives you the necessary information about boundedness and liveness
of the Petri net. So if it gives you that the Petri net is "live" , so
there is no deadlock in it.

Tina and Partial order reduction techniques..

With the advancement of computer technology, highly concurrent systems
are being developed. The verification of such systems is a challenging
task, as their state space grows exponentially with the number of processes. Partial order reduction is an effective technique to address this problem. It relies on the observation that the effect of executing transitions concurrently is often independent of their ordering.

Tina is using “partial-order” reduction techniques aimed at preventing
combinatorial explosion, Read more here to notice it:

http://projects.laas.fr/tina/papers/qest06.pdf

About modelizations and detection of race conditions and deadlocks
in parallel programming..

I have just taken further a look at the following project in Delphi
called DelphiConcurrent by an engineer called Moualek Adlene from France:

https://github.com/moualek-adlene/DelphiConcurrent/blob/master/DelphiConcurrent.pas

And i have just taken a look at the following webpage of Dr Dobb's journal:

Detecting Deadlocks in C++ Using a Locks Monitor

https://www.drdobbs.com/detecting-deadlocks-in-c-using-a-locks-m/184416644

And i think that both of them are using technics that are not as good
as analysing deadlocks with Petri Nets in parallel applications ,
for example the above two methods are only addressing locks or mutexes
or reader-writer locks , but they are not addressing semaphores
or event objects and such other synchronization objects, so they
are not good, this is why i have written a tutorial that shows my
methodology of analysing and detecting deadlocks in parallel applications with Petri Nets, my methodology is more sophisticated because it is a generalization and it modelizes with Petri Nets the broader range of synchronization objects, and in my tutorial i will add soon other synchronization objects, you have to look at it, here it is:

https://sites.google.com/site/scalable68/how-to-analyse-parallel-applications-with-petri-nets

You have to get the powerful Tina software to run my Petri Net examples
inside my tutorial, here is the powerful Tina software:

http://projects.laas.fr/tina/

Also to detect race conditions in parallel programming you have to take
a look at the following new tutorial that uses the powerful Spin tool:

https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

This is how you will get much more professional at detecting deadlocks
and race conditions in parallel programming.


About Java and Delphi and Freepascal..

I have just read the following webpage:

Java is not a safe language

https://lemire.me/blog/2019/03/28/java-is-not-a-safe-language/


But as you have noticed the webpage says:

- Java does not trap overflows

But Delphi and Freepascal do trap overflows.

And the webpage says:

- Java lacks null safety

But Delphi has null safety since i have just posted about it by saying the following:

Here is MyNullable library for Delphi and FreePascal that brings null safety..

Java lacks null safety. When a function receives an object, this object might be null. That is, if you see ‘String s’ in your code, you often have no way of knowing whether ‘s’ contains an actually String unless you check at runtime. Can you guess whether programmers always check? They do not, of course, In practice, mission-critical software does crash without warning due to null values. We have two decades of examples. In Swift or Kotlin, you have safe calls or optionals as part of the language.

Here is MyNullable library for Delphi and FreePascal that brings null safety, you can read the html file inside the zip to know how it works, and you can download it from my website here:

https://sites.google.com/site/scalable68/null-safety-library-for-delphi-and-freepascal


And the webpage says:

- Java allows data races

But for Delphi and Freepascal i have just written about how to prevent data races by saying the following:

Yet more precision about the invariants of a system..

I was just thinking about Petri nets , and i have studied more
Petri nets, they are useful for parallel programming, and
what i have noticed by studying them, is that there is two methods
to prove that there is no deadlock in the system, there is the
structural analysis with place invariants that you have to mathematically find, or you can use the reachability tree, but we have to notice that the structural analysis of Petri nets learns you more, because it permits you to prove that there is no deadlock in the system, and the place invariants are mathematically calculated by the following
system of the given Petri net:

Transpose(vector) * Incidence matrix = 0

So you apply the Gaussian Elimination or the Farkas algorithm to
the incidence matrix to find the Place invariants, and as you will
notice those place invariants calculations of the Petri nets look
like Markov chains in mathematics, with there vector of probabilities
and there transition matrix of probabilities, and you can, using
Markov chains mathematically calculate where the vector of probabilities
will "stabilize", and it gives you a very important information, and
you can do it by solving the following mathematical system:

Unknown vector1 of probabilities * transition matrix of probabilities = Unknown vector1 of probabilities.

Solving this system of equations is very important in economics and
other fields, and you can notice that it is like calculating the
invariants , because the invariant in the system above is the
vector1 of probabilities that is obtained, and this invariant,
like in the invariants of the structural analysis of Petri nets,
gives you a very important information about the system, like where
market shares will stabilize that is calculated this way in economics.

About reachability analysis of a Petri net..

As you have noticed in my Petri nets tutorial example (read below),
i am analysing the liveness of the Petri net, because there is a rule
that says:

If a Petri net is live, that means that it is deadlock-free.

Because reachability analysis of a Petri net with Tina
gives you the necessary information about boundedness and liveness
of the Petri net. So if it gives you that the Petri net is "live" , so
there is no deadlock in it.

Tina and Partial order reduction techniques..

With the advancement of computer technology, highly concurrent systems
are being developed. The verification of such systems is a challenging
task, as their state space grows exponentially with the number of processes. Partial order reduction is an effective technique to address this problem. It relies on the observation that the effect of executing transitions concurrently is often independent of their ordering.

Tina is using “partial-order” reduction techniques aimed at preventing
combinatorial explosion, Read more here to notice it:

http://projects.laas.fr/tina/papers/qest06.pdf

About modelizations and detection of race conditions and deadlocks
in parallel programming..

I have just taken further a look at the following project in Delphi
called DelphiConcurrent by an engineer called Moualek Adlene from France:

https://github.com/moualek-adlene/DelphiConcurrent/blob/master/DelphiConcurrent.pas

And i have just taken a look at the following webpage of Dr Dobb's journal:

Detecting Deadlocks in C++ Using a Locks Monitor

https://www.drdobbs.com/detecting-deadlocks-in-c-using-a-locks-m/184416644

And i think that both of them are using technics that are not as good
as analysing deadlocks with Petri Nets in parallel applications ,
for example the above two methods are only addressing locks or mutexes
or reader-writer locks , but they are not addressing semaphores
or event objects and such other synchronization objects, so they
are not good, this is why i have written a tutorial that shows my
methodology of analysing and detecting deadlocks in parallel applications with Petri Nets, my methodology is more sophisticated because it is a generalization and it modelizes with Petri Nets the broader range of synchronization objects, and in my tutorial i will add soon other synchronization objects, you have to look at it, here it is:

https://sites.google.com/site/scalable68/how-to-analyse-parallel-applications-with-petri-nets

You have to get the powerful Tina software to run my Petri Net examples
inside my tutorial, here is the powerful Tina software:

http://projects.laas.fr/tina/

Also to detect race conditions in parallel programming you have to take
a look at the following new tutorial that uses the powerful Spin tool:

https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

This is how you will get much more professional at detecting deadlocks
and race conditions in parallel programming.


And about memory safety of Delphi and Freepascal, here is what i said:

I have just read the following webpage about memory safety:

Microsoft: 70 percent of all security bugs are memory safety issues

https://www.zdnet.com/article/microsoft-70-percent-of-all-security-bugs-are-memory-safety-issues/


And it says:


"Users who often read vulnerability reports come across terms over and over again. Terms like buffer overflow, race condition, page fault, null pointer, stack exhaustion, heap exhaustion/corruption, use after free, or double free --all describe memory safety vulnerabilities."

So as you will notice below, that the following memory safety problems has been solved in Delphi:

And I have just read the following webpage about "Fearless Security: Memory safety":

https://hacks.mozilla.org/2019/01/fearless-security-memory-safety/

Here is the memory safety problems:

1- Misusing Free (use-after-free, double free)

I have solved this in Delphi and Freepascal by inventing a "Scalable" reference counting with efficient support for weak references. Read below about it.


2- Uninitialized variables

This can be detected by the compilers of Delphi and Freepascal.


3- Dereferencing Null pointers

I have solved this in Delphi and Freepascal by inventing a "Scalable" reference counting with efficient support for weak references. Read below about it.

4- Buffer overflow and underflow

This has been solved in Delphi by using madExcept, read here about it:

http://help.madshi.net/DebugMm.htm

You can buy it from here:

http://www.madshi.net/


There remains also the stack exhaustion memory safety problem,
and here is how to detect it in Delphi:

Call the function "DoStackOverflow" below once from your code and you'll get the EStackOverflow error raised by Delphi with the message "stack overflow", and you can print the line of the source code where EStackOverflow is raised with JCLDebug and such:

----

​function DoStackOverflow : integer;

begin

result := 1 + DoStackOverflow;

end;

---



About my scalable algorithms inventions..


I am a white arab, and i am a gentleman type of person,
and i think that you know me too by my poetry that i wrote
in front of you and that i posted here, but i am
also a more serious computer developer, and i am also
an inventor who has invented many scalable algorithms, read about
them on my writing below:


Here is my last scalable algorithm invention, read
what i have just responded in comp.programming.threads:

About my LRU scalable algorithm..

On 10/16/2019 7:48 AM, Bonita Montero on comp.programming.threads wrote:
> Amine, a quest for you:
> Database-servers and operating-system-kernels mostly use LRU as
> the scheme to evict old buffers from their cache. One issue with
> LRU is, that an LRU-structure can't be updated by multiple threads
> simultaneously. So you have to have a global lock.
> I managed to write a LRU-caching-class that can update the links
> in the LRU-list to put the most recent fetched block to the head
> of the list without any lock in almost any acccess. Only when
> flushing an entry or inserting a new I have to lock the structure
> completely; but in contrast to cache-hits this has usually a mag-
> nitude lower frequency because of the slowness of disk-/ssd-access,
> so this doesn't relly hurt.
> The algorithm is partitiylly locked, partitially lock-free. Even
> the case when putting cache hits to the head has to be processed
> in locked mode in very rare cases. And as I said inserting and
> flushing is conventional locked access.
> So the quest is for you: Can you guess what I did?


And here is what i have just responded:


I think i am also smart, so i have just quickly found a solution that is scalable and that is not your solution, so it needs my hashtable that is scalable and it needs my fully scalable FIFO queue that i have invented. And i think i will not patent it. But my solution is not Lockfree, it uses locks like in a Lock striping manner and it is scalable.


And read about my other scalable algorithms inventions on my writing below:


About the buffer overflow problem..

I wrote yesterday about buffer overflow in Delphi and Freepascal..

I think there is a "higher" abstraction in Delphi and Freepascal
that does the job very well of avoiding buffer overflow, and it is
the TMemoryStream class, since it behaves also like a pointer
and it supports reallocmem() and freemem() on the pointer but
with a higher level abstraction, look for example at my
following example in Delphi and Freepascal, you will notice
that contrary to pointers , that the memory stream is adapting with writebuffer() without the need of reserving the memory, and this is why it avoids the buffer overflow problem, read the following example to notice how i am using it with a PAnsichar type:

========================================


Program test;


uses system.classes,system.sysutils;


var P: PAnsiChar;


Begin


P:='Amine';


mem:=TMemorystream.create;

mem.position:=0;

mem.writebuffer(pointer(p)^,6);

mem.position:=0;

writeln(PAnsichar(mem.memory));



end.


===================================


So since Delphi and Freepascal also detect the buffer overflow on dynamic arrays , so i think that Delphi and Freepascal are powerful
tools.


Read my previous thoughts below to understand more:


And I have just read the following webpage about "Fearless Security: Memory safety":

https://hacks.mozilla.org/2019/01/fearless-security-memory-safety/

Here is the memory safety problems:

1- Misusing Free (use-after-free, double free)

I have solved this in Delphi and Freepascal by inventing a "Scalable" reference counting with efficient support for weak references. Read below about it.


2- Uninitialized variables

This can be detected by the compilers of Delphi and Freepascal.


3- Dereferencing Null pointers

I have solved this in Delphi and Freepascal by inventing a "Scalable" reference counting with efficient support for weak references. Read below about it.

4- Buffer overflow and underflow

This has been solved in Delphi by using madExcept, read here about it:

http://help.madshi.net/DebugMm.htm

You can buy it from here:

http://www.madshi.net/


And about race conditions and deadlocks problems and more, read my following thoughts to understand:


I will reformulate more smartly what about race conditions detection in Rust, so read it carefully:

You can think of the borrow checker of Rust as a validator for a locking system: immutable references are shared read locks and mutable references are exclusive write locks. Under this mental model, accessing data via two independent write locks is not a safe thing to do, and modifying data via a write lock while there are readers alive is not safe either.

So as you are noticing that the "mutable" references in Rust follow the Read-Write Lock pattern, so this is not good, because it is not like more fine-grained parallelism that permits us to run the writes in "parallel" and gain more performance from parallelizing the writes.


Read more about Rust and Delphi and my inventions..

I think the spirit of Rust is like the spirit of ADA, they are especially designed for the very high standards of safety, like those of ADA, "but" i don't think we have to fear race conditions that Rust solve, because i think that race conditions are not so difficult to avoid when you are a decent knowledgeable programmer in parallel programming, so you have to understand what i mean, now we have to talk about the rest of the safety guaranties of Rust, there remain the problem of Deadlocks, and i think that Rust is not solving this problem, but i have provided you with an enhanced DelphiConcurrent library for Delphi and Freepascal that detects deadlocks, and there is also the Memory Safety guaranties of Rust, here they are:

1- No Null Pointer Dereferences
2- No Dangling Pointers
3- No Buffer Overruns

But notice that I have solved the number 1 and number 2 by inventing my
scalable reference counting with efficient support for weak references
for Delphi and Freepascal, read below to notice it, and for number 3 read my following thoughts to understand:

More about research and software development..

I have just looked at the following new video:

Why is coding so hard...

https://www.youtube.com/watch?v=TAAXwrgd1U8


I am understanding this video, but i have to explain my work:

I am not like this techlead in the video above, because i am also an "inventor" that has invented many scalable algorithms and there implementions, i am also inventing effective abstractions, i give you an example:

Read the following of the senior research scientist that is called Dave Dice:

Preemption tolerant MCS locks

https://blogs.oracle.com/dave/preemption-tolerant-mcs-locks

As you are noticing he is trying to invent a new lock that is preemption tolerant, but his lock lacks some important characteristics, this is why i have just invented a new Fast Mutex that is adaptative and that is much much better and i think mine is the "best", and i think you will not find it anywhere, my new Fast Mutex has the following characteristics:

1- Starvation-free
2- Good fairness
3- It keeps efficiently and very low the cache coherence traffic
4- Very good fast path performance (it has the same performance as the
scalable MCS lock when there is contention.)
5- And it has a decent preemption tolerance.


this is how i am an "inventor", and i have also invented other scalable algorithms such as a scalable reference counting with efficient support for weak references, and i have invented a fully scalable Threadpool, and i have also invented a Fully scalable FIFO queue, and i have also invented other scalable algorithms and there inmplementations, and i think i will sell some of them to Microsoft or to
Google or Embarcadero or such software companies.


Read my following writing to know me more:

More about computing and parallel computing..

The important guaranties of Memory Safety in Rust are:

1- No Null Pointer Dereferences
2- No Dangling Pointers
3- No Buffer Overruns

I think i have solved Null Pointer Dereferences and also solved Dangling Pointers and also solved memory leaks for Delphi and Freepascal by inventing my "scalable" reference counting with efficient support for weak references and i have implemented it in Delphi and Freepascal (Read about it below), and reference counting in Rust and C++ is "not" scalable.

About the (3) above that is Buffer Overruns, read here about Delphi and Freepascal:

What's a buffer overflow and how to avoid it in Delphi?

read my above thoughts about it.


About Deadlock and Race conditions in Delphi and Freepascal:

I have ported DelphiConcurrent to Freepascal, and i have
also extended them with the support of my scalable RWLocks for Windows and Linux and with the support of my scalable lock called MLock for Windows and Linux and i have also added the support for a Mutex for Windows and Linux, please look inside the DelphiConcurrent.pas and FreepascalConcurrent.pas files inside the zip file to understand more.

You can download DelphiConcurrent and FreepascalConcurrent for Delphi and Freepascal from:

https://sites.google.com/site/scalable68/delphiconcurrent-and-freepascalconcurrent

DelphiConcurrent and FreepascalConcurrent by Moualek Adlene is a new way to build Delphi applications which involve parallel executed code based on threads like application servers. DelphiConcurrent provides to the programmers the internal mechanisms to write safer multi-thread code while taking a special care of performance and genericity.

In concurrent applications a DEADLOCK may occurs when two threads or more try to lock two consecutive shared resources or more but in a different order. With DelphiConcurrent and FreepascalConcurrent, a DEADLOCK is detected and automatically skipped - before he occurs - and the programmer has an explicit exception describing the multi-thread problem instead of a blocking DEADLOCK which freeze the application with no output log (and perhaps also the linked clients sessions if we talk about an application server).

Amine Moulay Ramdane has extended them with the support of his scalable RWLocks for Windows and Linux and with the support of his scalable lock called MLock for Windows and Linux and he has also added the support for a Mutex for Windows and Linux, please look inside the DelphiConcurrent.pas and FreepascalConcurrent.pas files to
understand more.

And please read the html file inside to learn more how to use it.


About race conditions now:

My scalable Adder is here..

As you have noticed i have just posted previously my modified versions of DelphiConcurrent and FreepascalConcurrent to deal with deadlocks in parallel programs.

But i have just read the following about how to avoid race conditions in Parallel programming in most cases..

Here it is:

https://vitaliburkov.wordpress.com/2011/10/28/parallel-programming-with-delphi-part-ii-resolving-race-conditions/

This is why i have invented my following powerful scalable Adder to help you do the same as the above, please take a look at its source code to understand more, here it is:

https://sites.google.com/site/scalable68/scalable-adder-for-delphi-and-freepascal

Other than that, about composability of lock-based systems now:

Design your systems to be composable. Among the more galling claims of the detractors of lock-based systems is the notion that they are somehow uncomposable:

“Locks and condition variables do not support modular programming,” reads one typically brazen claim, “building large programs by gluing together smaller programs[:] locks make this impossible.”9 The claim, of course, is incorrect. For evidence one need only point at the composition of lock-based systems such as databases and operating systems into larger systems that remain entirely unaware of lower-level locking.

There are two ways to make lock-based systems completely composable, and each has its own place. First (and most obviously), one can make locking entirely internal to the subsystem. For example, in concurrent operating systems, control never returns to user level with in-kernel locks held; the locks used to implement the system itself are entirely behind the system call interface that constitutes the interface to the system. More generally, this model can work whenever a crisp interface exists between software components: as long as control flow is never returned to the caller with locks held, the subsystem will remain composable.

Second (and perhaps counterintuitively), one can achieve concurrency and
composability by having no locks whatsoever. In this case, there must be
no global subsystem state—subsystem state must be captured in per-instance state, and it must be up to consumers of the subsystem to assure that they do not access their instance in parallel. By leaving locking up to the client of the subsystem, the subsystem itself can be used concurrently by different subsystems and in different contexts. A concrete example of this is the AVL tree implementation used extensively in the Solaris kernel. As with any balanced binary tree, the implementation is sufficiently complex to merit componentization, but by not having any global state, the implementation may be used concurrently by disjoint subsystems—the only constraint is that manipulation of a single AVL tree instance must be serialized.

Read more here:

https://queue.acm.org/detail.cfm?id=1454462

And about Message Passing Process Communication Model and Shared Memory Process Communication Model:

An advantage of shared memory model is that memory communication is faster as compared to the message passing model on the same machine.

Read the following to notice it:

Why did Windows NT move away from the microkernel?

"The main reason that Windows NT became a hybrid kernel is speed. A microkernel-based system puts only the bare minimum system components in the kernel and runs the rest of them as user mode processes, known as servers. A form of inter-process communication (IPC), usually message passing, is used for communication between servers and the kernel.

Microkernel-based systems are more stable than others; if a server crashes, it can be restarted without affecting the entire system, which couldn't be done if every system component was part of the kernel. However, because of the overhead incurred by IPC and context-switching, microkernels are slower than traditional kernels. Due to the performance costs of a microkernel, Microsoft decided to keep the structure of a microkernel, but run the system components in kernel space. Starting in Windows Vista, some drivers are also run in user mode."


More about message passing..

An advantage of shared memory model is that memory communication is faster as compared to the message passing model on the same machine.

Read the following to notice it:

"One problem that plagues microkernel implementations is relatively poor performance. The message-passing layer that connects
different operating system components introduces an extra layer of
machine instructions. The machine instruction overhead introduced
by the message-passing subsystem manifests itself as additional
execution time. In a monolithic system, if a kernel component needs
to talk to another component, it can make direct function calls
instead of going through a third party."

However, shared memory model may create problems such as synchronization and memory protection that need to be addressed.

Message passing’s major flaw is the inversion of control–it is a moral equivalent of gotos in un-structured programming (it’s about time somebody said that message passing is considered harmful).

Also some research shows that the total effort to write an MPI application is significantly higher than that required to write a shared-memory version of it.

And more about my scalable reference counting with efficient support for weak references:

My invention that is my scalable reference counting with efficient support for weak references version 1.37 is here..

Here i am again, i have just updated my scalable reference counting with
efficient support for weak references to version 1.37, I have just added a TAMInterfacedPersistent that is a scalable reference counted version,
and now i think i have just made it complete and powerful.

Because I have just read the following web page:

https://www.codeproject.com/Articles/1252175/Fixing-Delphis-Interface-Limitations

But i don't agree with the writting of the guy of the above web page, because i think you have to understand the "spirit" of Delphi, here is why:

A component is supposed to be owned and destroyed by something else, "typically" a form (and "typically" means in english: in "most" cases, and this is the most important thing to understand). In that scenario, reference count is not used.

If you pass a component as an interface reference, it would be very unfortunate if it was destroyed when the method returns.

Therefore, reference counting in TComponent has been removed.

Also because i have just added TAMInterfacedPersistent to my invention.

To use scalable reference counting with Delphi and FreePascal, just replace TInterfacedObject with my TAMInterfacedObject that is the scalable reference counted version, and just replace TInterfacedPersistent with my TAMInterfacedPersistent that is the scalable reference counted version, and you will find both my TAMInterfacedObject and my TAMInterfacedPersistent
inside the AMInterfacedObject.pas file, and to know how to use weak references please take a look at the demo that i have included called example.dpr and look inside my zip file at the tutorial about weak references, and to know how to use delegation take a look at the demo that i have included called test_delegation.pas, and take a look inside my zip file at the tutorial about delegation that learns you how to use delegation.

I think my Scalable reference counting with efficient support for
weak references is stable and fast, and it works on both Windows and Linux, and my scalable reference counting scales on multicore and NUMA systems, and you will not find it in C++ or Rust, and i don't think you will find it anywhere, and you have to know that this invention of mine solves the problem of dangling pointers and it solves the problem of memory leaks and my scalable reference counting is "scalable".

And please read the readme file inside the zip file that i have just
extended to make you understand more.

You can download my new scalable reference counting with efficient support for weak references version 1.37 from:

https://sites.google.com/site/scalable68/scalable-reference-counting-with-efficient-support-for-weak-references

And now i will talk about data dependency and parallel loops..

For a loop to be parallelized, every iteration must be independent of the others, one way to be sure of it is to execute the loop
in the direction of the incremented index of the loop and in the direction of the decremented index of the loop and verify if the results are the same. A data dependency happens if memory is modified: a loop has a data dependency if an iteration writes a variable that is read or write in another iteration of the loop. There is no data dependency if only one iteration reads or writes a variable or if many iterations read
the same variable without modifying it. So this is the "general" "rules".

Now there remains to know that you have for example to know how to construct the parallel for loop if there is an induction variable or if there is a reduction operation, i will give an example of them:

If we have the following (the code looks like Algol or modern Object Pascal):


IND:=0

For I:=1 to N
Do
Begin
IND := IND + 1;
A[I]:=B[IND];
End;


So as you are noticing since IND is an induction variable , so
to parallelize the loop you have to do the following:


For I:=1 to N
Do
Begin
IND:=(I*(I+1))/2;
A[I]:=B[IND];
End;


Now for the reduction operation example, you will notice that my invention that is my Threadpool with priorities that scales very well (
read about it below) supports a Parallel For that scales very well that supports "grainsize", and you will notice that the grainsize can be used in the ParallelFor() with a reduction operation and you will notice that my following powerful scalable Adder is also used in this scenario, here it is:

https://sites.google.com/site/scalable68/scalable-adder-for-delphi-and-freepascal


So here is the example with a reduction operation in modern Object Pascal:


TOTAL:=0.0
For I := 1 to N
Do
Begin
TOTAL:=TOTAL+A[I]
End;


So with my powerful scalable Adder and with my powerful invention that is my ParallelFor() that scales very well, you will parallelize the above like this:

procedure test1(j:integer;ptr:pointer);
begin

t.add(A[J]); // "t" is my scalable Adder object

end;

// Let's suppose that N is 100000
// In the following, 10000 is the grainsize

obj.ParallelFor(1,N,test1,10000,pointer(0));

TOTAL:=T.get();


And read the following to understand how to use grainsize of my Parallel for that scales well:


About my ParallelFor() that scales very well that uses my efficient Threadpool that scales very well:

With ParallelFor() you have to:

1- Ensure Sufficient Work

Each iteration of a loop involves a certain amount of work,
so you have to ensure a sufficient amount of the work,
read below about "grainsize" that i have implemented.

2- In OpenMP we have that:

Static and Dynamic Scheduling

One basic characteristic of a loop schedule is whether it is static or dynamic:

• In a static schedule, the choice of which thread performs a particular
iteration is purely a function of the iteration number and number of
threads. Each thread performs only the iterations assigned to it at the
beginning of the loop.

• In a dynamic schedule, the assignment of iterations to threads can
vary at runtime from one execution to another. Not all iterations are
assigned to threads at the start of the loop. Instead, each thread
requests more iterations after it has completed the work already
assigned to it.

But with my ParallelFor() that scales very well, since it is using my efficient Threadpool that scales very well, so it is using Round-robin scheduling and it uses also work stealing, so i think that this is sufficient.

Read the rest:

My Threadpool engine with priorities that scales very well is really powerful because it scales very well on multicore and NUMA systems, also it comes with a ParallelFor() that scales very well on multicores and NUMA systems.

You can download it from:

https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well


Here is the explanation of my ParallelFor() that scales very well:

I have also implemented a ParallelFor() that scales very well, here is the method:

procedure ParallelFor(nMin, nMax:integer;aProc: TParallelProc;GrainSize:integer=1;Ptr:pointer=nil;pmode:TParallelMode=pmBlocking;Priority:TPriorities=NORMAL_PRIORITY);

nMin and nMax parameters of the ParallelFor() are the minimum and maximum integer values of the variable of the ParallelFor() loop, aProc parameter of ParallelFor() is the procedure to call, and GrainSize integer parameter of ParallelFor() is the following:

The grainsize sets a minimum threshold for parallelization.

A rule of thumb is that grainsize iterations should take at least 100,000 clock cycles to execute.

For example, if a single iteration takes 100 clocks, then the grainsize needs to be at least 1000 iterations. When in doubt, do the following experiment:

1- Set the grainsize parameter higher than necessary. The grainsize is specified in units of loop iterations.

If you have no idea of how many clock cycles an iteration might take, start with grainsize=100,000.

The rationale is that each iteration normally requires at least one clock per iteration. In most cases, step 3 will guide you to a much smaller value.

2- Run your algorithm.

3- Iteratively halve the grainsize parameter and see how much the algorithm slows down or speeds up as the value decreases.

A drawback of setting a grainsize too high is that it can reduce parallelism. For example, if the grainsize is 1000 and the loop has 2000 iterations, the ParallelFor() method distributes the loop across only two processors, even if more are available.

And you can pass a parameter in Ptr as pointer to ParallelFor(), and you can set pmode parameter of to pmBlocking so that ParallelFor() is blocking or to pmNonBlocking so that ParallelFor() is non-blocking, and the Priority parameter is the priority of ParallelFor(). Look inside the test.pas example to see how to use it.


Thank you,
Amine Moulay Ramdane.


0 new messages