Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

More of my philosophy about non-linear regression and about technology and more of my thoughts..

2 views
Skip to first unread message

Amine Moulay Ramdane

unread,
Nov 3, 2022, 2:52:17 PM11/3/22
to
Hello,


More of my philosophy about non-linear regression and about technology and more of my thoughts..

I am a white arab, and i think i am smart since i have also
invented many scalable algorithms and algorithms..


I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, so the R-squared is invalid for non-linear regression, so you have to use the standard error of the estimate (Mean Square Error), and of course you have to calculate the Relative standard error that is the standard deviation of the mean of the sample divide by the Estimate that is the mean of the sample, and i think that the Relative standard Error is
an important thing that brings more quality to the statistical calculations, and i will now talk to you more about my interesting software project for mathematics, so my new software project uses artificial intelligence to implement a generalized way with artificial intelligence using the software that permit to solve the non-linear "multiple" regression, and it is much more powerful than Levenberg–Marquardt algorithm , since i am implementing a smart algorithm using artificial intelligence that permits to avoid premature
convergence, and it is also one of the most important thing, and
it will also be much more scalable using multicores so that to search with artificial intelligence much faster the global optimum, so i am
doing it this way so that to be professional and i will give you a tutorial that explains my algorithms that uses artificial intelligence so that you learn from them, and of course it will automatically calculate the above Standard error of the estimate and the Relative standard Error.

More of my philosophy about non-linear regression and more..

I think i am really smart, and i have also just finished quickly the software implementation of Levenberg–Marquardt algorithm and of the Simplex algorithm to solve non-linear least squares problems, and i will soon implement a generalized way with artificial intelligence using the software that permit to solve the non-linear "multiple" regression, but i have also noticed that in mathematics you have to take care of the variability of the y in non-linear least squares problems so that to approximate, also the Levenberg–Marquardt algorithm (LMA or just LM) that i have just implemented , also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The Levenberg–Marquardt algorithm is used in many software applications for solving generic curve-fitting problems. The Levenberg–Marquardt algorithm was found to be an efficient, fast and robust method which also has a good global convergence property. For these reasons, It has been incorporated into many good commercial packages performing non-linear regression. But my way of implementing the non-linear "multiple" regression in the software will be much more powerful than Levenberg–Marquardt algorithm, and of course i will share with you many parts of my software project, so stay tuned !

More of my philosophy about C# and Delphi and about technology and more of my thoughts..


I invite you to read the following article:

Why C# coders should shut up about Delphi

Read more here:

https://jonlennartaasenden.wordpress.com/2016/10/18/why-c-coders-should-shut-up-about-delphi/


More of my philosophy about Delphi and Freepascal compilers and about technology and more of my thoughts..

According to the 20th edition of the State of the Developer Nation report, there were 26.8 million active software developers in the world at the end of 2021, and from the following article on Delphi programming language, there is 3.25% Delphi software developers this year, so it is around 1 million Delphi software developers in 2022. Here are the following articles, read them carefully:

https://blogs.embarcadero.com/considerations-on-delphi-and-the-stackoverflow-2022-developer-survey/

And read here:

https://www.future-processing.com/blog/how-many-developers-are-there-in-the-world-in-2019/


I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, of course that i have already programmed in C++, for example you can download my Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well from my website here:

https://sites.google.com/site/scalable68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library

But why do you think i am also programming in Delphi and Freepascal ?

Of course that Delphi and Freepascal compilers support modern Object Pascal, it is not only Pascal, but it is modern Object Pascal, i mean
that modern Object Pascal of for example Delphi and Freepascal support object oriented programming and support Anonymous methods or typed Lambdas , so i think that it is a decent programming language, even if i know that the new C++ 20 supports generic Lambdas and templated Lambdas, but i think that Delphi will soon also support generic Lambdas, and in Delphi and Freepascal compilers there is no big runtime like in C# and such compilers, so you get small native executables in Delphi
and Freepascal, and inline assembler is supported by both Delphi
and Freepascal, and Lazarus the IDE of Freepascal and Delphi come
both with one of the best GUI tools, and of course you can make .SO, .DLL, executables, etc. in both Delphi and Freepascal, and both Delphi
and Freepascal compilers are Cross platform to Windows, Linux and Mac
and Android etc. , and i think that modern Object Pascal of Delphi
or Freepascal is more strongly typed than C++ , but less strongly typed than ADA programming language, but i think that modern Object Pascal of Delphi and Freepascal are not Strict as the programming language ADA and are not strict as the programming language Rust or the pure functional programming languages, so it can also be flexible and advantageous to not be this kind of strictness, and the compilation times of Delphi is extremely fast , and of course Freepascal supports the Delphi mode so that to be compatible with Delphi and i can go on and on, and it is why i am also programming in Delphi and Freepascal.

And you can read about the last version 11.2 of Delphi from here:

https://www.embarcadero.com/products/delphi

And you can read about Freepascal and Lazarus from here:

https://www.freepascal.org/

https://www.lazarus-ide.org/


More of my philosophy about Asynchronous programming and about the futures and about the ActiveObject and about technology and more of my thoughts..

I am a white arab, and i think i am smart since i have also
invented many scalable algorithms and algorithms..


I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, i think from my new implementation of
future below, you can notice that Asynchronous programming is not a simple task, since it can get too much complicated , since you can
notice in my implementation below that if i make the starting of the thread of the future out of the constructor and if i make the passing of the parameter as a pointer to the future out of the constructor , it
will get more complex to get the automaton of how to use
and call the methods right and safe, so i think that there is
still a problem with Asynchronous programming and it is that
when you have many Asynchronous tasks or threads it can get
really complex, and i think that it is the weakness of Asynchronous programming, and of course i am also speaking of the implementation
of a sophisticated ActiveObject or a future or complex Asynchronous programming.


More of my philosophy about my new updated implementation of a future and about the ActiveObject and about technology and more of my thoughts..



I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, so i have just updated my implementation
of a future, and now both the starting the thread of the future and the passing the parameter as a pointer to the future is made from the constructor so that to make safe the system of the automaton of the how to use and call the methods, and I have just added support for exceptions, so you have to know that programming with futures is asynchronous programming, but so that to be robust the future implementation has to deal correctly with "exceptions", so in my implementation of a future when an exception is raised inside the future you will receive the exception, so i have implemented two things: The HasException() method so that to detect the exception from inside the future, and the the exception and its address is returned as a string in the ExceptionStr property, and my implementation of a future does of course support passing parameters as a pointer to the future, also my implementation of a future works in Windows and Linux, and of course you can also use my following more sophisticated Threadpool engine with priorities as a sophisticated ActiveObject or such and pass the methods or functions and there parameters to it, here it is:

Threadpool engine with priorities

https://sites.google.com/site/scalable68/threadpool-engine-with-priorities

And stay tuned since i will enhance more my above Threadpool engine with priorities.

So you can download my new updated portable and efficient implementation of a future in Delphi and FreePascal version 1.32 from my website here:

https://sites.google.com/site/scalable68/a-portable-and-efficient-implementation-of-a-future-in-delphi-and-freepascal


And here is a new example program of how to use my implementation of a future in Delphi and Freepascal and notice that the interface has changed a little bit:


--

program TestFuture;

uses system.SysUtils, system.Classes, Futures;

type

TTestFuture1 = class(TFuture)
public
function Compute(ptr:pointer): Variant; override;
end;

TTestFuture2 = class(TFuture)
public
function Compute(ptr:pointer): Variant; override;
end;

var obj1:TTestFuture1;
obj2:TTestFuture2;
a:variant;


function TTestFuture1.Compute(ptr:pointer): Variant;
begin

raise Exception.Create('I raised an exception');

end;

function TTestFuture2.Compute(ptr:pointer): Variant;
begin

writeln(nativeint(ptr));
result:='Hello world !';

end;


begin

writeln;

obj1:=TTestFuture1.create(pointer(12));

if obj1.GetValue(a) then writeln(a)
else if obj1.HasException then writeln(obj1.ExceptionStr);

obj1.free;

writeln;

obj2:=TTestFuture2.create(pointer(12));


if obj2.GetValue(a) then writeln(a);

obj2.free;

end.

---



More of my philosophy about quantum computing and about matrix operations and about scalability and more of my thoughts..


I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, i have just looked at the following
video about the powerful parallel quantum computer of IBM from USA that will be soon available in the cloud, and i invite you to look at it:

Quantum Computing: Now Widely Available!

https://www.youtube.com/watch?v=laqpfQ8-jFI


But i have just read the following paper and it is saying that the powerful Quantum algorithms for matrix operations and linear systems of equations are available, read about them on the below paper, so as you notice in the following paper that many matrix operations and also the linear systems of equations solver can be done in a quantum computer, read about it here in the following paper:

Quantum algorithms for matrix operations and linear systems of equations

Read more here:

https://arxiv.org/pdf/2202.04888.pdf


So i think that IBM will do the same for there powerful parallel quantum computer that will be available in the cloud, but i think that you will have to pay for it of course since i think it will be commercial, but i think that there is a weakness with this kind of configuration of the powerful parallel quantum computer from IBM, since the cost of bandwidth of internet is exponentially decreasing , but the latency of accessing the internet is not, so it is why i think that people will still use classical computers for many mathematical applications that uses mathematical operations such as matrix operations and linear systems of equations etc. that needs a much faster latency, other than that Moore's law will still be effective in classical computers since it will permit us to have really powerful classical computer at a low cost and it will be really practical since the quantum computer is big in size and not so practical, so read about the two inventions below that will make logic gates thousands of times faster or a million times faster than those in existing computers so that to notice it, so i think that the business of classical computers will still be great in the future even with the coming of the powerful parallel quantum computer of IBM, so as you notice this kind of business is not only dependent on Moore's law and Bezos’ Law , but it is also dependent on the latency of accessing internet, so read my following thoughts about Moore’s law and about Bezos’ Law:


More of my philosophy about Moore’s law and about Bezos’ Law..

For RAM chips and flash memory, Moore's Law means that in eighteen months you'll pay the same price as today for twice as much storage.
But other computing components are also seeing their price versus performance curves skyrocket exponentially. Data storage doubles every twelve months.

More about Moore’s law and about Bezos’ Law..

"Parallel code is the recipe for unlocking Moore’s Law"

And:

"BEZOS’ LAW

The Cost of Cloud Computing will be cut in half every 18 months - Bezos’ Law

Like Moore’s law, Bezos’ Law is about exponential improvement over time. If you look at AWS history, they drop prices constantly. In 2013 alone they’ve already had 9 price drops. The difference; however, between Bezos’ and Moore’s law is this: Bezos’ law is the first law that isn’t anchored in technical innovation. Rather, Bezos’ law is anchored in confidence and market dynamics, and will only hold true so long as Amazon is not the aggregate dominant force in Cloud Computing (50%+ market share). Monopolies don’t cut prices."

More of my philosophy about latency and contention and concurrency and parallelism and more of my thoughts..

I think i am highly smart and i have just posted, read it below,
about the new two inventions that will make logic gates thousands of times faster or a million times faster than those in existing computers,
and i think that there is still a problem with those new inventions,
and it is about the latency and concurrency, since you need concurrency
and you need preemptive or non-preemptive scheduling of the coroutines ,
so since the HBM is 106.7 ns in latency and the DDR4 is 73.3 ns in latency and the AMD 3D V-Cache has also almost the same cost in latency, so as you notice that this kind of latency is still costly , also there is a latency that is the Time slice that takes a coroutine to execute and it is costly in latency, since this kind of latency and Time slice is a waiting time that looks like the time wasted in a contention in parallelism, so by logical analogy this kind of latency and Time slice create like a contention like in parallelism that reduces scalability, so i think it is why those new inventions have this kind of limit or constraints in a "concurrency" environment.

And i invite you to read my following smart thoughts about preemptive and non-preemptive timesharing:

https://groups.google.com/g/alt.culture.morocco/c/JuC4jar661w


More of my philosophy about Fastest-ever logic gates and more of my thoughts..

"Logic gates are the fundamental building blocks of computers, and researchers at the University of Rochester have now developed the fastest ones ever created. By zapping graphene and gold with laser pulses, the new logic gates are a million times faster than those in existing computers, demonstrating the viability of “lightwave electronics.”. If these kinds of lightwave electronic devices ever do make it to market, they could be millions of times faster than today’s computers. Currently we measure processing speeds in Gigahertz (GHz), but these new logic gates function on the scale of Petahertz (PHz). Previous studies have set that as the absolute quantum limit of how fast light-based computer systems could possibly get."

Read more here:

https://newatlas.com/electronics/fastest-ever-logic-gates-computers-million-times-faster-petahertz/

Read my following news:

And with the following new discovery computers and phones could run thousands of times faster..

Prof Alan Dalton in the School of Mathematical and Physics Sciences at the University of Sussex, said:

"We're mechanically creating kinks in a layer of graphene. It's a bit like nano-origami.

"Using these nanomaterials will make our computer chips smaller and faster. It is absolutely critical that this happens as computer manufacturers are now at the limit of what they can do with traditional semiconducting technology. Ultimately, this will make our computers and phones thousands of times faster in the future.

"This kind of technology -- "straintronics" using nanomaterials as opposed to electronics -- allows space for more chips inside any device. Everything we want to do with computers -- to speed them up -- can be done by crinkling graphene like this."

Dr Manoj Tripathi, Research Fellow in Nano-structured Materials at the University of Sussex and lead author on the paper, said:

"Instead of having to add foreign materials into a device, we've shown we can create structures from graphene and other 2D materials simply by adding deliberate kinks into the structure. By making this sort of corrugation we can create a smart electronic component, like a transistor, or a logic gate."

The development is a greener, more sustainable technology. Because no additional materials need to be added, and because this process works at room temperature rather than high temperature, it uses less energy to create.

Read more here:

https://www.sciencedaily.com/releases/2021/02/210216100141.htm


But I think that mass production of graphene still hasn't quite begun,
so i think the inventions above of the Fastest-ever logic gates that
uses graphene and of the one with nanomaterials that uses graphene will not be commercialized fully until perhaps around year 2035 or 2040 or so, so read the following so that to understand why:

"Because large-scale mass production of graphene still hasn't quite begun , the market is a bit limited. However, that leaves a lot of room open for investors to get in before it reaches commercialization.

The market was worth $78.7 million in 2019 and, according to Grand View Research, is expected to rise drastically to $1.08 billion by 2027.

North America currently has the bulk of market share, but the Asia-Pacific area is expected to have the quickest growth in adoption of graphene uses in coming years. North America and Europe are also expected to have above-market average growth.

The biggest driver of all this growth is expected to be the push for cleaner, more efficient energy sources and the global reduction of emissions in the air."

Read more here:

https://www.energyandcapital.com/report/the-worlds-next-rare-earth-metal/1600

And of course you can read my thoughts about technology in the following web link:

https://groups.google.com/g/soc.culture.usa/c/N_UxX3OECX4


More of my philosophy about matrix-matrix multiplication and about scalability and more of my thoughts..


I think that the time complexity of the Strassen algorithm for matrix-matrix multiplication is around O(N^2.8074), and the time complexity of the naive algorithm is O(N^3) , so it is not a significant difference, so i think i will soon implement the parallel Blocked matrix-matrix multiplication and i will implement it with a new algorithm that also uses intel AVX512 and that uses fused multiply-add and of course it will use the assembler instructions below of prefetching into caches so that to gain a 22% speed, so i think that overall it will have around the same speed as parallel BLAS, and i say that Pipelining greatly increases throughput in modern CPUs such as x86 CPUs, and another common pipelining scenario is the FMA or fused multiply-add, which is a fundamental part of the instruction set for some processors. The basic load-operate-store sequence simply lengthens by one step to become load-multiply-add-store. The FMA is possible only if the hardware supports it, as it does in the case of the Intel Xeon Phi, for example, as well as in Skylake etc.

More of my philosophy about matrix-vector multiplication of large matrices and about scalability and more of my thoughts..

The matrix-vector multiplication of large matrices is completly limited by the memory bandwidth as i have just said it, read it below, so vector extensions like using SSE or AVX are usually not necessary for matrix-vector multiplication of large matrices. It is interesting that
matrix-matrix-multiplications don't have these kind of problems with memory bandwidth. Companies like Intel or AMD typically usually show benchmarks of matrix-matrix multiplications and they show how nice they scale on many more cores, but they never show matrix-vector multiplications, and notice that my Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well is also memory-bound and the matrices for it are usually big, but my new algorithm of it is efficiently cache-aware and efficiently NUMA-aware, and i have implemented it for the dense and sparse matrices.

More of my philosophy about the efficient Matrix-Vector multiplication algorithm in MPI and about scalability and more of my thoughts..

Matrix-vector multiplication is an absolutely fundamental operation, with countless applications in computer science and scientific computing. Efficient algorithms for matrix-vector multiplication are of paramount importance, and notice that for matrix-vector multiplication, n^2 time is certainly required for an n × n dense matrix, but you have to be smart, since in MPI computing for also the supercomputer exascale systems, doesn't only take into account this n^2 time, since it has to also be efficiently be cache-aware, and it has to also have a good complexity for the how much memory is used by the parallel processes in MPI, since notice carefully with me that you have also to not send both a row of the matrix and the vector the the parallel processes of MPI, but you have to know how to reduce efficiently this complexity by for example dividing each row of the matrix and by dividing the vector and sending a part of the row of the matrix and a part of the vector to the parallel processes of MPI, and i think that in an efficient algorithm for Matrix-Vector multiplication, time for addition is dominated by the communication time, and of course that my implementation of my Powerful Open source software of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well is also smart, since it is efficiently cache-aware and efficiently NUMA-aware, and it implements both the dense and the sparse, and of course as i am showing below, it is scaling well on the memory channels, so it is scaling well in my 16 cores dual Xeon with 8 memory channels as i am showing below, and it will scale well on 16 sockets HPE NONSTOP X SYSTEMS or the 16 sockets HPE Integrity Superdome X with above 512 cores and with 64 memory channels, so i invite you to read carefully and to download my Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well from my website here:

https://sites.google.com/site/scalable68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library

MPI will continue to be a viable programming model on exascale supercomputer systems, so i will soon implement many algorithms in MPI for Delphi and Freepascal and i will provide you with them, i am currently implementing an efficient Matrix-Vector multiplication algorithm in MPI
and you have to know that an efficient Matrix-Vector multiplication algorithm is really important for scientific applications, and of course i will also soon implement many other interesting algorithms in MPI for Delphi and Freepascal and i will provide you with them, so stay tuned !
More of my philosophy about the memory bottleneck and about scalability
and more of my thoughts..

I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, and I am also specialized in parallel computing, and i know that the large cache can reduce Amdahl’s Law bottleneck – main memory, but you have to understand what i am saying, since my Open source project below of my Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well is also memory-bound and the matrices for it are usually big, and since also the sparse linear system solvers are ubiquitous in high performance computing (HPC) and often are the most computational intensive parts in scientific computing codes. A few of the many applications relying on sparse linear solvers include fusion energy simulation, space weather simulation, climate modeling, and environmental modeling, and finite element method, and large-scale reservoir simulations to enhance oil recovery by the oil and gas industry. So it is why i am speaking about the how many memory channels comes in the 16 sockets HPE NONSTOP X SYSTEMS or the 16 sockets HPE Integrity Superdome X, so as you notice that they can come with more than 512 cores and with 64 memory channels. Also i have just benchmarked my Scalable Varfiler and it is scaling above 7x on my 16 cores Dual Xeon processor, and it is scaling well since i have 8 memory channels, and i invite you to look at my powerful Scalable Varfiler carefully in the following web link:

https://sites.google.com/site/scalable68/scalable-parallel-varfiler


More of my philosophy about the how many memory channels in the 16 sockets HPE NONSTOP X SYSTEMS and more of my thoughts..

I think i was right by saying that the 16 sockets HPE NONSTOP X SYSTEMS or the 16 sockets HPE Integrity Superdome X have around 2 to 4 memory channels per socket on x86 with Intel Xeons, and it means that they have 32 or 64 memory channels.

You can read here the FAQ from Hewlett Packard Enterprise from USA so that to notice it:

https://bugzilla.redhat.com/show_bug.cgi?id=1346327

And it says the following:

"How many memory channels per socket for specific CPU?

Each of the 8 blades has 2 CPU sockets.
Each CPU socket has 2 memory channels each connecting to 2 memory controllers that contain 6 Dimms each."

So i think that it can also support 4 memory channels per CPU socket with Intel Xeons.


More of my philosophy about the highest availability with HPE NONSTOP X SYSTEMS from Hewlett Packard Enterprise from USA and more of my thoughts..


I have just talked, read it below, about the 16 sockets HPE Integrity Superdome X from Hewlett Packard Enterprise from USA, but so that
to be the highest "availability" on x86 architecture, i advice you to buy the
16 sockets HPE NONSTOP X SYSTEMS from Hewlett Packard Enterprise from USA, and read about it here:

https://www.hpe.com/hpe-external-resources/4aa4-2000-2999/enw/4aa4-2988?resourceTitle=Engineered+for+the+highest+availability+with+HPE+Integrity+NonStop+family+of+systems+brochure&download=true

And here is more of my thoughts about the history of HP NonStop on x86:

More of my philosophy about HP and about the Tandem team and more of my thoughts..

I invite you to read the following interesting article so that
to notice how HP was smart by also acquiring Tandem Computers, Inc.
with there "NonStop" systems and by learning from the Tandem team
that has also Extended HP NonStop to x86 Server Platform, you can read about it in my below writing and you can read about Tandem Computers here: https://en.wikipedia.org/wiki/Tandem_Computers , so notice that Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss:

https://www.zdnet.com/article/tandem-returns-to-its-hp-roots/

More of my philosophy about HP "NonStop" to x86 Server Platform fault-tolerant computer systems and more..

Now HP to Extend HP NonStop to x86 Server Platform

HP announced in 2013 plans to extend its mission-critical HP NonStop technology to x86 server architecture, providing the 24/7 availability required in an always-on, globally connected world, and increasing customer choice.

Read the following to notice it:

https://www8.hp.com/us/en/hp-news/press-release.html?id=1519347#.YHSXT-hKiM8

And today HP provides HP NonStop to x86 Server Platform, and here is
an example, read here:

https://www.hpe.com/ca/en/pdfViewer.html?docId=4aa5-7443&parentPage=/ca/en/products/servers/mission-critical-servers/integrity-nonstop-systems&resourceTitle=HPE+NonStop+X+NS7+%E2%80%93+Redefining+continuous+availability+and+scalability+for+x86+data+sheet

So i think programming the HP NonStop for x86 is now compatible with x86 programming.


More of my philosophy about the 16 sockets HPE Integrity Superdome X from Hewlett Packard Enterprise from USA and more of my thoughts..

I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, so i think that parallel programming with memory on Intel's CXL will be different than parallel programming the many memory channels and on many sockets, so i think so that to scale much more the memory channels on many sockets and be compatible, i advice you to for example buy the 16 sockets HPE Integrity Superdome X from Hewlett Packard Enterprise from USA here:

https://cdn.cnetcontent.com/3b/dc/3bdcd896-f2b4-48e4-bbf6-a75234db25da.pdf

And i am sure that my below Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well will work correctly on the 16 sockets HPE Superdome X.

More of my philosophy about the future of system memory and more of thoughts..

Here is the future of system memory of how to scale like with many more memory channels:

THE FUTURE OF SYSTEM MEMORY IS MOSTLY CXL

Read more here:

https://www.nextplatform.com/2022/07/05/the-future-of-system-memory-is-mostly-cxl/


So i think the way to parallel programming in the standard Intel’s CXL will look like parallel programming with many memory channels as i am
doing it below with my Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well.

More of my philosophy about x86 CPUs and about cache prefetching and more of my thoughts..


I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, and today i will talk about the how to prefetch data into the caches on x86 microprocessors:

So here my following delphi and freepascal x86 inline assembler procedures that prefetch data into the caches:

So for 32 bit Delphi and Freepascal compilers, here is how to prefetch data into the level 1 cache and notice that, in delphi and freepascal compilers, when we pass the first parameter of the procedure with a register convention, it will be passed on CPU register eax of the x86 microprocessor:

procedure Prefetch(p : pointer); register;
asm
prefetchT1 byte ptr [eax]
end;


For 64 bit Delphi and Freepascal compilers, here is how to prefetch data into the level 1 cache and notice that, in delphi and freepascal compilers, when we pass the first parameter of the procedure with a register convention, it will be passed on CPU register rcx of the x86 microprocessor:

procedure Prefetch(p : pointer); register;
asm
prefetchT1 byte ptr [rcx]
end;


And you can request a loading of 256 bytes in advance into the caches, and it can be efficient, by doing this:

So for 32 bit Delphi and Freepascal compilers you do this:


procedure Prefetch(p : pointer); register;
asm
prefetchT1 byte ptr [eax]+256
end;


So for 64 bit Delphi and Freepascal compilers you do this:


procedure Prefetch(p : pointer); register;
asm
prefetchT1 byte ptr [rcx]+256
end;


So you can also prefetch into level 0 and level 2 caches with the x86 assembler instruction prefetchT0 and prefetchT2, so just replace, in the above inline assembler procedures, prefetchT1 with prefetchT0 or prefetchT2, but i think i am highly smart and i say that notice that those prefetch x86 assembler instructions are used since also the microprocessor can be faster than memory, so then you have to understand that today, the story is much nicer, since the powerful x86 processor cores can all sustain many memory requests, and we call this process: "memory-level parallelism", and today x86 AMD or Intel processor cores could support more than 10 independent memory requests at a time, so for example Graviton 3 ARM CPU appears to sustain about 19 simultaneous memory loads per core against about 25 for the Intel processor, so then i think i can also say that this memory-level parallelism looks like using latency hiding so that to speed the things more so that the CPU doesn't wait too much for memory.

And now i invite you to read more of my thoughts about stack memory allocations and about preemptive and non-preemptive timesharing in the following web link:

https://groups.google.com/g/alt.culture.morocco/c/JuC4jar661w


And more of my philosophy about Stacktrace and more of my thoughts..

I think i am highly smart, and i say that there is advantages and disadvantages to portability in software programming , for example you can make your application run just in Windows operating system and it can be much more business friendly than making it run in multiple operating systems, since in business you have for example to develop and sell your application faster or much faster than the competition, so then we can not say that the tendency of C++ to requiring portability is a good thing.

Other than that i have just looked at Delphi and Freepascal and
i have just noticed that the Stacktrace in Freepascal is much more enhanced than Delphi, since look for example at the following application of Freepascal that has made Stacktrace portable to different operating systems and CPU architectures , and it is a much more enhanced stacktrace that is better than the Delphi ones that run just in Windows:

https://github.com/r3code/lazarus-exception-logger

But notice carefully that the Delphi ones run just in Windows:

https://docwiki.embarcadero.com/Libraries/Sydney/en/System.SysUtils.Exception.StackTrace


So i think that since a much more enhanced Stacktrace is important,
so i think that Delphi needs to provide us with a portable one to different operating systems and CPU architectures.

Also the Free Pascal Developer team is pleased to finally announce the addition of a long awaited feature, though to be precise it's two different, but very much related features: Function References and Anonymous Functions. These two features can be used independantly of each other, but their greatest power they unfold when used together.

Read about it here:

https://forum.lazarus.freepascal.org/index.php/topic,59468.msg443370.html#msg443370

More of my philosophy about the AMD Epyc CPU and more of my thoughts..

I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, if you want to be serious about buying
a CPU and motherboard, i advice you to buy the following AMD Epyc 7313p Milan 16 cores CPU that costs much less(around 1000 US dollars) and that is reliable and fast, since it is a 16 cores CPU and it supports standard ECC memory and it supports 8 memory channels, here it is:

https://en.wikichip.org/wiki/amd/epyc/7313p

And the good Supermicro motherboard for it that supports the Epyc Milan 7003 is the following:

https://www.newegg.com/supermicro-mbd-h12ssl-nt-o-supports-single-amd-epyc-7003-7002-series-processor/p/1B4-005W-00911?Description=amd%20epyc%20motherboard&cm_re=amd_epyc%20motherboard-_-1B4-005W-00911-_-Product


And the above AMD Epyc 7313p Milan 16 cores CPU can be configured
as NUMA using the good Supermicro motherboard above as following:

This setting enables a trade-off between minimizing local memory latency for NUMAaware or highly parallelizable workloads vs. maximizing per-core memory bandwidth for non-NUMA-friendly workloads. The default configuration (one NUMA domain per socket) is recommended for most workloads. NPS4 is recommended for HPC and other highly parallel workloads.Here is the detail introduction for such options:

• NPS0: Interleave memory accesses across all channels in both sockets (not recommended)

• NPS1: Interleave memory accesses across all eight channels in each socket, report one NUMA node per socket (unless L3 Cache as NUMA is enabled)

• NPS2: Interleave memory accesses across groups of four channels (ABCD and EFGH) in each socket, report two NUMA nodes per socket (unless L3 Cache as NUMA is enabled)

• NPS4: Interleave memory accesses across pairs of two channels (AB, CD, EF and GH) in each socket, report four NUMA nodes per socket (unless L3 Cache as NUMA is enabled)


And of course you have to read my following writing about DDR5 memory that is not a fully ECC memory:

"On-die ECC: The presence of on-die ECC on DDR5 memory has been the subject of many discussions and a lot of confusion among consumers and the press alike. Unlike standard ECC, on-die ECC primarily aims to improve yields at advanced process nodes, thereby allowing for cheaper DRAM chips. On-die ECC only detects errors if they take place within a cell or row during refreshes. When the data is moved from the cell to the cache or the CPU, if there’s a bit-flip or data corruption, it won’t be corrected by on-die ECC. Standard ECC corrects data corruption within the cell and as it is moved to another device or an ECC-supported SoC."

Read more here to notice it:

https://www.hardwaretimes.com/ddr5-vs-ddr4-ram-quad-channel-and-on-die-ecc-explained/


So if you want to get serious and professional you can buy the above
AMD Epyc 7313p Milan 16 cores CPU with the Supermicro motherboard that supports it and that i am advicing and that supports the fully ECC memory and that supports 8 memory channels.

And of course you can read my thoughts about technology in the following web link:

https://groups.google.com/g/soc.culture.usa/c/N_UxX3OECX4


And of course you have to read my following thoughts that also show how
powerful is to use 8 memory channels:



I have just said the following:

--

More of my philosophy about the new Zen 4 AMD Ryzen™ 9 7950X and more of my thoughts..


So i have just looked at the new Zen 4 AMD Ryzen™ 9 7950X CPU, and i invite you to look at it here:

https://www.amd.com/en/products/cpu/amd-ryzen-9-7950x

But notice carefully that the problem is with the number of supported memory channels, since it just support two memory channels, so it is not good, since for example my following Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well is scaling around 8X on my 16 cores Intel Xeon with 2 NUMA nodes and with 8 memory channels, but it will not scale correctly on the
new Zen 4 AMD Ryzen™ 9 7950X CPU with just 2 memory channels since it is also memory-bound, and here is my Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well and i invite you to take carefully a look at it:

https://sites.google.com/site/scalable68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library

So i advice you to buy an AMD Epyc CPU or an Intel Xeon CPU that supports 8 memory channels.

---


And of course you can use the next Twelve DDR5 Memory Channels for Zen 4 AMD EPYC CPUs so that to scale more my above algorithm, and read about it here:

https://www.tomshardware.com/news/amd-confirms-12-ddr5-memory-channels-on-genoa


And here is the simulation program that uses the probabilistic mechanism that i have talked about and that prove to you that my algorithm of my Parallel C++ Conjugate Gradient Linear System Solver Library is scalable:

If you look at my scalable parallel algorithm, it is dividing the each array of the matrix by 250 elements, and if you look carefully i am using two functions that consumes the greater part of all the CPU, it is the atsub() and asub(), and inside those functions i am using a probabilistic mechanism so that to render my algorithm scalable on NUMA architecture , and it also make it scale on the memory channels, what i am doing is scrambling the array parts using a probabilistic function and what i have noticed that this probabilistic mechanism is very efficient, to prove to you what i am saying , please look at the following simulation that i have done using a variable that contains the number of NUMA nodes, and what i have noticed that my simulation is giving almost a perfect scalability on NUMA architecture, for example let us give to the "NUMA_nodes" variable a value of 4, and to our array a value of 250, the simulation bellow will give a number of contention points of a quarter of the array, so if i am using 16 cores , in the worst case it will scale 4X throughput on NUMA architecture, because since i am using an array of 250 and there is a quarter of the array of contention points , so from the Amdahl's law this will give a scalability of almost 4X throughput on four NUMA nodes, and this will give almost a perfect scalability on more and more NUMA nodes, so my parallel algorithm is scalable on NUMA architecture and it also scale well on the memory channels,

Here is the simulation that i have done, please run it and you will notice yourself that my parallel algorithm is scalable on NUMA architecture.

Here it is:

---
program test;

uses math;

var tab,tab1,tab2,tab3:array of integer;
a,n1,k,i,n2,tmp,j,numa_nodes:integer;
begin

a:=250;
Numa_nodes:=4;

setlength(tab2,a);

for i:=0 to a-1
do
begin

tab2:=i mod numa_nodes;

end;

setlength(tab,a);

randomize;

for k:=0 to a-1
do tab:=k;

n2:=a-1;

for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab;
tab:=tab[n1];
tab[n1]:=tmp;
end;

setlength(tab1,a);

randomize;

for k:=0 to a-1
do tab1:=k;

n2:=a-1;

for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab1;
tab1:=tab1[n1];
tab1[n1]:=tmp;
end;

for i:=0 to a-1
do
if tab2[tab]=tab2[tab1] then
begin
inc(j);
writeln('A contention at: ',i);

end;

writeln('Number of contention points: ',j);
setlength(tab,0);
setlength(tab1,0);
setlength(tab2,0);
end.
---



And i invite you to read my thoughts about technology here:

https://groups.google.com/g/soc.culture.usa/c/N_UxX3OECX4

More of my philosophy about the problem with capacity planning of a website and more of my thoughts..

I think i am highly smart since I have passed two certified IQ tests and i have scored above 115 IQ, and i have just invented a new methodology
that simplifies a lot capacity planning of a website that can be of a
three-tier architecture with the web servers and with the applications servers and with the database servers, but i have to explain more so that you understand the big problem with capacity planning of a website, so when you want to for example to use web testing, the problem is
how to choose for example the correct distribution of the read and write and delete transactions on the database of a website ? so if it is not
realistic you can go beyond the knee of the curve and get a not acceptable waiting time, and the Mean value analysis (MVA) algorithm has
the same problem, so how to solve the problem ? so as you are noticing
it is why i have come with my new methodology that uses mathematics that
solves the problem. And read my previous thoughts:


More of my philosophy about website capacity planning and about Quality of service and more of my thoughts..

I think i am highly smart since I have passed two certified IQ tests and i have scored above 115 IQ, so i think that you have to lower to a certain level the QoS (quality of service) of a website, since you have to fix the limit of the number of connections that we allow to the website so that to not go beyond the knee of the curve, and of course i will soon show you my mathematical calculations of my new methodology of how to do capacity planning of a website, and of course
you have to know that that we have to do capacity planning using
mathematics so that to know the average waiting time etc. and this
permits us to calculate the number of connections that we allow to the website.

More of my philosophy about the Mean value analysis (MVA) algorithm and more of my thoughts..


I think i am highly smart since I have passed two certified IQ tests and i have scored above 115 IQ, and i have just read the following paper
about the Mean value analysis (MVA) algorithm, and i invite you to read it carefully:

https://www.cs.ucr.edu/~mart/204/MVA.pdf


But i say that i am understanding easily the above paper of Mean value analysis (MVA) algorithm, but i say that the above paper doesn't say that since you have to empirically collect the visit ratio and and the average demand of each class, so it is not so practical, since i say that you can and you have for example to calculate the "tendency" by also for example rendering the not memoryless service of for example the database to a memoryless service, but don't worry since i will soon make you understand my powerful methodology with all the mathematical calculations that easy for you the job and that makes it much more practical.

More of my philosophy about formal methods and about Leslie Lamport and more of my thoughts..

I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, and I have just looked at the following video about the man who revolutionized computer science with math, and i invite you to look at it:

https://www.youtube.com/watch?v=rkZzg7Vowao

So i say that in mathematics, a conjecture is a conclusion or a proposition that is proffered on a tentative basis without proof. And Leslie Lamport the known scientist is saying in the above video the following: "An algorithm without a proof is conjecture, and if you are proving things, that means using mathematics.", so then i think that Leslie Lamport the known scientist is not thinking correctly by saying so, since i think that you can also prove an algorithm by highering much more the probability of the success of the proof without using mathematics to prove the algorithm, and i say that a proof has not to be just a conclusion as a boolean logic of true or false, since i think that a proof can be a conclusion in fuzzy logic and by logical analogy it looks like how race detectors in the very agressive mode don't detect all the data races, so then they miss a really small number of real races , so it is like a very high probability of really detecting real races, so read my below thoughts about it so that yo understand my views. And i think that the second mistake of Leslie Lamport the known scientist is that he is wanting us to use formal methods, but read the following interesting article below about why don't people use formal methods:

And I invite you to read the following new article of the known computer expert in the above video called Leslie Lamport , and that says programmers need to use math by using formal methods, and how Lamport discuss some of his work, such as the TLA+ specification language (developed by Lamport over the past few decades, the TLA+ [Temporal Logic of Actions] specification language allows engineers to describe objectives of a program in a precise and mathematical way), and also cited some of the reasons why he gives a prominent place to mathematics in programming.

Read more in the following article and you have to translate it from french to english:

https://www.developpez.com/actu/333640/Un-expert-en-informatique-declare-que-les-programmeurs-ont-besoin-de-plus-de-mathematiques-ajoutant-que-les-ecoles-devraient-repenser-la-facon-dont-elles-enseignent-l-informatique/

But to answer the above expert called Leslie Lamport, i invite you to carefully read in the following interesting web page about the why don't people use formal methods:

WHY DON'T PEOPLE USE FORMAL METHODS?

https://www.hillelwayne.com/post/why-dont-people-use-formal-methods/


More of my philosophy of the polynomial-time complexity of race detection and more of my thoughts..

I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, so i have quickly understood how Rust
detects race conditions, but i think that a slew of
“partial order”-based methods have been proposed, whose
goal is to predict data races in polynomial time, but at the
cost of being incomplete and failing to detect data races in
"some" traces. These include algorithms based on the classical
happens-before partial order, and those based
on newer partial orders that improve the prediction of data
races over happens-before , so i think that we have to be optimistic
since read the following web page about the Sanitizers:

https://github.com/google/sanitizers

And notice carefully the ThreadSanitizer, so read carefully
the following paper about ThreadSanitizer:

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35604.pdf


And it says in the conclusion the following:

"ThreadSanitizer uses a new algorithm; it has several modes of operation, ranging from the most conservative mode (which has few false positives but also misses real races) to a very aggressive one (which
has more false positives but detects the largest number of
real races)."

So as you are noticing since the very agressive mode doesn't detect
all the data races, so it misses a really small number of real races , so it is like a very high probability of really detecting real races ,
and i think that you can also use my below methodology of using incrementally a model from the source code and using Spin model checker so that to higher even more the probability of detecting real races.


Read my previous thoughts:

More of my philosophy about race conditions and about composability and more of my thoughts..

I say that a model is a representation of something. It captures not all attributes of the represented thing, but rather only those seeming relevant. So my way of doing in software development in Delphi and Freepascal is also that i am using a "model" from the source code that i am executing in Spin model checking so that to detect race conditions, so i invite you to take a look at the following new tutorial that uses the powerful Spin tool:

https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

So you can for example install Spin model checker so that to detect race conditions, this is how you will get much more professional at detecting deadlocks and race conditions in parallel programming. And i invite you to look at the following video so that to know how to install Spin model checker on windows:

https://www.youtube.com/watch?v=MGzmtWi4Oq0

More of my philosophy about race detection and concurrency and more..

I have just looked quickly at different race detectors, and i think that
the Intel Thread Checker from Intel company from "USA" is also very good since the Intel Thread Checker needs to instrument either the C++ source code or the compiled binary to make every memory reference and every standard Win32 synchronization primitive observable, so this instrumentation from the source code is very good since it also permits me to port my scalable algorithms inventions by for example wrapping them in some native Windows synchronization APIs, and this instrumentation from the source code is also business friendly, so read about different race detectors and about Intel Thread Checker here:

https://docs.microsoft.com/en-us/archive/msdn-magazine/2008/june/tools-and-techniques-to-identify-concurrency-issues

So i think that the other race detectors of other programming languages have to provide this instrumentation from the source code as Intel Thread Checker from Intel company from "USA".

More of my philosophy about Rust and about memory models and about technology and more of my thoughts..


I think i am highly smart, and i say that the new programming language that we call Rust has an important problem, since read the following interesting article that says that atomic operations that have not correct memory ordering can still cause race conditions in safe code, this is why the suggestion made by the researchers is:

"Race detection techniques are needed for Rust, and they should focus on unsafe code and atomic operations in safe code."


Read more here:

https://www.i-programmer.info/news/98-languages/12552-is-rust-really-safe.html


More of my philosophy about programming languages about lock-based systems and more..

I think we have to be optimistic about lock-based systems, since race conditions detection can be done in polynomial-time, and you can notice it by reading the following paper:

https://arxiv.org/pdf/1901.08857.pdf

Or by reading the following paper:

https://books.google.ca/books?id=f5BXl6nRgAkC&pg=PA421&lpg=PA421&dq=race+condition+detection+and+polynomial+complexity&source=bl&ots=IvxkORGkQ9&sig=ACfU3U2x0fDnNLHP1Cjk5bD_fdJkmjZQsQ&hl=en&sa=X&ved=2ahUKEwjKoNvg0MP0AhWioXIEHRQsDJc4ChDoAXoECAwQAw#v=onepage&q=race%20condition%20detection%20and%20polynomial%20complexity&f=false

So i think we can continu to program in lock-based systems, and about
composability of lock-based systems, read my following thoughts about it it:

More of my philosophy about composability and about Haskell functional language and more..

I have just read quickly the following article about composability,
so i invite you to read it carefully:

https://bartoszmilewski.com/2014/06/09/the-functional-revolution-in-c/

I am not in accordance with the above article, and i think that the above scientist is programming in Haskell functional language and it is for him the way to composability, since he says that the way of functional programming like Haskell functional programming is the
the way that allows composability in presence of concurrency, but for him lock-based systems don't allow it, but i don't agree with him, and i will give you the logical proof of it, and here it is, read what is saying an article from ACM that was written by both Bryan M. Cantrill and Jeff Bonwick from Sun Microsystems:

You can read about Bryan M. Cantrill here:

https://en.wikipedia.org/wiki/Bryan_Cantrill

And you can read about Jeff Bonwick here:

https://en.wikipedia.org/wiki/Jeff_Bonwick

And here is what says the article about composability in the presence of concurrency of lock-based systems:

"Design your systems to be composable. Among the more galling claims of the detractors of lock-based systems is the notion that they are somehow uncomposable:

“Locks and condition variables do not support modular programming,” reads one typically brazen claim, “building large programs by gluing together smaller programs[:] locks make this impossible.”9 The claim, of course, is incorrect. For evidence one need only point at the composition of lock-based systems such as databases and operating systems into larger systems that remain entirely unaware of lower-level locking.

There are two ways to make lock-based systems completely composable, and each has its own place. First (and most obviously), one can make locking entirely internal to the subsystem. For example, in concurrent operating systems, control never returns to user level with in-kernel locks held; the locks used to implement the system itself are entirely behind the system call interface that constitutes the interface to the system. More generally, this model can work whenever a crisp interface exists between software components: as long as control flow is never returned to the caller with locks held, the subsystem will remain composable.

Second (and perhaps counterintuitively), one can achieve concurrency and
composability by having no locks whatsoever. In this case, there must be
no global subsystem state—subsystem state must be captured in per-instance state, and it must be up to consumers of the subsystem to assure that they do not access their instance in parallel. By leaving locking up to the client of the subsystem, the subsystem itself can be used concurrently by different subsystems and in different contexts. A concrete example of this is the AVL tree implementation used extensively in the Solaris kernel. As with any balanced binary tree, the implementation is sufficiently complex to merit componentization, but by not having any global state, the implementation may be used concurrently by disjoint subsystems—the only constraint is that manipulation of a single AVL tree instance must be serialized."

Read more here:

https://queue.acm.org/detail.cfm?id=1454462

More of my philosophy about HP and about the Tandem team and more of my thoughts..


I invite you to read the following interesting article so that
to notice how HP was smart by also acquiring Tandem Computers, Inc.
with there "NonStop" systems and by learning from the Tandem team
that has also Extended HP NonStop to x86 Server Platform, you can read about it in my below writing and you can read about Tandem Computers here: https://en.wikipedia.org/wiki/Tandem_Computers , so notice that Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applications requiring maximum uptime and zero data loss:

https://www.zdnet.com/article/tandem-returns-to-its-hp-roots/

More of my philosophy about HP "NonStop" to x86 Server Platform fault-tolerant computer systems and more..

Now HP to Extend HP NonStop to x86 Server Platform

HP announced in 2013 plans to extend its mission-critical HP NonStop technology to x86 server architecture, providing the 24/7 availability required in an always-on, globally connected world, and increasing customer choice.

Read the following to notice it:

https://www8.hp.com/us/en/hp-news/press-release.html?id=1519347#.YHSXT-hKiM8

And today HP provides HP NonStop to x86 Server Platform, and here is
an example, read here:

https://www.hpe.com/ca/en/pdfViewer.html?docId=4aa5-7443&parentPage=/ca/en/products/servers/mission-critical-servers/integrity-nonstop-systems&resourceTitle=HPE+NonStop+X+NS7+%E2%80%93+Redefining+continuous+availability+and+scalability+for+x86+data+sheet

So i think programming the HP NonStop for x86 is now compatible with x86 programming.

And i invite you to read my thoughts about technology here:

https://groups.google.com/g/soc.culture.usa/c/N_UxX3OECX4


More of my philosophy about stack allocation and more of my thoughts..


I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, so i have just looked at the x64 assembler
of the C/C++ _alloca function that allocates size bytes of space from the Stack and it uses x64 assembler instructions to move RSP register and i think that it also aligns the address and it ensures that it doesn't go beyond the stack limit etc., and i have quickly understood the x64 assembler of it, and i invite you to look at it here:

64-bit _alloca. How to use from FPC and Delphi?

https://www.atelierweb.com/64-bit-_alloca-how-to-use-from-delphi/


But i think i am smart and i say that the benefit of using a stack comes mostly from "reusability" of the stack, i mean it is done this way
since you have for example from a thread to execute other functions or procedures and to exit from those functions of procedures and this exiting from those functions or procedures makes the memory of stack available again for "reusability", and it is why i think that using a dynamic allocated array as a stack is also useful since it also offers those benefits of reusability of the stack and i think that dynamic allocation of the array will not be expensive, so it is why i think i will implement _alloca function using a dynamic allocated array and i think it will also be good for my sophisticated coroutines library that you can read about it from my following thoughts about preemptive and non-preemptive timesharing in the following web link:


https://groups.google.com/g/alt.culture.morocco/c/JuC4jar661w


And i invite you to read my thoughts about technology here:

https://groups.google.com/g/soc.culture.usa/c/N_UxX3OECX4


More of my philosophy about the German model and about quality and more of my thoughts..

I think i am highly smart since I have passed two certified IQ tests and i have scored above 115 IQ, so i will ask the following philosophical question of:


Why is Germany so successful in spite of least working hours?


So i think one of the most important factors are:


Of course the first factor is that Germany has good schools and vocational training - for everyone. This makes the average worker much more productive in terms of value add per hour.

And the second "really" important factor is the following:

It’s in the culture of Germany to focus on quality and being effective (all the way back to Martin Luther and his protestant work ethic)... Higher quality in every step of the chain leads to a massive reduction in defects and rework. This increases everyone’s productivity. But notice that i am also speaking in my below thoughts about the other ways to increase productivity by being specialization etc., and the way of the German model to focus on quality and being effective by also focusing on quality in every step of the chain that leads to a massive reduction in defects and rework, is also done by the following methodologies of quality control and Six Sigma etc., so read my following thoughts about them:

More of my philosophy about quality control and more of my thoughts..

I have just looked and understood quickly the following paper about SPC(Statistical process control):

https://owic.oregonstate.edu/sites/default/files/pubs/EM8733.pdf


I think i am highly smart, but i think that the above paper doesn't speak about the fact that you can apply the central limit theorem as following:

The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough.

Also the above paper doesn't speak about the following very important things:

And I have quickly understood quality control with SPC(Statistical process control) and i have just discovered a smart pattern with my fluid intelligence and it is that with SPC(Statistical process control) we can debug the process, like in software programming, by looking at its variability, so if the variability doesn't follow a normal distribution, so it means that there are defects in the process, and we say that there is special causes that causes those defects, and if the variability follows a normal distribution, we say that the process is stable and it has only common causes, and it means that we can control it much more easily by looking at the control charts that permit to debug and control the variability by for example changing the machines or robots and looking at the control charts and measuring again with the control charts

More of my philosophy about the Post Graduate Program on lean Six Sigma and more..

More of my philosophy about Six Sigma and more..

I think i am smart, and now i will talk more about Six Sigma
since i have just talked about SPC(Statistical quality control), so
you have to know that Six Sigma needs to fulfill the following steps:

1- Define the project goals and customer (external and internal)
deliverables.

2- Control future performance so improved process doesn't degrade.

3- Measure the process so that to determine current performance and
quantify the problem.

4- Analyze and determine the root cause(s) of the defects.

5- Improve the process by eliminating the defects.


And you have to know that those steps are also important steps toward attaining ISO 9000 certification, and notice that you can use SPC(Statistical process control) and the control charts on step [4] and step [5] above.

Other than that i have just read the following interesting important paper about SPC(Statistical process control) that explains all the process of SPC(Statistical process control), so i invite you to read it
carefully:

https://owic.oregonstate.edu/sites/default/files/pubs/EM8733.pdf

So as you notice in the above paper that the central limit theorem
in mathematics is so important, but notice carefully that the necessary and important condition so that the central limit theorem works is that you have to use independent and random variables, and notice in the above paper that you have to do two things and it's that you have to reduce or eliminate the defects and you have to control the "variability" of the defects, and this is why the paper is talking about how to construct a control chart. Other than that the central limit theorem is not only related to SPC(Statistical process control), but it is also related to PERT and my PERT++ software project below, and notice that in my software project below that is called PERT++, i have provided you with two ways of how to estimate the critical path, first, by the way of CPM(Critical Path Method) that shows all the arcs of the estimate of the critical path, and the second way is by the way of the central limit theorem by using the inverse normal distribution function, and you have to provide my software project that is called PERT++ with three types of estimates that are the following:

Optimistic time - generally the shortest time in which the activity
can be completed. It is common practice to specify optimistic times
to be three standard deviations from the mean so that there is
approximately a 1% chance that the activity will be completed within
the optimistic time.

Most likely time - the completion time having the highest
probability. Note that this time is different from the expected time.

Pessimistic time - the longest time that an activity might require. Three standard deviations from the mean is commonly used for the pessimistic time.

And you can download my PERT++ from reading my following below thoughts:

More of my philosophy about the central limit theorem and about my PERT++ and more..

The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough.

How large is "large enough"?

In practice, some statisticians say that a sample size of 30 is large enough when the population distribution is roughly bell-shaped. Others recommend a sample size of at least 40. But if the original population is distinctly not normal (e.g., is badly skewed, has multiple peaks, and/or has outliers), researchers like the sample size to be even larger. So i invite you to read my following thoughts about my software
project that is called PERT++, and notice that the PERT networks are referred to by some researchers as "probabilistic activity networks" (PAN) because the duration of some or all of the arcs are independent random variables with known probability distribution functions, and have finite ranges. So PERT uses the central limit theorem (CLT) to find the expected project duration.

And as you are noticing this Central Limit Theorem is also so important
for quality control, read the following to notice it(I also understood Statistical Process Control (SPC)):

An Introduction to Statistical Process Control (SPC)

https://www.engineering.com/AdvancedManufacturing/ArticleID/19494/An-Introduction-to-Statistical-Process-Control-SPC.aspx

Also PERT networks are referred to by some researchers as "probabilistic activity networks" (PAN) because the duration of some or all of the arcs are independent random variables with known probability distribution functions, and have finite ranges. So PERT uses the central limit theorem (CLT) to find the expected project duration.

So, i have designed and implemented my PERT++ that that is important for quality, please read about it and download it from my website here:

https://sites.google.com/site/scalable68/pert-an-enhanced-edition-of-the-program-or-project-evaluation-and-review-technique-that-includes-statistical-pert-in-delphi-and-freepascal

---


So I have provided you in my PERT++ with the following functions:


function NormalDistA (const Mean, StdDev, AVal, BVal: Extended): Single;

function NormalDistP (const Mean, StdDev, AVal: Extended): Single;

function InvNormalDist(const Mean, StdDev, PVal: Extended; const Less: Boolean): Extended;

For NormalDistA() or NormalDistP(), you pass the best estimate of completion time to Mean, and you pass the critical path standard deviation to StdDev, and you will get the probability of the value Aval or the probability between the values of Aval and Bval.

For InvNormalDist(), you pass the best estimate of completion time to Mean, and you pass the critical path standard deviation to StdDev, and you will get the length of the critical path of the probability PVal, and when Less is TRUE, you will obtain a cumulative distribution.


So as you are noticing from my above thoughts that since PERT networks are referred to by some researchers as "probabilistic activity networks" (PAN) because the duration of some or all of the arcs are independent random variables with known probability distribution functions, and have finite ranges. So PERT uses the central limit theorem (CLT) to find the expected project duration. So then you have to use my above functions
that are Normal distribution and inverse normal distribution functions, please look at my demo inside my zip file to understand better how i am doing it:

You can download and read about my PERT++ from my website here:

https://sites.google.com/site/scalable68/pert-an-enhanced-edition-of-the-program-or-project-evaluation-and-review-technique-that-includes-statistical-pert-in-delphi-and-freepascal

I think i am smart and i invite you to read carefully the following webpage of Alan Robinson Professor of Operations Management at University of Massachusetts and that is a full-time professor at the Isenberg School of Management of UMass and a consultant and book author specializing in managing ideas (idea-generation and idea-driven organization) and building high-performance organizations, creativity, innovation, quality, and lean management:

https://www.simplilearn.com/pgp-lean-six-sigma-certification-training-course?utm_source=google&utm_medium=cpc&utm_term=&utm_content=11174393172-108220153863-506962883161&utm_device=c&utm_campaign=Display-MQL-DigitalOperationsCluster-PG-QM-CLSS-UMass-YTVideoInstreamCustomIntent-US-Main-AllDevice-adgroup-QM-Desktop-CI&gclid=Cj0KCQiA3rKQBhCNARIsACUEW_ZGLHcUP2htLdQo46zP6Eo2-vX0MQYvc-o6GQP55638Up4tex85RBEaArn9EALw_wcB


And notice in the above webpage of the professor, that he is giving Post Graduate Program in Lean Six Sigma and on agile methodology, and i think that this Post Graduate Program is easy for me since i am really smart and i can easily understand lean Six Sigma or Six Sigma and i can easily understand agile methodology, and notice that i am in my below thoughts also explaining much more smartly what is agile methodology, and i think that the more difficult part of Six Sigma or lean Six Sigma is to understand the central limit theorem and to understand what is SPC(Statistical quality control) and how to use the control charts so that to control the variability of the defects, and notice that i am talking about it in my below thoughts, but i think that the rest of lean Six Sigma and Six Sigma is easy for me.


More of my philosophy about IQ tests and more of my thoughts..


I think i am highly smart, and I have passed two certified IQ tests and i have scored above 115 IQ, but i have just passed more and more IQ tests, and i have just noticed that the manner in wich we test with IQ tests is not correct, since in an IQ test you can be more specialized and above average in one subtest of intelligence than in another subtest of intelligence inside an IQ test, since IQ tests test for many kind of intelligences such as the spatial and speed and calculations and logical intelligence etc., so i think that you can be really above average in logical intelligence, as i am really above average in logical intelligence, but at the same time you can be below average in calculations and/or spatial.., so since an IQ test doesn't test for this kind of specializations of intelligence, so i think it is not good, since testing for this kind specializations in intelligence is really important so that to be efficient by knowing the strong advantages of this or that person in every types of intelligences. And about the importance of specialization, read carefully my following thought about it:


More of my philosophy about specialization and about efficiency and productivity..

The previous CEO Larry Culp of General Electric and the architect of a strategy that represented a new turning point in the world corporate strategies, Larry Culp's strategy was to divide the company according to its activities. Something like we are better of alone, seperately and
focused on each one's own activity, than together in a large
conglomerate. And it is a move from integration to specialization.
You see it is thought that a company always gains economies of scale
as it grows, but this is not necessarily the case, since as the company
gains in size - especially if it engages in many activities - it
also generates its own bureaucracy, with all that entails in term
of cost and efficiency. And not only that, it is also often the case
that by bringing together very different activities, strategic focus is lost and decision-making is diluted, so that in the end no one ends up
taking responsability, it doesn't always happen, but this reasons are
basically what is driving this increasing specialization. So i invite to look at the following video so that to understand more about it:

The decline of industrial icon of the US - VisualPolitik EN

https://www.youtube.com/watch?v=-hqwYxFCY-k


And here is my previous thoughts about specialization and productivity so that to understand much more:

More about the Japanese Ikigai and about productivity and more of my thoughts..

Read the following interesting article about Japanese Ikigai:

The More People With Purpose, the Better the World Will Be

https://singularityhub.com/2018/06/15/the-more-people-with-purpose-the-better-the-world-will-be/

I think i am highly smart, so i say that the Japanese Ikigai is like a Japanese philosophy that is like the right combination or "balance" of passion, vocation, and mission, and Ikigai and MTP, as concepts, urge us to align our passions with a mission to better the world, but i think that Japanese Ikiai is a also smart since it gets the "passion" from the "mission", since the mission is also the engine, so you have to align the passion with the mission of the country or the global world so that to be efficient, and Japanese Ikigai is also smart since so that to higher productivity and be efficient, you have to "specialize" in doing a job, but so that to higher more productivity and be more efficient you can also specialize in what you do "better", and it is what is doing Japanese Ikigai, since i think that in Japanese Ikigai, being the passion permits to make you specialized in a job in what you do better, and here is what i have just smartly said about productivity:

I think i am highly smart, and i have passed two certified IQ tests and i have scored above 115 IQ, and i will now talk about another important idea of Adam Smith the father of economic Liberalism, and it is about "specialization" in an economic system, since i say that in an economic system we have to be specialized in doing a job so that to be efficient and productive, but not only that, but we have to specialize in doing a job in what we do better so that to be even more efficient and productive, and we have to minimize at best the idle time or the wasting of time doing a job, since i can also say that this average idle time or wasting time of the workers working in parallel can be converted to a contention like in parallel programming, so you have to minimize it at best, and you have to minimize at best the coherency like in parallel programming so that to scale much better, and of course all this can create an economy of scale, and also i invite you to read my following smart and interesting thoughts about scalability of productivity:

I will talk about following thoughts from the following PhD computer scientist:

https://lemire.me/blog/about-me/

Read more carefully here his thoughts about productivity:

https://lemire.me/blog/2012/10/15/you-cannot-scale-creativity/

And i think he is making a mistake in his above webpage about productivity:

Since we have that Productivity = Output/Input

But better human training and/or better tools and/or better human smartness and/or better human capacity can make the Parallel productivity part much bigger that the Serial productivity part, so it can scale much more (it is like Gustafson's Law).

And it looks like the following:

About parallelism and about Gustafson’s Law..

Gustafson’s Law:

• If you increase the amount of work done by each parallel
task then the serial component will not dominate
• Increase the problem size to maintain scaling
• Can do this by adding extra complexity or increasing the overall
problem size

Scaling is important, as the more a code scales the larger a machine it
can take advantage of:

• can consider weak and strong scaling
• in practice, overheads limit the scalability of real parallel programs
• Amdahl’s law models these in terms of serial and parallel fractions
• larger problems generally scale better: Gustafson’s law


Load balance is also a crucial factor.


And read my following thoughts about Evolutionary Design methodology and that is so important so that to understand:

And I invite you to look at step 4 of my below thoughts of software Evolutionary Design methodology with agile, here it is:

4- When in agile a team breaks a project into phases, it’s called
incremental development. An incremental process is one in which
software is built and delivered in pieces. Each piece, or increment,
represents a complete subset of functionality. The increment may be
either small or large, perhaps ranging from just a system’s login
screen on the small end to a highly flexible set of data management
screens. Each increment is fully coded Sprints, Planning, and
Retrospectives.

And you will notice that it has to be done by "prioritizing" the pieces of the software to be delivered to the customers, and here again in agile you are noticing that we are also delivering prototypes of the software, since we often associate prototypes with nearly completed or just-before launch versions of products. However, designers create prototypes at all phases of the design process at various resolutions. In engineering, students are taught to and practitioners think deeply before setting out to build. However, as the product or system becomes increasingly complex, it becomes increasingly difficult to consider all factors while designing. Facing this reality, designers are no longer just "thinking to build" but also "building to think." By getting hands on and trying to create prototypes, unforeseen issues are highlighted early, saving costs related with late stage design changes. This rapid iterative cycle of thinking and building is what allows designers to learn rapidly from doing. Creating interfaces often benefit from the "build to think" approach. For example, in trying to layout the automotive cockpit, one can simply list all the features, buttons, and knobs that must be incorporated. However, by prototyping the cabin does one really start to think about how the layout should be to the driver in order to avoid confusion while maximizing comfort. This then allows the designer iterate on their initial concept to develop something that is more intuitive and refined. Also prototypes and there demonstrations are designed to get potential customers interested and excited.

More of my philosophy about the Evolutionary Design methodology and more..

Here are some important steps of software Evolutionary Design methodology:

1- Taking a little extra time during the project to write solid code and
fix problems today, they create a codebase that’s easy to maintain
tomorrow.

2- And the most destructive thing you can do to your project is to build
new code, and then build more code that depends on it, and then still
more code that depends on that, leading to that painfully familiar
domino effect of cascading changes...and eventually leaving you with
an unmaintainable mess of spaghetti code. So when teams write code,
they can keep their software designs simple by creating software
designs based on small, self-contained units (like classes, modules,
services, etc.) that do only one thing; this helps avoid the domino
effect.

3- Instead of creating one big design at the beginning of the project
that covers all of the requirements, agile architects use incremental
design, which involves techniques that allow them to design a system
that is not just complete, but also easy for the team to modify as
the project changes.

4- When in agile a team breaks a project into phases, it’s called
incremental development. An incremental process is one in which
software is built and delivered in pieces. Each piece, or increment,
represents a complete subset of functionality. The increment may be
either small or large, perhaps ranging from just a system’s login
screen on the small end to a highly flexible set of data management
screens. Each increment is fully coded Sprints, Planning, and
Retrospectives.

5- And an iterative process in agile is one that makes progress through
successive refinement. A development team takes a first cut
at a system, knowing it is incomplete or weak in some (perhaps many)
areas. They then iteratively refine those areas until the product is
satisfactory. With each iteration the software is improved through
the addition of greater detail.

More of philosophy about Democracy and the Evolutionary Design methodology..

I will make a logical analogy between software projects and Democracy,
first i will say that because of the today big complexity of software
projects, so the "requirements" of those complex software projects are
not clear and a lot could change in them, so this is
why we are using an Evolutionary Design methodology with different tools
such as Unit Testing, Test Driven Development, Design Patterns,
Continuous Integration, Domain Driven Design, but we have to notice
carefully that an important thing in Evolutionary Design methodology is
that when those complex software projects grow, we have first to
normalize there growth by ensuring that the complex software projects
grow "nicely" and "balanced" by using standards, and second we have to
optimize growth of the complex software projects by balancing between
the criteria of the easy to change the complex software projects and the
performance of the complex software projects, and third you have to
maximize the growth of the complex software projects by making the most
out of each optimization, and i think that by logical analogy we can
notice that in Democracy we have also to normalize the growth by not
allowing "extremism" or extremist ideologies that hurt Democracy, and we
have also to optimize Democracy by for example well balancing between
"performance" of the society and in the Democracy and the "reliability"
of helping others like the weakest members of the society among the
people that of course respect the laws.


More of my philosophy about the the importance of randomness in
the genetic algorithm and in the evolutionary algorithms and more
of my thoughts..

More of my philosophy about the genetic algorithm and about artificial intelligence and more of my thoughts..

I think i am highly smart, so i will ask the following philosophical question about the genetic algorithm:

Is the genetic algorithm a brute-force search and if it is
not, how is it different than the brute-force search ?

So i have just quickly took a look at some example of a minimization problem with a genetic algorithm, and i think that the genetic algorithm is not a brute-force search, since i think that when in a minimization
problem with a genetic algorithm you do a crossover, also called recombination, that is a genetic operator used to combine the genetic information of two parents to generate new offspring, the genetic algorithm has this tendency to also explore locally and we call it exploitation, and when the genetic algorithm does genetical mutations with a level of probability, the genetic algorithm has this tendency to explore globally and we call it exploration, so i think a good genetic algorithm is the one that balance efficiently exploration and exploitation so that to avoid premature convergence, and notice that when you explore locally and globally you can do it with a bigger population that makes it search faster, so it is is why i think the genetic algorithm has this kind of patterns that makes it a much better search than brute-force search. And so that to know more about this kind of artificial intelligence , i invite you to read my following thoughts in the following web link about evolutionary algorithms and artificial intelligence so that to understand more:

https://groups.google.com/g/alt.culture.morocco/c/joLVchvaCf0

More of my philosophy about the other conditions of the genetic algorithm and about artificial intelligence and more of my thoughts..
.
I think i am highly smart, and i think that the genetic algorithm
is interesting too, but i have to speak about one other most important thing about the genetic algorithm, so i will ask a philosophical question about it:

Since as i just said previously, read it below, that a good genetic algorithm has to efficiently balance between global(exploration) and local(exploitation) search , but how can you be sure that you have found a global optimum ?

I think i am smart, and i will say that it also depends on the kind of problem, so if for example we have a minimization problem, you can
rerun a number of times the genetic algorithm so that to select the best minimum among all the results and you can also give more time to
the exploration so that to find the a better result, also you have to know that the genetic algorithm can be more elitist in the crossover steps, but i think that this kind of Elitism can has the tendency to not efficiently higher the average best of the average members of the population, so then it depends on wich problem you want to use the genetic algorithm, also i think that the genetic algorithm is a model that explains from where comes humans, since i also think
that the genetic mutations of humans, that happens with a probability, has also not only come from the inside body from the chromosomes and genes, but they also were the result of solar storms that, as has said NASA, that may have been key to life on Earth, read here so that to notice it:

https://www.nasa.gov/feature/goddard/2016/nasa-solar-storms-may-have-been-key-to-life-on-earth

I think i am highly smart, and i will invite you to read my following
smart thoughts about evolutionary algorithms and artificial intelligence so that you notice how i am talking about the so important thing that we call "randomness":

https://groups.google.com/g/alt.culture.morocco/c/joLVchvaCf0


So i think i am highly smart, and notice that i am saying in the above web link the following about evolutionary algorithms:

"I think that Modern trends in solving tough optimization problems tend
to use evolutionary algorithms and nature-inspired metaheuristic
algorithms, especially those based on swarm intelligence (SI), two major
characteristics of modern metaheuristic methods are nature-inspired, and
a balance between randomness and regularity."

So i think that in the genetic algorithm, there is a part that is hard coded, like selecting the best genes, and i think that it is what
we call regularity, since it is hard coded like that, but there is
a so important thing in that genetic algorithm that we call randomness,
and i think that it is the genetic mutations that happen with a
probability and that give a kind of diversity, so i think that this
genetic mutations are really important, since i can for example
say that if the best genes are the ones that use "reason", so then reason too can make the people that has the tendency to use reason do
a thing that is against there survival, like going to war when we
feel that there is too much risk, but this going to war can make
the members or people that use reason so that to attack the other enemy
be extinct in a war when they loose a war, and it is the basis of randomness in a genetic algorithm, since even when there is a war
between for example two Ant colonies, there are some members that do not make war and that can survive if other are extinct by making war, and i say it also comes from randomness of the genetics.

More of my philosophy about student performance and about artificial intelligence and more of my thoughts..


I have just read the following interesting article from McKinsey, and
i invite you to read it carefully:

Drivers of student performance: Asia insights

https://www.mckinsey.com/industries/education/our-insights/drivers-of-student-performance-asia-insights

And i think i am smart, and i think that the following factors in the above article that influence student performance are not so difficult to implement:

1- Students who receive a blend of inquiry-based and teacher-directed
instruction have the best outcomes

2- School-based technology yields the best results when placed in the
hands of teachers

3- Early childhood education has a positive impact on student scores,
but the quality and type of care is important

But i think that the factor that is tricky to implement (since it needs good smartness) is good motivation calibration that permits to score 8 to 14 percent higher on the science test than poorly calibrated one, and the high self-identified motivation that permits to score 6 to 8 percent higher.




Thank you,
Amine Moulay Ramdane.















0 new messages