Hello,
More of my philosophy about humanity and about technology and more of my thoughts..
I am a white arab, and i think i am smart since i have also
invented many scalable algorithms and algorithms..
The global average ecological footprint is 2.84 gha per person while the average biocapacity is 1.68 gha per person; it takes 1.69 Earth to cover the consumption of humanity; and this global average ecological footprint brings problems, so this global average ecological footprint is growing more and more, so economic growth can be separated from unsustainable resource consumption and harmful pollution. It is what our humanity has to do by also using science and technology. So i think that the problem of the global ecological footprint will be solved by technology and science, read the following so that to notice it:
The Limits of the Earth, Part 1: Problems
"In my own new book, The Infinite Resource: The Power of Ideas on a Finite Planet, I challenge this view. The problem isn’t economic growth, per se. Nor is the problem that our natural resources are too small. While finite, the natural resources the planet supplies are vast and far larger than humanity needs in order to continue to thrive and grow prosperity for centuries to come. The problem, rather, is the types of resources we access, and the manner and efficiency with which we use them.
And the ultimate solution to those problems is innovation – innovation in the science and technology that we use to tap into physical resources, and innovation in the economic system that steers our consumption.
The situation we’re in isn’t a looming wall that we’re doomed to crash into. It’s a race – a race between depletion and pollution of natural resources on one side, and our pace of innovation on the other."
Read more here:
https://blogs.scientificamerican.com/guest-blog/the-limits-of-the-earth-part-1-problems/
And I have just read the following article from United Nations:
Growing at a slower pace, world population is expected to reach 9.7
billion in 2050 and could peak at nearly 11 billion around 2100
Read more here:
https://www.un.org/development/desa/en/news/population/world-population-prospects-2019.html
So notice that it says the following:
"Falling proportion of working-age population is putting pressure on
social protection systems
The potential support ratio, which compares numbers of persons at
working ages to those over age 65, is falling around the world. In Japan
this ratio is 1.8, the lowest in the world. An additional 29 countries,
mostly in Europe and the Caribbean, already have potential support
ratios below three. By 2050, 48 countries, mostly in Europe, Northern
America, and Eastern and South-Eastern Asia, are expected to have
potential support ratios below two. These low values underscore the
potential impact of population ageing on the labour market and economic
performance, as well as the fiscal pressures that many countries will
face in the coming decades as they seek to build and maintain public
systems of health care, pensions and social protection for older persons."
So this is why you have to read the following to understand more:
And I have just looked at this video of the french politician called
Jean-Marie Le Pen and he is saying in the video that with those flows of
immigrants in Europe that: "La 3ème Guerre mondiale est commencée", look
at the following video to notice it:
https://www.youtube.com/watch?v=Ene0hp7EAus
But i think that Jean-Marie Le Pen is "not" thinking correctly, because
if Western Europe wants to keep its social benefits, the countries of
the E.U. are going to need more workers. No place in the world has an
older population that's not into baby making than Europe, read more here
on Forbes to notice it:
Here's Why Europe Really Needs More Immigrants
https://www.forbes.com/sites/kenrapoza/2017/08/15/heres-why-europe-really-needs-more-immigrants/#7319e2e24917
I have just read the following interesting article,
i invite you to read it carefully:
Does Our Survival Depend on Relentless Exponential Growth?
https://singularityhub.com/2017/10/11/do-we-need-relentless-exponential-growth-to-survive/
As you also notice that the article above says the following:
"There have concurrently been developments in agriculture and medicine
and, in the 20th century, the Green Revolution, in which Norman Borlaug
ensured that countries adopted high-yield varieties of crops—the first
precursors to modern ideas of genetically engineering food to produce
better crops and more growth. The world was able to produce an
astonishing amount of food—enough, in the modern era, for ten billion
people."
So i think that the world will be able to produce enough food for world
population in year 2100, since around 2100, the world population will
peak at nearly 11 billions, read the following article to notice it:
Growing at a slower pace, world population is expected to reach 9.7
billion in 2050 and could peak at nearly 11 billion around 2100
Read more here:
https://www.un.org/development/desa/en/news/population/world-population-prospects-2019.html
Also you can read my new writing about new interesting medical treatments and drugs and about antibiotic resistance here:
https://groups.google.com/g/alt.culture.morocco/c/vChmXT_pXUI
More of my philosophy about the 12 memory channels of
the new AMD Epyc Genoa CPU and more of my thoughts..
So as i am saying below, i think that so that to use 12 memory
channels in parallel that supports it the new AMD Genoa CPU, the GMI-Wide mode must enlarge more and connects each CCD with more GMI links, so i think that it is what is doing AMD in its new 4 CCDs configuration, even with the cost optimized Epyc Genoa 9124 16 cores with 64 MB of L3 cache with 4 Core Complex Dies (CCDs), that costs around $1000 (Look at it here:
https://www.tomshardware.com/reviews/amd-4th-gen-epyc-genoa-9654-9554-and-9374f-review-96-cores-zen-4-and-5nm-disrupt-the-data-center ), and as i am explaining more below that the Core Complex Dies (CCDs) connect to memory, I/O, and each other through the I/O Die (IOD) and each CCD connects to the IOD via a dedicated high-speed, or Global Memory Interconnect (GMI) link and the IOD also contains memory channels, PCIe Gen5 lanes, and Infinity Fabric links and all dies, or chiplets, interconnect with each other via AMD’s Infinity Fabric Technology, and of course this will permit my new software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well to scale on the 12 memory channels, read my following thoughts so that to understand more about it:
More of my philosophy about the new Zen 4 AMD Ryzen™ 9 7950X and more of my thoughts..
So i have just looked at the new Zen 4 AMD Ryzen™ 9 7950X CPU, and i invite you to look at it here:
https://www.amd.com/en/products/cpu/amd-ryzen-9-7950x
But notice carefully that the problem is with the number of supported memory channels, since it just support two memory channels, so it is not good, since for example my following Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well is scaling around 8X on my 16 cores Intel Xeon with 2 NUMA nodes and with 8 memory channels, but it will not scale correctly on the
new Zen 4 AMD Ryzen™ 9 7950X CPU with just 2 memory channels since it is also memory-bound, and here is my Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well and i invite you to take carefully a look at it:
https://sites.google.com/site/scalable68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library
So i advice you to buy an AMD Epyc CPU or an Intel Xeon CPU that supports 8 memory channels.
---
And of course you can use the next Twelve DDR5 Memory Channels for Zen 4 AMD EPYC CPUs so that to scale more my above algorithm, and read about it here:
https://www.tomshardware.com/news/amd-confirms-12-ddr5-memory-channels-on-genoa
And here is the simulation program that uses the probabilistic mechanism that i have talked about and that prove to you that my algorithm of my Parallel C++ Conjugate Gradient Linear System Solver Library is scalable:
If you look at my scalable parallel algorithm, it is dividing the each array of the matrix by 250 elements, and if you look carefully i am using two functions that consumes the greater part of all the CPU, it is the atsub() and asub(), and inside those functions i am using a probabilistic mechanism so that to render my algorithm scalable on NUMA architecture , and it also make it scale on the memory channels, what i am doing is scrambling the array parts using a probabilistic function and what i have noticed that this probabilistic mechanism is very efficient, to prove to you what i am saying , please look at the following simulation that i have done using a variable that contains the number of NUMA nodes, and what i have noticed that my simulation is giving almost a perfect scalability on NUMA architecture, for example let us give to the "NUMA_nodes" variable a value of 4, and to our array a value of 250, the simulation bellow will give a number of contention points of a quarter of the array, so if i am using 16 cores , in the worst case it will scale 4X throughput on NUMA architecture, because since i am using an array of 250 and there is a quarter of the array of contention points , so from the Amdahl's law this will give a scalability of almost 4X throughput on four NUMA nodes, and this will give almost a perfect scalability on more and more NUMA nodes, so my parallel algorithm is scalable on NUMA architecture and it also scale well on the memory channels,
Here is the simulation that i have done, please run it and you will notice yourself that my parallel algorithm is scalable on NUMA architecture.
Here it is:
---
program test;
uses math;
var tab,tab1,tab2,tab3:array of integer;
a,n1,k,i,n2,tmp,j,numa_nodes:integer;
begin
a:=250;
Numa_nodes:=4;
setlength(tab2,a);
for i:=0 to a-1
do
begin
tab2:=i mod numa_nodes;
end;
setlength(tab,a);
randomize;
for k:=0 to a-1
do tab:=k;
n2:=a-1;
for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab;
tab:=tab[n1];
tab[n1]:=tmp;
end;
setlength(tab1,a);
randomize;
for k:=0 to a-1
do tab1:=k;
n2:=a-1;
for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab1;
tab1:=tab1[n1];
tab1[n1]:=tmp;
end;
for i:=0 to a-1
do
if tab2[tab]=tab2[tab1] then
begin
inc(j);
writeln('A contention at: ',i);
end;
writeln('Number of contention points: ',j);
setlength(tab,0);
setlength(tab1,0);
setlength(tab2,0);
end.
---
More of my philosophy about 4 CCDs configuration of AMD Epyc Genoa CPU and more of my thoughts..
I have just read the following new paper about AMD 4th Gen EPYC 9004 Series, so i invite you to read it carefully:
https://hothardware.com/reviews/amd-genoa-data-center-cpu-launch
So read carefully the 4 CCDs configuration, so i am understanding
the following from it:
I/O DIE is what is connected to the memory channels externally, and it says that SKUs north of 4 CCDs (e.g. 32 cores) use the GMI3-Narrow configuration with a single GMI link per CCD. With 4 CCD and lower SKUs, AMD can implement GMI-Wide mode which joins each CCD to the IOD with two GMI links. In this case, one link of each CCD populates GMI0 to GMI3 while the other link of each CCD populates GMI8 to GMI11 as diagramed above. This helps these parts better balance against I/O demands.
So i think that that AMD implemented in his new 4 CCDs configuration the GMI-Wide mode which joins each CCD to the IOD with two GMI links, so that to be connected to the 8 memory channels externally and use them in parallel, so i think that the problem is solved, since i think that the cost optimized Epyc Genoa 9124 16 cores with 64 MB of L3 cache with 4 Core Complex Dies (CCDs), that costs around $1000 (Look at it here:
https://www.tomshardware.com/reviews/amd-4th-gen-epyc-genoa-9654-9554-and-9374f-review-96-cores-zen-4-and-5nm-disrupt-the-data-center )
can use fully the 8 memory channels in parallel, so it is a good Epyc Genoa processor to buy.
And of course i invite you to read the following:
More of my philosophy about the new Epyc Genoa and about Core Complex Die (CCD) and Core-complex(CCX) and more of my thoughts..
I have just looked at the following paper from AMD and i invite
you to look at it:
https://developer.amd.com/wp-content/resources/56827-1-0.pdf
And as you notice above that you have to look at how many
Core Complex Dies (CCDs) you have, since it tells you more
about how many connections of Infinity Fabric you have, and it is
an important information, since look at the following article
about the new AMD Epyc Genoa:
https://wccftech.com/amd-epyc-genoa-cpu-lineup-specs-benchmarks-leak-up-to-2-6x-faster-than-intel-xeon/
More of my thoughts about technology and about Apple Silicon M1 Emulating x86 and more of my thoughts..
I have just looked at the following articles about Rosetta 2 and the benchmarks of Apple Silicon M1 Emulating x86:
https://www.computerworld.com/article/3597949/everything-you-need-to-know-about-rosetta-2-on-apple-silicon-macs.html
and read also here:
https://www.macrumors.com/2020/11/15/m1-chip-emulating-x86-benchmark/
But i think that the problem with Apple Silicon M1 and the next Apple Silicon M2 is that Rosetta 2 only lets you run x86–64 macOS apps. That would be apps that were built for macOS (not Windows) and aren't 32-bit. The macOS restriction eliminates huge numbers of Windows apps, and 64-bit restriction eliminates even more.
Also read the following:
Apple says new M2 chip won’t beat Intel’s finest
Read more here:
https://www.pcworld.com/article/782139/apple-m2-chip-wont-beat-intels-finest.html
And here is what i am saying on my following thoughts about technology about Arm Vs. X86:
More of my philosophy about the Apple Silicon and about Arm Vs. X86 and more of my thoughts..
I invite you to read carefully the following interesting article so
that to understand more:
Overhyped Apple Silicon: Arm Vs. X86 Is Irrelevant
https://seekingalpha.com/article/4447703-overhyped-apple-silicon-arm-vs-x86-is-irrelevant
More of my philosophy about code compression of RISC-V and ARM and more of my thoughts..
I think i am highly smart, and i have just read the following paper
that says that RISC-V Compressed programs are 25% smaller than RISC-V programs, fetch 25% fewer instruction bits than RISC-V programs, and incur fewer instruction cache misses. Its code size is competitive with other compressed RISCs. RVC is expected to improve the performance and energy per operation of RISC-V.
Read more here to notice it:
https://people.eecs.berkeley.edu/~krste/papers/waterman-ms.pdf
So i think RVC has the same compression as ARM Thumb-2, so i think
that i was correct in my previous thoughts , read them below,
so i think we have now to look if the x86 or x64 are still more cache friendly even with Thumb-2 compression or RVC.
More of my philosophy of who will be the winner, x86 or x64 or ARM and more of my thoughts..
I think i am highly smart, and i think that since x86 or x64 has complex instructions and ARM has simple instructions, so i think that x86 or x64 is more cache friendly, but ARM has wanted to solve the problem by compressing the code by using Thumb-2 that compresses the code, so i think Thumb-2 compresses the size of the code by around 25%, so i think
we have to look if the x86 or x64 are still more cache friendly even with Thumb-2 compression, and i think that x86 or x64 will still optimize more the power or energy efficiency, so i think that there remains that since x86 or x64 has other big advantages, like the advantage that i am talking about below, so i think the x86 or x64 will be still successful big players in the future, so i think it will be the "tendency". So i think that x86 and x64 will be good for a long time to make money in business, and they will be good for business for USA that make the AMD or Intel CPUs.
More of my philosophy about x86 or x64 and ARM architectures and more of my thoughts..
I think i am highly smart, and i think that x86 or x64 architectures
has another big advantage over ARM architecture, and it is the following:
"The Bright Parts of x86
Backward Compatibility
Compatibility is a two-edged sword. One reason that ARM does better in low-power contexts is that its simpler decoder doesn't have to be compatible with large accumulations of legacy cruft. The downside is that ARM operating systems need to be modified for every new chip version.
In contrast, the latest 64-bit chips from AMD and Intel are still able to boot PC DOS, the 16-bit operating system that came with the original IBM PC. Other hardware in the system might not be supported, but the CPUs have retained backward compatibility with every version since 1978.
Many of the bad things about x86 are due to this backward compatibility, but it's worth remembering the benefit that we've had as a result: New PCs have always been able to run old software."
Read more here on the following web link so that to notice it:
https://www.informit.com/articles/article.aspx?p=1676714&seqNum=6
So i think that you can not compare x86 or x64 to ARM, since it is
not just a power efficiency comparison, like some are doing it by comparing
the Apple M1 Pro ARM CPU to x86 or x64 CPUs, it is why i think that x86 or x64 architectures will be here for a long time, so i think that they will be good for a long time to make money in business, and they are a good business for USA that make the AMD or Intel CPUs.
More of my philosophy about weak memory model and ARM and more of my thoughts..
I think ARM hardware memory model is not good, since it is a
weak memory model, so ARM has to provide us with a TSO memory
model that is compatible with x86 TSO memory model, and read what Kent Dickey is saying about it in my following writing:
ProValid, LLC was formed in 2003 to provide hardware design and verification consulting services.
Kent Dickey, founder and President, has had 20 years experience in hardware design and verification. Kent worked at Hewlett-Packard and Intel Corporation, leading teams in ASIC chip design and pre-silicon and post-silicon hardware verification. He architected bus interface chips for high-end servers at both companies. Kent has received more than 10 patents for innovative work in both design and verification.
Read more here about him:
https://www.provalid.com/about/about.html
And read the following thoughts of Kent Dickey about the weak memory model such as of ARM:
"First, the academic literature on ordering models is terrible. My eyes
glaze over and it's just so boring.
I'm going to guess "niev" means naive. I find that surprising since x86
is basically TSO. TSO is a good idea. I think weakly ordered CPUs are a
bad idea.
TSO is just a handy name for the Sparc and x86 effective ordering for
writeback cacheable memory: loads are ordered, and stores are buffered and will complete in order but drain separately from the main CPU pipeline. TSO can allow loads to hit stores in the buffer and see the new value, this doesn't really matter for general ordering purposes.
TSO lets you write basic producer/consumer code with no barriers. In fact, about the only type of code that doesn't just work with no barriers on TSO is Lamport's Bakery Algorithm since it relies on "if I write a location and read it back and it's still there, other CPUs must see that value as well", which isn't true for TSO.
Lock free programming "just works" with TSO or stronger ordering guarantees, and it's extremely difficult to automate putting in barriers for complex algorithms for weakly ordered systems. So code for weakly ordered systems tend to either toss in lots of barriers, or use explicit locks (with barriers). And extremely weakly ordered systems are very hard to reason about, and especially hard to program since many implementations are not as weakly ordered as the specification says they could be, so just running your code and having it work is insufficient. Alpha was terrible in this regard, and I'm glad it's silliness died with it.
HP PA-RISC was documented as weakly ordered, but all implementations
guaranteed full system sequential consistency (and it was tested in and
enforced, but not including things like cache flushing, which did need
barriers). No one wanted to risk breaking software from the original in-order fully sequential machines that might have relied on it. It wasn't really a performance issue, especially once OoO was added.
Weakly ordered CPUs are a bad idea in much the same way in-order VLIW is a bad idea. Certain niche applications might work out fine, but not for a general purpose CPU. It's better to throw some hardware at making TSO perform well, and keep the software simple and easy to get right.
Kent"
Read the rest on the following web link:
https://groups.google.com/g/comp.arch/c/fSIpGiBhUj0
And you can read much more of my thoughts about technology in the following web links:
https://groups.google.com/g/alt.culture.morocco/c/MosH5fY4g_Y
And here:
https://groups.google.com/g/soc.culture.usa/c/N_UxX3OECX4
Thank you,
Amine Moulay Ramdane.