Take a class on computer architecture! Narrowing the question to instruction set architecture misses the point.
Turn the question around, ask how do I write software that runs optimally (NOT 'in the same time') on an 'x86 like' architecture and on an 'arm like' architecture? The answer is if necessary use native threads. Any real answer would also require thinking about the GPU.
With respect to Python, if the question is how do I write software that runs as fast as C code? Generally you can't. Abstraction has a price. Python has a C interface so you can prioritize performance over abstraction, as implemented in any number of Python packages.
If a program is slow, it is ALWAYS because the programmer created it to be slow (by some choice the programmer made). Once a programmer understands this, they are free to review their choices (which does not mean jump to conclusions).
Fundamental thing to remember:
Good, Fast, Cheap, pick any two.🤣