In the diagram of intel core microarchitecture, there are
several execution clusters behind scheduler. It says
several execution cluster can execute Integer ALU
or Integer SIMD ALU. Do it means that each execution
unit can execute part of ALU instructions or each execution
unit can execute all instructions?
Do execution units share the same register file
or have their own register file each?
Does Integer SIMD ALU contain MMX Integer SIMD ALU
and SSE Integer SIMD ALU or only SSE Integer SIMD ALU?
Thanks
Jogging
> I have no experience in x86 assembly coding previously,
> but may need to optimize code on PC.So I begin to read
> the manuals downloaded from Intel's website.
I don't have answers to the questions you posed available off the
top of my head, but I would like to point out that you might find
the materials available at:
http://www.agner.org/optimize/
quite helpful.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
"jogging" <joggi...@MUNGED.microcosmotalk.com> wrote in message
news:4b3a05d5$0$5103$9a6e...@unlimited.newshosting.com...
>
for assembly coding and optimization, in general, you don't need to know any
of this.
even for something like writing an emulator, you don't really need to know
this.
this would be mostly stuff useful to know for HW engineers and similar...
most useful to learn assembly is the software-development related manuals:
Intel� 64 and IA-32 Architectures
Software Developer's Manual
which have different volumes, most useful (for optimization) being the
instruction set reference and optimization guide.
it is also recommended to know the contents of the others as well, 'Basic
Architecture', ...
now, it is worth noting:
trying to micro-optimize in ASM if you don't know assembly all that well is
generally a bad idea.
it is better to turn on compiler optimizations in a C compiler (or whatever
you are using) and living with this (or optimizing at the C level).
this is because in general, many aspects of x86 performance are
counter-intuitive and the compilers generally do a good enough job that it
is actually required to know what one is doing before one has much hope of
doing much better (or can do so without introducing bugs, ...).
similarly, for simple performance issues, this may well introduce otherwise
unecessary CPU and/or OS dependencies...
just beating something together is not really likely to turn out so well...
> Thanks
> Jogging