According to wiki x86-64 supports 16-bit, 32-bit and 64-bit programs. Does it support one better than the others in the sense it runs the code faster? Someone (who was probably blowing smoke up my ass) was telling me that Operteron CPUs were the first to run 64bits with still being native in 32bits.
The x86-64 is capable of executing many instructions of a wide variety of operand widths in essentially the same unit time, so instructions with 64 bit operands aren't really faster or slower than other widths as a general rule. (Some of the smaller-width instructions may actually be slowed down by partial-register write stalls).
But 64 bit instructions are more efficient in the sense that more data bits get processed per unit time. (For integer operands, I wonder how effective this is in practice: most just-plain-integer numbers are pretty small and processing the top 56 bits of all-zeros or all-ones doesn't really add value, instead it just adds heat).
X86-64 runs instructions with 64 bit operands, which x86-32 simply cannot do, so there's a real qualitative difference. This allows larger individual values to be processed, and, probably the major benefit of 64 bit systems, much larger data sets without complex address space management. Technically an X86-64 can run much larger programs, but in practice nobody writes single programs big enough for this to matter.
I doubt about Opterons were first to do this. (Hardly anybody who claims to be first was first; consider Christopher Columbus). The mainframe guys (e.g., IBM) made transitions from 32 to 64 bits quite awhile back, allowing both 32 and 64 bit instruction forms to run "natively" on the same CPU with just a change to one bit in the PSW. The Opterons were likely the first to do this in the Intel instruction set space.
There aren't any CPUs where it's not worth using 64bit mode, if the CPU supports it at all. Atom/Silvermont might be on the edge, since they can slow down when too many prefix bytes are needed on an instruction, and REX counts. (So do the required prefix-bytes that are really part of the opcodes for SSE instructions.) As I understand it, 64bit is still a net win for them, but possibly not as big a win.
Low-memory systems can sometimes do better with a 32bit OS than a 64bit OS. Some of that is that 64bit OSes still need to have copies of the 32bit libraries, so they can run 32 or 64bit programs. Windows especially will tend to have both 32 and 64bit processes running at all times, so both versions of many libraries will actually be in memory at once, not just on disk. I haven't measured to see if Linux or Windows is worse about using more memory at a bare desktop when going from 32 to 64bit, but at least a Linux desktop won't have any 32bit processes that can't share the same 32bit libraries everything else is using. This paragraph is way off topic for SO, sorry.
Even an ideal ABI in x86-32 code that assumed AVX2 support would be hampered by the register scarcity (7 general purpose regs not including the stack pointer, and only 8 vector regs). 64bit mode has 15GP and 16 vector regs, and a new RIP-relative addressing mode mostly removes the overhead of making position-independent (library) code. The extra regs and better ABI are usually quoted as being worth about 15% performance.These factors apply specifically to x86-32 vs. x86-64, not to 32 vs. 64bit in general (like for PowerPC or SPARC: on those systems it's common for simple programs (like ls) to be 32bit). Only programs that might need more than 4GiB of address space benefit from being 64bit and being burdened with pointers that are twice as large. 64bit ARM has some design improvements over 32bit ARM, but AFAIK not nearly the leap that x86 got from AMD64.
To put it another way: what makes x86-64 good is mostly not the widening of each register to 64b, it's the other architectural improvements and the chance to make a partial break with many years of backwards compatibility (esp. in software standards. The insn set improvements could have been better, but AMD prob. wanted to make decoding as similar as possible to share transistors. They could have deprecated more of the useless instructions and added new ones. A setcc r/m32 would be really nice, and could have used two of the removed BCD opcodes. A cmovcc r, imm32 would be neat, too. Two opcodes each would do it, combined with a 3bit field in the mod/rm byte, to give the 4 bits needed to encode all 16 cc conditions. Redefining the shift instructions to always write flags, instead of conditionally not changing flags depending on shift count, would have made them cheaper, but again would have required more transistors because 32bit mode still has to be fast. So it's nowhere near a clean break with the x86 ISA's cruft, but that's not a major obstacle to high performance in modern chips.)
Linux's x32 ABI is an attempt to provide the speedups of a modern ABI and 64bit-mode without the burden of 64bit pointers. It's a big win in code with pointer-heavy data structures. (Note that even though RAM is cheap, cache is not, so smaller data structures matter.)
64bit mode (including x32) allows much more efficient copying and computations with 64bit integers.Anything that works with file sizes needs 64bit math. A lot of stuff uses 64bit numbers these days, because they're the new "big enough and everyone supports them efficiently" size. Even before 32bit mode was really obsolete, file sizes had to be 64bit, but now 64bit time values are replacing 32bit seconds since the epoch, and stuff like that. (We have to finish doing that before 2038 to avoid 32bit wraparound).
16bit mode isn't useful for anything in practice, but as I understand it modern CPUs still blaze along at full speed in 16bit mode. You're more likely to run into partial-register stalls in 16bit code, since it often uses byte registers. 16bit code for 386 also uses 32bit registers sometimes, producing more stalls (and probably length-changing prefixes for immediates bigger than 8b).
16-bit real mode running natively on a CPU can't use paging, so you never have TLB misses. (Running 16-bit code in virtual-8086 mode or 16-bit protected mode under a normal 32-bit OS will have paging enabled, though. Or even in real mode inside a VM.)
You can leave paging disabled in 32-bit protected mode, too, so this isn't really an advantage of 16-bit code. But 64-bit long mode requires paging to be enabled. You can map all of memory with a few 1GB hugepages so you'll have very few TLB misses, though.
32bit operand size is the fastest in 64bit code. There's a code-size advantage in using 32bit variables (except when an extra insn is needed to sign-extend array indices to 64bit so they can be used in addressing modes with pointers). 64bit is also cheap, but 16b and 8b. can get ugly and be much worse than just the code-size difference.
The same opcode is used for 16, 32, and 64bit operand sizes, with either an operand-size 0x66 prefix, no prefix, or REX prefix with its W field set (aka REX.W). 8bit insns have separate opcodes, so they have the same code-size advantage.
The results of 1-operand mul go in [E/R]DX:[E/R]AX, so maybe the outputs of the multipliers are wired up in a way that requires an extra uop to split the halves of a 64bit output into two regs. Even the 2 and 3 operand forms of imul r16, r/m16, imm8 are an extra uop when 16bit.
Some early 64bit-capable CPUs had some limitations in 64bit mode. e.g. Core2 (Intel's 64bit P6-family design) can only macro-fuse compare-and-branch in 32bit mode. That applies regardless of operand-size though, and depends on the mode.
64bit mode was really "bolted on" in p4, where shl r32, imm is 1c latency, but shl r64, imm is 7c latency: even some simple execution units were not 32b. IIRC, that wasn't a problem for K8 Opteron. 64bit CPUs run 32bit code natively as well, even when the OS was 64b (unlike IA-64, which had either slow ia32 HW or pure emulation). Probably what you heard was a garbled 3rd-hand version of that. Although as Paul Clayton points out, the slow x86 hardware on early Itanics sort of counts as "native".
8 and 16bit operand sizes tend to create partial-register stalls on Intel CPUs (pre IvB). Writing an 8b or 16b register doesn't clear the upper bits, so there's a dependency on the previous contents of the full register. Some CPUs just make such insns wait for the full reg to be ready. Intel P6 was designed back when 16bit code was still relevant (PPro was released in Nov 1995, so design obvious started before that. Even Win95 still had significant amounts of 16bit code, I think.) This may be why Intel P6 (and later SnB-family) does register renaming on the 8 and 16b partial registers. A read of a wider reg after a write of a partial reg causes a stall (or just insertion of a merging uop: SnB-family). Or on Haswell and later, no penalty at all: All the benefit of no false dependencies, but no penalty even for writing a reg like ah and then reading eax. (IvB had no penalty for cases other than the high8 registers).
This isn't a problem with mixing 32 and 64bit, because any write to a 32b register zeros the upper32 of the full 64b. This nicely avoids the false-dependency issue. When you do need to merge 64b regs, you can just AND/OR, or use shld.
Smoke 2 Click is based on the ADPD188BI, a complete photometric system for smoke detection using optical dual-wavelength technology from Analog Devices. The module combines the dual photodetector with two separate LEDs and a mixed-signal photometric front-end ASIC, and prevents light from going directly from the LED to the photodiode without first entering the smoke detection chamber. The dual-wavelength combination in a scattering measurement, a 470nm blue LED and an 850nm IR LED, allows particle size discrimination between different types of smoke, dust, or steam. The core circuitry stimulates the LEDs and measures the corresponding optical return signals. This integrated solution enables low power and reduces false smoke alarms in harsh environments due to dust, steam, and other nuisance sources.
795a8134c1