Rosetta Emulation

0 views

Skip to first unread message

Do Kieu

unread,

Aug 5, 2024, 10:11:31 AM8/5/24

to quiproffitri

Ikeep having lots of issues with LRClassic and being that I am big user of LR only real reason I went for Ultra vs less expensive Mac studio was it was supposed to be faster. Not so fast and wondering why I have rosetta emulation mode on ( maybe adobe support told me to use it a year ago ? seems like they should be playing nicer together. ANy help would be appreciated. LR actually quite today and computer restarted as I was attempting to import images. ( images are ingested with card reader and I make a dng copy)

However, there have been some specific code components that were not yet Apple Silicon native. Some examples have been the Creative Cloud installer, and also the code for tethered shooting for specific cameras (see the Adobe help article Tether support on Apple Silicon devices). In the past, Rosetta has been needed to get those working. Adobe has been converting more of those loose ends to be Apple Silicon native, so running through Rosetta has been much less necessary in recent versions. But that article says Rosetta is still needed for tethered shooting.

For use of tethering within Lightroom (ver 13.2) I activated Rosetta on my Mac Studio M1 version. My Nikon Z8 couldn't fire a shot so I switched to original Nikon software. Now I would like to disable Rosetta but was not able by using the support file which stated using terminal. I could not find in the library the mentioned Rosetta files and therefore could not disable them. I'm not familiar with terminal and code, so I'm anxious to destroy my good working installation.

In my (Dutch) Lightroom version it is written: Rosetta emulation mode on. Is there a simple switch to turn it off? I uploaded 388 photographs and it took more than 1,5 hour to import the whole map. It wasn't that slow before Rosetta. Please give me some clues.

use the last section of this article to turn off Rosetta for that ONE application, if possible. if not possible, you will need to obtain an updated version of the Lightroom software. Here is an example of what that looks like:

Check for version an updated of Lightroom that supports Apple-silicon Mac processors DIRECTLY, or is sold as "universal", which means it has both intel binaries and Apple-silicon binaries inside, and the correct one is selected automatically at run-time.

If you're on an Apple Silicon Mac and you're talking about an Apple-Silicon-only application, Rosetta 2 would not come into play. It might be on the system, because of having been needed for another application, but the system would just run the Apple-Silicon-native application directly.

If you have an application with a Universal 2 binary that contains both Intel code and Apple Silicon code, you can go into the application's Get Info box and choose whether to run it natively, or to run the Intel version using Rosetta 2. No need to disable Rosetta 2 system-wide. Just set your preference for that application via the user interface that Apple has provided for that purpose.

I utilize Microsoft MSSQL (mcr.microsoft.com/mssql/server:2022-latest) on my M3 MacBook Pro. I have been running this Docker image with platform emulation set to linux/amd64 without issue for several months. After upgrading to Sonoma 14.5, I have been facing poor performance with Rosetta emulation on my Mac. Things that worked prior to the update no longer work. Cross post of: MSSQL Docker Github issue.

We're seeing our Docker PostGIS database get corrupted periodically under 14.5. There's no official ARM image for PostGIS yet, so we're using the amd64 version. I can't say we've noticed a performance change in 14.5, but we've started seeing the corruption across multiple developer computers and the only common thread is that it's happening on computers that have upgraded to 14.5, but not on ones that are still on 14.4.

I started to have issues with Docker Desktop recently too. MySQL or MSSQL docker images with rosetta emulation have really poor performance now and they are restarting from time to time on heavy loads.

Rosetta 2 translates the entire text segment of the binary from x86 to ARM up-front. It also supports just-in-time (JIT) translation, but that is used relatively rarely, avoiding both the direct runtime cost of compilation, and any indirect instruction and data cache effects.

[Correction: an earlier version of this post said that every ahead-of-time translated instruction was a valid entry point. While I still believe it would be valid to jump to almost any ahead-of-time translated instruction, the lookup tables used do not allow for this. I believe this is an optimisation to keep the lookup size small. The prologue/epilogue optimisation was also discovered after the initial version of this post.]

Each x86 instruction is translated to one or more ARM instructions once within the ahead-of-time binary (with the exception of NOPs, which are ignored). When an indirect jump or call sets the instruction pointer to an arbitrary offset in the text segment, the runtime will look up the corresponding translated instruction, and branch there.

This uses an x86 to ARM lookup table that contains all function starts, and other basic blocks that are otherwise not referenced. If it misses this, for example while handling a switch-statement, it can fall back to the JIT.

To allow for precise exception handling, sampling profiling, and attaching debuggers, Rosetta 2 maintains a mapping from translated ARM instructions to their original x86 address, and guarantees that the state will be canonical between each instruction.

(The two lookups (one from x86 to ARM and the other from ARM to x86) are found via the fragment list found in LC_AOT_METADATA. Branch target results are cached in a hash-map. Various structures can be used for these, but in one binary the performance-critical x86 to ARM mapping used a two-level binary search, and the much larger, less-performance-critical ARM to x86 mapping used a top-level binary search, followed by a linear scan through bit-packed data.)

Rosetta 2 takes advantage of this by rewriting x86 CALL and RET instructions to ARM BL and RET instructions (as well as the architectural loads/stores and stack-pointer adjustments). This also requires some extra book-keeping, saving the expected x86 return-address and the corresponding translated jump target on a special stack when calling, and validating them when returning, but it allows for correct return prediction.

A lot of overhead comes from small differences in behaviour between x86 and ARM, like the semantics of flags. Rosetta 2 uses the ARM flag-manipulation extensions (FEAT_FlagM and FEAT_FlagM2) to handle these differences efficiently.

x86 shift instructions also require complicated flag handling, as it shifts bits into the carry flag. The RMIF instruction (rotate-mask-insert-flags) is used within rosetta to move an arbitrary bit from a register into an arbitrary flag, which makes emulating fixed-shifts (among other things) relatively efficient. Variable shifts remain relatively inefficient if flags escape, as the flags must not be modified when shifting by zero, requiring a conditional branch.

One non-standard ARM extension available on the Apple M1 that has been widely publicised is hardware support for TSO (total-store-ordering), which, when enabled, gives regular ARM load-and-store instructions the same ordering guarantees that loads and stores have on an x86 system.

There are only a handful of different instructions that account for 90% of all operations executed, and, near the top of that list are addition and subtraction. On ARM these can optionally set the four-bit NZCV register, whereas on x86 these always set six flag bits: CF, ZF, SF and OF (which correspond well-enough to NZCV), as well as PF (the parity flag) and AF (the adjust flag).

I believe there is room for performance improvement in Rosetta 2, by performing more inter-instruction optimisations. However, this would come at the cost of significantly increased complexity (especially for debugging and exception handling), and increased translation times.

After reading some comments I realised this was a significant omission from the original post. Rosetta 2 provides full emulation for the SSE2 SIMD instruction set. These instructions have been enabled in compilers by default for many years, so this would have been required for compatibility. However, all common operations are translated to a reasonably-optimised sequence of NEON operations. This is critical to the performance of software that has been optimised to use these instructions.

However, there is a full mapping in the other direction (ARM PC to x86 address). This is split into two levels, a top level binary search, and then second-level bit-packed delta-encoded entries for each instruction in the group.

Both these flags are computed on every ADD or SUB instruction (extremely often), and 64-bit ARM has no such functionality. For example, to compute the parity flag on ARM, a subtraction turns into something like:

When I switch on emulation it helps run AMD containers (faster) but I have a problem with AMD containers running k8s. For example, pods do not respond to liveness probs. I have not investigate it yet but maybe anybody share some information

No that should suffice. Once your X64 docker container is started, you can enter the command line into the docker container and then type ps -ef. You should then see a list of processes running inside the docker. Each X64 process should be preceded by /rosetta/rosetta. Hope this helps!

I am curious about the vastly different performance characteristics of running x86-64 binaries on the Apple M1 platform using Rosetta 2 vs. emulation, for example what Docker Desktop currently does using QEMU.

The gist of that explanation is that under usual circumstances, arm and x86 have opposite (and incompatible) memory addressing schemes which require significant emulation overhead, but the M1 chip addresses this with a hardware optimization that allows it to access memory using both addressing schemes. Effectively, when Rosetta 2-emulated instructions are being run, a flag is set to let the processor know to use the x86-style addressing scheme.