Pretty good. Lets go over some of the points (skipping over the casting issues and other style rather than actual semantic risk things)
First, there was the GC thing:
People of this list seem to get the GC-can-happen-between-code-lines-and-move-my-obejcts-around risk, which can make a computed base address wrong. The getLong(long, long) signature [and all other get/put variant with direct address+offset parameters) should only be used for off-heap, known-to-remain-static addresses. They can NEVER be safely used to access content in the heap, as the input address is stale by definition.
In my experience, this GC-can-move-your-stuff thing is probably the biggest pitfall in the current unsafe API [So this means people posting on this list are pretty good, since you all noted it]. While most people would instinctively balk at the casting of a reference into a long for address computation purposes, their objections are usually on the grounds of "code cleanliness" and type safety. This usually means that if they think they know what they are doing, they would be willing to do some dangerous but well understood casting in a contained piece of code to achieve something. After all, that is exactly what good practice would be in C (make your cross-type casting in well contained inline functions or macros). The GC thing often does not come up as a reason unless you point to it and ask "what can go wrong *there*?, and even there most people I've tried this with can't quite articulate what could happen, just that it's against some rule and probably not a good idea. The problem is compounded by the fact that this one-line-race-withGC is very hard to expose in testing, since GC events are extremely (dynamically, form an instruction count point of view) rare, and the chance of a relocating GC hitting you right between the specific two lines in a race is very low. The fact that in low latency systems there are very few actually running threads at most times, and that GC is usually triggered at an allocation site, makes it virtually impossible to design a QA test that will actually hit this sort of thing in a full system without aid from the JVM (using it's various non-product stress modes), and even then it's just "more likely" to be found. [BTW, you guys that do use unsafe should probably be baking your code with fastdebug builds of OpenJDK with all sorts of modes like ScavengeALot and GCALotAtAllSafepoints turned on].
Next, there was the reference size thing is another item (could be 32 bit or 64 bit) Nitsan noted. But it's representation is also an issue (e.g. compressedOops). But that doesn't matter, because there is NEVER a safe way to read a reference field as anything other than an Object type. So while casting a reference field is possible, all scalar representations for the contents of a reference field are useless by definition because they become stale immediately after the read. This includes all those "non-addressing" use-mode excuses I've seen attempted, like use for keying and comparison purposes.
Next, there were assumptions about constant behaviors over time:
Rudiger noted the assumptions about internalLongArrayOffset, arrayBaseOffset being constant over runtime, including the important "(which is true currently)" note. Both Nitsan and Rudiger noted the assumption that the long scale factor is 8, and Nitsan noted that a less dangerous way would be to use arrayIndexScale to derive the proper scale. But both forgot that there is an assumption that this scale is constant over time (which is also "currently true").
The "constant over time" assumption is one of the greatest weaknesses of the current Unsafe API, and one of the reasons I expect it to be deprecated in some future point in time. Those "currently true" statements can become non-true at any time in the future, and will actually have to do so if some of the perfectly valid under-the-hood optimization techniques JVM researches are working on make it to production. JVMs have the power to safely change object layout over time through a combination of all-reference-fixups-traversal (usually folded into GC) and de-optimization. While none currently do this in production code, there is plenty of interesting work done by folks on layout optimization techniques that could change class layout at any safepoint in your code (read as: "between any two operations that you can state in Java"). These layout change things are usually motivated either by access pattern optimization work or by field and object compression work. I've even run into ones where arrays grow backwards in memory (making scale factor negative). If any of those layouts-change-during-runtime things ever makes it to production, you can kiss all the current unsafe methods that access heap contents good-bye, as their APIs will be forced to change. The reason is that while the reference itself can probably be guaranteed to remain non-stale, any relative offset computation may become stale-by-definition any day now. Possible victims include field offset computation, array base offset computation, and array scale factor computation, which would make some or all of the direct computed-offset access to heap objects unavailable in future unsafe APIs. Note that the pure (safe) Java means of accessing these things via reflection APIs will continue to work, and do so quite optimally, since the compilers will be able to de-optimize and re-optimize code that assumes these things are "temporarily constant" (a recorded assumption that can be used to trigger a de-optimization when they change).
Another assumption that none of you stated was that the various layout information points are constant at a point in time. e.g. the assumption that a given Java class has only one current value for field offsets, and that arrays of longs have a single scale at a given point in time. Here again there are various techniques (usually motivated by compression or instrumentation) that are being researched where different instances of the same Java class may have varying heap layout representations at the same point in time. The easiest to comprehend example of this are the on-again-off-again discussions on holding String contents in memory in 8-bits-per-char form when the string contains only ASCII chars. But I've seen heap-storage-reduction work that suggests doing the same for any primitive array where immutability or low chance of out-of-expected-value-range modification can be inferred. And I've seen sound academic papers that do similar thing to commonly used non-array classes. These thing usually (not always) take the form of providing a Java class with multiple under-the-hood class representations that are proper classes from the JVM point of view (each with constant notions of field offsets and scales), but are folded into a single class from a Java semantics point of view. Since the class that your unsafe APIs are working on is a Java semantics class, there is not necessarily going to be a good constant-for-this-class answer to many of the offset and scale querying things in the current unsafe API.
That's probably enough for this one example.
Hopefully this help explain some of the fragility of key parts of the unsafe API. While I think the unsafe APIs for off-heap access can and should be solidified and published as a safe-to-rely-on API (much like JNI), and so should things like fences in general and atomics for off-heap, I see the current on-heap object manipulation via unsafe (by anything outside of the JDK libraries) as a very flakey abstraction that is likely to destabilize and change over time. This makes it very risky from a code-logenvity point of view to use them, even though they do work on most JVMs (Zing included). The JDK libraries have to use them, but as I noted before, their implementation can synchronously and safely be changed to match any changes to the unsafe APIs in the future.
As it stands, BTW, I've seen little performance based justification for (outside of JDK libs) use of most of these heap-accessing unsafe APIs, which, as noted previously, are very different from the off-heap ones even though they differ only in signature. I'd love to see examples (with measurements) of actual speed improvement through use of on-heap unsafe access. If JIT optimizations do not already make most of these performance-nuetral, we should keep working on those optimizations...