--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
setMemory isn't backed by intrinsic so you get JNI penalty. copyMemory is but what size are you copying? You mention small arrays which, depending on what we mean, could be quicker to blast through with no memcpy setup. Try larger sizes. Finally, look at the generated assembly to see the difference.
sent from my phone
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
I'm somewhat surprised the cutoff is 4 bytes - I'd have expected larger. Have you looked at the assembly by chance for both versions?
sent from my phone
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On Tuesday, August 25, 2015 at 6:20:38 PM UTC-4, Vitaly Davidovich wrote:
setMemory isn't backed by intrinsic so you get JNI penalty. copyMemory is but what size are you copying? You mention small arrays which, depending on what we mean, could be quicker to blast through with no memcpy setup. Try larger sizes. Finally, look at the generated assembly to see the difference.
sent from my phone
On Aug 25, 2015 5:58 PM, "Kyle Downey" <kyle....@gmail.com> wrote:
--This is my first time posting to mechanical-sympathy.I am seeing a consistent difference in a microbenchmark between calling Unsafe.putByte() for each byte in a byte array vs. either (a) setMemory() to zero out all bytes, or (b) copyMemory() to copy the data in the byte[] array into the native memory. I would have expected the bulk memory writes to be faster than making multiple calls to update memory byte-by-byte, but am seeing the opposite, at least for the small byte[] arrays I'm testing.This is on MacOS X 10.5, JDK 1.8.0_51 on a MacBook Pro with a 2.8 GHz Intel Core i7, benchmarked with JMH settings:# Warmup: 5 iterations, 1 s each# Measurement: 20 iterations, 1 s each# Timeout: 10 min per iteration# Threads: 1 thread, will synchronize iterations# Benchmark mode: Throughput, ops/timeThe difference seen from essentially a one-line change to replace my putBytes() call with copyMemory():OffHeapFastStringPerfTest.appendFastStringNoPrealloc thrpt 20 1895.290 ± 15.210 ops/sOffHeapFastStringPerfTest.appendFastStringNoPreallocUsingCopyMemory thrpt 20 1415.249 ± 16.747 ops/sIs this a well-known difference, and is there something about the way these operations that have been implemented in Java 8 that slows them down?
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/8_Pn597tmFE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
Kyle,
A few quick suggestions:
1) since you appear to be using StringBuilder as baseline, I'd size those instances appropriately upfront. In particular, 32 case will cause a resize.
2) Remove the asserts. it's just unnecessary code noise (shouldn't impact perf in this case since methods are still within frequent code inline threshold).
3) Don't branch based on input array length. Again, it'll get predicted well by the cpu but it's noise and may cause compiler to do something odd (unlikely, but without assembly cannot tell). Create an abstract class with 2 concrete impls instead.
4) Manually common out loop invariant calculations in the putByte loop (i.e. address + startIndex). Compiler *should* pick that up but without assembly hard to say (plus you're not trying to test that aspect).
5) My hunch is compiler is not unrolling the putByte loop as it doesn't know if the stores alias with the loads. Try manually unrolling, say, a 8 byte loop and see if anything changes.
6) immaterial to the perf, but I'd make unsafe field final or just remove it entirely (assuming NativeBytes.UNSAFE is static final it'll become JIT constant).
sent from my phone
Argh - #6 should say make unsafe static final, not just final.
sent from my phone