Some background: As those of you who have read my various rants may have noticed, I spend a lot of my time thinking about the behavior of latency/response-time/reaction-time. In addition to trying to understand and teach about the behavior better (with monitoring and measurement tools like HdrHistogram, LatencyUtils, and jHiccup), I actually work on things that makes that try to improve bad behavior (for some definitions of "improve" and "bad"). Eliminating pausing behavior in GC was the lowest hanging fruit, but more recent work has focused on eliminating pauses due to things like at-market-open deoptimzations, lock deflation, lock de-basing, and all sorts of TTSP (time to safepoint) issues. I've also learned a lot about how to bring down Linux's contribution to latency spikes.But the JVM and the OS are not the only things that causes latency spikes. Sometimes your code is just "spiky". In my day job, I keep running into in actual, real-world low latency system code that is typically super-fast, but occasionally spikes in actual work latency due to some rare but huge piece of work that needs to be done, most often due to some state accumulation. After we eliminate GC pauses (which tend to dominate latency spikes, and which simply disappear immediately when Zing is applied), we often see this nice pattern of growing latency spikes at growing intervals, with a near-perfect doubling in both magnitude and interval between the spikes. This happens so often that we've studied the common causes, and (by far) the most common culprits are HashMaps used to accumulate something during the trading day. And it's all about resizing work.I've had "build a Pauseless HashMap" on my weekend project list for over a year now, but finally got around to actually building it (at the request of someone on this list). There are probably at least 17 ways to skin a HashMap so it won't stall puts and gets when it resizes, but this is my simple take on it:Keep in mind that this is a "probably-working draft" that's gone through some bench testing, but is not yet battle hardened (scrutiny is welcome).
I intentionally based this version on Apache Harmony's version of HashMap, and not on OpenJDK's, in order to make it available without GPLv2 license restrictions (for those who may want to include it in non-GPL products). The background resizing concept itself is simple, and can be applied just as easily to the OpenJDK version (e.g. if some future Java SE version wants to use it). You can use (https://svn.apache.org/repos/asf/harmony/enhanced/java/trunk/classlib/modules/luni/src/test/api/common/org/apache/harmony/luni/tests/java/util/HashMapTest.java) as a baseline comparison for the code I started with.
Before my starting into research PHM details and its source-code, could you confirm:1. "Pauseless" means less, not zero, stop-the-world GC pauses. STW can still happen, right?
2. runs on HotSpot with any specialized Collector/SurvivorRatio/etc JVM boot parameters?
3. PHM stays entirely on-heap and benefits from longterm background GC ?
Implementation Note: To maintain time-determinism, rehash/resize is performed only when the map's size is small (see chart). For large maps (size > 512), the map is divided recursively into (64) smaller sub-maps. The cost of the dispatching (based upon hashcode value) has been measured to be at most 20% of the access time (and most often way less).
I'm not sure if this works as a direct drop-in replacement. Each instance kicks off 2 threads for background resizing. An application with a moderate to large number of HashMap instances could quickly end up spawning many more threads than otherwise anticipated, causing issues elsewhere - in ways you wouldn't expect of "just a map". I think any usage of a map like this should be with fully understanding the resource usage implications.
Also, I think it might be useful to be able to pass in a ScheduledExecutorService when constructing, so that they could be potentially shared among several such maps.
3. Think of HashMaps with *lots* of entries, that grow all day. That's an example of what PauselessHashMap was built to deal with, without slowing down or pausing.
There are probably at least 17 ways to skin a HashMap [...]
static final ScheduledExecutorService backgroundResizers =Executors.newScheduledThreadPool(DEFAULT_NUMBER_OF_BACKGROUND_RESIZER_EXECUTOR_THREADS);
On the key thing, background resizing:Comments in the code don't sufficiently describe relations between all 7 involved fields.
- Why sometimes pending resize is checked as if (!pendingResize), sometimes as if (resizingIntoElementData != null)?
- observedResizingIntoTable is set to false only in finishResizing(), finishResizing() is called only if backgroundResizeComplete is true,backgroundResizeComplete is set to true at the end of Resizer runnable, spin-wait on observedResizingIntoTable to become false is in the beginning of Resizer -- how doesn't this lead to dead lock?
Misc:1. Don't repeat mistakes of HashMap, class should be parameterized with expectedSize and loadFactor instead of capacity and loadFactor.
2. Increment modCount before structural modifications, compare yours clearImpl() with http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java#l859 and other places where modCount is incremented.
3. hashCode is computed twice in putImpl
- Why sometimes pending resize is checked as if (!pendingResize), sometimes as if (resizingIntoElementData != null)?Those two conditions are not equivalent. Operations that depend on a resize being in flight check pendingResize. Operations that depend on a resizingIntoElementData existing check for it. There is a (potentially long) period of time between pendingResize being set and resizingIntoElementData being populated.- observedResizingIntoTable is set to false only in finishResizing(), finishResizing() is called only if backgroundResizeComplete is true,backgroundResizeComplete is set to true at the end of Resizer runnable, spin-wait on observedResizingIntoTable to become false is in the beginning of Resizer -- how doesn't this lead to dead lock?It's not a dead-lock, but a potentially never-done resize, and a potential memory leak as a result.If a resize is triggered and no further put access into the hash map is done after that, the background resizer will never progress past the spin-wait loop that waits for the mutator to indicate observedResizingIntoTable. This doesn't stop anyone else from doing anything, but the poor resizer task will keep polling on the observed condition every millisecond. A back off (to longer waits) in the wait loop is probably appropriate here.The memory leak problem this raises would occur on the edge case where the very last put access to a hash map instanced triggered a resize, and the instance then "dies" from the mutator's point of view. In this situation, the background resizer will keep the instance alive while waiting for an observed indication that will never come... Fixing this probably requires the background resizer to "give up" on resizing on some timeout condition. A bit tricky (need to avoid deadlocking with mutator operations that block waiting for resize to complete), but I think it's doable...
Misc:1. Don't repeat mistakes of HashMap, class should be parameterized with expectedSize and loadFactor instead of capacity and loadFactor.PauselessHashMap is meant to be a drop-in replacement for HashMap. so I intentionally repeat all of it's externally visible interface mistakes ;-)
2. Increment modCount before structural modifications, compare yours clearImpl() with http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java#l859 and other places where modCount is incremented.I started with Harmony's HashMap, and avoided trying to optimize anything in it that didn't have to do with the problem I was working on. I included a wrong link in my original posting above (now fixed). The baseline code is at:and as you can see in clear() there, the modCount change in the loop comes from how Harmony chose to do this. The tradeoff here appears to be between how frequently you touch modCount in structural changes, and how lax your concurrent modification detection would be. I just wanted to avoid stepping on some assumption Harmony has in the code.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/DY8vysxdmj4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
java -version
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
java -cp -server -Xmx20g -cp .:./PauselessHashMap-1.0-SNAPSHOT.jar IntTest
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
>In a real world scenario no one would be inserting millions of items in a batch normally you would do a mix of inserts and removals.That's not true if one intends to use the hash map to cache data on the server side.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Benedict Elliott Smith: spreading the cost of the migration over the inserts that follow its initiation (and possibly gets/removes)
This approach looks like a lot easier to program: you do not need to synchronize concurrent read/writes from multiple threads
The drawback of that "pauseless" is that you'll keep old Object[] array longer (no problem for azul, a promotion problem for OpenJDK)
--
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
In a world with more and more cores that are difficult to use efficiently: I think this off-load-to-thread approach very interesting.
And I think this technic is more cache friendly because fast-path threads are doing the minimum and background threads concentrate around the resize.
Still, the single-thread is quite nice.
I would happily enjoy a head-to-head pauselessHM in this group to find the best possible implementation for a few scenarios/benchmarks :)
@Gil would you be OK to accept requests of alternatives impl / benchmarks to your github project?
I would happily contribute!
Would be a nice playground for everybody I think :)
Cheers
GG
In a world with more and more cores that are difficult to use efficiently: I think this off-load-to-thread approach very interesting
And I think this technic is more cache friendly because fast-path threads are doing the minimum and background threads concentrate around the resize.
I've a workload to do and unless this offers me some advantage over HashMap to reduce the overall time it takes to crunch through that workload I'm wondering, what's the point? After all isn't jitter another form of dead time and shouldn't reducing dead time equate to faster processing of a workload?
That's my case.
I've a workload to do and unless this offers me some advantage over HashMap to reduce the overall time it takes to crunch through that workload I'm wondering, what's the point? After all isn't jitter another form of dead time and shouldn't reducing dead time equate to faster processing of a workload?
This is a different problem. You're addressing total throughput; jitter is important for users with tight SLAs on latency for an operation. These users are usually quite willing to accept lower throughput in order to meet their SLAs.
--
> To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
- Your get is currently doing a lookup either in the source table or in the target table, which I think is a bug. At least for the >= migrated range, you need to look up in both, since your put implementation forces whatever bucket it hits to migrate to the target table.
- The incremental copy approach looks up in both source and target tables (see note below)
- while the cost of a get() during a resize is similar in both (at about 2x the cost of a get when no resize is happening). However, since the background copy operations will complete faster, the number of get and put operations impacted by the resize should be significantly lower in the background case
This can be mitigated by limiting the number of entries (rather than buckets) copied per put, without incurring much added cost (since the get needs to scan both source and target anyway to be correct).
Your code does not handle null keys right now, which I think is the main cause of code complexity difference.
However, since the background copy operations will complete faster, the number of get and put operations impacted by the resize should be significantly lower in the background case, and this relative benefit grows as the rate of access drops
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Benedict Elliott Smith: spreading the cost of the migration over the inserts that follow its initiation (and possibly gets/removes)This approach looks like a lot easier to program: you do not need to synchronize concurrent read/writes from multiple threads (the background resizer vs mutator).In azul you do not even require that background allocator thread.The drawback of that "pauseless" is that you'll keep old Object[] array longer (no problem for azul, a promotion problem for OpenJDK)Gil,Here are some more questions:1) Did you consider using less non-static fields? I understand you might want to leak "the first dose of pauseless", then suggest azul when users realize all their PauselessMaps consume all the RAM :)
As far as I understand, reading volatile fields does not involve much overhead, thus there should be no much improvement out of using non-volatile + volatile pair. That is volatile update that require overhead of false sharing, cache line contention, etc.
2) Entry#isvalid is used inconsistent in findNullKeyEntryInChain vs findNonNullKeyEntryInChain. Is that intentional?
3) Reader thread has absolutely no synchronization in respect to values read out of newly created Entry objects in background. There is no synchronization when accessing resizingIntoElementData either.
Typical JDK implementation allows to reason on the concurrency correctness in terms of JLS, thus it is correct for a valid JMM implementation. Your code is written in terms of memory barriers, thus it is not clear if it is bullet-proof.
4) It is very hard to trace if if (result == null) {result = resizingIntoEntry.value;} that overrides the previous value of result = entry.value; is fine.On the top of my head I can't find a case when this would result in invalid "result". However, if both those values are always the same, then why bother with if (result == null) {result = resizingIntoEntry.value;}?
The fact that I especially do not like is that result==null is a valid value (i.e. null was stored in the Map), then you might rewrite that result=null with some unknown resizingIntoEntry.value.
Vladimir
Can you please use JMH as a benchmark harness?
Hint: JMH is much likely to measure the right thing than other harness.
Hint2: when using JMH, Alexey validates the benchmark for free. That is what I see.
Hint3: when you hand-roll a benchmark harness God kills a kitten.
Vladimir