Java - Reuse String Instances

636 views
Skip to first unread message

lior simon

unread,
May 3, 2015, 4:54:12 PM5/3/15
to mechanica...@googlegroups.com
Hi,

Background
------------------
I was developing an application that performs ~30k QPS in less than 5ms latency for a request (for 99% of the requests).
I've done all kind of optimizations such as object pooling and object representation within one big array in order to avoid full gc cycles, and indeed, a full gc cycle will happen once a week or so.
The thing is, even minor gc cycles are a bit problematic because it makes ~80ms hiccups every few seconds.

I'm using Netty as an embedded http web server in my application.

My objective
-------------------
Eliminate minor gc cycles, or at least make them happen rarely.

The problems
--------------------
As every application, every request instantiates a few strings that are created at runtime.
The amount of distinct values of the strings is huge (>400M) and I can't really create them in advance.

So how can I achieve my objective, given that Strings can't be pooled, and I use Netty as the http web server which already instantiates Strings when it forwards strings representing the incoming url?
Is there a way to achieve that using Unsafe or something like that?

Thanks a lot!



peter royal

unread,
May 3, 2015, 5:11:31 PM5/3/15
to mechanica...@googlegroups.com
If your budget allows for it, use the Zing JVM from Azul. 

An easy way to try it is with the EC2 AMIs they have available. 

I highly suggest running the numbers comparing developer time spent (and cost) vs licensing costs

-pete 

-- 
peter royal - (on the go)
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lior simon

unread,
May 3, 2015, 10:31:28 PM5/3/15
to mechanica...@googlegroups.com
What is the other alternative other than proprietary jvm?
How can this be achievable?

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Kevin Burton

unread,
May 3, 2015, 11:00:37 PM5/3/15
to mechanica...@googlegroups.com
As every application, every request instantiates a few strings that are created at runtime.
The amount of distinct values of the strings is huge (>400M) and I can't really create them in advance.

So how can I achieve my objective, given that Strings can't be pooled, and I use Netty as the http web server which already instantiates Strings when it forwards strings representing the incoming url?
Is there a way to achieve that using Unsafe or something like that?


Not sure if this solves your problem, but Netty supports composite buffers.

So if you need the string: 

Content-Length: 100

Create two strings,

one as "Content-Length: " 

and the other of "100"

Then use a composite buffer.  Netty *should* be doing this internally, if not it's at least a performance optimization to reduce jitter.

I'm not sure what your string space looks like but if you could intern them this way you could save jitter at the cost of string lookup.

Tomasz Borek

unread,
May 3, 2015, 11:02:34 PM5/3/15
to mechanica...@googlegroups.com
I have a feeling it might help to get specific answers if you provide more info on GC settings you have. Unless you're after general answers.

pozdrawiam,
LAFK

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

lior simon

unread,
May 3, 2015, 11:40:42 PM5/3/15
to mechanica...@googlegroups.com
I'm running the java process with the following jvm flags:

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSScavengeBeforeRemark -XX:SurvivorRatio=10 -XX:TargetSurvivorRatio=90 -Xms24000m -Xmx30000m -Xmn2000m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=95 -XX:+CMSClassUnloadingEnabled

pozdrawiam,
LAFK

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Gil Tene

unread,
May 3, 2015, 11:48:33 PM5/3/15
to mechanica...@googlegroups.com
:-) 

On Sunday, May 3, 2015 at 7:31:28 PM UTC-7, lior simon wrote:
What is the other alternative other than proprietary jvm?

By "other than proprietary", do you mean "OpenJDK"? Or are you really saying "other than Oracle's proprietary & closed sourced HotSpot JDK?". ;-)
 
How can this be achievable?

The alternative to a "proprietary" JVM that makes normal Java allocation work without those annoying hiccups is to use a proprietary object management scheme, where you re-introduce your own variant of malloc/free to Java, and stop using any Java code that doesn't do the same... 

But if you are using idiomatic Java, and/or leveraging the multitude of useful 3rd party java code out there, you are pretty much stuck facing the music of regular newgen collection occurring (and occasional oldgen ones as well). That music is much nicer to deal with when this newgen collections are not done with a monolithic, stop-the-world copying collector. Unfortunately, all the current collectors in those "non proprietary" JVMs (ParallelGC, CMS, G1) use a monolithic, stop-the-world copying collector for newgen (their different algorithm names only refer to their oldgen behavior; their newgen algorithms are practically identical).

AKAIK, Zing is the only shipping runtime (not just Java runtime) that doesn't use a monolithic, stop-the-world copying collector for newgen. It uses a concurrent Mark/Compact collector instead, which is what makes all those hiccups go away.

lior simon

unread,
May 4, 2015, 12:16:22 AM5/4/15
to mechanica...@googlegroups.com
Hi Gil, 
I will look into Zing.

Can you please elaborate a little about "re-introduce your own variant of malloc/free to Java". 
Does it mean changing the internals of the JVM? Or does it mean creating my own classes' implementations (I.e: MyString) and replace them with the standard ones (I.e: String)?
If the former, do you have any refernece of how to do so?

Thanks

Gil Tene

unread,
May 4, 2015, 1:35:31 AM5/4/15
to mechanica...@googlegroups.com


On Sunday, May 3, 2015 at 9:16:22 PM UTC-7, lior simon wrote:
Hi Gil, 
I will look into Zing.

Can you please elaborate a little about "re-introduce your own variant of malloc/free to Java".  
Does it mean changing the internals of the JVM? Or does it mean creating my own classes' implementations (I.e: MyString) and replace them with the standard ones (I.e: String)?

There is usually no need for new basic types, but often you'll need new aggregate types (e.g. collections) if you are striving for zero-allocation. And you will usually find using pooled objects within other "regular" object graphs to be a highly dangerous practice.

When I say "re-introduce malloc/free" I'm referring to any situation where you manage your own object lifetimes and allocation. E.g. object pools (global or per-thread) invariably introduce malloc/free semantics. This works "ok" for trivially simple object graphs, but it gets very messy once you try to do anything as "complex" as having circular or doubly linked list of your pool-managed objects. Not to speak of placing your objects in multiple collections (e.g. lists and hash maps). At that point you tend to "reinvent" reference counting, destructors, and bring in entire programming paradigms that are foreign to Java... Since Java doesn't have a proper notion of a destructor, this quickly gets into the realm of managing your own set of proprietary classes that include some specific lifecycle management APIs and use conventions. It also leads yo avoiding passing references to pooled objects to any code you don't know the implementation of, and avoiding their participation in any normal collection or data structure (due to semantic conflicts between your "free" semantics and memory contracts and what normal java code math be doing with this references). The normal bugs that go with the malloc/free realm also come back in (e.g. double-freeing, reuse of non-free objects, and various types of memory leaks that idiomatic Java code is immune to)  

With that said, there are quite a few people (some on this list) that are very good at achieving either Low-allocation or Zero-allocation Java. This is evidenced by the fact that Java use in low latency systems is tending up, even in places that don't yet use of Zing. Low allocation usually amounts to "reducing the frequency of newgen pauses but not getting rid of them". But with true Zero-allocation in steady-state execution, you can basically avoid having even a newgen GC trigger after some initial startup.

Zero allocation at the application level is HARD work though. Zero allocation libraries are very different from zero-allocation applications though. Some libraries that wish to be used even in Zero-allocation environments will strive for zero-allocation for their hot code paths (e.g. I do this in HdrHistigram), but for an entire application to achieve Zero allocation it would need to avoid using any third party code )including most of the very useful JDK classes) that may allocate objects. This is certainly doable, but you are pretty much writing C in Java syntax at that point. And this gets harder (and more expensive) to do the more non-trivial your application becomes. Y
 
If the former, do you have any refernece of how to do so?

N/A. I was not talking about changing the internals of the JVM...

Gil Tene

unread,
May 4, 2015, 1:49:11 AM5/4/15
to mechanica...@googlegroups.com
On Sunday, May 3, 2015 at 8:40:42 PM UTC-7, lior simon wrote:
I'm running the java process with the following jvm flags:

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSScavengeBeforeRemark -XX:SurvivorRatio=10 -XX:TargetSurvivorRatio=90 -Xms24000m -Xmx30000m -Xmn2000m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=95 -XX:+CMSClassUnloadingEnabled

Unfortunately, you are probably stuck between (Newgen pause length = Rock) and (Oldgen pause frequency = Hard Place). And the two are tied to each other...

That large oldgen (28GB in the above) is probably needed to delay the oldgen collections to ~1/week (per your original post). [for STW newgen collectors] Newgen pauses in the tens of msec are inherent to an oldgen that is tens of GB in size. Card scanning alone takes ~1msec/GB-of-oldgen even when no cards are dirty, which is rare (it's generally the case when oldgen is empty, but changes slowly after that). A more realistic estimate on modern hardware is 2-3msec/GB-of-oldgen as the oldgen fills up before eventually being collected. That's usually when the card table is the dirtiest.

Michael Barker

unread,
May 4, 2015, 2:44:02 AM5/4/15
to mechanica...@googlegroups.com
To add to what Gil has said, we've had experience trying to replace a library that was allocation heavy, but with short lived objects to one that was gc-free, with lots of pooled objects.  The result was that we had fewer GC pauses, but because the application as a whole was not gc-free (some of our own code would do allocation) those less frequent pauses were significantly worse.  This additional cost came as a mix of card scanning, longer marking (more objects) and more time tenuring (more data surviving eden and getting copied).  The upshot being that techniques like object pooling only work if you go the whole hog and your application can run for the required amount of time without incurring a new collection.  If you can't achieve this you are better off removing all of the pooling and making objects as short-lived as possible.  We did this in one of our apps and ran with a small eden (32MB) so that we had multiple GCs per second, but were able to keep them to around 2-3ms.  The code ends up being simpler and more idiomatic.

Then...well we switched to Zing and none this bothers us any more.

Mike.


pozdrawiam,
LAFK

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Richard Warburton

unread,
May 4, 2015, 4:49:01 AM5/4/15
to mechanica...@googlegroups.com
Hi,
Why do you want to do this? Are your existing latency metrics not sufficient for the business' SLA?

I ask these questions for two reasons:

1. Some of the solutions that you can undertake, like pooling objects or having partially filled in mutable strings that you edit in a ThreadLocal fashion will fairly easily get you a lot of the way to a goal. To actually completely avoid GCs is a bit trickier. Its not really the hugely difficult problem that some make it out to be, but it can be the case that using something like Zing is better if you want to go the whole hog. NB: not making a recommendation, just offering motivation context for the questions which should be asked.

2. Avoiding premature optimisation. Much optimisation isn't premature and this line is often used to avoid doing necessary work but really the needs should drive the optimisation.

regards,

  Richard Warburton

Wojciech Kudla

unread,
May 4, 2015, 5:33:29 AM5/4/15
to mechanica...@googlegroups.com
One really dirty hack is to hijack a String object header and move it entirely off-heap but that only works if your are willing to sacrifice the benefits of compressed oops; running with ~30GB heaps I suppose you might not. Mind you, it will work nicely only in extremely simple scenarios. In general, I'd follow Gil's advice pretty blindly...

--

Kirk Pepperdine

unread,
May 4, 2015, 5:46:43 AM5/4/15
to mechanica...@googlegroups.com
On May 4, 2015, at 8:43 AM, Michael Barker <mik...@gmail.com> wrote:

To add to what Gil has said, we've had experience trying to replace a library that was allocation heavy, but with short lived objects to one that was gc-free, with lots of pooled objects.  The result was that we had fewer GC pauses, but because the application as a whole was not gc-free (some of our own code would do allocation) those less frequent pauses were significantly worse.  This additional cost came as a mix of card scanning, longer marking (more objects) and more time tenuring (more data surviving eden and getting copied).  The upshot being that techniques like object pooling only work if you go the whole hog and your application can run for the required amount of time without incurring a new collection.  If you can't achieve this you are better off removing all of the pooling and making objects as short-lived as possible.  We did this in one of our apps and ran with a small eden (32MB) so that we had multiple GCs per second, but were able to keep them to around 2-3ms.  The code ends up being simpler and more idiomatic.

+1, in cases where Zing wasn’t possible I’ve done the same with the same results. Work well with CMS and with G1 but only if you unlock options which allow you to shrink Eden. Zing would be/is simpler in this regard.

Regards,
Kirk
signature.asc

Vitaly Davidovich

unread,
May 4, 2015, 10:10:57 AM5/4/15
to mechanica...@googlegroups.com

Does Netty always materialize request strings? There's no option to get a view on the raw bytes using CharSequence?

It's practically impossible to write zero allocation java, particularly so with string based protocols.  The more you want to reduce allocations, the more custom code you end up writing, with all the downsides of that approach.

sent from my phone

Norman Maurer

unread,
May 4, 2015, 10:16:38 AM5/4/15
to mechanica...@googlegroups.com
Our HttpRequestDecoder uses Strings. You could write your own HttpRequestDecoder though.


Tomasz Kowalczewski

unread,
May 4, 2015, 10:31:16 AM5/4/15
to mechanica...@googlegroups.com
Disclaimer: we are not in a low latency or HFT business so our "good enough" might be terrible for your case, but still we had some success in decreasing object allocation rates by using CharSequences backed by classes from libraries like javolution.org to handle local mutable state (strings, paring strings and UUIDs etc.). Together with holding some things in thread locals (e.g. see io.netty.util.internal.RecyclableArrayList) it turns out to reduce garbage creation substantially. 

Netty on "raw" TCP sockets seems to be quite conservative in garbage creation. Maybe writing your own HttpRequestDecoder like Norman suggested is all it takes.

For us, after all these optimisations GC cycles were less frequent, but took longer - all objects in young gen survived and were actually meant for old gen.

Tomasz.
Tomasz Kowalczewski

Scott Carey

unread,
May 6, 2015, 4:24:01 PM5/6/15
to mechanica...@googlegroups.com


On Sunday, May 3, 2015 at 11:44:02 PM UTC-7, mikeb01 wrote:
To add to what Gil has said, we've had experience trying to replace a library that was allocation heavy, but with short lived objects to one that was gc-free, with lots of pooled objects.  The result was that we had fewer GC pauses, but because the application as a whole was not gc-free (some of our own code would do allocation) those less frequent pauses were significantly worse.  This additional cost came as a mix of card scanning, longer marking (more objects) and more time tenuring (more data surviving eden and getting copied).  The upshot being that techniques like object pooling only work if you go the whole hog and your application can run for the required amount of time without incurring a new collection.  If you can't achieve this you are better off removing all of the pooling and making objects as short-lived as possible.  We did this in one of our apps and ran with a small eden (32MB) so that we had multiple GCs per second, but were able to keep them to around 2-3ms.  The code ends up being simpler and more idiomatic.


One trick to throw in here is leveraging Weak References.  You can do 'weak pooling' to reduce the frequency of allocation, without paying the cost of increased tenuring.   It is a half-way house between the two extremes you mention.  The main drawback is that your application latency will be higher just after a young GC when the weakly referenced content is lost.   For use cases where you want overall lower latency and higher throughput but some jitter is OK it works great.  It does require paying attention to your overall object allocation rate and making sure it is moderate.  The more allocations, the shorter the lifetime of a weakly referenced piece of data.
 

Mike.


pozdrawiam,
LAFK

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Jean-Philippe BEMPEL

unread,
May 7, 2015, 3:53:48 AM5/7/15
to mechanica...@googlegroups.com
+1 on this technique. We are using it successfully since several years now.
You just need also to check that the increase ref processing time (Weak, Soft, Phantoms, ...) does not erase the time you save by not tenuring those objects during GC pauses.

Kirk Pepperdine

unread,
May 7, 2015, 4:55:48 AM5/7/15
to mechanica...@googlegroups.com
+1 also…

— Kirk

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
signature.asc

Peter Booth

unread,
May 17, 2015, 8:04:26 PM5/17/15
to mechanica...@googlegroups.com
I find that *sometimes* challenges like this can dissolve away once you dig into what the real business requirements are

Some questions

I guess from your mail that a 5ms latency is acceptable whilst an 80ms latency is not. 
Is that the case? What is an acceptable latency? What are your minimum and median latencies?
Do you have metrics for the amount of object creation per request?
Are you certain that the hiccups are due to GCs? 

You say that your app uses netty as an embedded web server, as opposed to saying 
"my app is a web app". Does that mean that most of the work that the app does is not 
driven by HTTP requests?

Jacob Hansson

unread,
May 19, 2015, 3:10:15 PM5/19/15
to mechanica...@googlegroups.com
We did a spike on this a few years ago, playing with trying to get a netty stack allocation free. Replacing the default HTTP decoder with (a hacked-together to try it out) one that used MutableString for header parsing and a few other high-allocation areas made a good dent in dropping the frequency of collections. 

While it's a spike, not production quality or feature complete by any means, perhaps our HTTP decoder could be useful as input:




Reply all
Reply to author
Forward
0 new messages