Hello ICU world,
We use icu4j through Apache Lucene to power a high-throughput indexing pipeline.
While attempting to analyze our performance using JDK Mission Control, we noticed a particularly hot lock - the instance lock on LowercaseTransliterator.
It seems that Lucene instantiates a single Transliterator instance, which should be safe and performant as the docs note that it is stateless.
However, the LowercaseTransliterator in particular seems to actually keep a small bit of state (a ReplaceableContextIterator and a StringBuilder result) and guard their reuse via synchronized.
This would indeed avoid some allocation pressure and reuse buffers, but the flip side is that our 8 worker threads all trip over each others feet trying to mediate access to these shared buffers.
Would the project consider changing this, to accept the cost of allocating temporary buffers, in pursuit of allowing many threads to share a Transliterator efficiently? Failing that, a ThreadLocal iter + result buffer could be the "best of both worlds" with a slight complexity penalty.
We'd be happy to submit a PR if either of these approaches seem acceptable.
Thank you for your consideration!
Steven