icu4j 76.1 LowercaseTransliterator synchronized bottleneck

24 views
Skip to first unread message

Steven Schlansker

unread,
Feb 24, 2025, 6:03:05 PMFeb 24
to icu-support
Hello ICU world,

We use icu4j through Apache Lucene to power a high-throughput indexing pipeline.
While attempting to analyze our performance using JDK Mission Control, we noticed a particularly hot lock - the instance lock on LowercaseTransliterator.

It seems that Lucene instantiates a single Transliterator instance, which should be safe and performant as the docs note that it is stateless.

However, the LowercaseTransliterator in particular seems to actually keep a small bit of state (a ReplaceableContextIterator and a StringBuilder result) and guard their reuse via synchronized.

This would indeed avoid some allocation pressure and reuse buffers, but the flip side is that our 8 worker threads all trip over each others feet trying to mediate access to these shared buffers.

Would the project consider changing this, to accept the cost of allocating temporary buffers, in pursuit of allowing many threads to share a Transliterator efficiently? Failing that, a ThreadLocal iter + result buffer could be the "best of both worlds" with a slight complexity penalty.

We'd be happy to submit a PR if either of these approaches seem acceptable.
Thank you for your consideration!

Steven

lowercase_monitor.png

Markus Scherer

unread,
Feb 24, 2025, 7:39:05 PMFeb 24
to Steven Schlansker, icu-support, Andy Heninger, Alan
Hi Steven,

Sadly, we have not had real work done on the Transliterator code in a long time. We are also aware that we have a scalability issue there.

It would be great if you could experiment with one or another approach, and benchmark it, and create a PR based on what you find.

Thanks & best regards,
markus
ICU-TC chair

Steven Schlansker

unread,
Feb 25, 2025, 2:00:18 PMFeb 25
to icu-support, Markus Scherer, icu-support, Andy Heninger, Alan, Steven Schlansker
Thanks for the encouragement. I've opened a PR with my initial attempt:

I didn't seem to be able to create a JIRA issue to track due to lack of permissions.

Markus Scherer

unread,
Feb 25, 2025, 2:01:46 PMFeb 25
to Steven Schlansker, icu-support, Andy Heninger, Alan
On Tue, Feb 25, 2025 at 11:00 AM Steven Schlansker <ste...@paywholesail.com> wrote:
I didn't seem to be able to create a JIRA issue to track due to lack of permissions.

You need to create a Jira account for creating a ticket.
It's an anti-spam measure by Jira...

tnx
markus

Steven Schlansker

unread,
Feb 25, 2025, 2:10:06 PMFeb 25
to icu-support, Markus Scherer, icu-support, Andy Heninger, Alan, Steven Schlansker
Ah, got it, was my mistake wandering through the sign-in flows :)
Reply all
Reply to author
Forward
0 new messages