Strategies for dealing with large number of languages?

131 views
Skip to first unread message

Alexander Bertram

unread,
Jan 3, 2024, 7:29:08 PM1/3/24
to GWT Users
Hi there,
We have been using GWT to build our product for a very long time. Recently, we've faced a new challenge as we've steadily been increasing the number of supported translations of the application to support a global audience. We're up to 24 languages, and could conceivably hit 40 in the coming year.

With all of these languages, come more permutations! We've stripped away browser-specific permutations, but we do have a mobile version of the app, which means that we have 2 x 24 permutations = 48.

So far, we've addressed this problem by increasing the size of the VM that builds the app, but even with 16 vCPUs it takes 10-12 minutes to build the app. I'm experimenting with increasing to 32 vCPUs, but so far I can't get the build time to drop linearly.

Anyone else out there using alternate strategies? Is it worth trying to create some sort of distributed cache from the intermediate files the compiler writes out? Load translations dynamically at runtime instead? Or just through more hardware at it :-)

Just curious to hear what others are doing?

Best,
Alex

Colin Alworth

unread,
Jan 3, 2024, 10:29:20 PM1/3/24
to GWT Users
First, there's no magic - the compiler is pretty memory/cpu intensive, but by the time permutations are being built, it is singly threaded. Unless you've got a ton of cache and bandwidth to ram, "not scaling linearly" makes a lot of sense no matter what your application is - you're probably able to saturate access to memory before you can keep your CPUs busy. Additionally, I'm not sure exactly what "vCPU" means these days, but the last time I checked cloud vendors were using hyperthreading/etc to let a given "core" work on more than one thread at a time, and appear to be multiple cores to software. Ostensibly this lets the CPU handle more than one instruction at the same time... but you're still choking on memory access, loading more data into cache as needed to chase those pointers.

So, on a given set of hardware, you're probably able to find a limit where you are no longer scaling linearly, and another limit somewhat above that where you're no longer building any faster. 

A few questions, that either may help the discussion along, or might help you to weigh your options:
  • What happens if you profile the compiler like a normal Java application - is the heap too big (30GB is a bit of a magic number to stay below when it comes to compressed references) or too small (a bit more headroom might make it possible to get more work done)?
  • What does your CPU usage look like - is the process actually scaling to use the threads you have?
  • Any other oddities in your profiling report? When we look at long-lived GWT applications, it isn't uncommon for us to find far too many split points, which eat an amazing amount of build time to produce even if they have very little effect. The compiler can be "asked" to guess for you which splitpoints are not worth having, but it is worth auditing this or other parts of the build to see what else could be going on.
Now some GWT specific points, rather than general JVM points:
  • What do you gain from those permutations? Taking an extreme example, what happens if you collapse the entire application, 48 permutations, into one single super-permutation - how much bigger is the app? How much slower is it? What if you just collapse mobile vs desktop (I'd guess that mobile is smaller than desktop, but smaller enough to matter?), or collapse languages in groups of, say 4-8 - do you add 20% to the total compiled size, or 1%?
  • Do you always need separate permutations? For acceptance testing, you likely want the same build that would go into a production release, but maybe it is okay to add 15 minutes to those build times, but for "does this PR build?" or "post-merge, does main still pass tests?", you might be able to support a subset of values, or just a collapsed set, saving time, but producing somewhat larger output.
  • Any other configuration you've experimented with? As you alluded to, you can split the process up when building permutations via the "gwt.jjs.permutationWorkerFactory" system property. In short, this is customizable to not just decide "all work stays in-process" or "fork another JVM per permutation (and tune memory usage carefully)", but also how many workers come from each source. The default (see PermutationWorkerFactory for specifics) is to run both ThreadedPermutationWorkerFactory for the permutations, then ExternalPermutationWorkerFactory for the next, etc. The -localWorkers option and "gwt.jjs.maxThreads" system property will further control how work is divided.
  • Javadoc for ExternalPermutationWorkerFactory indicates that it runs CompilePermsServer instances, but the isLocal method still returns true. A custom worker factory can also be written to not just write work to disk and handle it in a forked JVM, but even copy to another machine and communicate with it remotely.
I _suspect_ you won't get too far into the weeds with this before finding a happy medium with small-enough compiled output and fast enough development/CI builds, but that at least covers where I would get started in considering this. Moving locales out of the compiled JS is definitely another option (and not too difficult to achieve, at least as long as you are focused on Constants rather than Messages), but it can be a bit harder to let the compiler be as aggressive about ensuing you keep unused output out of browsers.

Colin Alworth

unread,
Jan 3, 2024, 10:31:27 PM1/3/24
to GWT Users
Apologies, dumb typo right in the first line: It is _no longer_ singly threaded by the time permutations are being built. I hope the rest is more accurate, please feel free to call me out on other dumb mistakes :).

Ralph Fiergolla

unread,
Jan 4, 2024, 3:55:26 AM1/4/24
to GWT Users
Hi! 
Since a big part of our string content comes from database records anyway, we decided to go without any static texts but use dynamic labels. Initial concerns about performance and memory footprint have proven to be unfounded. That is, despite working in the context of European Institutions we go with a single static language and avoid the compile time performance bottleneck of having a large number of permutations. 
Cheers,
Ralph 

Frank Hossfeld

unread,
Jan 4, 2024, 8:15:59 AM1/4/24
to GWT Users
The strings of our application are located on the server. At applicaiton start, the client loads the constants from the server and load it into a factory. The factory has a method that accepts a key and return the value.

benefits:
- no new permutations
- change language without reloading the application
- adding a language is just adding an new property file on the server

drawback:
- a little bit longer start up time
- no support from the IDE

Leon Pennings

unread,
Jan 4, 2024, 8:16:24 AM1/4/24
to GWT Users
All,

Same as Ralph, we've always been using a custom Translator class (since 2009/2010 or so).
so for instance .setText(Translator.translate("Submit"))

The Translator loads all labels and puts them in mem based on the language preference of the user. So only 1 set of language labels in mem.
Works like a charm and never had a problem.
The translations are on the server side in the db so that a superuser can manage translations.

rg,

Leon.

Op donderdag 4 januari 2024 om 09:55:26 UTC+1 schreef Ralph Fiergolla:

Alexander Bertram

unread,
Jan 6, 2024, 3:35:39 PM1/6/24
to GWT Users
Thanks!
This is all very helpful. 

Following on what Collin suggested, I took a look at what the effect would be of collapsing all the permutations.

I found that collapsing all 23 languages into a single permutation would increase the gzipped initial download of our application from 443k to 633k, which is not a good tradeoff for us as many our users are working in areas with poor or limited connectivity. But there is a probably room to group 2-3 languages together to reduce the number of permutations in the short-term.

Also, I did some experimenting with virtual machines, and for what it's worth, moving from n1-highmem-32 to c3d-standard-8 cut our build time in half, and is much much cheaper. The C3D, which is based on AMD's Genoa processor seems to do better than the corresponding Intel-based C3 (and is cheaper).

However, long-term, it sounds like using side-loaded dictionaries are both feasable and much more practical. We'll start investigating how we can make the change this year.

Thanks again, and all the best for 2024.

Alex
Reply all
Reply to author
Forward
0 new messages