While using the Prometheus JMX Exporter as a Java agent in our application, we observed frequent GC clean ups and it turns out it is because the available heap space got exhausted by the caching of the Prometheus JMX Exporter.
For context - we have an exporter yaml file with about 40 rules defined and there are many JMX MBeans which we process in order to get the required metrics. Without caching enabled for the rules, the request can take a minimum of 1 minute to complete. In an attempt to reduce this time, we enabled caching and observed that the MatchedRulesCache object can take around 600 MB in a particular case. There is a realistic potential in out case that this object grows above 1 GB, which is a huge amount of space for this kind of process.
We identified that the major reason for the large size of MatchedRulesCache is that it contains duplicating String objects for the same MBean names. The object keeps all MBean names the rules were matched against previously, in order to avoid expensive pattern matching later. Although each rule caches the same set of MBean names, those are in fact separate objects in the heap.
The origin of the matter was found in
https://github.com/prometheus/jmx_exporter/blob/a3dac9acee1464531cd87502579178a1fec1cc76/collector/src/main/java/io/prometheus/jmx/JmxCollector.java#L584where for each rule with caching enabled, there is a new String object created, although such could already exist in memory from a previous iteration.
The duplication is by a factor of the number of rules with caching enabled.
I experimented with interning the String,
String matchName = (beanName + attributeName + ": " + matchBeanValue).intern();so that the JVM's string pool is utilized and the String objects reused. The result in speed and heap space used by cache is described below:

What do you think about this and do you find this suggestion could have negative consequences in certain cases?
Kind regards,