/Sven
On Thursday, April 21, 2022 at 11:57:19 AM UTC+2 Sven Selberg wrote:Hi,Before upgrade we had a rather steady mean-heap-usage of ~25 GB, after upgrading to v3.5.1 (or once people started working the following monday) the heap-usage is now consistently ~10 GB larger, a 40 % increase.
On Thursday, April 21, 2022 at 12:06:25 PM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 11:57:19 AM UTC+2 Sven Selberg wrote:Hi,Before upgrade we had a rather steady mean-heap-usage of ~25 GB, after upgrading to v3.5.1 (or once people started working the following monday) the heap-usage is now consistently ~10 GB larger, a 40 % increase.Memory/Heap profile past 30 days for reference, upgrade was performed on April 9:th.Is this expected of v3.5.1?I cannot find any reports of similar findings by others so it might be something specific to our setup.If so does anyone have any tips regarding likely culprits?To partially answer my own question with a heap histogram:
Feels odd that 8-9 GB of a 35 GB heap should consist of metrics-related objects...
num #instances #bytes class name (module)
-------------------------------------------------------
1: 80632689 14235174048 [B (java...@11.0.14.1)
2: 21190716 3262721856 [Ljava.util.concurrent.ConcurrentHashMap$Node; (java...@11.0.14.1)
3: 71315481 2282095392 java.lang.String (java...@11.0.14.1)
4: 2813057 2246859920 [I (java...@11.0.14.1)
5: 21309494 2216187376 java.util.concurrent.ConcurrentHashMap (java...@11.0.14.1)
6: 42338725 2032258800 com.google.gerrit.metrics.dropwizard.DropWizardMetricMaker$TimerImpl
7: 21169188 1524181536 com.google.gerrit.metrics.dropwizard.TimerImpl1
8: 21169309 1185481304 com.google.gerrit.metrics.AutoValue_Field
9: 23548142 1130310816 java.util.concurrent.ConcurrentHashMap$Node (java...@11.0.14.1)
10: 21839033 1048273584 java.util.HashMap$Node (java...@11.0.14.1)
11: 21169187 1016120976 com.google.gerrit.metrics.dropwizard.TimerImpl1$1
12: 21169283 677417984 [Lcom.google.gerrit.metrics.Field;
13: 1020330 613677080 [J (java...@11.0.14.1)
14: 23854059 572497416 java.util.Optional (java...@11.0.14.1)
15: 21287008 340592128 java.lang.Object (java...@11.0.14.1)
16: 20 319291872 [Lorg.h2.util.CacheObject;
17: 82437 308442936 [Ljava.util.HashMap$Node; (java...@11.0.14.1)
18: 899505 214747856 [Ljava.lang.Object; (java...@11.0.14.1)
19: 2260005 162328344 [Lorg.h2.value.Value;
20: 3818449 152737960 org.eclipse.jgit.lib.ObjectId
/Sven
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/373da857-9178-4e7f-8035-5b02a466d719n%40googlegroups.com.
On Thu, Apr 21, 2022 at 12:47 PM Sven Selberg <sven.s...@axis.com> wrote:On Thursday, April 21, 2022 at 12:06:25 PM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 11:57:19 AM UTC+2 Sven Selberg wrote:Hi,Before upgrade we had a rather steady mean-heap-usage of ~25 GB, after upgrading to v3.5.1 (or once people started working the following monday) the heap-usage is now consistently ~10 GB larger, a 40 % increase.Memory/Heap profile past 30 days for reference, upgrade was performed on April 9:th.Is this expected of v3.5.1?I cannot find any reports of similar findings by others so it might be something specific to our setup.If so does anyone have any tips regarding likely culprits?To partially answer my own question with a heap histogram:
Feels odd that 8-9 GB of a 35 GB heap should consist of metrics-related objects...maybe this is related to https://bugs.chromium.org/p/gerrit/issues/detail?id=15531 ?num #instances #bytes class name (module)
-------------------------------------------------------
1: 80632689 14235174048 [B (java...@11.0.14.1)some of the classnames in this histogram are truncated too much to be recognizable
On Thursday, April 21, 2022 at 2:11:42 PM UTC+2 Matthias Sohn wrote:On Thu, Apr 21, 2022 at 12:47 PM Sven Selberg <sven.s...@axis.com> wrote:On Thursday, April 21, 2022 at 12:06:25 PM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 11:57:19 AM UTC+2 Sven Selberg wrote:Hi,Before upgrade we had a rather steady mean-heap-usage of ~25 GB, after upgrading to v3.5.1 (or once people started working the following monday) the heap-usage is now consistently ~10 GB larger, a 40 % increase.Memory/Heap profile past 30 days for reference, upgrade was performed on April 9:th.Is this expected of v3.5.1?I cannot find any reports of similar findings by others so it might be something specific to our setup.If so does anyone have any tips regarding likely culprits?To partially answer my own question with a heap histogram:
Feels odd that 8-9 GB of a 35 GB heap should consist of metrics-related objects...maybe this is related to https://bugs.chromium.org/p/gerrit/issues/detail?id=15531 ?
On Friday, April 22, 2022 at 10:04:18 AM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 2:11:42 PM UTC+2 Matthias Sohn wrote:On Thu, Apr 21, 2022 at 12:47 PM Sven Selberg <sven.s...@axis.com> wrote:On Thursday, April 21, 2022 at 12:06:25 PM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 11:57:19 AM UTC+2 Sven Selberg wrote:Hi,Before upgrade we had a rather steady mean-heap-usage of ~25 GB, after upgrading to v3.5.1 (or once people started working the following monday) the heap-usage is now consistently ~10 GB larger, a 40 % increase.Memory/Heap profile past 30 days for reference, upgrade was performed on April 9:th.Is this expected of v3.5.1?I cannot find any reports of similar findings by others so it might be something specific to our setup.If so does anyone have any tips regarding likely culprits?To partially answer my own question with a heap histogram:
Feels odd that 8-9 GB of a 35 GB heap should consist of metrics-related objects...maybe this is related to https://bugs.chromium.org/p/gerrit/issues/detail?id=15531 ?It looks like the performance metrics is the culprit. metrics-reporter-prometheus reports that there are 3178 available metrics (of which 1941 is 0.0).
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/6c8371c6-0c3e-41d9-8946-d3b4c249920an%40googlegroups.com.
On Friday, April 22, 2022 at 10:04:18 AM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 2:11:42 PM UTC+2 Matthias Sohn wrote:On Thu, Apr 21, 2022 at 12:47 PM Sven Selberg <sven.s...@axis.com> wrote:On Thursday, April 21, 2022 at 12:06:25 PM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 11:57:19 AM UTC+2 Sven Selberg wrote:Hi,Before upgrade we had a rather steady mean-heap-usage of ~25 GB, after upgrading to v3.5.1 (or once people started working the following monday) the heap-usage is now consistently ~10 GB larger, a 40 % increase.Memory/Heap profile past 30 days for reference, upgrade was performed on April 9:th.Is this expected of v3.5.1?I cannot find any reports of similar findings by others so it might be something specific to our setup.If so does anyone have any tips regarding likely culprits?To partially answer my own question with a heap histogram:
Feels odd that 8-9 GB of a 35 GB heap should consist of metrics-related objects...maybe this is related to https://bugs.chromium.org/p/gerrit/issues/detail?id=15531 ?It looks like the performance metrics is the culprit. metrics-reporter-prometheus reports that there are 3178 available metrics (of which 1941 is 0.0).
On 22 Apr 2022, at 09:49, Sven Selberg <sven.s...@axis.com> wrote:
On Friday, April 22, 2022 at 10:41:19 AM UTC+2 Sven Selberg wrote:
On Friday, April 22, 2022 at 10:04:18 AM UTC+2 Sven Selberg wrote:
On Thursday, April 21, 2022 at 2:11:42 PM UTC+2 Matthias Sohn wrote:
On Thu, Apr 21, 2022 at 12:47 PM Sven Selberg <sven.s...@axis.com> wrote:
On Thursday, April 21, 2022 at 12:06:25 PM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 11:57:19 AM UTC+2 Sven Selberg wrote:Hi,Before upgrade we had a rather steady mean-heap-usage of ~25 GB, after upgrading to v3.5.1 (or once people started working the following monday) the heap-usage is now consistently ~10 GB larger, a 40 % increase.Memory/Heap profile past 30 days for reference, upgrade was performed on April 9:th.
<Memory-profile-heap-increase-v3.5.1.png>Is this expected of v3.5.1?I cannot find any reports of similar findings by others so it might be something specific to our setup.If so does anyone have any tips regarding likely culprits?To partially answer my own question with a heap histogram:
Feels odd that 8-9 GB of a 35 GB heap should consist of metrics-related objects...
maybe this is related to https://bugs.chromium.org/p/gerrit/issues/detail?id=15531 ?
It looks like the performance metrics is the culprit. metrics-reporter-prometheus reports that there are 3178 available metrics (of which 1941 is 0.0).
For comparison v3.4 has around 700 available metrics
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/66ff17f9-d773-409c-a38e-a2b48244112cn%40googlegroups.com.
<Memory-profile-heap-increase-v3.5.1.png>
On Fri, Apr 22, 2022 at 10:41 AM Sven Selberg <sven.s...@axis.com> wrote:On Friday, April 22, 2022 at 10:04:18 AM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 2:11:42 PM UTC+2 Matthias Sohn wrote:On Thu, Apr 21, 2022 at 12:47 PM Sven Selberg <sven.s...@axis.com> wrote:On Thursday, April 21, 2022 at 12:06:25 PM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 11:57:19 AM UTC+2 Sven Selberg wrote:Hi,Before upgrade we had a rather steady mean-heap-usage of ~25 GB, after upgrading to v3.5.1 (or once people started working the following monday) the heap-usage is now consistently ~10 GB larger, a 40 % increase.Memory/Heap profile past 30 days for reference, upgrade was performed on April 9:th.Is this expected of v3.5.1?I cannot find any reports of similar findings by others so it might be something specific to our setup.If so does anyone have any tips regarding likely culprits?To partially answer my own question with a heap histogram:
Feels odd that 8-9 GB of a 35 GB heap should consist of metrics-related objects...maybe this is related to https://bugs.chromium.org/p/gerrit/issues/detail?id=15531 ?It looks like the performance metrics is the culprit. metrics-reporter-prometheus reports that there are 3178 available metrics (of which 1941 is 0.0).Weird, I thought those metrics were disabled in 3.5.1:
On 22 Apr 2022, at 09:49, Sven Selberg <sven.s...@axis.com> wrote:On Friday, April 22, 2022 at 10:41:19 AM UTC+2 Sven Selberg wrote:On Friday, April 22, 2022 at 10:04:18 AM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 2:11:42 PM UTC+2 Matthias Sohn wrote:On Thu, Apr 21, 2022 at 12:47 PM Sven Selberg <sven.s...@axis.com> wrote:On Thursday, April 21, 2022 at 12:06:25 PM UTC+2 Sven Selberg wrote:On Thursday, April 21, 2022 at 11:57:19 AM UTC+2 Sven Selberg wrote:Hi,Before upgrade we had a rather steady mean-heap-usage of ~25 GB, after upgrading to v3.5.1 (or once people started working the following monday) the heap-usage is now consistently ~10 GB larger, a 40 % increase.Memory/Heap profile past 30 days for reference, upgrade was performed on April 9:th.<Memory-profile-heap-increase-v3.5.1.png>Is this expected of v3.5.1?I cannot find any reports of similar findings by others so it might be something specific to our setup.If so does anyone have any tips regarding likely culprits?To partially answer my own question with a heap histogram:
Feels odd that 8-9 GB of a 35 GB heap should consist of metrics-related objects...maybe this is related to https://bugs.chromium.org/p/gerrit/issues/detail?id=15531 ?It looks like the performance metrics is the culprit. metrics-reporter-prometheus reports that there are 3178 available metrics (of which 1941 is 0.0).For comparison v3.4 has around 700 available metricsWhat the configuration value for tracing.exportPerformanceMetrics in your case? Can you add some logging?
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/3b06d757-27b1-4932-85b9-16c6ec95687dn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/0d371377-385d-48dd-9eee-f062e13153aan%40googlegroups.com.
Interesting.Your heap dumps says you have many ConcurrentHashMap notes, but also ConcurrentHashMaps themselves. You also have Timer1 impls. This looks like it lines up to "too many metrics get registered with different names".DropWizardMetricMaker contains two ConcurrentHashMaps (bucketed, descriptions). They map from name => timerimpl. TimerImpl has hashmaps of it's own internally.I checked Gerrit core to see if we set metrics names dynamically but we don't. So these maps should never grow out of bounds. The PluginMetricMaker allows plugins to register more metrics with a plugin prefix. Could you check if any of these set metric names dynamically (not just always the same string hardcoded)?
If you can use Eclipse MAT, I'd be interested in the keys of the hashmaps. I bet with Eclipse MAT you would find the problem quickly when looking at the object graph.
Also, can you use the API to list currently in use metrics (/config/metrics)? That would return what is in these buckets.
On Wednesday, May 4, 2022 at 2:11:11 PM UTC+2 hie...@google.com wrote:Interesting.Your heap dumps says you have many ConcurrentHashMap notes, but also ConcurrentHashMaps themselves. You also have Timer1 impls. This looks like it lines up to "too many metrics get registered with different names".DropWizardMetricMaker contains two ConcurrentHashMaps (bucketed, descriptions). They map from name => timerimpl. TimerImpl has hashmaps of it's own internally.I checked Gerrit core to see if we set metrics names dynamically but we don't. So these maps should never grow out of bounds. The PluginMetricMaker allows plugins to register more metrics with a plugin prefix. Could you check if any of these set metric names dynamically (not just always the same string hardcoded)?Thanks for the analysis Patrick.If you can use Eclipse MAT, I'd be interested in the keys of the hashmaps. I bet with Eclipse MAT you would find the problem quickly when looking at the object graph.Also, can you use the API to list currently in use metrics (/config/metrics)? That would return what is in these buckets.What API are you referring to?
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/712dbb93-18e6-42c0-a6a8-e0cae4220070n%40googlegroups.com.
On Wed, May 4, 2022 at 2:32 PM Sven Selberg <sven.s...@axis.com> wrote:On Wednesday, May 4, 2022 at 2:11:11 PM UTC+2 hie...@google.com wrote:Interesting.Your heap dumps says you have many ConcurrentHashMap notes, but also ConcurrentHashMaps themselves. You also have Timer1 impls. This looks like it lines up to "too many metrics get registered with different names".DropWizardMetricMaker contains two ConcurrentHashMaps (bucketed, descriptions). They map from name => timerimpl. TimerImpl has hashmaps of it's own internally.I checked Gerrit core to see if we set metrics names dynamically but we don't. So these maps should never grow out of bounds. The PluginMetricMaker allows plugins to register more metrics with a plugin prefix. Could you check if any of these set metric names dynamically (not just always the same string hardcoded)?Thanks for the analysis Patrick.If you can use Eclipse MAT, I'd be interested in the keys of the hashmaps. I bet with Eclipse MAT you would find the problem quickly when looking at the object graph.Also, can you use the API to list currently in use metrics (/config/metrics)? That would return what is in these buckets.What API are you referring to?Aparently there must be an undocumented API to look at live metricsMaybe GET /config/server/metrics/The code tells me there is a REST API by dropwizard metrics (see MetricCollection, ListMetrics etc)
On Thursday, May 12, 2022 at 2:19:47 PM UTC+2 hie...@google.com wrote:On Wed, May 4, 2022 at 2:32 PM Sven Selberg <sven.s...@axis.com> wrote:On Wednesday, May 4, 2022 at 2:11:11 PM UTC+2 hie...@google.com wrote:Interesting.Your heap dumps says you have many ConcurrentHashMap notes, but also ConcurrentHashMaps themselves. You also have Timer1 impls. This looks like it lines up to "too many metrics get registered with different names".DropWizardMetricMaker contains two ConcurrentHashMaps (bucketed, descriptions). They map from name => timerimpl. TimerImpl has hashmaps of it's own internally.I checked Gerrit core to see if we set metrics names dynamically but we don't. So these maps should never grow out of bounds. The PluginMetricMaker allows plugins to register more metrics with a plugin prefix. Could you check if any of these set metric names dynamically (not just always the same string hardcoded)?Thanks for the analysis Patrick.If you can use Eclipse MAT, I'd be interested in the keys of the hashmaps. I bet with Eclipse MAT you would find the problem quickly when looking at the object graph.Also, can you use the API to list currently in use metrics (/config/metrics)? That would return what is in these buckets.What API are you referring to?Aparently there must be an undocumented API to look at live metricsMaybe GET /config/server/metrics/The code tells me there is a REST API by dropwizard metrics (see MetricCollection, ListMetrics etc)/config/server/metrics does exist.
A lot of metrics but nothing that suggests dynamic metric names (beyond metrics created for individual projects).
On Thursday, May 12, 2022 at 4:25:39 PM UTC+2 Sven Selberg wrote:On Thursday, May 12, 2022 at 2:19:47 PM UTC+2 hie...@google.com wrote:On Wed, May 4, 2022 at 2:32 PM Sven Selberg <sven.s...@axis.com> wrote:On Wednesday, May 4, 2022 at 2:11:11 PM UTC+2 hie...@google.com wrote:Interesting.Your heap dumps says you have many ConcurrentHashMap notes, but also ConcurrentHashMaps themselves. You also have Timer1 impls. This looks like it lines up to "too many metrics get registered with different names".DropWizardMetricMaker contains two ConcurrentHashMaps (bucketed, descriptions). They map from name => timerimpl. TimerImpl has hashmaps of it's own internally.I checked Gerrit core to see if we set metrics names dynamically but we don't. So these maps should never grow out of bounds. The PluginMetricMaker allows plugins to register more metrics with a plugin prefix. Could you check if any of these set metric names dynamically (not just always the same string hardcoded)?Thanks for the analysis Patrick.If you can use Eclipse MAT, I'd be interested in the keys of the hashmaps. I bet with Eclipse MAT you would find the problem quickly when looking at the object graph.Also, can you use the API to list currently in use metrics (/config/metrics)? That would return what is in these buckets.What API are you referring to?Aparently there must be an undocumented API to look at live metricsMaybe GET /config/server/metrics/The code tells me there is a REST API by dropwizard metrics (see MetricCollection, ListMetrics etc)/config/server/metrics does exist.
A lot of metrics but nothing that suggests dynamic metric names (beyond metrics created for individual projects).@Luca How did you count the metrics to get 420?
On 12 May 2022, at 13:19, 'Patrick Hiesel' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:On Wed, May 4, 2022 at 2:32 PM Sven Selberg <sven.s...@axis.com> wrote:Interesting.Your heap dumps says you have many ConcurrentHashMap notes, but also ConcurrentHashMaps themselves. You also have Timer1 impls. This looks like it lines up to "too many metrics get registered with different names".DropWizardMetricMaker contains two ConcurrentHashMaps (bucketed, descriptions). They map from name => timerimpl. TimerImpl has hashmaps of it's own internally.I checked Gerrit core to see if we set metrics names dynamically but we don't. So these maps should never grow out of bounds. The PluginMetricMaker allows plugins to register more metrics with a plugin prefix. Could you check if any of these set metric names dynamically (not just always the same string hardcoded)?Thanks for the analysis Patrick.If you can use Eclipse MAT, I'd be interested in the keys of the hashmaps. I bet with Eclipse MAT you would find the problem quickly when looking at the object graph.Also, can you use the API to list currently in use metrics (/config/metrics)? That would return what is in these buckets.What API are you referring to?Aparently there must be an undocumented API to look at live metricsMaybe GET /config/server/metrics/The code tells me there is a REST API by dropwizard metrics (see MetricCollection, ListMetrics etc)
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/CAM7sg%3D3Mirm%3D7b5edC0oa_KN77ZqStoyr%2BH53vqjxEVQzkr2fA%40mail.gmail.com.
<TimerImpl-on-heap.png><timerimpl-on-heap.png>
On Wed, May 4, 2022 at 2:32 PM Sven Selberg <sven.s...@axis.com> wrote:On Wednesday, May 4, 2022 at 2:11:11 PM UTC+2 hie...@google.com wrote:Interesting.Your heap dumps says you have many ConcurrentHashMap notes, but also ConcurrentHashMaps themselves. You also have Timer1 impls. This looks like it lines up to "too many metrics get registered with different names".DropWizardMetricMaker contains two ConcurrentHashMaps (bucketed, descriptions). They map from name => timerimpl. TimerImpl has hashmaps of it's own internally.I checked Gerrit core to see if we set metrics names dynamically but we don't. So these maps should never grow out of bounds. The PluginMetricMaker allows plugins to register more metrics with a plugin prefix. Could you check if any of these set metric names dynamically (not just always the same string hardcoded)?Thanks for the analysis Patrick.If you can use Eclipse MAT, I'd be interested in the keys of the hashmaps. I bet with Eclipse MAT you would find the problem quickly when looking at the object graph.Also, can you use the API to list currently in use metrics (/config/metrics)? That would return what is in these buckets.What API are you referring to?Aparently there must be an undocumented API to look at live metricsMaybe GET /config/server/metrics/The code tells me there is a REST API by dropwizard metrics (see MetricCollection, ListMetrics etc)
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/CAM7sg%3D3Mirm%3D7b5edC0oa_KN77ZqStoyr%2BH53vqjxEVQzkr2fA%40mail.gmail.com.