On 21 Dec 2017, at 18:25, Abhishek Agarwal <abhis...@gmail.com> wrote:
Hello everyone,Recently, I had one of the JVMs going into this weird state wherein minor gc start kicking almost every minute but what is most surprising that it is one minor GC is taking as much as 100 seconds. The heap size is around 12 gigs and young gen is about 30% of that. As soon as gc starts, understandably CPU usage started increasing as well. I am not able to figure out how would one minor GC take this much time. The machine does not have swapping enabled. I also ran sar and confirmed there was no swapping during the issue. We are using parallel GC for young gen and CMS for old gen GC. How can I go about troubleshooting this further? Any pointers will be appreciated.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
D
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
59549.736: [GC (Allocation Failure) 59549.736: [ParNew: 1688278K->56721K(1887488K), 0.5518315 secs] 4474842K->3433110K(21480704K), 0.5526896 secs] [Times: user=1.92 sys=0.25, real=0.55 secs]
59581.163: [GC (Allocation Failure) 59581.164: [ParNew: 1734545K->43688K(1887488K), 0.4640033 secs] 5110934K->3941007K(21480704K), 0.4650333 secs] [Times: user=1.59 sys=0.26, real=0.46 secs]
59617.565: [GC (Allocation Failure) 59617.565: [ParNew: 1721512K->40500K(1887488K), 0.1154788 secs] 5618831K->3937818K(21480704K), 0.1166312 secs] [Times: user=0.44 sys=0.00, real=0.12 secs]
59645.910: [GC (Allocation Failure) 59645.911: [ParNew: 1718324K->46789K(1887488K), 0.1918905 secs] 5615642K->3944132K(21480704K), 0.1930935 secs] [Times: user=0.77 sys=0.00, real=0.20 secs]
59692.159: [GC (Allocation Failure) 59692.159: [ParNew: 1724613K->45533K(1887488K), 5.4467875 secs] 5621956K->3944000K(21480704K), 5.4480386 secs] [Times: user=21.86 sys=0.00, real=5.45 secs]
59716.348: [GC (Allocation Failure) 59716.348: [ParNew: 1723357K->49256K(1887488K), 8.2982698 secs] 5621824K->3948162K(21480704K), 8.2994285 secs] [Times: user=33.47 sys=0.00, real=8.30 secs]
59751.501: [GC (Allocation Failure) 59751.502: [ParNew: 1727080K->47156K(1887488K), 9.4546047 secs] 5625986K->3946152K(21480704K), 9.4558187 secs] [Times: user=37.88 sys=0.00, real=9.45 secs]
59779.293: [GC (Allocation Failure) 59779.293: [ParNew: 1724980K->31847K(1887488K), 98.3531267 secs] 5623976K->3949810K(21480704K), 98.3542271 secs] [Times: user=393.93 sys=0.00, real=98.35 secs]
59933.098: [GC (Allocation Failure) 59933.099: [ParNew: 1709671K->24851K(1887488K), 155.8798281 secs] 5627634K->3947185K(21480704K), 155.8810828 secs] [Times: user=629.60 sys=0.00, real=155.88 secs]
60138.148: [GC (Allocation Failure) 60138.148: [ParNew: 1702675K->22690K(1887488K), 162.4118957 secs] 5625009K->3945214K(21480704K), 162.4132521 secs] [Times: user=659.75 sys=0.00, real=162.42 secs]
60352.343: [GC (Allocation Failure) 60352.343: [ParNew: 1700514K->26504K(1887488K), 166.2566004 secs] 5623038K->3949247K(21480704K), 166.2577295 secs] [Times: user=672.08 sys=0.00, real=166.25 secs]
60576.273: [GC (Allocation Failure) 60576.274: [ParNew: 1704328K->49624K(1887488K), 168.2311204 secs] 5627071K->3972465K(21480704K), 168.2322492 secs] [Times: user=660.16 sys=0.00, real=168.23 secs]
60791.458: [GC (Allocation Failure) 60791.459: [ParNew: 1727448K->33138K(1887488K), 174.0129541 secs] 5650289K->3956199K(21480704K), 174.0139678 secs] [Times: user=680.44 sys=0.00, real=174.02 secs]
61022.223: [GC (Allocation Failure) 61022.223: [ParNew: 1710962K->29340K(1887488K), 176.6459303 secs] 5634023K->3952483K(21480704K), 176.6474561 secs] [Times: user=690.10 sys=0.00, real=176.64 secs]
61246.967: [GC (Allocation Failure) 61246.967: [ParNew: 1707164K->31914K(1887488K), 182.4714040 secs] 5630307K->3955486K(21480704K), 182.4724297 secs] [Times: user=721.68 sys=0.00, real=182.47 secs]
61483.620: [GC (Allocation Failure) 61483.621: [ParNew: 1709738K->27879K(1887488K), 187.2219554 secs] 5633310K->3951691K(21480704K), 187.2230896 secs] [Times: user=783.68 sys=0.00, real=187.23 secs]
61723.432: [GC (Allocation Failure) 61723.433: [ParNew: 1705703K->22446K(1887488K), 192.4606406 secs] 5629515K->3946498K(21480704K), 192.4618564 secs] [Times: user=721.75 sys=0.00, real=192.46 secs]
61965.560: [GC (Allocation Failure) 61965.561: [ParNew: 1700270K->30881K(1887488K), 197.1221819 secs] 5624322K->3955113K(21480704K), 197.1233286 secs] [Times: user=806.53 sys=0.00, real=197.13 secs]
62211.173: [GC (Allocation Failure) 62211.174: [ParNew: 1708705K->22526K(1887488K), 199.7864929 secs] 5632937K->3946917K(21480704K), 199.7875671 secs] [Times: user=811.19 sys=0.00, real=199.78 secs]
62464.936: [GC (Allocation Failure) 62464.937: [ParNew: 1700350K->47417K(1887488K), 203.1958095 secs] 5624741K->3972025K(21480704K), 203.1969591 secs] [Times: user=789.37 sys=0.00, real=203.20 secs]
62682.808: [GC (Allocation Failure) 62682.808: [ParNew: 1724605K->37951K(1887488K), 206.8942846 secs] 5649212K->3962723K(21480704K), 206.8948168 secs] [Times: user=821.45 sys=0.00, real=206.90 secs]
Hi,If GC logging is giving you grief, you’ve got bigger problems because GC logs a very small amount of data. If this is an issue there will be a hint of it in the GC logs. At any rate, the problem is with GC and turning off GC logging leaves you blind to any possible solutions.Kind regards,Kirk Pepperdine
On Dec 21, 2017, at 9:40 PM, Josh Humphries <jh...@bluegosling.com> wrote:
Like Erik said, check that the host isn't swapping. An old-gen GC can end up scanning parts of the heap that the OS may have paged out if it encountered memory pressure.Another issue I've seen that causes this sort of thing is if you have GC logging turned on and encounter issues in I/O. GC logging is not async. I've seen the GC logging itself, to a drive that had a SCSI firmware issue, cause silly-long GC times. You can possibly corroborate by checking kernel logs to see if there are other signals of disk/driver issues.
On Thu, Dec 21, 2017 at 3:36 PM, Erik Svensson <er...@phlogiston.se> wrote:
If it’s on a virtual host then I’ve seen the same behaviour when the underlying hw started swapping. I’ve also seen the same when the other hosts on the hw had high cpu demands (i.e. my host was cpu starved).But, as martin said, gc logging always.cheerErik Svensson
D
On 21 Dec 2017, at 18:25, Abhishek Agarwal <abhis...@gmail.com> wrote:
Hello everyone,Recently, I had one of the JVMs going into this weird state wherein minor gc start kicking almost every minute but what is most surprising that it is one minor GC is taking as much as 100 seconds. The heap size is around 12 gigs and young gen is about 30% of that. As soon as gc starts, understandably CPU usage started increasing as well. I am not able to figure out how would one minor GC take this much time. The machine does not have swapping enabled. I also ran sar and confirmed there was no swapping during the issue. We are using parallel GC for young gen and CMS for old gen GC. How can I go about troubleshooting this further? Any pointers will be appreciated.--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/AG-iG9JuIfU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Hi,If GC logging is giving you grief, you’ve got bigger problems because GC logs a very small amount of data. If this is an issue there will be a hint of it in the GC logs. At any rate, the problem is with GC and turning off GC logging leaves you blind to any possible solutions.
Kind regards,Kirk Pepperdine
On Dec 21, 2017, at 9:40 PM, Josh Humphries <jh...@bluegosling.com> wrote:
Like Erik said, check that the host isn't swapping. An old-gen GC can end up scanning parts of the heap that the OS may have paged out if it encountered memory pressure.Another issue I've seen that causes this sort of thing is if you have GC logging turned on and encounter issues in I/O. GC logging is not async. I've seen the GC logging itself, to a drive that had a SCSI firmware issue, cause silly-long GC times. You can possibly corroborate by checking kernel logs to see if there are other signals of disk/driver issues.
On Thu, Dec 21, 2017 at 3:36 PM, Erik Svensson <er...@phlogiston.se> wrote:
If it’s on a virtual host then I’ve seen the same behaviour when the underlying hw started swapping. I’ve also seen the same when the other hosts on the hw had high cpu demands (i.e. my host was cpu starved).But, as martin said, gc logging always.cheerErik Svensson
D
On 21 Dec 2017, at 18:25, Abhishek Agarwal <abhis...@gmail.com> wrote:
Hello everyone,Recently, I had one of the JVMs going into this weird state wherein minor gc start kicking almost every minute but what is most surprising that it is one minor GC is taking as much as 100 seconds. The heap size is around 12 gigs and young gen is about 30% of that. As soon as gc starts, understandably CPU usage started increasing as well. I am not able to figure out how would one minor GC take this much time. The machine does not have swapping enabled. I also ran sar and confirmed there was no swapping during the issue. We are using parallel GC for young gen and CMS for old gen GC. How can I go about troubleshooting this further? Any pointers will be appreciated.--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
On Dec 22, 2017, at 7:28 AM, Abhishek Agarwal <abhis...@gmail.com> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/AG-iG9JuIfU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
--Regards,
Abhishek Agarwal
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/AG-iG9JuIfU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
D
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/AG-iG9JuIfU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--Regards,
Abhishek Agarwal
Thank you for the pointers. We have gc logs but didn't help much.Few corrections:Old gen size is ~ 20 GigsYoung gen size is ~ 1.5 GigsThe skew between young gen and old gen size has been fixed. Here is the snippet of gc logs when time increased to ~ 200 seconds. By the way the service is running on AWS.
--Regards,
Abhishek Agarwal
On Jan 2, 2018, at 9:08 AM, pboot...@gmail.com wrote:Abhishek,The fact that you are running on AWS is enormously significant.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.