Java app and scheduler settings

199 views
Skip to first unread message

Stefano Ciccarelli

unread,
Aug 7, 2015, 7:11:49 AM8/7/15
to GAE Group
Hello to all!

Our application has always suffered from high latencies due to instances respawning continuously.
We always used custom settings in "automatic-scaling" to try to save money.

Following this post (http://googlecloudplatform.blogspot.it/2015/08/How-to-Troubleshoot-Latency-in-Your-App-Engine-Application.html) I've decided to reset the "automatic-scheduling" settings of our app to default values.

So I've configured only min-idle-instances to 1 (otherwise the /_ah/warmup handler is never called). The instance class is F2.

After two days this is our state.

So:
- why I have 4 instances 2 days old not used? (The logs says that one instance served ~20 request today, but the other instances stopped to serve yesterday)
- why the scheduler stop and spawn instances every 40/50 minutes if 4 instances are there to do nothing?
- why the resident instance is restarted every 40/50 minutes if it is supposed to be... resident?
- is there something wrong with our code?

Thanks




nimbus-image-1438944970211.png
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 
stefano.c...@mmbsoftware.it 

M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 
www.mmbsoftware.it - in...@mmbsoftware.it

Le informazioni contenute in questa comunicazione sono riservate e destinate esclusivamente alla/e persona/e o all'ente sopra indicati. E' vietato ai soggetti diversi dai destinatari qualsiasi uso, copia, diffusione di quanto in esso contenuto sia ai sensi dell'art. 616 c.p., sia ai sensi del DL n. 196/03. Se questa comunicazione Vi e' pervenuta per errore, Vi preghiamo di rispondere a questa e-mail e successivamente cancellarla dal Vostro sistema.

Nick (Cloud Platform Support)

unread,
Aug 7, 2015, 11:36:40 AM8/7/15
to Google App Engine
Hi Stefano,

If I understand your post correctly, you're using automatic scaling and have only configured the min_idle_instances parameter to be 1. You're wondering why you've seen a certain request distribution pattern or dynamic instance lifetime on your app. I think that the answer to these questions lies in understanding the App Engine Autoscaler.

Putting your scaling settings into default means that you give up control over max_idle_instances (that is, there is no max), and so the Autoscaler can and will spawn idle instances to preempt traffic or respond to sudden load spikes. This is based on past request loads on your app and projections based on spikes.

You will probably want to determine your current and projected request load and set a sensible maximum to this value, since at that point you could be assured that you won't be charged for more idle instances, if they are spawned, than your configuration specified

As I can see from looking at the data you supplied, an instance which was only active for 34 minutes handled 1108 requests. While it's certainly possible that this load was handled well by the instance, if the same load was observed on the other two machines at some point in the past, which the overall graph of requests seems to imply, especially given how spiked the load can tend to be within even the past 6 hours (the timeframe of the graph), it makes sense that the Autoscaler would keep some instances in reserve and route requests to them once the load went higher, but not spin them down in case a new spike were to arrive. 

The Autoscaler scales aggressively in order to ensure that you won't be left overloaded, and it's up to your scaling configuration to make sure that this matches your expected load. I suggest you look into the max and min_pending_latency configuration options as well as min and max_idle_instances.

Now, as to your question about resident instances: a resident instance is an instance which is always ready for requests, and won't spin down due to inactivity. You specified one min idle instance, so you have one resident instance. Idle instances ("resident" instances) are meant to allow your app to handle traffic while all dynamic instances (if there are any) are overloaded, and new dynamic instances need to be created, so generally they will only receive traffic during the front edge of a load spike, acting as a buffer as opposed to the main force of request-handling instances.

While resident instances won't spin down due to inactivity, they can shut down due to errors or memory limits being reached, in which case the instance will receive a request to /_ah/stop which allows it to attempt to save any delicate state it's holding. Instances can also (on a rare occasion) restart due to datacenter events and maintenance.

However more common reasons for seeing a dynamic instance with a smaller age than other dynamic instances is either because a change has just been made which deploys a new resident instance (such as a configuration change to specify 1 resident instance), or because a resident instance which is serving traffic and becomes quite busy with that task might transition to being a regular serving dynamic instance as a small efficiency gain, and another resident instance will be created to take its place as a buffer for even higher load spikes.

So, overall, my advice is to keep in mind that the Autoscaler will scale your instances to meet spiking loads (which it seems you have), and that your scaling configuration can help keep this within certain boundaries.

I hope this has helped explain what you're seeing, and feel free to ask any further questions you might have.

Regards,

Nick


On Friday, August 7, 2015 at 7:11:49 AM UTC-4, Stefano Ciccarelli wrote:
Hello to all!

Our application has always suffered from high latencies due to instances respawning continuously.
We always used custom settings in "automatic-scaling" to try to save money.

Following this post (http://googlecloudplatform.blogspot.it/2015/08/How-to-Troubleshoot-Latency-in-Your-App-Engine-Application.html) I've decided to reset the "automatic-scheduling" settings of our app to default values.

So I've configured only min-idle-instances to 1 (otherwise the /_ah/warmup handler is never called). The instance class is F2.

After two days this is our state.

So:
- why I have 4 instances 2 days old not used? (The logs says that one instance served ~20 request today, but the other instances stopped to serve yesterday)
- why the scheduler stop and spawn instances every 40/50 minutes if 4 instances are there to do nothing?
- why the resident instance is restarted every 40/50 minutes if it is supposed to be... resident?
- is there something wrong with our code?

Thanks




nimbus-image-1438944970211.png
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Stefano Ciccarelli

unread,
Aug 10, 2015, 6:31:46 AM8/10/15
to google-a...@googlegroups.com
Hi Nick,

your answer is really complete, but there is still something that doesn't work here.

Our application is a little bit strange because is used only during the work days and the work hours (Central European Summer Time).

For all the weekend we got 7 instances live (almost unused), but this morning, all the 7 instances were killed *at the same time* and replaced by 3 fresh instances. Why? And then, every hour the 3 instances were killed *at the same time* and replaced by 3 fresh instances. Again, why?

During the "kill and replace" the latency grows up and our users are a bit upset.

Why the scheduler keeps killing *all* the instances to replace them with new ones?

When I posted the first screenshot, last week, I was paying for 7-8 idle instances, but 4 of them were never used, while 3 where killed and replaced every our, and the latency increased every our, because during the "kill and replace" the first 4 instances were never 

I'm not sure of being able to express myself well, especially because English is not my first language, but, from what I can see the behavior of the scheduler with our application is not as documentation describes and not as you describe, because even if we pay for idle instances, every hour (or less) the latency increases although it is supposed should not be done.

Why I have to pay for idle instances (most notabily the resident ones) if are restarted every hour or less and the users get high latency?




M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Le informazioni contenute in questa comunicazione sono riservate e destinate esclusivamente alla/e persona/e o all'ente sopra indicati. E' vietato ai soggetti diversi dai destinatari qualsiasi uso, copia, diffusione di quanto in esso contenuto sia ai sensi dell'art. 616 c.p., sia ai sensi del DL n. 196/03. Se questa comunicazione Vi e' pervenuta per errore, Vi preghiamo di rispondere a questa e-mail e successivamente cancellarla dal Vostro sistema.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/b4e88571-6790-4fb7-b129-afb82d27821b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 
nimbus-image-1439190255158.png
nimbus-image-1439192845701.png
nimbus-image-1439199992714.png
nimbus-image-1439197845896.png

Stefano Ciccarelli

unread,
Aug 10, 2015, 11:15:45 AM8/10/15
to google-a...@googlegroups.com
Hi Nick,

the two screenshots attached to this post are perfect to explain what happen everyday to my app: one intance was started 3 hours ago and served only 22 requests, every hour (or so) 3 instances are killed and replaced by fresh ones, all at the same time, so all the requests goes to the new instances (high latencies).

PS: I have no errors in my app.


Il giorno ven 7 ago 2015 alle ore 17:36 Nick (Cloud Platform Support) <pay...@google.com> ha scritto:


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Le informazioni contenute in questa comunicazione sono riservate e destinate esclusivamente alla/e persona/e o all'ente sopra indicati. E' vietato ai soggetti diversi dai destinatari qualsiasi uso, copia, diffusione di quanto in esso contenuto sia ai sensi dell'art. 616 c.p., sia ai sensi del DL n. 196/03. Se questa comunicazione Vi e' pervenuta per errore, Vi preghiamo di rispondere a questa e-mail e successivamente cancellarla dal Vostro sistema.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/b4e88571-6790-4fb7-b129-afb82d27821b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 
nimbus-image-1439215777290.png
nimbus-image-1439219328802.png

Nick (Cloud Platform Support)

unread,
Aug 10, 2015, 7:05:36 PM8/10/15
to Google App Engine
Hi Stefano,

This does indeed seem very odd, and thank you for clarifying the behaviour. It's possible some part of your code is catching the errors and not logging them, or your logging level is not set appropriately, but these are small possibilities.

In this case, I think the best thing for you to do would be to open a public issue tracker issue with your app id, a description of the problem, a time-frame to investigate, and hopefully the issue can be looked at by engineering to determine if there's some issue in the platform or your app. We monitor this forum, stackoverflow, and the public issue tracker, so your issue should see action within a few days of being posted.

PS - your english is quite good, and you've done a very thorough job of reporting your issue. I wish you the best of luck with your application.

Nick


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Le informazioni contenute in questa comunicazione sono riservate e destinate esclusivamente alla/e persona/e o all'ente sopra indicati. E' vietato ai soggetti diversi dai destinatari qualsiasi uso, copia, diffusione di quanto in esso contenuto sia ai sensi dell'art. 616 c.p., sia ai sensi del DL n. 196/03. Se questa comunicazione Vi e' pervenuta per errore, Vi preghiamo di rispondere a questa e-mail e successivamente cancellarla dal Vostro sistema.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Mauricio Lumbreras

unread,
Aug 11, 2015, 5:43:59 PM8/11/15
to Google App Engine

Hello,
we are suffering mostly the same issue, we have resident instances and without any reason nor error, the instance is killed and restarted. No ah stop message in the logs. Also we do not run out of memory. We run B8 backends and java.
I started a very long ticket with GAE support a couple of months ago and they claim with app engine 1.9.25 it will be solved in some manner, but they explain me there is a scheduler refactoring on progress but nor deployed until 6 months from now....
We are fighting with this issue: instances with no traffic lives for hours, in the moment there is a sustained traffic but reasonable, instamce is killerd and restarted every 12 or 15 minutes, very regular fashion. These restart are not linked to any maintenance or singular condition, it occurs every day and about 20 times per day at least.

Regards
Mauricio

Nick (Cloud Platform Support)

unread,
Aug 12, 2015, 2:24:14 PM8/12/15
to Google App Engine
Hi Mauricio,

I'd be happy to look further into this issue for you, especially since it seems related to the issue Stefano is experiencing. Could you provide the case reference number if you have it?

Thanks,

Nick

Stefano Ciccarelli

unread,
Aug 13, 2015, 6:51:50 AM8/13/15
to Google App Engine

The scheduler is completly broken!

In the last hour the scheduler keeps respawning instances every minute or less! The latency is orrible and the costs are rising!


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Stefano Ciccarelli

unread,
Aug 13, 2015, 6:56:43 AM8/13/15
to Google App Engine
I filed a production issue because our app is really unusable!


Nick (Cloud Platform Support)

unread,
Aug 13, 2015, 7:40:51 PM8/13/15
to Google App Engine
Hi Stefano,

I notice that you've opened a support case for this issue as well. We'll update this thread and the public issue tracker with any results or solutions.

Best wishes,

Nick


On Thursday, August 13, 2015 at 6:56:43 AM UTC-4, Stefano Ciccarelli wrote:
I filed a production issue because our app is really unusable!



Il giorno gio 13 ago 2015 alle ore 12:50 Stefano Ciccarelli <stefano.ciccarelli@mmbsoftware.it> ha scritto:

The scheduler is completly broken!

In the last hour the scheduler keeps respawning instances every minute or less! The latency is orrible and the costs are rising!


Il giorno mer 12 ago 2015 alle ore 20:24 Nick (Cloud Platform Support) <pay...@google.com> ha scritto:
Hi Mauricio,

I'd be happy to look further into this issue for you, especially since it seems related to the issue Stefano is experiencing. Could you provide the case reference number if you have it?

Thanks,

Nick


On Tuesday, August 11, 2015 at 5:43:59 PM UTC-4, Mauricio Lumbreras wrote:

Hello,
we are suffering mostly the same issue, we have resident instances and without any reason nor error, the instance is killed and restarted. No ah stop message in the logs. Also we do not run out of memory. We run B8 backends and java.
I started a very long ticket with GAE support a couple of months ago and they claim with app engine 1.9.25 it will be solved in some manner, but they explain me there is a scheduler refactoring on progress but nor deployed until 6 months from now....
We are fighting with this issue: instances with no traffic lives for hours, in the moment there is a sustained traffic but reasonable, instamce is killerd and restarted every 12 or 15 minutes, very regular fashion. These restart are not linked to any maintenance or singular condition, it occurs every day and about 20 times per day at least.

Regards
Mauricio

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Le informazioni contenute in questa comunicazione sono riservate e destinate esclusivamente alla/e persona/e o all'ente sopra indicati. E' vietato ai soggetti diversi dai destinatari qualsiasi uso, copia, diffusione di quanto in esso contenuto sia ai sensi dell'art. 616 c.p., sia ai sensi del DL n. 196/03. Se questa comunicazione Vi e' pervenuta per errore, Vi preghiamo di rispondere a questa e-mail e successivamente cancellarla dal Vostro sistema.
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Nick (Cloud Platform Support)

unread,
Aug 13, 2015, 7:43:37 PM8/13/15
to Google App Engine
Quick correction: I don't believe you have opened a support case. We'll continue to work with the production issue in the public issue tracker you've filed. Please note that a support case does guarantee immediate responses and troubleshooting, while a public issue tracker issue, even in production, is not meant for 1-on-1 support but is technically meant for the reporting of an issue occurring in production.

This is a pretty serious issue, so we'll be investigating it shortly. However, it's also possible symptoms like this could manifest from a code error, such as instances which use up too much private memory and get restarted based on that. The investigation will have to determine what solutions can be recommended / implemented on either side.


On Thursday, August 13, 2015 at 6:56:43 AM UTC-4, Stefano Ciccarelli wrote:
I filed a production issue because our app is really unusable!



Il giorno gio 13 ago 2015 alle ore 12:50 Stefano Ciccarelli <stefano.ciccarelli@mmbsoftware.it> ha scritto:

The scheduler is completly broken!

In the last hour the scheduler keeps respawning instances every minute or less! The latency is orrible and the costs are rising!


Il giorno mer 12 ago 2015 alle ore 20:24 Nick (Cloud Platform Support) <pay...@google.com> ha scritto:
Hi Mauricio,

I'd be happy to look further into this issue for you, especially since it seems related to the issue Stefano is experiencing. Could you provide the case reference number if you have it?

Thanks,

Nick


On Tuesday, August 11, 2015 at 5:43:59 PM UTC-4, Mauricio Lumbreras wrote:

Hello,
we are suffering mostly the same issue, we have resident instances and without any reason nor error, the instance is killed and restarted. No ah stop message in the logs. Also we do not run out of memory. We run B8 backends and java.
I started a very long ticket with GAE support a couple of months ago and they claim with app engine 1.9.25 it will be solved in some manner, but they explain me there is a scheduler refactoring on progress but nor deployed until 6 months from now....
We are fighting with this issue: instances with no traffic lives for hours, in the moment there is a sustained traffic but reasonable, instamce is killerd and restarted every 12 or 15 minutes, very regular fashion. These restart are not linked to any maintenance or singular condition, it occurs every day and about 20 times per day at least.

Regards
Mauricio

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Le informazioni contenute in questa comunicazione sono riservate e destinate esclusivamente alla/e persona/e o all'ente sopra indicati. E' vietato ai soggetti diversi dai destinatari qualsiasi uso, copia, diffusione di quanto in esso contenuto sia ai sensi dell'art. 616 c.p., sia ai sensi del DL n. 196/03. Se questa comunicazione Vi e' pervenuta per errore, Vi preghiamo di rispondere a questa e-mail e successivamente cancellarla dal Vostro sistema.
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Nick (Cloud Platform Support)

unread,
Aug 17, 2015, 7:28:07 PM8/17/15
to Google App Engine
For the record of any future users: an investigation of the issue reported by Stefano appeared to turn up that the cause of the instances shutting down was low instance specs combined with a memory-heavy CSV import job.

Mauricio, I've located the support ticket you have open, and it appears that you're currently receiving help there for a similar but unrelated issue. I encourage you to follow up there. 


On Friday, August 7, 2015 at 7:11:49 AM UTC-4, Stefano Ciccarelli wrote:
Hello to all!

Our application has always suffered from high latencies due to instances respawning continuously.
We always used custom settings in "automatic-scaling" to try to save money.

Following this post (http://googlecloudplatform.blogspot.it/2015/08/How-to-Troubleshoot-Latency-in-Your-App-Engine-Application.html) I've decided to reset the "automatic-scheduling" settings of our app to default values.

So I've configured only min-idle-instances to 1 (otherwise the /_ah/warmup handler is never called). The instance class is F2.

After two days this is our state.

So:
- why I have 4 instances 2 days old not used? (The logs says that one instance served ~20 request today, but the other instances stopped to serve yesterday)
- why the scheduler stop and spawn instances every 40/50 minutes if 4 instances are there to do nothing?
- why the resident instance is restarted every 40/50 minutes if it is supposed to be... resident?
- is there something wrong with our code?

Thanks




nimbus-image-1438944970211.png
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Stefano Ciccarelli

unread,
Aug 18, 2015, 3:29:13 AM8/18/15
to Google App Engine
Nick, I completely disagree.

The CSV import job is not so heavy because stream a file from GCS and enqueue a task for every row. We designed that job with the GAE rules in mind. Moreover that jobs are very few compared to the app traffic and are handled by a backend module, but my instances restarts on the default module.
When the app is used the instances restart *at the same time* every hour or less. I still have no errors and I still have no evidence that our is wrong. We are sure to have followed all the rules and all the docs. We tried to configure the instance class to F4 but was the same. If the app has very few traffic the instances last days but as the traffic rise the instances begin restarting.

I have not responded to the issue because I found it shallow and offensive, seems that someone looked at the logs and seeing an url like "importcsv" found the gold mine. The issue mention a warning in the log about a concurrency issue as an evidence of our fault, well, the warning is to log when our 'entity group limiter' start to limit the write rate on a specific entity group to consume less resources.

Now I'm on vacation, the instances restart, I'm very upset, in September I will decide the role of GCP on our business.

Il giorno mar 18 ago 2015 alle 01:28 Nick (Cloud Platform Support) <pay...@google.com> ha scritto:
For the record of any future users: an investigation of the issue reported by Stefano appeared to turn up that the cause of the instances shutting down was low instance specs combined with a memory-heavy CSV import job.

Mauricio, I've located the support ticket you have open, and it appears that you're currently receiving help there for a similar but unrelated issue. I encourage you to follow up there. 


On Friday, August 7, 2015 at 7:11:49 AM UTC-4, Stefano Ciccarelli wrote:
Hello to all!

Our application has always suffered from high latencies due to instances respawning continuously.
We always used custom settings in "automatic-scaling" to try to save money.

Following this post (http://googlecloudplatform.blogspot.it/2015/08/How-to-Troubleshoot-Latency-in-Your-App-Engine-Application.html) I've decided to reset the "automatic-scheduling" settings of our app to default values.

So I've configured only min-idle-instances to 1 (otherwise the /_ah/warmup handler is never called). The instance class is F2.

After two days this is our state.

So:
- why I have 4 instances 2 days old not used? (The logs says that one instance served ~20 request today, but the other instances stopped to serve yesterday)
- why the scheduler stop and spawn instances every 40/50 minutes if 4 instances are there to do nothing?
- why the resident instance is restarted every 40/50 minutes if it is supposed to be... resident?
- is there something wrong with our code?

Thanks




nimbus-image-1438944970211.png
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Le informazioni contenute in questa comunicazione sono riservate e destinate esclusivamente alla/e persona/e o all'ente sopra indicati. E' vietato ai soggetti diversi dai destinatari qualsiasi uso, copia, diffusione di quanto in esso contenuto sia ai sensi dell'art. 616 c.p., sia ai sensi del DL n. 196/03. Se questa comunicazione Vi e' pervenuta per errore, Vi preghiamo di rispondere a questa e-mail e successivamente cancellarla dal Vostro sistema.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Stefano Ciccarelli 
GAE Application Division 
/ Director 


M.M.B. s.r.l. 
via Granarolo, 177/7 - 48018 Faenza (RA) - Italy 
tel. +39.0546.637711 - fax +39.0546.46077 

Jeff Schnitzer

unread,
Aug 18, 2015, 10:39:33 PM8/18/15
to Google App Engine
It sounds like the root problem is "OOM errors fail to produce good log messages".

What CSV parser are you using? If you're on Java, most of them are crap. Jackson has a pretty good CSV plugin, but you probably still need to be very careful about streaming - it's really easy to blow up the heap with large CSV imports. I've had this problem before.

Jeff

Stefano Ciccarelli

unread,
Aug 19, 2015, 3:09:44 AM8/19/15
to Google App Engine
Jeff, I don't think so.

I use a simple BufferedReader and for every readline I do a simple String.split.
By the way that jobs parse very small files and takes from 5 to 10 seconds, sometimes 20 seconds but is very unusual, the files are very small (usually less than 10kb). That jobs are run from a B1 instance on a dedicated module, my problem is that my front end, F2 instances, restart every hour less all at the same time, even instances that aren't serving traffic. Sometimes, like now, I have instances many days old that aren't receiving traffic, while the instances that receive traffic are restarted every hour or less.

Look at the screenshot, the 2 instances 18 hours old have served the last request some hours ago, they are like ghost instances, aren't used and all the traffic is managed by the 3 fresh instances. When the 3 instances were started I got a latency spike because the other 2 instances haven't handled traffic.






For more options, visit https://groups.google.com/d/optout.

Stefano Ciccarelli

unread,
Aug 19, 2015, 9:00:39 AM8/19/15
to Google App Engine
This screenshot taken some minutes ago is better than thousand words.

And the issue was closed because "worksasintended"... I think works bad, very bad!

Jeff Schnitzer

unread,
Aug 19, 2015, 2:18:24 PM8/19/15
to Google App Engine
Are you getting billed for the "zombie" instances?

The real issue here is that you _can't tell_ if your restarts are caused by OOM conditions. Google says it is, you don't believe them. There needs to be something in the logs to indicate the condition; servers should not die without explanation. I would file an issue against that - because then, if the logs don't indicate OOM, you have something to push back with.

Maybe relevant anecdote: A long time ago (years) I experimented with the "resident instances" setting and found that it did horrible things to my user experience (lots of user facing cold starts). I don't know if it was a gap in my understanding of this feature or genuine misbehavior, but I quickly abandoned it and haven't touched it since. Try the default scheduler behavior and see if that improves things. Of course, this advice is based on seriously stale information, so YMMV.

Jeff

Reply all
Reply to author
Forward
0 new messages