H12 errors, blocked dynos and Heroku website claims.

Showing 1-18 of 18 messages
H12 errors, blocked dynos and Heroku website claims. Tim W 2/16/11 8:42 AM
The Heroku website claims:

http://heroku.com/how/dyno_grid_last#3
"If a dyno is unresponsive for any reason (user bugs, long requests,
or high load), other requests will be routed around it."

In my experience, this does not seem to be the case. We have several
admin features in our app that when requested with certain params, it
can take longer then 30s to run. (I am working on ways to get these in
check and in the background). When a user trips one of these long
running requests, Heroku appears to queue additional requests to this
dyno and those requests time out, even though there are plenty of
other dynos available to handle that request.

Is the statement on the Heroku website true or false? It does not
appear that Heroku actively monitors the dynos to see if they are busy
with a long running request. Is there a better way to handle this
situation?

Thanks..
-tim
Re: H12 errors, blocked dynos and Heroku website claims. Neil Middleton 2/16/11 8:45 AM
The dyno is still running the long request, successfully.  It's only the routing mesh that's returned the timeout error back to the user.  Therefore, the dynos still in your 'grid' and ready for new requests.

I blogged about something very similar a couple of weeks back:  http://neilmiddleton.com/avoiding-zombie-dynos-with-heroku


Neil Middleton

--
You received this message because you are subscribed to the Google Groups "Heroku" group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to heroku+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/heroku?hl=en.

Re: H12 errors, blocked dynos and Heroku website claims. Tim W 2/16/11 8:55 AM
Thanks, I will give rack-timeout a try.

So what it seems like is that the routing mesh is not as sophisticated
as Heroku leads on?

On Feb 16, 11:45 am, Neil Middleton <neil.middle...@gmail.com> wrote:
> The dyno is still running the long request, successfully. It's only the routing mesh that's returned the timeout error back to the user. Therefore, the dynos still in your 'grid' and ready for new requests.
>
> I blogged about something very similar a couple of weeks back:http://neilmiddleton.com/avoiding-zombie-dynos-with-heroku
>
> Neil Middletonhttp://about.me/neilmiddleton
Re: H12 errors, blocked dynos and Heroku website claims. Neil Middleton 2/16/11 8:57 AM
Is it, but you have a healthy dyno.  If the dyno crashes, or hangs somehow, it gets removed.  


Neil Middleton

Re: H12 errors, blocked dynos and Heroku website claims. Tim W 2/16/11 9:29 AM
I guess I am just used to using passenger which uses a global queue,
making a single long running request a non issue.



On Feb 16, 11:57 am, Neil Middleton <neil.middle...@gmail.com> wrote:
> Is it, but you have a healthy dyno. If the dyno crashes, or hangs somehow, it gets removed.
>
Re: H12 errors, blocked dynos and Heroku website claims. Neil Middleton 2/16/11 9:36 AM
AFAIK Passenger does have a similar concept with running processes (having a default of six running processes, which are comparable to 6 dynos). 

The situation you describe should have the same results on Passenger as Heroku.  More info on Passenger here: http://www.modrails.com/documentation/Users%20guide%20Apache.html#_resource_control_and_optimization_options


Neil Middleton

Re: H12 errors, blocked dynos and Heroku website claims. Tim W 2/16/11 9:46 AM
Passenger... imho handles this better then Heroku

http://www.modrails.com/documentation/Users%20guide%20Apache.html#PassengerUseGlobalQueue

>>If global queuing is turned on, then Phusion Passenger will use a global queue that’s shared between all backend processes. If an HTTP request comes in, and all the backend processes are still busy, then Phusion Passenger will wait until at least one backend process is done, and will then forward the request to that process.<<

default is on


On Feb 16, 12:36 pm, Neil Middleton <neil.middle...@gmail.com> wrote:
> AFAIK Passenger does have a similar concept with running processes (having a default of six running processes, which are comparable to 6 dynos).
>
> The situation you describe should have the same results on Passenger as Heroku. More info on Passenger here:http://www.modrails.com/documentation/Users%20guide%20Apache.html#_re...
Re: H12 errors, blocked dynos and Heroku website claims. Neil Middleton 2/16/11 9:50 AM
Is this not identical to what Heroku provides though?  Your global queue is your applications dynos and the routing mesh will send requests to whichever dynos are idle. The wait being the backlog.

The only difference I can see is that Passenger won't, by default, spit back any requests that take longer than 30 seconds.


Neil Middleton

Re: H12 errors, blocked dynos and Heroku website claims. Tim W 2/16/11 10:01 AM
It is not identical to what Heroku is providing.. The Heroku mesh
seems to blindly sends a request to a dyno, no matter the current
status of the dyno. The queue is at the dyno level. Passenger holds
back the request until a process is available..

With passenger you do not end up in the situation noted below, where
as with Heroku you do..
(Request Y gets served ok with passenger, with Heroku, request Y gets
the H12 error)

Quoted from passenger docs (this is what happens if you have that
feature turned off on passenger and what always happens with Heroku):
----------------------------------------------------------------------
The situation looks like this:

Backend process A:  [*     ]  (1 request in queue)
Backend process B:  [***   ]  (3 requests in queue)
Backend process C:  [***   ]  (3 requests in queue)
Backend process D:  [***   ]  (3 requests in queue)
Each process is currently serving short-running requests.

Phusion Passenger will forward the next request to backend process A.
A will now have 2 items in its queue. We’ll mark this new request with
an X:

Backend process A:  [*X    ]  (2 request in queue)
Backend process B:  [***   ]  (3 requests in queue)
Backend process C:  [***   ]  (3 requests in queue)
Backend process D:  [***   ]  (3 requests in queue)

Assuming that B, C and D still aren’t done with their current request,
the next HTTP request - let’s call this Y - will be forwarded to
backend process A as well, because it has the least number of items in
its queue:

Backend process A:  [*XY   ]  (3 requests in queue)
Backend process B:  [***   ]  (3 requests in queue)
Backend process C:  [***   ]  (3 requests in queue)
Backend process D:  [***   ]  (3 requests in queue)

But if request X happens to be a long-running request that needs 60
seconds to complete, then we’ll have a problem. Y won’t be processed
for at least 60 seconds. It would have been a better idea if Y was
forward to processes B, C or D instead, because they only have short-
living requests in their queues.




On Feb 16, 12:50 pm, Neil Middleton <neil.middle...@gmail.com> wrote:
> Is this not identical to what Heroku provides though? Your global queue is your applications dynos and the routing mesh will send requests to whichever dynos are idle. The wait being the backlog.
>
> The only difference I can see is that Passenger won't, by default, spit back any requests that take longer than 30 seconds.
>
> Neil Middletonhttp://about.me/neilmiddleton
>
>
>
>
>
>
>
> On Wednesday, 16 February 2011 at 17:46, Tim W wrote:
> > Passenger... imho handles this better then Heroku
>
> >http://www.modrails.com/documentation/Users%20guide%20Apache.html#Pas...
Re: H12 errors, blocked dynos and Heroku website claims. Neil Middleton 2/16/11 11:12 AM
Although the symptoms that you are seeing may not indicate it the two systems are the same.  There is a queue backlog, and the dynos pick up the next request from that backlog when they become idle, as described here:  http://devcenter.heroku.com/articles/key-concepts-performance  (esp the part about backlog).

I've seen many instances on my applications which indicate this to be an accurate description.

Although I have no real clue about the application you have regarding number of dynos etc, it would appear that what you are seeing could be a different problem.

 
Neil Middleton
http://about.me/neilmiddleton
Re: H12 errors, blocked dynos and Heroku website claims. Tim W 2/16/11 12:13 PM
Wow.. that does say the mesh will hold a request until an app is
available, but that is not how Heroku is currently operating..

I'll write this up a bit better and send it off to Heroku support...
but here is how to duplicate...

---
Created a new app http://h12-test2.heroku.com/
Simple sinatra app. Responds to two paths / and /wait (/wait sleeps
for 300 seconds)

=== h12-test2
Web URL:        http://h12-test2.heroku.com/
Git Repo:       git@heroku.com:h12-test2.git
Dynos:          2
Workers:        0
Repo size:      1M
Slug size:      1M
Stack:          bamboo-ree-1.8.7
Data size:      (empty)
Addons:         Expanded Logging, Shared Database 5MB


2 Dynos available...
If i have one tab open with /wait, according to Heroku.. I should not
get a time out on / if I just hit reload and reload in my browser...
but I do...


2011-02-16T12:07:03-08:00 heroku[web.2]: State changed from starting
to up
2011-02-16T12:07:04-08:00 heroku[web.1]: State changed from starting
to up
2011-02-16T12:09:44-08:00 heroku[router]: Error H12 (Request timeout) -
> GET h12-test2.heroku.com/wait dyno=web.2 queue=0 wait=0ms
service=0ms bytes=0
2011-02-16T12:09:45-08:00 heroku[router]: GET h12-test2.heroku.com/
favicon.ico
dyno=web.1 queue=0 wait=0ms service=7ms bytes=221
2011-02-16T12:09:45-08:00 app[web.1]: xx.223.127.66, 10.110.34.145 - -
[16/Feb/2011 12:09:45] "GET /favicon.ico HTTP/1.1" 404 18 0.0009
2011-02-16T12:09:45-08:00 heroku[nginx]: GET /favicon.ico HTTP/1.1 |
xx.223.127.66 | 252 | http | 404
2011-02-16T12:09:50-08:00 heroku[router]: Error H12 (Request timeout) -
> GET h12-test2.heroku.com/ dyno=web.2 queue=0 wait=0ms service=0ms
bytes=0
2011-02-16T12:09:50-08:00 heroku[nginx]: GET / HTTP/1.1 | xx.
223.127.66 | 3364 | http | 502
2011-02-16T12:10:01-08:00 heroku[router]: GET h12-test2.heroku.com/
dyno=web.1
queue=0 wait=0ms service=1ms bytes=238
2011-02-16T12:10:01-08:00 app[web.1]: xx.223.127.66, 10.101.29.42 - -
[16/Feb/2011 12:10:01] "GET / HTTP/1.0" 200 59 0.0004
2011-02-16T12:10:01-08:00 heroku[nginx]: GET / HTTP/1.1 | xx.
223.127.66 | 269 | http | 200
2011-02-16T12:10:20-08:00 heroku[router]: Error H12 (Request timeout) -
> GET h12-test2.heroku.com/favicon.ico dyno=web.2 queue=0 wait=0ms
service=0ms bytes=0
2011-02-16T12:10:20-08:00 heroku[router]: GET h12-test2.heroku.com/
favicon.ico
dyno=web.1 queue=0 wait=0ms service=8ms bytes=221
2011-02-16T12:10:20-08:00 app[web.1]: xx.223.127.66, 10.110.34.145 - -
[16/Feb/2011 12:10:20] "GET /favicon.ico HTTP/1.0" 404 18 0.0005
2011-02-16T12:10:20-08:00 heroku[nginx]: GET /favicon.ico HTTP/1.1 |
xx.223.127.66 | 3364 | http | 502
2011-02-16T12:10:21-08:00 heroku[nginx]: GET /favicon.ico HTTP/1.1 |
xx.223.127.66 | 252 | http | 404
2011-02-16T12:10:33-08:00 heroku[router]: Error H12 (Request timeout) -
> GET h12-test2.heroku.com/ dyno=web.2 queue=0 wait=0ms service=0ms
bytes=0
2011-02-16T12:10:33-08:00 heroku[nginx]: GET / HTTP/1.1 | xx.
223.127.66 | 3364 | http | 502
2011-02-16T12:10:44-08:00 app[web.1]: xx.223.127.66, 10.108.18.220 - -
[16/Feb/2011 12:10:44] "GET / HTTP/1.0" 200 59 0.0004

The dyno web.2 is busy, web.1 is open... yet requests get sent to web.
2 and H12 timeout. Why?


-tim


On Feb 16, 2:12 pm, Neil Middleton <neil.middle...@gmail.com> wrote:
> Although the symptoms that you are seeing may not indicate it the two systems are the same. There is a queue backlog, and the dynos pick up the next request from that backlog when they become idle, as described here:http://devcenter.heroku.com/articles/key-concepts-performance(esp the part about backlog).
>
> I've seen many instances on my applications which indicate this to be an accurate description.
>
> Although I have no real clue about the application you have regarding number of dynos etc, it would appear that what you are seeing could be a different problem.
>
Re: H12 errors, blocked dynos and Heroku website claims. Oren Teich 2/16/11 4:56 PM
the timeout is only on the routing side.  

If you have 2 dynos, and 1 is running "forever", then 50% of your requests to your app will timeout.  

This is expected behavior on Heroku today.  

I strongly encourage you to use the rack-timeout gem to ensure that your dyno terminates after 30 seconds so you don't get this odd behavior.


-- 
Oren Teich

On Wednesday, February 16, 2011 at 12:13 PM, Tim W wrote:

Wow.. that does say the mesh will hold a request until an app is
available, but that is not how Heroku is currently operating..

I'll write this up a bit better and send it off to Heroku support...
but here is how to duplicate...

---
Created a new app http://h12-test2.heroku.com/
Simple sinatra app. Responds to two paths / and /wait (/wait sleeps
for 300 seconds)

=== h12-test2
Web URL: http://h12-test2.heroku.com/
Git Repo: g...@heroku.com:h12-test2.git

Dynos: 2
Workers: 0
Repo size: 1M
Slug size: 1M
Stack: bambooo-ree-1.8.7

Re: H12 errors, blocked dynos and Heroku website claims. Tim W 2/16/11 6:57 PM
I am quite surprised this is expected behavior.

The documentation on the site is now out of date and is highly
misleading. The entire 2 paragraphs of the backlog section are untrue.
( http://devcenter.heroku.com/articles/key-concepts-performance )



On Feb 16, 7:56 pm, Oren Teich <o...@heroku.com> wrote:
> the timeout is only on the routing side.
>
> If you have 2 dynos, and 1 is running "forever", then 50% of your requests to your app will timeout.
>
> This is expected behavior on Heroku today.
>
> I strongly encourage you to use the rack-timeout gem to ensure that your dyno terminates after 30 seconds so you don't get this odd behavior.
>
> --
> Oren Teich
>
>
>
>
>
>
>
> On Wednesday, February 16, 2011 at 12:13 PM, Tim W wrote:
> > Wow.. that does say the mesh will hold a request until an app is
> > available, but that is not how Heroku is currently operating..
>
> > I'll write this up a bit better and send it off to Heroku support...
> > but here is how to duplicate...
>
> > ---
> > Created a new apphttp://h12-test2.heroku.com/
> > Simple sinatra app. Responds to two paths / and /wait (/wait sleeps
> > for 300 seconds)
>
> > === h12-test2
> > Web URL:http://h12-test2.heroku.com/
> > Git Repo: g...@heroku.com:h12-test2.git
> > Dynos: 2
> > Workers: 0
> > Repo size: 1M
> > Slug size: 1M
> > Stack: bamboo-ree-1.8.7
> > > Although the symptoms that you are seeing may not indicate it the two systems are the same. There is a queue backlog, and the dynos pick up the next request from that backlog when they become idle, as described here:http://devcenter.heroku.com/articles/key-concepts-performance(espthe part about backlog).
> ...
>
> read more »
Re: H12 errors, blocked dynos and Heroku website claims. Adam Wiggins 2/16/11 7:40 PM
Tim,

You're correct, the routing mesh does not behave in quite the way
described by the docs.  We're working on evolving away from the global
backlog concept in order to provide better support for different
concurrency models, and the docs are no longer accurate.  The current
behavior is not ideal, but we're on our way to a new model which we'll
document fully once it's done.

In the meantime, you shouldn't have any difficulties as long as you
keep your web requests short (less than about 500ms), which is good
practice anyway.

Sorry for any difficulty or confusion, and thanks for digging in and
providing such a detailed analysis.

Adam

Re: H12 errors, blocked dynos and Heroku website claims. Christos Zisopoulos 2/17/11 1:26 AM
On 17 Feb 2011, at 04:40, Adam Wiggins wrote:

> In the meantime, you shouldn't have any difficulties as long as you
> keep your web requests short (less than about 500ms), which is good
> practice anyway.

Nice to know you are working on it. I was bitten by this badly[1], because I assumed the docs where correct although my observing of the logs hinted otherwise.

One scenario where you can't keep your requests under 30s is file upload w/ post processing.

If for example you have  a 2MB image that you need to upload, post-process 4 versions, and then upload to S3, you could be looking at 30-60s per request.

Interface-wise this is acceptable (you can fake a file upload bar), but obviously it will either fail, or leave a dyno blocked but available as far as the routing mesh is concerned.

If you have 5 dynos serving your app really fast (200ms), and 3 get blocked in a file upload operation for a long time (40secs), then New Relic is going to have field day sending you site downtime alerts...

The rack-timeout solution could mitigate the 'false' New Relic alerts, but obviously your file uploads will fail intermittently.

-christos

[1] http://bit.ly/eqK79C


Re: H12 errors, blocked dynos and Heroku website claims. Christos Zisopoulos 2/17/11 6:23 AM
On 17 Feb 2011, at 04:40, Adam Wiggins wrote:

> In the meantime, you shouldn't have any difficulties as long as you
> keep your web requests short (less than about 500ms), which is good
> practice anyway.

Nice to know you are working on it. I was bitten by this badly[1], because I assumed the docs where correct although my observing of the logs hinted otherwise.

One scenario where you can't keep your requests under 30s is file upload w/ post processing.

If for example you have  a 2MB image that you need to upload, post-process 4 versions, and then upload to S3, you could be looking at 30-60s per request.

Interface-wise this is acceptable (you can fake a file upload bar), but obviously it will either fail, or leave a dyno blocked but available as far as the routing mesh is concerned.

If you have 5 dynos serving your app really fast (200ms), and 3 get blocked in a file upload operation for a long time (40secs), then New Relic is going to have field day sending you site downtime alerts...

The rack-timeout solution could mitigate the 'false' New Relic alerts, but obviously your file uploads will fail intermittently.

-christos

[1] http://bit.ly/eqK79C


Re: H12 errors, blocked dynos and Heroku website claims. Mike 2/17/11 7:15 AM
In this situation I would think you'd want to upload to s3 directly and the fire off a background worker to go and do the resizing for you. That would keep your web dynos available to serve your requests.

Mike

> --
> You received this message because you are subscribed to the Google Groups "Heroku" group.
> To post to this group, send email to her...@googlegroups.com.
> To unsubscribe from this group, send email to heroku+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
>

Re: H12 errors, blocked dynos and Heroku website claims. Adam Wiggins 2/17/11 5:16 PM
On Thu, Feb 17, 2011 at 7:15 AM, Michael Abner <mike....@gmail.com> wrote:
> In this situation I would think you'd want to upload to s3 directly and the fire off a background worker to go and do the resizing for you. That would keep your web dynos available to serve your requests.

Exactly right.  Uploads straight to S3 from the browser are fantastic,
you should use them if you're doing any kind of substantial upload
processing.

Moral of the story: never, ever block the web process for long (> 1
second) periods of time - there's always a better way.  Follow this
practice and you and your users will always be happy campers.

I certainly extend my apologies to you, Christos, and anyone else
inconvenienced by the incorrect docs.  I've tweaked the performance
concepts doc to remove some of the more glaring inaccuracies, which
will hopefully tide us over until we can rewrite the whole thing to
cover the broader array of concurrency models we can handle.

Adam