Taskqueue Failing with Error Code 123 Before 10 Minute deadline

638 views
Skip to first unread message

Michael Sander

unread,
Jan 19, 2016, 9:01:51 PM1/19/16
to Google App Engine
Hi All,

Over the past few days I've been getting a lot of Error Code 123 in my Taksqueue. I understand that this error is a deadline, but Taskqueues are supposed to live for 10 minutes, and I get this error far before that. It is difficult to debug because these requests have practically no logging information in them. My app normally spits out a lot of logs, so either GAE is dropping my logs or this crash occurs outside of my code.

  1. 2016-01-19 20:36:45.991 /_ah/queue/deferred 500 190547ms 0kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=18
    0.1.0.2 - - [19/Jan/2016:17:36:45 -0800] "POST /_ah/queue/deferred HTTP/1.1" 500 0 "http://docketupdate.appspot.com/_ah/queue/deferred" "AppEngine-Google; (+http://code.google.com/appengine)" "docketupdate.appspot.com" ms=190547 cpu_ms=0 cpm_usd=0.000065 queue_name=default task_name=37491651647795200701 exit_code=123 app_engine_release=1.9.31 instance=00c61b117c4864f5a842a8ae7ca496d20dbaa9f8
  2. E2016-01-19 20:36:45.991
    Process terminated because the request deadline was exceeded. (Error code 123)

Here's another:


  1. 2016-01-19 19:48:58.581 /_ah/queue/deferred 500 84488ms 0kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=18
    0.1.0.2 - - [19/Jan/2016:16:48:58 -0800] "POST /_ah/queue/deferred HTTP/1.1" 500 0 "http://docketupdate.appspot.com/_ah/queue/deferred" "AppEngine-Google; (+http://code.google.com/appengine)" "docketupdate.appspot.com" ms=84489 cpu_ms=0 queue_name=indexsearch task_name=5718499252586827456 exit_code=123 app_engine_release=1.9.31 instance=00c61b117cbebad1360673a661a18cf0e69c7cb0
  2. E2016-01-19 19:48:58.580
    Process terminated because the request deadline was exceeded. (Error code 123)


And another:

  1. 2016-01-19 19:48:58.583 /_ah/queue/deferred 500 105886ms 0kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=18
    0.1.0.2 - - [19/Jan/2016:16:48:58 -0800] "POST /_ah/queue/deferred HTTP/1.1" 500 0 "http://docketupdate.appspot.com/_ah/queue/deferred" "AppEngine-Google; (+http://code.google.com/appengine)" "docketupdate.appspot.com" ms=105887 cpu_ms=0 cpm_usd=0.009467 queue_name=officialdocumentqueue task_name=8648504684653226301 exit_code=123 app_engine_release=1.9.31 instance=00c61b117cbebad1360673a661a18cf0e69c7cb0
  2. E2016-01-19 19:48:58.583
    Process terminated because the request deadline was exceeded. (Error code 123)

Nicholas (Google Cloud Support)

unread,
Jan 20, 2016, 10:27:01 AM1/20/16
to Google App Engine
Thank you for posting your error logs. According to 'Google Cloud Status', there are no know outages for Task Queues or App Engine modules at the moment.

As you've mentioned, the default deadline for a automatic scaled modules is 10 minutes. This is documented in 'Task deadlines'. Please note that the deadline can be as high as 24 hours for manual or basic scaling modules. Also note that the DeadlineExceededException is raised when the task's execution time nears the deadline, thus allowing you to save your work or log something before the deadline is actually exceeded. That being said, more information is required to get a better idea of what might really be the culprit here:
  1. What is the .yaml configuration for the related task-handling modules? (feel free to censor any project-sensitive data)
  2. What is the time frame? (When did this start; is it ongoing; if not, when did it end)
  3. Have you encountered any other timeout or deadline exceeded errors from other modules, including those unrelated to task queues?
  4. How is your queue.yaml configured for the queues experiencing these issues?
  5. Do retries or increased module scalability help reduce the frequency of these errors?
Though the thread 'Suddenly App Engine not Serving Anymore: Error Code 123' seems to describe similar errors, the scope of these errors and what systems are causing them is unclear in said thread so I would advise to continue here going forward.

Christian F. Howes

unread,
Jan 20, 2016, 2:19:12 PM1/20/16
to Google App Engine
is there any chance that deferred tasks are somehow treated differently then regularly queued tasks?  in theory i would think they are just tasks with a special handler, but could that be why you are hitting an incorrect deadline?

Michael Sander

unread,
Jan 20, 2016, 2:19:33 PM1/20/16
to Google App Engine
Hi Nicholas,

Appreciate you jumping in. I purchased a Silver support package and that team has said they are on it (but no answer so far).

This is happening in my default module, here is the relevant section from app.yaml:
runtime: python27
threadsafe: yes
api_version: 1

instance_class: F4
automatic_scaling:
  min_idle_instances: 2
  max_idle_instances: 15
  max_pending_latency: 150ms

includes:
- libs/mapreduce/include.yaml

builtins:
- remote_api: on
- admin_redirect: on

inbound_services:
- warmup
- mail_bounce
- mail

2. I started noticing this on Jan. 18 at 7:30PM ET. At that time, it was a very big problem, I was getting thousands of these errors and my sight was unresponsive. Since then, I have (1) lowered the rate of my taskqueues, (2) lowered the rate of cron jobs, (3) increased min_idle_instances from 1 to 2. The problem has decreased, but has not gone away.

3. I'm not sure if this is specific to Taskqueues. Below are two examples of a user requests failing with the same error message. Notice that the request time is 100 and 87 seconds respectively, so these would have hit the timeout. However, what is suspicious is that there are no logs printed other than the error message. Both of these requests should print out quite a bit of logging information.
  1. 2016-01-20 13:47:52.869 /get_creditcard.ajax?_=1453315570813 500 100831ms 0kb Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36 module=default version=18
    209.155.152.66 - - [20/Jan/2016:10:47:52 -0800] "GET /get_creditcard.ajax?_=1453315570813 HTTP/1.1" 500 0 - "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36" "www.docketalarm.com" ms=100832 cpu_ms=0 cpm_usd=0.000074 exit_code=123 app_engine_release=1.9.31 instance=00c61b117ca4e745b9e7703d6d0cad31c835274c
  2. E2016-01-20 13:47:52.869
  1. Process terminated because the request deadline was exceeded. (Error code 123)
  1. 2016-01-20 13:03:02.143 /cases/TTAB/91119978/KELLOGG_COMPANY_v._PACIFIC_GRAIN_PRODUCTS_INC./docs/16.pdf?download=true 500 87681ms 0kb Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2 module=default version=18
  2. E2016-01-20 13:03:02.143 Process terminated because the request deadline was exceeded. (Error code 123)

4. The issues mostly arise on queues that are running relatively quickly (> 1.0/sec)

5. Yes, increasing min_idle_instances helped, but did not eliminate the issue. See #2 above.

Thanks,

Michael Sander

unread,
Jan 20, 2016, 2:34:43 PM1/20/16
to Google App Engine
Hi Christian - Good thought. I took a look at the code in C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\deferred and nothing jumped out at me as being different or special. The only major added complexity is the introduction of pickle in the deferred library. But this does not seem like a pickle issue.

Kaan Soral

unread,
Jan 21, 2016, 10:27:21 AM1/21/16
to Google App Engine
Apart from the main issue, deferred stores stuff in db if the payload overloads the taskqueue limit, so this process adds a bit of an overhead, but I would guess it's no more than 10 seconds (on the taskqueue side, it's db_pull + picker_deciphering), I haven't tested the overhead myself

I personally limit myself to 8-9 minutes to be on the safe side

Nicholas (Google Cloud Support)

unread,
Jan 22, 2016, 3:36:12 PM1/22/16
to Google App Engine
Judging by the troubleshooting done on your support case, this errors seem to be the cost of dealing with timeouts with systems handling requests concurrently.

When an instance is unresponsive and a request left unanswered, the instance is killed and the request is served a 500 deadline exceeded response. When said instance was assigned multiple requests which is likely in concurrent systems, all requests are sent this same response. An effective way to cut down on these responses might be to allow the module to scale more, select higher tier specs for said module (reducing operation time), and perhaps optimizing the task at hand.

If none of these can be implemented or prove effective, perhaps the computational task being performing should be broken down into smaller tasks.
Reply all
Reply to author
Forward
0 new messages