Cursor Related Errors in Task

82 views
Skip to first unread message

Evan Ruff

unread,
Mar 11, 2014, 10:14:49 PM3/11/14
to google-a...@googlegroups.com
Hey guys,

I've been working on a Task that exports a bunch of datastore entries out to GCS and am getting some wacky behavior. Sometimes, the thing fails in <30 seconds with a 202 error in the log that I can't catch in the code. Sometimes, I get an ApiDeadlineException when trying to get the cursor from my iterator after 30 seconds. Other times, I'll get a Java heap space error after three or four cursor pulls. There is nothing in my logs that is out of the ordinary and I'm not keeping a reference to anything anywhere.

What is causing this completely unpredictable behavior? Is something going on with Datastore?

Thanks,

E

Vinny P

unread,
Mar 12, 2014, 3:48:35 AM3/12/14
to google-a...@googlegroups.com
When you say you're getting a 202 error, is that a HTTP status code or the App Engine internal error code that you're receiving? 

Can you also post the stack trace and any exception message connected with the ApiDeadlineException?

  
-----------------
-Vinny P
Technology & Media Advisor
Chicago, IL

App Engine Code Samples: http://www.learntogoogleit.com
 
 

Evan Ruff

unread,
Mar 12, 2014, 7:14:54 AM3/12/14
to google-a...@googlegroups.com
Hey Vinnie,

The 202 is the "exit code" from my process. I've attached the three major exceptions that I've been getting.

1. Mystery202LogMessage - This one seems to happen ~usually on the first run after a new deploy, but can pop up at other times. The entire doPost method has been wrapped in a try/catch block, and nothing is ever caught related to this exception. I'm really unclear as to what's happening. I do know that it does make it pretty far into the process, as it updates some datastore entries.

2. HeapSpaceOnLog - This happens occasionally when writing out a static log message. It seems unlikely that my heap was THIIIIIIIIIIIIIIIIIIIIIS99.999999999% full that the log message would push it over the top, but hey who knows.

3. ApiDeadlineExceededException - I'm trying to get a cursor from the iterator. I have a 27 second window set on all queries. Once it hits that, it gets the cursor (this error), reloads the query and sets the iterator to the itr of the new query. This exception is caught and logged.

Thanks for any insight! (Oh, I'm using the Java SDK)

E
ApiDeadlineExceededException.txt
HeapSpaceOnLog.txt
Mystery202LogMessage.txt

Evan Ruff

unread,
Mar 12, 2014, 7:28:45 AM3/12/14
to google-a...@googlegroups.com
Hey guys,

I also seem to have found a couple more.

1. HeapOnWrite: Java heap space while waiting for outstanding writes in the GCSOutputChannel. My GCSOutputChannel is wrapped by a print writer. Every 100 records (~25k size) I'm calling waitForOutstandingWrites and then printWriter.flush. 

This does not appear to actually be clearing the buffer on subsequent calls. I also don't think I'm anywhere NEAR URLFetch's 32mb limit when this happens.

2. TimeoutException: I thought to myself, "Hey, let's try waiting for write every 10 records." This hit me with the IOException while writing almost instantly. 

The thing that kills me is that the errors aren't consistent operating over the same data set. Anyone have any recommendations?

Appreciate you guys looking! 

Thanks,

E
HeapOnWrite.txt
TimeoutException.txt

Vinny P

unread,
Mar 14, 2014, 3:49:04 AM3/14/14
to google-a...@googlegroups.com
On Wed, Mar 12, 2014 at 6:14 AM, Evan Ruff <evan...@gmail.com> wrote:
The 202 is the "exit code" from my process. I've attached the three major exceptions that I've been getting.

1. Mystery202LogMessage - This one seems to happen ~usually on the first run after a new deploy, but can pop up at other times. The entire doPost method has been wrapped in a try/catch block, and nothing is ever caught related to this exception. I'm really unclear as to what's happening. I do know that it does make it pretty far into the process, as it updates some datastore entries.

2. HeapSpaceOnLog - This happens occasionally when writing out a static log message. It seems unlikely that my heap was THIIIIIIIIIIIIIIIIIIIIIS99.999999999% full that the log message would push it over the top, but hey who knows.

3. ApiDeadlineExceededException - I'm trying to get a cursor from the iterator. I have a 27 second window set on all queries. Once it hits that, it gets the cursor (this error), reloads the query and sets the iterator to the itr of the new query. This exception is caught and logged.

Thanks for any insight! (Oh, I'm using the Java SDK)



Whoa, that's quite a long list!

1. Mystery202LogMessage - Google has never quite documented what the 202 error code means, so it's unclear what's going on. For now, you'll have to treat it as an unavoidable error. The best way to handle it is via task queues - requests that encounter 202 will fail with an error HTTP status code, and you can take advantage of this by configuring the task to retry automatically. If you need to, split the updates over more tasks.

2. HeapSpaceOnLog - It looks like Arrays is trying to acquire too much memory for the copy operation. Can you try batching up the log writing operations? For instance, instead of writing a new log line for each new piece of info (and forcing the underlying infrastructure to allocate memory to holding it, stamping it with the time, etc), add your logging information to a StringBuffer and then print it out occasionally.

3. ApiDeadlineExceededException - How frequently does this occur? If it occurs rarely, then I wouldn't worry about it too much. If it's more frequent than that, you may need to reduce the number of entities retrieved per iteration. How large is each entity and how many properties does it hold?

4. Google Cloud Storage HeapOnWrite & TimeoutException - It's difficult to say what exactly the issue is with Cloud Storage. I often get odd and unexpected issues from Cloud Storage as well. What I would suggest trying is not to stream data to Cloud Storage - build up a few MB of data to write to Cloud Storage, open up a connection, write the data fully, flush the connection, close it out, rinse & repeat. 

Evan Ruff

unread,
Mar 14, 2014, 4:55:52 PM3/14/14
to google-a...@googlegroups.com
Vinny,

Thank you so much for helping me understand these issues. A couple of points:
1. I am operating within a task, so it does get rerun; however, the first task fails silently without causing an exception so I do not get a chance to make the task as failed. This screws up my process.

2. The log message only happens when a new cursor is retrieved, approximately every 30 seconds. No way that operation itself is blowing out the heap.

3. The model object is tiny, a ~10 field POJO.

4. Glad to hear I'm not the only one finding weirdness with Cloud Storage. I'm telling it to waitForOutstandingWrites every 100 records. I would assume this would empty the buffer and release the space. Is that not the case?

Thanks for any advice!

E

Vinny P

unread,
Mar 16, 2014, 2:26:49 PM3/16/14
to google-a...@googlegroups.com
1. The 202 error log should look similar to this: http://imgur.com/fvRV13P - in particular, it should be returning a 500 HTTP status code indicating failure. If yours is failing silently and not returning 500, there's clearly a bug somewhere along the line.

There's no need for a specific exception to spawn; the 500 error will indicate automatically to the task queue infrastructure to retry the task.

2. Yes, that shouldn't be an issue. It's unusual that it's failing like that - I've written much more data to logging than that, and it hasn't failed for me. Perhaps the logging exception isn't the real issue here, and it's masking some App Engine internal issue. Can you try removing the call to logging, or add in a short sleep timer between the cursor access and the log call?  

I'm curious to see if another heap space exception pops even when the logging call is removed - this shouldn't be happening.

3. As I read more and more of this, I feel like the real issue is that you're hitting App Engine internal rate limits and throttling. On the surface, your application design seems very straightforward and correct, but you're hitting odd exceptions that App Engine should easily scale through. If you have a support contract with Google, you can call them up and ask them to relax the throttles - if not, then I would start adding sleep commands at high-throughput parts of your code, and check if that helps.

4. There are some other threads in this forum about GCS that go into more detail, but you've essentially hit the right point - internally, the urlfetch connection to GCS seems to be doing something odd (not clearing the buffers, etc). I have no idea what the actual issue is since it appears to be internal to App Engine. The problem only manifests itself if you're streaming large amounts of data to Cloud Storage though, so it doesn't affect many apps. 

The only way to "fix" it is to allocate new urlfetches periodically, especially if you run into strange errors like yours.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.

Evan Ruff

unread,
Apr 15, 2014, 9:58:01 AM4/15/14
to google-a...@googlegroups.com
Thanks for the very detailed response Vinny. I am attempting to alleviate the issue with mixed results by clearing out the objectify cache every few runs. While this has helped a little bit, I am still seeing all sorts of weirdness with it.

Could you go into more detail about allocating new URLFetches? While I understand I can use the builder to fine tune the options, I don't see how that could be utilized with the GCS implementation?

Thanks,

E


--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/dw3k0pw6AgE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengi...@googlegroups.com.

Vinny P

unread,
Apr 22, 2014, 8:25:58 AM4/22/14
to google-a...@googlegroups.com
Essentially you need to buy time for the data to fully move itself from App Engine servers to Cloud Storage servers. You also need to hint to App Engine to move the data in smaller chunks.

There's a couple of ways to accomplish this. In a previous email you said that you're using the waitForOutstandingWrites method every 100 records. Try using it more frequently, and in combination, decreasing the size of the ByteBuffer being written to GCS. 

If you feel comfortable using the JSON API, what you can also try is writing multiple smaller files, then using compose to combine them into a single object (or just leave them as smaller files, there's nothing wrong with that).
Reply all
Reply to author
Forward
0 new messages