Put_Async Use It

234 views
Skip to first unread message

Brandon Wirtz

unread,
Jan 18, 2012, 8:26:58 PM1/18/12
to google-a...@googlegroups.com

Crap. I’ll have to let it bake to be sure, but there is no reason to believe results shouldn’t be as indicated.

 

Put_async just reduced my instance count by 40%.  (I’m still working to get to the point that I pay for storage and bandwidth + my $2 a week)

 

I had mis-understood what it did, and was thinking it was like a Multi-Put and that it would take 20 puts and do them all at the same time.  Not what it actually does which is let your app keep going instead of waiting for the write to happen.

 

 

Brandon Wirtz
BlackWaterOps: President / Lead Mercenary

Description: http://www.linkedin.com/img/signature/bg_slate_385x42.jpg

Work: 510-992-6548
Toll Free: 866-400-4536

IM: dra...@gmail.com (Google Talk)
Skype: drakegreene

BlackWater Ops

 

 

 

 

image003.jpg

Brandon Wirtz

unread,
Jan 18, 2012, 11:35:47 PM1/18/12
to google-a...@googlegroups.com

3 hours before async puts, and 3 hours after, at the same load.  That was the only change.  This may have lowered the baseline enough that I will have to up the load for testing serialized data operations.

 

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

image001.png
image002.jpg

Richard Watson

unread,
Jan 19, 2012, 2:44:44 AM1/19/12
to google-a...@googlegroups.com
Just a note - Brandon, thanks for posting your findings, in general. Very useful.

Brandon Wirtz

unread,
Jan 19, 2012, 3:28:36 AM1/19/12
to google-a...@googlegroups.com

No problem.  Most the time I am talking to myself J if I put my notes in the forum It is easy to find. But it is really helpful that people like Brian chime in with bits of how things work in the black box behind the code.

 

I try to share what I can, I know I do a lot more testing than most people, and I always hope that if I share my results it will inspire other people to share theirs.

 

All in today we changed 8 lines of code in the final set of changes. We test 28 ways of doing things which generated 380 incremental saves (that doesn’t count the version which wouldn’t run because of syntax errors.

 

We now have a new Database Schema…. So my next challenge is how to blow away the old data on the cheap.  So I’m working on some code that does spare cycle deletes over time.  You still have to pay for the transactions but you wouldn’t have to pay for the instance hours.

 

-Brandon

http://www.cdninabox.com

 

Brandon Wirtz
BlackWaterOps: President / Lead Mercenary

Description: http://www.linkedin.com/img/signature/bg_slate_385x42.jpg

Work: 510-992-6548
Toll Free: 866-400-4536

IM: dra...@gmail.com (Google Talk)
Skype: drakegreene

BlackWater Ops

 

 

 

 

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/jCtpIj3-NOkJ.

image001.jpg

André Pankraz

unread,
Jan 19, 2012, 3:47:12 AM1/19/12
to google-a...@googlegroups.com
Never used it but sounds interesting.
How works this async feature in combination with the unreliable writes? I often (In comparision to RDBMS) have failed writes and have to repeat them.
How does this work here? Have you already found some best practices?

Brandon Wirtz

unread,
Jan 19, 2012, 4:27:58 AM1/19/12
to google-a...@googlegroups.com

My understanding is that with HRD you won’t have failed writes.  Or no more so than you would have otherwise.  And if you like the last line of your code can be the check to see if the Write completed.

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/3EM1Fw7ApPQJ.

Rohan Chandiramani

unread,
Jan 19, 2012, 4:54:34 AM1/19/12
to google-a...@googlegroups.com
How would you check to see if the write was completed, with async there is no guarantee when the write actually takes place? 

André Pankraz

unread,
Jan 19, 2012, 5:13:20 AM1/19/12
to google-a...@googlegroups.com
You always seem to be in a special GAE wonderland zone ;)

All operations can fail and...HDR or not...they fail sometimes. If a RDBMS would fail as often as HDR on writes I would throw it away.
So my first question for such fire-and-forget async methods is: How do I handle errors: even in RDBMS environment.

Brandon Wirtz

unread,
Jan 19, 2012, 5:13:02 AM1/19/12
to google-a...@googlegroups.com

http://code.google.com/appengine/docs/python/datastore/functions.html

 

Get Result will let you know if your Put has finished.  That way you can do other things while the put happens.  (or you can do what I do, and assume HRD won’t fail.

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/Q2tz2IqrL-QJ.

Brandon Wirtz

unread,
Jan 19, 2012, 5:47:13 AM1/19/12
to google-a...@googlegroups.com

I should re-phrase.  What do you do if HRD fails now?    The best you can do is log an error and hope for the best.  Async doesn’t have to be fire and forget, it can be fire and check back as the last line of your code.

 

You can actually use this as a way to make your apps more reliable.   The last line of your code could be “If write failed URL FETCH AS POST DataToWrite  so that you could try the write again.  Or you could add a task to fire later to add the data. Or you could log it.

 

All of these would be better than Doing a write and having it fail syncrously (especially since if you are like my devs you don’t wrap your write in a Try.

 

 

I don’t live in GAE Wonderland.  I just spend time with my head in the Clouds, and Gae’s tend to rain on me the least.

 

 

From: google-a...@googlegroups.com [mailto:google-a...@googlegroups.com] On Behalf Of André Pankraz


Sent: Thursday, January 19, 2012 2:13 AM
To: google-a...@googlegroups.com

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/oIBLy58JKOUJ.

Jeff Schnitzer

unread,
Jan 19, 2012, 10:47:08 AM1/19/12
to google-a...@googlegroups.com
On Thu, Jan 19, 2012 at 5:13 AM, André Pankraz
<andrep...@googlemail.com> wrote:
> So my first question for such fire-and-forget async methods is: How do I
> handle errors: even in RDBMS environment.

Keep in mind that async operations are not really "fire and forget".
They're "fire and forget until the end of your request". Pretend
there is a wrapper around your request method that completes any
pending async operations immediately after you return - there is no
disadvantage to completing them yourself (checking for error) yourself
at the end.

Jeff

Robert Kluin

unread,
Jan 19, 2012, 12:20:34 PM1/19/12
to google-a...@googlegroups.com
On Thu, Jan 19, 2012 at 05:13, Brandon Wirtz <dra...@digerat.com> wrote:
> http://code.google.com/appengine/docs/python/datastore/functions.html
>
>
>
> Get Result will let you know if your Put has finished.  That way you can do
> other things while the put happens.  (or you can do what I do, and assume
> HRD won’t fail.

I see periodic runs of failures on HRD apps from time to time. This
can range from temporary over per-minute quota errors (if your app is
actually doing much traffic) to other types of timeouts (you're not
only waiting on the datastore to write).

The advantage to async is what you've said here, it lets you do other
stuff rather than waiting for the write. Note, however, as Jeff said,
the request will wait for all RPCs to return before it actually
completes. It doesn't just start processing the next request.

Brandon Wirtz

unread,
Jan 23, 2012, 1:52:58 AM1/23/12
to google-a...@googlegroups.com
After running this in production, it turns out that on Python 2.7 with
thread safe enabled this small change cut our Instance hours by 64%. YMMV
but for 3 lines of code that is HUGE.

I think this comes from the fact that you can write the output to the user,
and the process does other things while waiting for the commits to the
datastore, rather than waiting for the commit, THEN writing to the user,
blocking other processes the whole time. So I think we gained concurrency
equal to the amount of time we were spending on waiting for the data store.

This was the single best 3 lines of code we ever changed. (well maybe other
than webapp2 so we could run thread safe).

Anand Mistry

unread,
Jan 23, 2012, 5:02:51 AM1/23/12
to google-a...@googlegroups.com
Hm. This sounds a little counter intuitive. If you have threadsafe enabled, the instance is able to process another request while a synchronous datastore operation is in effect. The async ops should be most helpful when you have threadsafe disabled. It sounds like there might be something else going on. Maybe you're gaining by avoiding some badness with the GIL, or maybe you're taking advantage of temporal locality with respect to L1/L2 caches, although I doubt the effect would be that big. Or maybe I'm completely missing something.

Robert Kluin

unread,
Jan 26, 2012, 2:23:19 AM1/26/12
to google-a...@googlegroups.com
Unless something has significantly changed with the 2.7 runtime, your
output isn't sent to the user until your request handling code has
fully executed and returned. Hopefully Anand can verify this, but I
believe all RPCs are flushed at the end of the request *before*
anything is returned.


Robert

Brandon Wirtz

unread,
Jan 26, 2012, 4:33:02 AM1/26/12
to google-a...@googlegroups.com
Yes, you are right, but you aren't consuming cycles in the meantime. This
works well if your datastore write is anything other than the last line of
your code.

James X Nelson

unread,
Jan 31, 2012, 7:43:33 PM1/31/12
to google-a...@googlegroups.com
Async everything will always save you instance hours.  It doesn't matter if you are threadsafe or not, the async operations allow you to perform a ds / memcahce / url fetch in a background thread, which you can check on whenever you want.

Async is extremely useful if you use it to perform "datastore streaming"; if you have to operate on thousands of entities, you can cut your runtime from "all of the time to create and persist all of the entities" to "all of the time to create all of the entities, and little to no time waiting for ds operations to complete".  I use a helper object to stream all my writes, deletes and reads in async batches, with custom page sizing.  The helper holds the entities in memory, async puts() them in batches when the page size is reached, and holds the entities in memory until their put() succeeds, so it can retry anything that fails.  Another easy way to retry a future, at least in Java, is to create a subclass of Future<Entity> that takes the Entity as a param, wraps the appengine Future<Key>, and automatically retries in transient errors.  For retry, I use synchronous rather than async again, but you can do whatever you like.

Using async everywhere, with threadsafe, got our app down from $30/day for frontend instance hours to ~$0/day for frontend instance hours.  We generally have 4-6 live instances, but only use approximately 1 instance hour / hour.
The trick is that all your api operations can happen in the background, so your total processing time is near or equal to your "userland" processing only.  

PS - Anyone that doesn't actually call .get() on their futures to finalize your async operations could lose data, especially if the last operation in your method is an async put.  Async operations started by your current processing thread die when the thread returns {in production}, so make sure you call .get() on the returned future. 

Jeff Schnitzer

unread,
Jan 31, 2012, 7:48:50 PM1/31/12
to google-a...@googlegroups.com
On Tue, Jan 31, 2012 at 7:43 PM, James X Nelson
<jamie....@promevo.com> wrote:
>
> PS - Anyone that doesn't actually call .get() on their futures to finalize
> your async operations could lose data, especially if the last operation in
> your method is an async put.  Async operations started by your current
> processing thread die when the thread returns {in production}, so make sure
> you call .get() on the returned future.

This would shock me. It doesn't work like that in GAE/Java.

Jeff

Robert Kluin

unread,
Feb 1, 2012, 2:09:34 AM2/1/12
to google-a...@googlegroups.com
There is a good chance the gains you saw are due to improved batching.
The other gains are probably from async, ie you being able to prepare
the data for your next put or fetch more data rather than waiting
around. If you're using Java, could also be multithreading gains.

As Jeff said, all RPCs are flushed when your request handler returns.

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/google-appengine/-/hK5rhcibQ1oJ.

Brandon Wirtz

unread,
Feb 1, 2012, 1:25:35 PM2/1/12
to google-a...@googlegroups.com
I got deferred working. Missed that you have to enable it as a built in,
which I chalk up as I shouldn't code when people are out of office and I'm
tired.

The result seems to be pretty awesome. Not waiting for writes speeds us up,
and pacing the tasks evens out the peaks and valleys of our instance use.

I don't think we will save actual CPU hours, but by filling the valleys and
delaying the peaks we should use fewer instance hours.

The biggest thing I learned... Don't walk away when debugging 5 QPS spawned
1 task per second, which when it threw and error resulted in more tasks
piling up and more instances... it would be expensive if I had just wandered
off.

Brandon Wirtz
BlackWaterOps: President / Lead Mercenary

Work: 510-992-6548


Toll Free: 866-400-4536
IM: dra...@gmail.com (Google Talk)
Skype: drakegreene

YouTube: BlackWaterOpsDotCom

BlackWater Ops

Cloud On A String Mastermind Group

-----Original Message-----
From: google-a...@googlegroups.com
[mailto:google-a...@googlegroups.com] On Behalf Of Robert Kluin
Sent: Tuesday, January 31, 2012 11:10 PM
To: google-a...@googlegroups.com
Subject: Re: [google-appengine] Re: Put_Async Use It

Andrin von Rechenberg

unread,
Feb 1, 2012, 2:48:15 PM2/1/12
to google-a...@googlegroups.com
One thing I dont understand is, if one runs python2.7 with thread safe enabled,
why would this reduce instance hours? Wouldn't the instance just handle
other tasks until the non-async rpc is complete? Or is a sync rpc blocking
the instance from handling another request (would surprise me)?
I'm assuming one has a lot of requests.

(Left aside the RAM & Latency optimization one would get)

Cheers,
-Andrin

Brandon Wirtz

unread,
Feb 1, 2012, 7:20:14 PM2/1/12
to google-a...@googlegroups.com

My solution works for non-Critical writes, it isn’t for everyone.

 

Concurrency is not infinite. And ASync is not perfectly non-blocking. Because Tasks are throttled you are pacing part of your total “work to be done”.

You would be surprise (I Know I was) an how bursty usage is at a microscale.  (check the archive for the discussions about the scheduler).

 

Standard method:

Do lots of stuff,

Write to data store

                Wait for write.

                Poll for write

                Poll for Write

                Poll for write

                Poll for Write

                Poll for write

                Poll for Write

                Received write confirmation

Do lots of stuff

Flush Buffer/release request

 

Async Way:

Do lots of stuff,

Write to data store (async)

Do lots of stuff

                Poll for write

                Poll for Write

                Received write confirmation

Flush Buffer/release request

 

 

Differred way:

                Thread1-

Do lots of stuff,

Spawn thread to Write to data store

Do lots of stuff

Flush Buffer/release request

 

                Thread 2- (runs when we get around to it.)

Write to data store (async)

                                Poll for Write

                                Poll for write

                                Poll for Write

                                Poll for write

                                Poll for Write

                                Received write confirmation

 

While you have only delayed the Polling you still delayed some of the process. And you have let things exit the scheduler.

By letting things exit the scheduler you never get to the Walmart, “We open a new line anytime there are more than 4 shoppers in line” guarantee.  So you have less instance hours, by smoothing your CPU Hours. Out over time.

 

 

 

 

 

Brandon Wirtz
BlackWaterOps: President / Lead Mercenary

Description: http://www.linkedin.com/img/signature/bg_slate_385x42.jpg

image003.jpg

Brandon Wirtz

unread,
Feb 1, 2012, 7:35:36 PM2/1/12
to google-a...@googlegroups.com

PS

I’d include graphs, but my DashBoard is flaky today. And not the flaky golden that a good pot pie is. Flaky in a bad way like when you have an allergic reaction to hotel shampoo and you have horrible scalp itch and dandruff during a big sales pitch.

image001.jpg
image002.jpg

Brandon Wirtz

unread,
Feb 1, 2012, 7:48:27 PM2/1/12
to google-a...@googlegroups.com

Here is the request + the Defer.   You can see that the Defer used 94 MS of CPU which would have had to been used immediately instead of deferred. Also we took 71 ms that would have been attached to the End user’s request.   Again you can only use this for non-Critical rights.

 

We have further optimizations we can make because Defer Pickles and un-pickles. So the thread about should I serialize my datastore… The CPU cycles that pickling eats just became free-ish.

 

 

1.      

1.     2012-02-01 16:36:06.662 /_ah/queue/deferred 200 71ms 0kb AppEngine-Google; (+http://code.google.com/appengine)

0.1.0.2 - - [01/Feb/2012:16:36:06 -0800] "POST /_ah/queue/deferred HTTP/1.1" 200 84 "http://www.xyhd.tv/2009/11/random-news/celebrity-news/peyton-manning-divorce/" "AppEngine-Google; (+http://code.google.com/appengine)" "www.xyhd.tv" ms=71 cpu_ms=94 api_cpu_ms=71 cpm_usd=0.001211 queue_name=default task_name=14572598243257364761 instance=00c61b117c9245c6ca10f9b5dc8ea5308c65e4

2.     I2012-02-01 16:36:06.594

X-Appengine-Taskretrycount:0, X-Appengine-Default-Namespace:xyhd.tv, X-Appengine-Queuename:default, X-Appengine-Taskname:14572598243257364761, X-Appengine-Current-Namespace:, X-Appengine-Tasketa:1328142966.581126, X-Appengine-Country:ZZ

2.      

1.     2012-02-01 16:36:06.611 /2009/11/random-news/celebrity-news/peyton-manning-divorce/200 513ms 157kb Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)

76.240.16.120 - - [01/Feb/2012:16:36:06 -0800] "GET /2009/11/random-news/celebrity-news/peyton-manning-divorce/ HTTP/1.1" 200 157966 "http://www.bing.com/search?q=Peyton+Manning\'s+Wife+Divorce&FORM=R5FD3" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)" "www.xyhd.tv" ms=513 cpu_ms=340 api_cpu_ms=153 cpm_usd=0.057038 instance=00c61b117c9245c6ca10f9b5dc8ea5308c65e4

image004.jpg
image002.jpg

Robert Kluin

unread,
Feb 2, 2012, 12:46:36 AM2/2/12
to google-a...@googlegroups.com
I'd suggest not explaining deferred as a thread -- it isn't.  It creates a task, which could well hit another instance.

Also, since deferred is a task, it will actually add a small amount of overhead.  However, it pays off since you get it out of the user-request.  I use this method all the time, including for important writes.  You just need to be sure and handle failures if the write is critical.

There is another very nice thing about this technique, it will make your apps performance feel for consistent.  If there is a datastore latency spike your users might not even notice, since the write will be pushed off to a backend task.  On M/S apps I frequently used this technique to weather latency spikes.



Robert
image002.jpg
image004.jpg

Brandon Wirtz

unread,
Feb 2, 2012, 1:32:00 AM2/2/12
to google-a...@googlegroups.com

Mostly agreed on the Thread, the explanation was really about how it can reduce instance count… I do better speaking than writing in text, and switching to analogies is hard. 

image001.jpg
image002.jpg
Reply all
Reply to author
Forward
0 new messages