dev_appserver hangs while running tests

115 views
Skip to first unread message

Shaunak Godbole

unread,
May 10, 2017, 1:07:39 PM5/10/17
to Google App Engine
Hello, we are using aetest to write some integration tests as part of our project.  The integration tests spin up an instance of a dev_appserver.py in a BeforeSuite function.  We use APIs to guarantee that the suite executes before we start the test:

func BeforeSuite() {
  params := aetest.InstanceParams{StrongConsistency: true}
  inst := aetest.NewInstance(params)
  req := aetest.NewRequest("Get", "/", nil)
  ctx := appengine.NewContext(req)
}

We have a similar AfterSuite that tears down the instance. 

func AfterSuite() {
  inst.Close()
}

Since we want the tests to be completely hermetic, we delete all the entities that we had created during the tests, usually in an AfterEach method.

func AfterEach() {
  deleteAllEntitiesFromDatastore("ENTITY_1")
  deleteAllEntitiesFromDatastore("ENTITY_2")
  deleteAllEntitiesFromDatastore("ENTITY_3")
  ...
}

And, the we have a lot of test cases that simply run against the created instance and use the context acquired through BeforeSuite.  We have a good mixture of Gets, Puts, Entity Queries, Filter Queries etc.  Since a few days, every-time we run our tests, the tests hang at some point during the execution.  This point changes as we add and remove some of the tests, but the tests always hang if the number of suites are above a certain threshold.  We usually see errors like:

    <string>: Call error 11: Deadline exceeded (timeout)


or like:

Expected success, but got an error:
    <*url.Error | 0xc8210570b0>: {
        Op: "Post",
        URL: "http://localhost:59072",
        Err: {s: "EOF"},
    }
    Post http://localhost:59072: EOF


While the tests are hung, if we try to access the datastore through the ip produced in the tests, the request just hangs similar to how it hangs during the tests.  This makes us suspicious that there is some kind of a deadlock that is getting introduced during our execution, but we are not able to figure out how to debug it.  

Also, the test suites have a timeout after which the tests simply crash.  At this point, the AfterSuite cannot execute so the test-server remains active in the background, and when we try to access datastore through the admin, we can see all the data that was added as part of the test.

For completeness sake, we also use gomocks to abstract out some external services as part of the tests.

Any help on debugging this would be very much appreciated.

Thanks,
Shaunak

Nick (Cloud Platform Support)

unread,
May 10, 2017, 4:13:04 PM5/10/17
to Google App Engine
Hey Shaunak,

You might want to run a set of tests across the parameter space of all the relevant variables:

Vary the total number of tests:

N = number of tests total

Vary the proportion of N taken up by each type of test:

N_a = number of tests of type a
N_b = number of tests of type b
. . . 
N_e = number of tests of type e
. . . 

Vary the parameters of the tests themselves:

a_x = critical parameter x involved in running a test of type a
a_y = critical parameter y involved in running a test of type a
. . .
e_x = critical parameter x involved in running a test of type e
. . . 

You would script a series of runs of test suites defined by their position in "test suite parameter space" (N, N_a, N_b, . . . , N_e, . . . , a_x, a_y, . . . ,  e_x, . . .), programmatically logging whether the test suite completed or crashed, how long it took before it crashed if so, and you could possibly observe correlations between crashing and certain kinds of tests and/or rule out which variables seem to have no impact on whether it crashes or not.

* * *

Beyond that, though, it seems you'd need to either go source-code diving or get the assistance of someone who actually works on the codebase of the go SDK and aetest. You might want to file an issue in the github issue tracker for aetest as soon as you can get more info from a more rigorous test of test suite parameters as described above.

Cheers,

Nick
Cloud Platform Community Support

Shaunak Godbole

unread,
May 11, 2017, 10:32:50 AM5/11/17
to Google App Engine
Thanks Nick, 

I was able to isolate the problem significantly.  It seems like there is a limit on the number of operations that can be read in a given amount of time.  If you run the following program:

package main

import (
"fmt"
"testing"

)

type Entity struct {
  Id int64
}

func TestLoad(t *testing.T) {
        opts := aetest.Options{StronglyConsistentDatastore: true}
        inst, _ := aetest.NewInstance(opts)
defer inst.Close()


for i := 0; i < 10000; i++ {
req, _ := inst.NewRequest("GET", "/", nil)
ctx := appengine.NewContext(req)

k := datastore.NewKey(ctx, "Entity", "", 12345, nil)
var entity Entity
datastore.Get(ctx, k, &entity)
fmt.Println("Iteration Count ", i+1)
ctx.Done()
}
}

This program gets stuck at Iteration count 240.  We suspect that aetest, or appengine go library releases its hold on some resources only after ctx.deadline expires.  This code seems to pass through if we change the context deadline to be 50 ms.  Increasing the deadline still makes the program pass through but after 240th iteration, we see each subsequent step taking roughly the same time as deadline.

We also added code to see the number of goroutines that get added as part of the each of the steps and can see that 2 goroutines are added at every step and the number keeps on increasing until step number 240.  Our guess is that these goroutines are holding up the resources.  Is there any way to determine which goroutines are these?

Thanks,
Shaunak

Nick (Cloud Platform Support)

unread,
May 11, 2017, 6:19:06 PM5/11/17
to Google App Engine
Hey Shaunak,

I've not been able to observe the crash as you saw. For me, it completes in a timely manner. I'm curious to know what version of the SDK you're using now and what are the specs of the machine you're running this on?


Cheers,

Nick
Cloud Platform Community Support

Shaunak Godbole

unread,
May 11, 2017, 6:45:20 PM5/11/17
to Google App Engine
That's interesting.

goapp version prints the following:

go version go1.6.3 (appengine-1.9.48) darwin/amd64



I'm using MacBookPro with the following configuration:
Processor: 2.2 GHz Intel Core i7
Memory 16G
Graphics Intel Iris Pro 1536MB

Thanks!
Shaunak

Nick (Cloud Platform Support)

unread,
May 11, 2017, 7:05:36 PM5/11/17
to Google App Engine
Hey Shaunak,

Interesting indeed! My SDK version is: 

$ goapp version
go version 1.6.3 (appengine-1.9.53) linux/amd64

Downgrading to 1.9.48 didn't cause the issue to appear, either. I'm running an Intel(R) Xeon(R) CPU @ 2.2 GHz, and I only have 7.3G memory!

I'm running the exact code you'd specified above, as well... Is it possible that your CPU and memory are being quite used-up by other processes?


Cheers,

Nick
Cloud Platform Community Support

Shaunak Godbole

unread,
May 11, 2017, 11:51:23 PM5/11/17
to Google App Engine
That is very weird indeed.  

To be fair, the tests suddenly stopped working one day and since then we've been seeing the tests hang every time.

I'm going to try and upgrade my SDK version and see if that helps.  I'm also going to try it on a different machine to see if it works there as well.

Thanks!
Shaunak
Reply all
Reply to author
Forward
0 new messages