Cloud endpoints are running slow in App Engine Standard Environment

1,042 views
Skip to first unread message

Azeem Haider

unread,
Jun 11, 2018, 6:45:50 AM6/11/18
to Google App Engine
I'm using App Engine Standard Environment with Google Cloud Endpoints v2 for java.
Using objectify for DataStore.

I just make a simple call but it takes a lot more time to get the results. There are very little users those are using my Endpoints, It's in the testing phase.

EndPoint Function:

public User getUser(@Named("userId") Long userId){
return ofy().load().key(Key.create(User.class, userId)).now();
}


It Just a very simple function no authentication call etc. Just Put the userId and get the result. There are only a few users almost 5.

When I call from API Explorer. Same Endpoint function running, again and again. Execute without OAuth

First Call: ---- time to execute: 8652 ms ---- May be delay due to start new instance because App Engine shut down instance when there is no traffic
Second Call: ---- time to execute: 7625 ms
Third Call ---- time to execute: 4665 ms
Fourth Call ----- time to execute: 2873 ms

After this rest of all call are between in 4000 ms - 2000 ms Only few time again it goes around to 7000 ms.


But When I call it from Android using Client Library. It takes too much time to get a result even more than API Explorer.

Activity Code: ( Using AsyncTask )

doInBackground(Long... params){

Zeem.Builder builder = new Zeem.Builder(AndroidHttp.newCompatibleTransport(), new JacksonFactory(), null);
Zeem zeem = builder.build();

zeem.userApi.getUser(5720147234914304).execute(); // 5720147234914304 is userId

}


First time when I run this is it takes more time I think some kind of SSL.

Here is the Log When I first time Run.


D/libc-netbsd: [getaddrinfo]: hostname=my-project-id.appspot.com; servname=(null); cache_mode=(null), netid=0; mark=0
[getaddrinfo]: ai_addrlen=0; ai_canonname=(null); ai_flags=4; ai_family=0
D/NativeCrypto: ssl=0xae85ba00 NativeCrypto_SSL_do_handshake fd=0xaef31480 shc=0xaef31484 timeout_millis=20000 client_mode=1 npn=0x0
doing handshake ++
D/NativeCrypto: ssl=0xae85ba00 info_callback where=0x10 ret=1
ssl=0xae85ba00 handshake start in UNKWN before/connect initialization
ssl=0xae85ba00 info_callback calling handshakeCompleted
ssl=0xae85ba00 info_callback completed
D/NativeCrypto: ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:UNKWN before/connect initialization
ssl=0xae85ba00 info_callback ignored
D/NativeCrypto: ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:23WCHA SSLv2/v3 write client hello A
ssl=0xae85ba00 info_callback ignored
D/NativeCrypto: ssl=0xae85ba00 info_callback where=0x1002 ret=-1
D/NativeCrypto: ssl=0xae85ba00 SSL_connect:error exit in 23RSHA SSLv2/v3 read server hello A
ssl=0xae85ba00 info_callback ignored
doing handshake -- ret=-1
ssl=0xae85ba00 NativeCrypto_SSL_do_handshake ret=-1 errno=11 sslError=2 timeout_millis=20000
D/NativeCrypto: doing handshake ++
ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:3RSH_A SSLv3 read server hello A
ssl=0xae85ba00 info_callback ignored
D/NativeCrypto: ssl=0xae85ba00 info_callback where=0x1002 ret=-1
ssl=0xae85ba00 SSL_connect:error exit in 3RSC_A SSLv3 read server certificate A
ssl=0xae85ba00 info_callback ignored
ssl=0xae85ba00 info_callback where=0x1002 ret=-1
ssl=0xae85ba00 SSL_connect:error exit in 3RSC_A SSLv3 read server certificate A
ssl=0xae85ba00 info_callback ignored
doing handshake -- ret=-1
ssl=0xae85ba00 NativeCrypto_SSL_do_handshake ret=-1 errno=11 sslError=2 timeout_millis=20000
D/NativeCrypto: doing handshake ++
E/NativeCrypto: ssl=0xae85ba00 cert_verify_callback x509_store_ctx=0xaef312c8 arg=0x0
ssl=0xae85ba00 cert_verify_callback calling verifyCertificateChain authMethod=ECDHE_RSA
D/NativeCrypto: ssl=0xae85ba00 cert_verify_callback => 1
ssl=0xae85ba00 info_callback where=0x1001 ret=1
D/NativeCrypto: ssl=0xae85ba00 SSL_connect:3RSC_A SSLv3 read server certificate A
ssl=0xae85ba00 info_callback ignored
D/NativeCrypto: ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:3RSKEA SSLv3 read server key exchange A
ssl=0xae85ba00 info_callback ignored
ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:3RSD_A SSLv3 read server done A
ssl=0xae85ba00 info_callback ignored
D/NativeCrypto: ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:3WCKEA SSLv3 write client key exchange A
ssl=0xae85ba00 info_callback ignored
D/NativeCrypto: ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:3WCCSA SSLv3 write change cipher spec A
ssl=0xae85ba00 info_callback ignored
ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:3WFINA SSLv3 write finished A
ssl=0xae85ba00 info_callback ignored
ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:3FLUSH SSLv3 flush data
ssl=0xae85ba00 info_callback ignored
D/NativeCrypto: ssl=0xae85ba00 info_callback where=0x1002 ret=-1
ssl=0xae85ba00 SSL_connect:error exit in UNKWN SSLv3 read server session ticket A
ssl=0xae85ba00 info_callback ignored
D/NativeCrypto: doing handshake -- ret=-1
ssl=0xae85ba00 NativeCrypto_SSL_do_handshake ret=-1 errno=11 sslError=2 timeout_millis=20000
D/NativeCrypto: doing handshake ++
ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:UNKWN SSLv3 read server session ticket A
ssl=0xae85ba00 info_callback ignored
D/NativeCrypto: ssl=0xae85ba00 info_callback where=0x1001 ret=1
ssl=0xae85ba00 SSL_connect:3RFINA SSLv3 read finished A
ssl=0xae85ba00 info_callback ignored
ssl=0xae85ba00 info_callback where=0x20 ret=1
ssl=0xae85ba00 handshake done in SSLOK SSL negotiation finished successfully
ssl=0xae85ba00 info_callback calling handshakeCompleted
ssl=0xae85ba00 info_callback completed
ssl=0xae85ba00 info_callback where=0x1002 ret=1
ssl=0xae85ba00 SSL_connect:ok exit in SSLOK SSL negotiation finished successfully
ssl=0xae85ba00 info_callback ignored
doing handshake -- ret=1
ssl=0xae85ba00 NativeCrypto_SSL_get_certificate => NULL
I/System.out: gba_cipher_suite:TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
I/System.out: [OkHttp] sendRequest>>
I/System.out: [OkHttp] sendRequest<<
D/NativeCrypto: ssl=0xae85ba00 sslWrite buf=0xae8ce800 len=230 write_timeout_millis=0
D/NativeCrypto: ssl=0xae85ba00 sslRead buf=0xae8ce800 len=2048,timeo=20000
D/NativeCrypto: ssl=0xae85ba00 sslRead buf=0xae8ce800 len=2048,timeo=20000
D/NativeCrypto: ssl=0xae85ba00 sslRead buf=0xae8ce800 len=2048,timeo=20000


After this whenever I run it a little bit fast but not fast enough, it takes around 6-7 sec sometimes more and sometimes less, you can say average 5 sec. Which is not good.

D/OpenSSLLib: OpensslErr:Module:13(114:155); file:external/openssl/crypto/asn1/asn1_lib.c ;Line:145;Function:ASN1_get_object
D/OpenSSLLib: OpensslErr:Module:13(114:155); file:external/openssl/crypto/asn1/asn1_lib.c ;Line:145;Function:ASN1_get_object
V/ActivityThread: Performing resume of ActivityRecord{36eaf480 token=android.os.BinderProxy@13894ab9 {com.azeem.socialtest/com.azeem.socialtest.Comment.CommentActivity}}
D/ActivityThread: ACT-AM_ON_RESUME_CALLED ActivityRecord{36eaf480 token=android.os.BinderProxy@13894ab9 {com.azeem.socialtest/com.azeem.socialtest.Comment.CommentActivity}}
V/ActivityThread: Resume ActivityRecord{36eaf480 token=android.os.BinderProxy@13894ab9 {com.azeem.socialtest/com.azeem.socialtest.Comment.CommentActivity}} started activity: false, hideForNow: false, finished: false
V/PhoneWindow: DecorView setVisiblity: visibility = 4 ,Parent =null, this =com.android.internal.policy.impl.PhoneWindow$DecorView{1d09a97b I.E..... R.....I. 0,0-0,0}
D/Surface: Surface::setBuffersDimensions(this=0xb45af700,w=720,h=1184)
I/System.out: [CDS]rx timeout:20000
I/System.out: [CDS]rx timeout:20000
[OkHttp] sendRequest>>
[OkHttp] sendRequest<<
D/NativeCrypto: ssl=0xae85ba00 sslWrite buf=0xae8d5800 len=289 write_timeout_millis=0
D/NativeCrypto: ssl=0xae85ba00 sslRead buf=0xae8d5800 len=2048,timeo=20000


Here is the trace from Cloud Console.

Please let me know how can I make it fast. I seen youtube, snapchat are much faster, as I know they are also in Google Cloud Platform. And I notice instagram is much much faster
then these it get result within sec or two. I think Instagram is using Amazon web services. Is Amazon web service is faster then Google cloud Platform ?

OR

Is there any problem in my code ? or I'm doing something wrong ? Please let me know about these, how can I improve the speed.

THANKS.

Ani Hatzis

unread,
Jun 11, 2018, 8:27:04 AM6/11/18
to Google App Engine
Hi,

in your app's request_log (under Stackdriver Logging), do the requests reflect the high latency in the field protoPayload.latency? Or is the latency issue isolated to Endpoints itself (those are logged to request_latency_in_ms in Endpoints logs)? The latter includes the time taken by Endpoints and the backend (your app) together.

And what type of scaling is defined in app.yaml? Maybe you are affected by one of the known issues. From the Cloud Endpoints documentation for standard environment:

Automatic scaling instances do not support background threads, which are used by the Endpoints Management Framework. The Framework will still function, but with potentially reduced performance.

If you use automatic scaling, maybe you could try if the performance improves with basic or manual scaling. I have used Endpoints with automatic scaling before and haven't noticed such latency issues as in your case (Python standard environment though).

Best regards,
Ani

Azeem Haider

unread,
Jun 12, 2018, 3:37:18 AM6/12/18
to Google App Engine
Here are the logs.

103.255.4.81 - - [11/Jun/2018:14:49:06 +0500] "GET /_ah/api/zeem/v1/user/5720147234914304 HTTP/1.1" 200 258 - "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36" "my-test-project-186021.appspot.com" ms=NaN cpu_ms=98 cpm_usd=2.8833e-8 loading_request=0 instance=00c61b117c672dd70d807801e2761bedb84eefcdab2d85b02a4728040eb34228db9ab117119678 app_engine_release=1.9.54 trace_id=48ef199b96bcbd5bc6ef5ff657769b8f
{
httpRequest: {…} 
insertId: "5b1e4593000b51253a6b6efb" 
labels: {…} 
logName: "projects/my-test-project-186021/logs/appengine.googleapis.com%2Frequest_log" 
operation: {…} 
protoPayload: {
appEngineRelease: "1.9.54" 
appId: "j~my-test-project-186021" 
cost: 2.8833e-8 
endTime: "2018-06-11T09:49:07.680316Z" 
finished: true 
first: true 
httpVersion: "HTTP/1.1" 
instanceId: "00c61b117c672dd70d807801e2761bedb84eefcdab2d85b02a4728040eb34228db9ab117119678" 
instanceIndex: -1 
ip: "103.255.4.81" 
latency: "1.663176s" 
megaCycles: "98" 
method: "GET" 
requestId: "5b1e459200ff00ff42f46775c12a00016a7e6d792d746573742d70726f6a6563742d3138363032310001323031383036303674313630303530000100" 
resource: "/_ah/api/zeem/v1/user/5720147234914304" 
responseSize: "258" 
startTime: "2018-06-11T09:49:06.017140Z" 
status: 200 
traceId: "48ef199b96bcbd5bc6ef5ff657769b8f" 
urlMapEntry: "unused" 
userAgent: "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36" 
versionId: "20180606t160050" 
}
receiveTimestamp: "2018-06-11T09:49:07.796031561Z" 
resource: {…} 
timestamp: "2018-06-11T09:49:06.017140Z" 
trace: "projects/my-test-project-186021/traces/48ef199b96bcbd5bc6ef5ff657769b8f" 
}

Here is log of most used URI.

103.255.4.81 - - [11/Jun/2018:14:56:00 +0500] "GET /_ah/api/zeem/v1/postfeed/5720147234914304/PRIVATE HTTP/1.1" 200 843 - "Google-API-Java-Client Google-HTTP-Java-Client/1.23.0 (gzip)" "my-test-project-186021.appspot.com" ms=NaN cpu_ms=806 cpm_usd=9.4212e-8 loading_request=0 instance=00c61b117c672dd70d807801e2761bedb84eefcdab2d85b02a4728040eb34228db9ab117119678 app_engine_release=1.9.54 trace_id=a49da969570d4e65fb338513f6b816a3
{
httpRequest: {…} 
insertId: "5b1e473200095f9bfcc03441" 
labels: {…} 
logName: "projects/my-test-project-186021/logs/appengine.googleapis.com%2Frequest_log" 
operation: {…} 
protoPayload: {
appEngineRelease: "1.9.54" 
appId: "j~my-test-project-186021" 
cost: 9.4212e-8 
endTime: "2018-06-11T09:56:02.608047Z" 
finished: true 
first: true 
httpVersion: "HTTP/1.1" 
instanceId: "00c61b117c672dd70d807801e2761bedb84eefcdab2d85b02a4728040eb34228db9ab117119678" 
instanceIndex: -1 
ip: "103.255.4.81" 
latency: "1.943253s" 
megaCycles: "806" 
method: "GET" 
requestId: "5b1e473000ff0a24daf14f976e00016a7e6d792d746573742d70726f6a6563742d3138363032310001323031383036303674313630303530000100" 
resource: "/_ah/api/zeem/v1/postfeed/5720147234914304/PRIVATE" 
responseSize: "843" 
startTime: "2018-06-11T09:56:00.664794Z" 
status: 200 
traceId: "a49da969570d4e65fb338513f6b816a3" 
urlMapEntry: "unused" 
userAgent: "Google-API-Java-Client Google-HTTP-Java-Client/1.23.0 (gzip)" 
versionId: "20180606t160050" 
}
receiveTimestamp: "2018-06-11T09:56:02.669731369Z" 
resource: {…} 
timestamp: "2018-06-11T09:56:00.664794Z" 
trace: "projects/my-test-project-186021/traces/a49da969570d4e65fb338513f6b816a3" 
} 
Message has been deleted

Azeem Haider

unread,
Jun 12, 2018, 3:53:40 AM6/12/18
to Google App Engine
I'm using Automatic Scaling.

Sorry, in my first post I did not upload stack traces. Check these below



Here is RPCs. And can you please what RPCs and LATENCY time mean here ?



Expanding trace for most user URI.



Here is trace detail of user request that I mention in my question, if you see it fetch from memory but still function take around 2 sec




Ani Hatzis

unread,
Jun 13, 2018, 12:12:29 PM6/13/18
to Google App Engine
Since your app is not in production yet, you could test the latency by changing your scaling from auto scaling to basic or manual scaling in app.yaml (see Known Issues) and compare the latencies of both scaling settings.

I would also compare the traces with a request that was directly sent to your API's backend, without Cloud Endpoints. Perhaps the problem is in your app, not with Endpoints.

I'm not 100% sure when the trace starts, but as far as I understand the documentation, the timeline of a trace starts when the (App Engine) application received the incoming request, not when Cloud Endpoints received the request (mentioned here and here). If it's true that Cloud Endpoints latency is not shown in the trace, then I'd assume that you would see the same high latency if you send a request directly to the API's backend. However, by following the logs of a trace or by comparing the trace timestamps with those of the matching logs in App Engine and Cloud Endpoints, you should be able to verify whether Endpoints latency is shown in a trace or not. (the log in Endpoints would start before the log in App Engine and end after the log in App Engine ended)

If the trace is just for your App Engine app, I would try to figure out what actually happens in the long block of untraced time, i.e. before you see the first span, e.g. the first RPC to datastore. I don't know how it is with Java standard, but for Python standard, the first RPCs show up after maybe 10ms, 20ms - or if it's the first request of a new instance maybe after 400ms. You could use Cloud Logging and Cloud Debug to get a better understanding why it takes so much time until the first RPC call.

I hope this helps.

Best regards,
Ani

Azeem Haider

unread,
Jun 14, 2018, 3:19:18 AM6/14/18
to Google App Engine
Ok, I check by changing scaling.

After seeing my logs and APIs Explorer. In Logs Latency time is low than in API Explorer.
As you said it only start time when App Engine receiver request.
By sending direct request from URL latency time is nearly equal to the time in Logs. But not same less than one sec difference.
Sending request from direct URL is faster then sending request from API Explorer.

But my main problem is I'm using using Client Libs in Android. Which take more time than API Explorer. 
It seems problem is not in App Engine and Cloud Endpoint but in Client Libs. I think Client Libs also take some time to prepare Request. That's why these are slow.

What you think ?

But I'm worried about why Datastore RPC is not showing on top as you said. Is paython environement is faster than Java ?

Please let me know about your thinking. Looking forward to hearing your valuable response.

Thanks for your precious time and consideration.

Kind Regards,
Azeem Haider






Katayoon (Cloud Platform Support)

unread,
Jun 18, 2018, 10:38:26 AM6/18/18
to Google App Engine
It seems that the majority of the latency you are experiencing is at instance creation time, as your trace shows. Long Untraced Time can be caused by a new instance being created to handle the load as a result of high amount of dependencies and the libraries you may have in your code; and this will be higher for Java applications than Python.

Azeem Haider

unread,
Jun 22, 2018, 6:24:34 AM6/22/18
to Google App Engine
I'm just using all basic library that is required for cloud endpoints. 

Is there any solution how can I reduce latency time in my case ?

Jordan (Cloud Platform Support)

unread,
Jun 25, 2018, 11:24:33 AM6/25/18
to Google App Engine
The best way to investigate the latency of your own application is to directly apply logging throughout your code. Log the timestamps of when your application receives a request, before and after it performs any logic, before and after using the Google Cloud Client Libraries (and all other API requests), and before it returns. This way you can exactly pin-point what in your code is causing the latency. 

You can check your App Engine Dashboard to confirm that the latency is not mainly caused by instance startup time (as seen in the "Loading latency" graph), but is simply latency caused by your code (as seen in the "Latency" graph). One major cause could be your excessive memory usage. You are using F1 class instances which have 128 MB of virtual memory, but you are constantly exceeding that and averaging around 215 MB (as seen in the "Memory Usage" graph). This could lead to App Engine shutting down an instance after finishing a request, forcing new incoming requests to be loaded for the first time on new instances.

Of course this is only to help reduce the latency of App Engine. The latency you see from your Android device is caused by a much larger number of things like networking paths and connectivity. If you wish to investigate the latency of the Android Client library, you must instead eliminate any networking latency and run the library on a machine that has tested fast connection to Google (like the Cloud Shell). If you do end up finding that the latency is truly due to the Client Libraries (and not your network or App Engine application code), then you can directly report your findings to the appropriate library engineering team in their Github Issue Tracker. 

- Note that Google Groups is reserved for general product discussions and is not for technical support. If you require further support for reducing the latency of your application it is recommended to post your detailed questions to Stack Exchange using the supported Cloud tags. 

 
Reply all
Reply to author
Forward
0 new messages