Hey Colin,
First and foremost, many thanks for your first impressions and initial
notes on TyphoonAE! This is all very helpful. Let me try to give you
some answers inline.
On Oct 4, 5:31 pm, hawkett <
hawk...@gmail.com> wrote:
> ...
>
> 1. If I am seeing new server starts (e.g. main.py is being
> (re)imported) I'm wondering if this is additional application
> instances, or stop/start of existing ones, or both? Your architecture
> diagram looks like it shows multiple application instances are used.
The number of appservers/app can be configured in the fcgi-program
section of the according supervisor config file (e.g. etc/
1.latest.appid-supervisor.conf). The default number of appservers/app
is 2.
numprocs = 2
For instance, if you experience heavy workload on the appservers over
a number of requests, you might want to increase the number of
appserver processes manually. However, it's on us to find a nifty
algorithm for automatic scaling.
All appserver processes are connected to the NGINX HTTP frontend
server via shared sockets managed by the supervisor daemon. Generally,
we don't restart appserver processes. That's different to GAE and the
main reason why apps stay "hot" on TyphoonAE. There are only three
possible reasons for a restart, though:
a) appserver crashes due to an uncaught exception which causes an
automatic restart (see the supervisor docs for configuring exit codes)
b) manual restart
c) appserver memory consumption exceeds a configurable threshold
(monitored by a separate memmon process)
> 2. I'm running into a problem that is probably fairly unique to my
> application. I initially 'boot' the database on dev/SDK with a long
> running request that also serves as a testing mechanism for my server
> code. This process doesn't occur on live - I just upload the raw data
> using remote_api. I *think* I am seeing this process terminated
> prematurely and new server instance being started, and I am guessing
> this could be due to this (terminating memory hogs)
http://groups.google.com/group/typhoonae/browse_thread/thread/33ff124...
> - I know you say you are looking at an option here to set the memory
> amount - but would you be able to point me in the direction of how to
> raise this limit (even if it requires a rebuild of the server), as I
> would like to verify that this is the problem with my boot process. I
> am wondering if there is a way for you to terminate a memory hog
> *after* it has finished processing all its requests, rather than just
> killing it? Perhaps set a flag on the server to accept no more
> requests, and poll it until the active request count drops to zero, or
> something like that? This might be why GAE uses timed termination -
> to ensure it doesn't kill servers actively running requests?
The memory limit can also be easily configured in the 1.latest.appid-
supervisor.conf file. Here is an example:
[eventlistener:appid.1_monitor]
command=/Users/tobias/projects/appengine/typhoonae-dev/bin/memmon -g
appid=200MB
events=TICK_60
Just raise the limit by modifying appid=x. If you want to disable the
event listener completely, either delete this section or type bin/
supervisorctl stop appid.1_monitor.
Providing a flag to finish the current request before killing the
process would be a great enhancement. I have to ponder on this,
though. An appropriate signal handler in fcgiserver.py might work.
> 3. Given that TyphoonAE can handle multiple instances (including stop/
> start), it would be great to have some unique identifier in the
> application logs to detect that a different server instance is running
> the request - I think this is why GAE live uses a 'request thread' log
> style - it is really hard to read aggregate logs from multiple threads
> - especially if there is no indication which thread a log message is
> from. If there is a unique identifier, we can grep to get a log for
> each server individually. It would be even better if every request
> thread had a unique identifier in the logs, so we could grep per
> server and per request :)
Great idea! Maybe more granular loglevels might be helpful, too.
That's definitely next on my list after pulling in Joaquins great work
with celery integration. Would you prefer to still keep appserver logs
separated from HTTP logs?
> 4. Is there a way to set the request timeout to something other than
> 30s? I know this number matches GAE live, but the SDK has no such
> limitation. To mimic the SDK behaviour it would be good to turn this
> off. This is not a big issue, as the server continues to finish the
> request regardless of the front end timeout, but just something I was
> wondering.
I'm a bit confused on this one, because we currently configured a
default server-side keepalive timeout of 65 seconds. Where do you
experience requests cut off after 30 seconds?
> 5. Almost all server errors show up as Gateway errors - is there a way
> to print the python stack trace to the browser?
This needs to be fixed in the fcgiserver.py as well, I guess. Would
you like to add an issue for that?
> 6. I am sometimes seeing caching behaviour that is not evident in the
> SDK - e.g. if I load a page and get the gateway error, look in the
> logs for the cause, modify my code and hit refresh - most of the time
> the refresh just shows the gateway error, and the server logs show no
> activity. I'm just wondering where this caching is occurring and if I
> can turn it off? i.e. every browser refresh of every url will always
> hit the server.
After modifying code you should manually restart the appserver
processes to make the changes take effect.
This can be done with the following command:
bin/supervisorctl resatart appid.version: (restarts the whole 'process
group')
Optionally, the apptool takes the --develop option which disables
module caching.
> Anyway - enough for now :) Thanks for the great work,
Thank you again and don't hesitate to post more questions and
suggestions. I hope that my answers above help so far.
- Tobias