It's urgent, please give me some ideas, with what other parameters or options should I compile in order to detect the reason for a completely frozen program!
Hardware and software environment
- Linux ubuntu 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
- 4 core Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
- 8 Gb RAM, SCSI RAID
Application description and interactions with other services
Go program is acting as a middleware http server (net/http) talking with >400 clients (browsers+jQuery)
- CouchDB as database through a pool (rrpool) of 50 connections with simple http (on the same machine)
- CouchDB-Lucene as a full text indexer (on the same machine)
- goroutines that spawn external programs (xelatex & php scripts) as PDF generators and wait for completion
The program was running OK (2 weeks) at moderate rate of load, at ~4000 documents incoming documents per day!
Suddently, on heavy loads, the programs started to freeze totally with:
- Not even a single line (of incoming requests, or anything else) is present in the log!
- Killing the process with -9 is moving it into a -defunct status for one or two minutes until gone and can be restarted again!
- A "hart-beat" simple goroutine that is keeping writing messages on log at every 5 secs is not writting any more
- the http event loop for dispatching request handlers is frozen, a simple "echo, i'm alive handler" is not responding
At that time, CouchDB is not saying something strange in logs, it is running perfectly, PostgreSQL is running OK, lucene logs are clear!
The main program is NOT eating memory (it stays at 480 mb RAM) or CPU (it stays on freezing at 0%).
I tried to compile it to run on just a single core, with -race flag ... anything but I've got the same thing!
On heavy load it just simply freeze at random places in the program!
=========
Please tell me if there are some tools in order to debug the program in order to know if it's completely frozen what other compiling options are available to check deadlocks or something!
I've set the GOGCTRACE=2 environment variable and got in the log file information about the garbage collection like:
scvg41: inuse: 9, idle: 5, sys: 15, released: 2, consumed: 12 (MB)
gc318(1): 3+3+0 ms, 9 -> 4 MB 69792 -> 31708 (4289323-4257615) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc319(1): 4+0+0 ms, 4 -> 4 MB 31709 -> 31686 (4289324-4257638) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc320(1): 3+0+0 ms, 4 -> 4 MB 31684 -> 31684 (4289324-4257640) objects, 0(0) handoff, 0(0) steal, 0/0/0 yields
Other ideas?
Thanks in advance,
Teo