-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ok, as near as I can tell, here's what happened with call-id
213055064@customerlocalIP:
> 08:11:00: INVITE recv from customer
The customer is auth-by-IP and was allowed to progress by FreeSWITCH:
> IP customerExtIP.com Approved by acl "trusted[]". Access Granted
> 08:11:00.363454 [DEBUG] kazoo_fetch_agent.c:216 Sending dialplan
> XML
request (ae7e9310-431a-11e4-b389-ab14f54ca636) to
ecal...@ny2apps2.mydomain.us
So ecallmgr@ny2apps2 is tasked with processing the call, but
ultimately times out after 3099ms and FreeSWITCH hangs the call up
with NO_ROUTE_DESTINATION.
Sad story.
Let's look at ecallmgr and see what it tried:
At 8:11:00, ecallmgr_fs_authz publishes an authz.authorize message
> (<0.491.0>) published to
callmgr(amqp://
amusr:am...@ny2apps1.mydomain.us:5672) exchange (routing
key authz.authorize)
At 8:11:05, that request timed out:
> 213055064@customerlocalIP|wh_amqp_worker:545 (<0.491.0>) request
timeout exceeded for msg id: 011ec30e2228d97c
Ok, now the 5 second wait is definitely something to be fixed (if
FreeSWITCH waits 3s, ecallmgr requests should probably only wait 2.8s
or so. I've filed a ticket for this sprint:
https://2600hz.atlassian.net/browse/KAZOO-2989
Let's look at the processing of that message id(011ec30e2228d97c):
Nothing on ny2apps2. Ok, how about ny2apps1. Hmm, appears nothing as
well. This makes me suspect that jonny5 is not running, or if it is
running, it does not have a connection to the same broker and
ecallmgr. For reference, ecallmgr published the request to
(amqp://
amusr:am...@ny2apps1.mydomain.us:5672)
Now, because the authz request fails to get a response, and the
timeout waiting for that response is larger than FreeSWTICH waits, the
call fails. But, we see later that the ecallmgr_fs_authz process (pid
0.30749.176) actually would have authz'd the request and the call
would have processed appropriately.
You can disable ecallmgr's authz process so it bypasses the
authz.authorize message entirely, or you could try patching the
timeouts on the route/authz requests to be under 3 seconds and let us
know if that addresses the issue. Or you can start jonny5 if you are
wanting to authz.
Hope that helps you along the way of solving this issue.
James
Le 2014-09-30 16:50, Dave Singer a écrit :
> Still hoping someone can take a look at this and help. I'm digging
> through code but it is very slow figuring out where each thing
> jumps, what it's affect might be and if that is what is causing my
> issue. Please someone help. Thanks Dave
>
> On Thursday, September 25, 2014 9:55:14 AM UTC-7, Dave Singer
> wrote:
>>
>> In preparing the logs and looking at the pcap that I was going to
>> include I discovered that my evaluation of the timing of the
>> response packet was wrong. It was not quick. It was 5+ sec after
>> the initial request. So I did more digging in the app log. Logs
>> <
http://pastebin.com/qK3sR5TR>. There is a bunch of stuff that is
>> probably not pertinent to the call but I left them in just in
>> case. Note that the only servers in the same zone have names that
>> start with ny2. So amqp connection failures to ams2... and sg1...
>> I think should be irrelevant.
>>
>> Running kazoo git version 3.08 with some minor customizations.
>> Mainly adding extra logging to help debug call processing.
>>
>> The sample call's callid is: 213055064@customerlocalIP Here are
>> some highlights of what I found looking mainly at logs for
>> ny2apps2 to get you started.
>>
>> It finds a routing solution in the no_match call flow and logs
>> this line: Sep 23 08:11:01 ny2apps2 2600hz[24027]:
>> |213055064@customerlocalIP|cf_route_req:133 (<0.5439.222>)
>> callflows knows how to route the call! sent park response I
>> guessed it was sending the message at that time to FS containing
>> the park xml. I was wrong. :( And it made me look in the wrong
>> place for a long time.
>>
>> Three sec after FS sent the dialplan request, it times out
>> waiting for the dialplan response. It apparently sends a message
>> to the app server which it logs starting with: Sep 23 08:11:03
>> ny2apps2 2600hz[24084]:
>> |213055064@customerlocalIP|ecallmgr_call_events:514
>> (<0.30817.176>) publishing call event channel_destroy ..... Sep
>> 23 08:11:00 ny2apps2 2600hz[24084]:
>> |213055064@customerlocalIP|wh_amqp_channel:143 (<0.491.0>)
>> published to callmgr(amqp://
amusr:am...@ny2apps1.mydomain.us:5672)
>> "amqp://
amusr:am...@ny2apps1.mydomain.us:5672" amqp_uri =
>> "amqp://
amusr:am...@ny2apps2.mydomain.us:5672"
sip:ja...@2600hz.com
iQEcBAEBAgAGBQJULC+YAAoJENTKa+JPXCVgSkMIAJqLjfGy9NMN1zFGD+qWOkGh
gx68KEmlhJQnAe9pYA0Uh/HZiom7DdDgys2ItWeNKrgbE1GbQE1bfgSYc3HobAkc
jt0s8eWjYLPMiSelN9gk6O8wNYSQZ4d7bNfgt8WT/2BblhTj1d03Lpj1w8LymtJX
1h+9AORr34Aj0lukeShUgjq4dW60of558Hv2p5aW9FF834xKziMMaAHANCqx+VLO
PrjS3cwvzjeOddY2lq98zs1Ql35+KFh8BmuTPJlGGSmU6srHmGu/suc74KuEnq8P
koniritsI7FRNIDwZ18OMPBPZHO2Btgp++DJuJ4ExZtmnNg4AcovlZ/+UXuyPoA=
=nlbU
-----END PGP SIGNATURE-----