BLF getting stuck on busy way more in 3.16

463 views
Skip to first unread message

Geoff Thomas

unread,
Sep 16, 2014, 12:22:30 PM9/16/14
to 2600h...@googlegroups.com
So we have had the occasional issue with blf getting stuck on busy randomly for users, I assume it is due to the idle event not getting handled to reset. Since upgrading to 3.16 over the weekend coming from 3.12 this seems to be way more prevalent now then ever before. I would say we had it happen once every couple weeks, now it is happening to probably 10 accounts an hour on average. Running the kamctl fifo presence_flush all clears it up but then they randomly get stuck again. Is there anything that can be done to mitigate this?

Darren Schreiber

unread,
Sep 16, 2014, 12:24:26 PM9/16/14
to 2600h...@googlegroups.com
This sounds like something misconfigured on your cluster. How many servers do you have and did you upgrade both Whapps, Kamailio and FreeSWITCH using our RPMs?

We just rolled v3.16 to production and are seeing the opposite of what you’re describing – it’s way more reliable and almost all BLF related tickets have been eliminated…

--
You received this message because you are subscribed to the Google Groups "2600hz-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 2600hz-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Geoff Thomas

unread,
Sep 16, 2014, 1:12:35 PM9/16/14
to 2600h...@googlegroups.com
So we have 2 kazoo / kamailio and 2 freeswitch servers, yes I upgraded them all to the latest available on the stable repo, kazoo is 3.16-17, kamailio is 4.0-46, kazoo configs is 3.16-5 and our freeswitch boxes are 1.4.7-4

One thing I noticed that I believe is new since the upgrade is an entry in the system_config/omnipresence doc: "expires_fudge_s": 20
I do not believe that was in there before, is it supposed to be? Only other entry in omnipresence is "expire_check_ms": 1000

Does that look correct? Any ideas where else I can check for misconfigurations?

Geoff Thomas

unread,
Sep 17, 2014, 2:21:06 PM9/17/14
to 2600h...@googlegroups.com
Darren, I know you asked for my ecallmgr system_config doc but just noticed that it was in the wrong thread that got posted so wanted to add it in the correct one here.

Also I have noticed that all the blf keys that get stuck are all in the early state if that helps at all...

"default": {
       "fs_nodes": [
           "freeswitch@fs1",
           "freeswitch@fs2"
       ],
       "syslog_log_level": "info",
       "fs_cmds": [
           {
               "load": "mod_sofia"
           },
           {
               "reloadacl": ""
           }
       ],
       "distribute_presence": false,
       "distribute_message_query": false,
       "publish_channel_state": true,
       "sofia_conf": null,
       "acls": {
           acls...
       },
       "default_realm": "nodomain.com",
       "authz_enabled": false,
       "default_ringback": "%(2000,4000,440,480)",
       "use_vlc": false,
       "use_http_cache": true,
       "record_waste_resources": false,
       "recording_file_path": "/tmp/",
       "node_down_grace_period": 10000,
       "capabilities": [
           {
               "module": "mod_conference",
               "is_loaded": false,
               "capability": "conference"
           },
           {
               "module": "mod_channel_move",
               "is_loaded": false,
               "capability": "channel_move"
           },
           {
               "module": "mod_http_cache",
               "is_loaded": false,
               "capability": "http_cache"
           },
           {
               "module": "mod_dptools",
               "is_loaded": false,
               "capability": "dialplan"
           },
           {
               "module": "mod_sofia",
               "is_loaded": false,
               "capability": "sip"
           },
           {
               "module": "mod_spandsp",
               "is_loaded": false,
               "capability": "fax"
           },
           {
               "module": "mod_flite",
               "is_loaded": false,
               "capability": "tts"
           },
           {
               "module": "mod_freetdm",
               "is_loaded": false,
               "capability": "freetdm"
           },
           {
               "module": "mod_skypopen",
               "is_loaded": false,
               "capability": "skype"
           },
           {
               "module": "mod_dingaling",
               "is_loaded": false,
               "capability": "xmpp"
           },
           {
               "module": "mod_skinny",
               "is_loaded": false,
               "capability": "skinny"
           }
       ],
       "restrict_cdr_publisher": false,
       "use_shout": false,
       "max_timeout_for_node_restart": 10000,
       "freeswitch_context": "context_2",
       "restrict_channel_state_publisher": true,
       "redirect_via_proxy": true,
       "default_fax_extension": ".tiff",
       "fax_file_path": "/tmp/",
       "user_cache_time_in_ms": 3600000,
       "expires_deviation_time": 180,
       "publish_channel_reconnect": false
   }

Arek Fryz

unread,
Sep 30, 2014, 3:25:50 PM9/30/14
to 2600h...@googlegroups.com
Geoff,

Did you get that resolved? We are having the same issue.

Thank you,
Arek

Geoff Thomas

unread,
Oct 3, 2014, 1:24:57 PM10/3/14
to 2600h...@googlegroups.com
For now yes, after much trial and error we have stabilized the blf by shutting down kazoo services on our secondary sbc. So by running only one instance of whistle apps and ecallmgr we have not had any more reports of stuck blf from users in about a week now. Obviously this is not ideal, but its better then getting support calls constantly on this issue until we can figure out the underlying cause. I have still been investigating but am still not sure why having kazoo running on both servers caused this for us. From everything I can tell both servers have identical configurations on the kazoo side. I would be curious Arek if trying this clears up your issues if you would be willing to give that a shot.

Arne van Balgoijen

unread,
Oct 8, 2014, 2:55:55 AM10/8/14
to 2600h...@googlegroups.com
Darren,

In Geoff's ecallmgr system_config doc the value "distribute_presence" equals false. Is this setting related to the omnipresence module, or something completely different?

BR

Arne

Darren Schreiber

unread,
Oct 8, 2014, 9:43:23 AM10/8/14
to 2600h...@googlegroups.com
It is related to omnipresence I believe.

From: Arne van Balgoijen <ar...@firmtel.com>
Reply-To: "2600h...@googlegroups.com" <2600h...@googlegroups.com>
--

Arne van Balgoijen | FirmTel

unread,
Oct 8, 2014, 9:56:07 AM10/8/14
to 2600h...@googlegroups.com
Ok what will change if I set it to true?

Arne van Balgoijen  | Product Marketing
FirmTel
+31 (0)637 009 102  |  Office:  Nieuwe Gracht 53 - 2011 ND Haarlem - The Netherlands  |  T +31 (0)23 820 0235  |  www.firmtel.com
                                                 
This email (including any attachments to it) is confidential, subject to copyright and is sent for the personal attention of the intended recipient only. If you have received this email in error, please advise me immediately and delete it. Our general terms and conditions are applicable to all our services and contain an exclusion of liability. FirmTel Holdings B.V. is registered with the Dutch Chamber of Commerce under number 53675894.

--
You received this message because you are subscribed to a topic in the Google Groups "2600hz-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/2600hz-dev/nBFd7kiTPMk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 2600hz-dev+...@googlegroups.com.

Darren Schreiber

unread,
Oct 8, 2014, 9:57:12 AM10/8/14
to 2600h...@googlegroups.com
It should propagate the BLF info from one whapps to another so that things stay in sync across the cluster.

I think :-)

I will need James or Karl or someone else to verify that.

Arne van Balgoijen | FirmTel

unread,
Oct 8, 2014, 10:15:35 AM10/8/14
to 2600h...@googlegroups.com
If that is the case than that might be the clue, as the default seems to be false

Arne van Balgoijen  | Product Marketing
FirmTel
+31 (0)637 009 102  |  Office:  Nieuwe Gracht 53 - 2011 ND Haarlem - The Netherlands  |  T +31 (0)23 820 0235  |  www.firmtel.com
                                                 
This email (including any attachments to it) is confidential, subject to copyright and is sent for the personal attention of the intended recipient only. If you have received this email in error, please advise me immediately and delete it. Our general terms and conditions are applicable to all our services and contain an exclusion of liability. FirmTel Holdings B.V. is registered with the Dutch Chamber of Commerce under number 53675894.

Darren Schreiber

unread,
Oct 8, 2014, 10:16:23 AM10/8/14
to 2600h...@googlegroups.com
I was pretty sure it was no longer required, but try it. You need to restart ecallmgr for that one too I believe.

James is teaching the training this week. I will try to see if Karl can answer.

Arne van Balgoijen | FirmTel

unread,
Oct 8, 2014, 10:57:37 AM10/8/14
to 2600h...@googlegroups.com
Darren,

Could you check your production environment if it is set there?

Arne van Balgoijen  | Product Marketing
FirmTel
+31 (0)637 009 102  |  Office:  Nieuwe Gracht 53 - 2011 ND Haarlem - The Netherlands  |  T +31 (0)23 820 0235  |  www.firmtel.com
                                                 
This email (including any attachments to it) is confidential, subject to copyright and is sent for the personal attention of the intended recipient only. If you have received this email in error, please advise me immediately and delete it. Our general terms and conditions are applicable to all our services and contain an exclusion of liability. FirmTel Holdings B.V. is registered with the Dutch Chamber of Commerce under number 53675894.

Darren Schreiber

unread,
Oct 8, 2014, 10:58:18 AM10/8/14
to 2600h...@googlegroups.com
Just did.

It is off.

I was pretty sure we no longer required that.

Arek Fryz

unread,
Oct 8, 2014, 11:22:27 AM10/8/14
to 2600h...@googlegroups.com
I set distribute_presence to false on my cluster and BLF stopped working at all. I only restarted whapps and ecallmgr, maybe kamailio needs to be restarted too?


Regards,
Arek Fryz




Darren Schreiber

unread,
Oct 8, 2014, 11:23:12 AM10/8/14
to 2600h...@googlegroups.com
Do you have publish_channel_state ? If so, what is it set as?

Arek Fryz

unread,
Oct 8, 2014, 11:32:12 AM10/8/14
to 2600h...@googlegroups.com
BFL started to work after couple minutes.. first 10 minutes after restart kamctl fifo presence_list did not give any output and now I see statuses again. Phone re-subscribed? Or omnipresence was talking to different place in amqp?

I will let you know if that change helped with sticked BLF - on polycom phones I see it a lot after upgrade to 3.16

This is my config:

"distribute_presence": "false",
       "node_down_grace_period": 10000,
       "distribute_message_query": "true",
       "default_realm": "nodomain.com",
       "sofia_conf": null,
       "publish_channel_state": true,
       "authz_enabled": false,
       "authz_dry_run": false,
       "default_ringback": "%(2000,4000,440,480)",
       "record_waste_resources": true,
       "recording_file_path": "/tmp/",
       "use_vlc": false,
       "use_http_cache": true,
       "use_shout": false,
       "default_fax_extension": ".tiff",
       "fax_file_path": "/tmp/",
       "authz_local_resources": "false",
       "max_timeout_for_node_restart": 10000,
       "authz_default_action": "allow",
"restrict_cdr_publisher": true,
       "redirect_via_proxy": true,
       "freeswitch_context": "context_2",
       "restrict_channel_state_publisher": true,
       "user_cache_time_in_ms": 3600000,
       "expires_deviation_time": 180,
       "publish_channel_reconnect": false




Darren Schreiber

unread,
Oct 8, 2014, 11:33:43 AM10/8/14
to 2600h...@googlegroups.com
Are you stating that after enabling publish_channel_state, and restarting, presence starting working ? (after 10 minutes?)

My guess is the 10 minutes is related to your phone’s resubscribe settings.

Arek Fryz

unread,
Oct 8, 2014, 11:38:03 AM10/8/14
to 2600h...@googlegroups.com
publish_channel_state was always on in my case, I changed distribute_presence to false and BLF was not working for about 10 minutes after restart and then started.

Darren Schreiber

unread,
Oct 8, 2014, 11:39:42 AM10/8/14
to 2600h...@googlegroups.com
OK. I think what you have set now is correct.

Arne van Balgoijen | FirmTel

unread,
Oct 8, 2014, 12:11:47 PM10/8/14
to 2600h...@googlegroups.com
Darren/Arek,

These settings correspond with what we have on our production environment running 3.16-25, unfortunately we are still experiencing BLF issues on Yealink devices, soft user agents and more specific on Zoiper (paid version!).

The weird situation is that the BLF indications on the Yealink devices do not correspond with what you see on Zoiper (might be a Zoiper issue), but also is also different to what you get when doing a "kamctl fifo presence_list". As stated earlier in this thread on the Yealink it is often a false state "ringing".

Currently we run very frequent a "kamctl fifo presence_flush all" that more or less solves it temporarily for the Yealinks, the soft user agents react different.


Br Arne

Darren Schreiber

unread,
Oct 8, 2014, 12:13:52 PM10/8/14
to 2600h...@googlegroups.com
What is the actual issue? I feel like there are multiple issues in this thread.

Arek is claiming no BLF at all. You seem to be discussing something different?

Arek Fryz

unread,
Oct 8, 2014, 12:31:06 PM10/8/14
to 2600h...@googlegroups.com

Well,  no BLF at all issue was related to restart and I also have issues with BLF not updating state. In my case yealink phones are the only ones without issues and polycoms have issues. I think it's related to parking calls on park slot. Look like every time user A Park call using *31 then call user B,  then hang up and user B picks up from *31 BLF on phone A still shows confirmed state in kamailio. Even after user B hang up it will still show confirmed state for his phone. Also *31 does not reset.


Arek

Sent from REMAC Samsung Galaxy cell phone.

Nigel Johnson

unread,
Jan 8, 2015, 3:20:56 PM1/8/15
to 2600h...@googlegroups.com
Has anyone found a solution to this ? I'm having issues with BLF on kazoo-R15B-3.16-36, 2 kazoo servers setup in federation, it just started happening when i upgraded to kazoo-R15B-3.16-36. It has pretty much always been an issue with grandstream phones though.
my system_config/ecallmgr 
       "publish_channel_state": true,
       "distribute_presence": true,

James Aimonetti

unread,
Jan 8, 2015, 11:11:46 PM1/8/15
to 2600h...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

If you can, I'd recommend updating to the 3.18 RPMs as they've
received the lion's share of development effort in regards to
BLF/presence, and it would be awesome to see if the update handles
your issues.
- --
James Aimonetti
Lead Systems Architect / Impressionable Scallywag
"I thought I fixed that"

2600Hz | http://2600hz.com
sip:ja...@2600hz.com
tel:415.886.7905
irc:mc_ @ freenode
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJUr1T/AAoJENTKa+JPXCVg5nUH/00r6AUvR8YJmkBWOGET1UB9
eQZCKSqH2emkdl6zWd4ZZNtwZnN6N5/gLRNUe9N3uAC7kC5d0FMrgeGIqL9zDckc
8OGaCrvD14RHq7qYD3fRJ1nHrwCKIiB3q/k/NHcWomGm5ooUtFZwEM2ALjVrfhjX
/gRmoztO45BGwPDJZu3i1dAkUgFphzsoyM/e/EITA8YdybdpAMFsoHEs1w2J6voF
Wjf4QOt/tQ8b5tBbBFLBbNrCkYhwJ2BbcP3s/W3uhKSTzfS1KP/R4mxmlmUW0dYD
jYQem9exUPKDDXZGVAdEJyzYKYNW19Y/s3trDBVMGimLvHUxC3wIBDzsM2PtsiI=
=7Eca
-----END PGP SIGNATURE-----

Mick Burns

unread,
Jan 9, 2015, 6:11:11 PM1/9/15
to 2600h...@googlegroups.com
James,

Wondering here if the 3.18 RPMs will be going into stable anytime soon
? Believe I recently saw a note from Darren mentioning you guys were
getting very close to that.

Perhaps it's taking longer as you are attempting to roll Kamailio 4.2
along with it ?

Finally, is there specific instructions already available about what
needs to be done in order to successfully migrate a 3.16 cluster over
to 3.18 ?
or a simple "sup whapps_maintenance migrate" would be sufficient ?

ty

Darren Schreiber

unread,
Jan 9, 2015, 6:12:00 PM1/9/15
to 2600h...@googlegroups.com
The last outstanding issue on v3.18 was BLF over TCP. We deployed a patch for that on Wed to the client who discovered the scenario that caused that. It seems to have resolved it.

We were going to promote v3.18 to stable today. Assuming we still have the time to do so, I would consider it stable as of today.


-----Original Message-----
From: 2600h...@googlegroups.com [mailto:2600h...@googlegroups.com] On Behalf Of Mick Burns
Sent: Friday, January 9, 2015 3:10 PM
To: 2600h...@googlegroups.com
Subject: Re: BLF getting stuck on busy way more in 3.16

Reply all
Reply to author
Forward
0 new messages