Issue with watchdog script

356 views
Skip to first unread message

Raphael Mazelier

unread,
May 14, 2015, 5:53:10 AM5/14/15
to exabgp...@googlegroups.com
Hello,

I am using exabgp since years with great succes (anycasting vip or network).
At home I ve recently upgrade one my openbsd gw, and since I've got issue with watchdog script hunging.

Version :
# exabgp -v
ExaBGP : 3.4.10
Python : 2.7.9 (default, Mar  6 2015, 16:37:51)  [GCC 4.2.1 20070719 ]
Uname  : GENERIC.MP#881

# OpenBSD gw2.tool 5.7 GENERIC.MP#881 amd64

Even with something as simple as :

group bird {
  neighbor 192.168.10.3 {
    router-id 192.168.0.3;
    local-address 192.168.0.3;
    local-as 65001;
    peer-as 65002;
  }

  process watcher {
      run /usr/local/etc/exabgp/dynamic-1.pl;
  }
}

The main process exabgp detect the watcher process dead (which it not) and try to stopping exabgp didn't work as the signal seems not to be propagated to the watcher one ...


snip
Thu, 14 May 2015 11:50:06 | INFO     | 21770  | processes     | The process died, trying to respawn it
Thu, 14 May 2015 11:50:06 | INFO     | 21770  | processes     | Terminating process watch-google
^D

^CThu, 14 May 2015 11:50:16 | ERROR    | 21770  | reactor       | ^C received
Thu, 14 May 2015 11:50:16 | INFO     | 21770  | reactor       | Performing shutdown
Thu, 14 May 2015 11:50:16 | DEBUG    | 21770  | wire          | session 1 outgoing 192.168.0.3 / 192.168.10.3       SENDING  (97) FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0027 0306 0350 6565 7220 4465 2D63 6F6E 6669 6775 7265 64
Thu, 14 May 2015 11:50:16 | INFO     | 21770  | processes     | The process died, trying to respawn it
Thu, 14 May 2015 11:50:16 | INFO     | 21770  | processes     | Terminating process watch-google
^CThu, 14 May 2015 11:50:17 | ERROR    | 21770  | reactor       | ^C received
Thu, 14 May 2015 11:50:17 | INFO     | 21770  | reactor       | Performing shutdown
Thu, 14 May 2015 11:50:17 | INFO     | 21770  | message       | Peer    192.168.10.3 ASN 65002   >> NOTIFICATION (6,3,"Peer De-configured")
Thu, 14 May 2015 11:50:17 | INFO     | 21770  | processes     | The process died, trying to respawn it
Thu, 14 May 2015 11:50:17 | INFO     | 21770  | processes     | Terminating process watch-google

etc...

Running 3.2.18 in the same environnent is working fine.
Is something change in the process watching code ?

Thks.

--
Raphael Mazelier



Thomas Mangin

unread,
May 14, 2015, 6:19:20 AM5/14/15
to exabgp...@googlegroups.com
Hello,

This is most likely related to
https://github.com/Exa-Networks/exabgp/issues/252
And I will be working on fixing this very shortly.

Thomas
> --
> You received this message because you are subscribed to the Google
> Groups "exabgp-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to exabgp-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Raphael Mazelier

unread,
May 14, 2015, 3:20:13 PM5/14/15
to exabgp...@googlegroups.com

Le jeudi 14 mai 2015 12:19:20 UTC+2, Thomas Mangin a écrit :
Hello,

This is most likely related to
https://github.com/Exa-Networks/exabgp/issues/252
And I will be working on fixing this very shortly.

Thomas


First thanks for the quick fix :)

I ve just try it. This is better but the watcher process is still seen as dead (?). The good news that now it will be correclty respawn and handle correctly signal.
The bad news is that while it continualy respaw main exabgp process stop :

Thu, 14 May 2015 21:16:55 | INFO     | 23853  | processes     | The process died, trying to respawn it
Thu, 14 May 2015 21:16:55 | INFO     | 23853  | processes     | Terminating process watch-google
Thu, 14 May 2015 21:16:55 | INFO     | 23853  | processes     | Forked process watch-google
Thu, 14 May 2015 21:16:55 | CRITICAL | 23853  | processes     | Too many respawn for watch-google (5) terminating program
Thu, 14 May 2015 21:16:55 | ERROR    | 23853  | reactor       | Problem when sending message(s) to helper program, stopping
Thu, 14 May 2015 21:16:55 | INFO     | 23853  | reactor       | Performing shutdown
Thu, 14 May 2015 21:16:55 | DEBUG    | 23853  | wire          | session 1 outgoing 192.168.0.3 / 192.168.10.3       SENDING  (97) FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0027 0306 0350 6565 7220 4465 2D63 6F6E 6669 6775 7265 64
Thu, 14 May 2015 21:16:55 | INFO     | 23853  | message       | Peer    192.168.10.3 ASN 65002   >> NOTIFICATION (6,3,"Peer De-configured")
DEBUG[healthcheck] Checking command 'curl -sf http://google.fr'
DEBUG[healthcheck] Command was executed successfully
DEBUG[healthcheck] Transition to RISING
Thu, 14 May 2015 21:16:56 | INFO     | 23853  | processes     | Terminating process watch-google

What is changing still 3.12 ?  

Thomas Mangin

unread,
May 14, 2015, 3:38:15 PM5/14/15
to exabgp...@googlegroups.com
Hello Raphael,

Thank you for the quick testing too ! I am on holidays tomorrow so I
should be able to give it another look.
On my mac all the unittests (which are using the feature) are working
(running as myself or root), so I am confused.

Could I check if you are starting ExaBGP as root or not and run the
process by hand as the ExaBGP user to see if it dies please ?

Could you also make sure your configuration was updated to use the
latest format for your version, as the process section has been quite
some change.

Thomas

diff --git a/lib/exabgp/reactor/api/processes.py
b/lib/exabgp/reactor/api/processes.py
index 847b3ce..1dd01a7 100644
--- a/lib/exabgp/reactor/api/processes.py
+++ b/lib/exabgp/reactor/api/processes.py
@@ -219,7 +219,9 @@ class Processes (object):

self.logger.processes("The process died, trying to respawn it")

self._terminate(process)

self._start(process)
- except
(subprocess.CalledProcessError,OSError,ValueError):
+ except
(subprocess.CalledProcessError,OSError,ValueError),e:
+ print type(e)
+ print str(e)
self.logger.processes("Issue with the
process, terminating it and restarting it")
self._terminate(process)
self._start(process)

Raphael Mazelier

unread,
May 14, 2015, 4:30:14 PM5/14/15
to exabgp...@googlegroups.com


Le jeudi 14 mai 2015 21:38:15 UTC+2, Thomas Mangin a écrit :
Hello Raphael,

Thank you for the quick testing too ! I am on holidays tomorrow so I
should be able to give it another look.
On my mac all the unittests (which are using the feature) are working
(running as myself or root), so I am confused.

Could I check if you are starting ExaBGP as root or not and run the
process by hand as the ExaBGP user to see if it dies please ?

Could you also make sure your configuration was updated to use the
latest format for your version, as the process section has been quite
some change.


I ve test launching exabgp with root , or nobody (default user), and same issue.
Here my conf :

group bird {
  neighbor
192.168.10.3 {
    router
-id 192.168.0.3;
   
local-address 192.168.0.3;
   
local-as 65001;
    peer
-as 65002;
 
}

  process watcher
{

      run
/usr/local/etc/exabgp/plof.sh;
 
}
}

et plof.sh

#!/bin/sh

while `true`;
do
echo
"announce route 192.0.2.1 next-hop 10.0.0.1"
sleep
5
done

This is not the real script, but even this simple one fail the same way...

Thks.



 
 

Thomas Mangin

unread,
May 14, 2015, 5:08:27 PM5/14/15
to exabgp...@googlegroups.com
I will try to install openbsd on a VM and give it a try.
alternatively if you can give me ssh access…

Thomas

Raphael Mazelier

unread,
May 15, 2015, 5:56:46 AM5/15/15
to exabgp...@googlegroups.com
Je t envoie ca en privé :)

Thomas Mangin

unread,
May 16, 2015, 1:49:16 PM5/16/15
to exabgp...@googlegroups.com, rmaz...@gmail.com
For the thread, this is fixed and will be in 3.4.11. 
The issue was a difference of behaviour by python depending on the OS.

Thomas
Reply all
Reply to author
Forward
0 new messages