Work Flow

76 views
Skip to first unread message

Tom

unread,
Aug 8, 2012, 4:27:28 PM8/8/12
to trigge...@googlegroups.com
Hi

I'm trying to understand the trigger workflow. I have a router defined in netdevices.xml and I'm able to connect to that device over ssh via trigger. Now what I need to do is update the ACLs on the device. 
  • I can see how the command line tool associates a named ACL with a device but how do I create the named ACL?
(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ acl -a abc123 10.252.17.122
added acl abc123 to 10.252.17.122
(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ acl abc123
10.252.17.122                           abc123
  •  Now that the association exists how does it find its way on to the router? Should I inject it into the load queue?
(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ acl -i abc123 10.252.17.122
ACL abc123 injected into integrated load queue for 10
"10.252.17.122" injected into manual load queue
(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ acl -m
10.252.17.122
added by tpurcell on 2012-08-08 16:19:58
  • Now that its injected what happens next?
  • If I try to list staged ACLs I get an error saying a file does not exist:
(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ acl -s
Access-lists currently staged in /home/tftp (listed by date):

Traceback (most recent call last):
  File "/home/tpurcell/data/comcast-code/mytrigger/vtrigger/bin/acl", line 90, in <module>
    os.chdir('/home/tftp')
OSError: [Errno 2] No such file or directory: '/home/tftp'
  • How does an ACL get staged and what does that mean? 
  •  Is there a way to control the location of the directory(/home/tftp) it looking for?
  • Am I on the right path?  
Thanks
Tom
 

 

Jathan McCollum

unread,
Aug 8, 2012, 7:00:42 PM8/8/12
to trigge...@googlegroups.com
Hello again, Tom!

I'll comment inline to clarify for you:

On Wed, Aug 8, 2012 at 1:27 PM, Tom <tpur...@chariotsolutions.com> wrote:
Hi

I'm trying to understand the trigger workflow. I have a router defined in netdevices.xml and I'm able to connect to that device over ssh via trigger. Now what I need to do is update the ACLs on the device. 
  • I can see how the command line tool associates a named ACL with a device but how do I create the named ACL?
(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ acl -a abc123 10.252.17.122
added acl abc123 to 10.252.17.122
(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ acl abc123
10.252.17.122                           abc123

The named ACL is expected to exist on disk in the location specified by FIREWALL_DIR in settings.py, which defaults to /data/firewalls. Policy files are expected to start with "acl.", so in this case your filename for your Cisco access-list would be "acl.abc123". 
  •  Now that the association exists how does it find its way on to the router? Should I inject it into the load queue?

Correct, you should inject it into the load queue. But because Trigger already knows the association, there is no need to specify the ip/hostname as an argument. For every argument to the -i option, the 'acl' command will try to evaluate it as an ACL name. If it can't find an association for that name, the value will instead be added to the manual queue. (This is not what you want.)

So if you execute "acl abc123" you get the result telling you that the acl "abc123" is associated to the device "10.252.17.122".

10.252.17.122                           abc123

If you execute "acl 10.252.17.122", it checks the reverse. It determines that "10.252.17.122" is not an ACL name, so then it checks if it's a device. It will then display all associated ACLs for that device. In this case, because you only have one ACL associated to one device, the result would be the same: 

10.252.17.122                           abc123

(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ acl -i abc123 10.252.17.122
ACL abc123 injected into integrated load queue for 10
"10.252.17.122" injected into manual load queue
(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ acl -m
10.252.17.122
added by tpurcell on 2012-08-08 16:19:58
 
"acl -m" shows you the manual queue, which are things that cannot be loaded automatically. The manual queue is intended to be a "todo list" for other engineers, such as when you need a manual change applied and want to let the on-call engineer know. This isn't what you want here. 

You want to show the integrated (automatic) queue, which is "acl -l", which will show a list of all ACLs in the queue, and the associated devices on which those ACLs will be loaded.

  • Now that its injected what happens next?
Now that's injected, you can use "load_acl -Q" to load from the queue. If the devices are within the specified bounce window based on their site location (the bounce windows are currently hard-coded in trigger/changemgmt.py), then it will present something like this:

% load_acl -Q

You are about to perform the following loads:

10.252.17.122         abc123

Are you sure you want to proceed?  
  • If I try to list staged ACLs I get an error saying a file does not exist:
(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ acl -s
Access-lists currently staged in /home/tftp (listed by date):

Traceback (most recent call last):
  File "/home/tpurcell/data/comcast-code/mytrigger/vtrigger/bin/acl", line 90, in <module>
    os.chdir('/home/tftp')
OSError: [Errno 2] No such file or directory: '/home/tftp'
  • How does an ACL get staged and what does that mean? 
Once you kickoff a load_acl job, the ACLs are then staged into the TFTP directory. Each file has a nonce appended to it's filename to increase the security of the files sitting in the tftproot.  Example:

% acl -s
Access-lists currently staged in /data/tftproot (listed by date):

-rw-r--r-- 1 fwtools netsec  9290 2012-08-06 02:15 acl.abc123.950f98fced38c8c7
-rw-r--r-- 1 fwtools netsec  9290 2012-08-06 02:15 acl.abc123.30709ac7076de4fd
-rw-r--r-- 1 fwtools netsec  9290 2012-08-06 02:16 acl.abc123.a8c3c47784332e18
-rw-r--r-- 1 fwtools netsec  9290 2012-08-06 02:18 acl.abc123.821a4297abb2b795

(These are not reaped by load_acl, so you'll have to make sure you clean up this directory from time-to-time.)
  •  Is there a way to control the location of the directory(/home/tftp) it looking for?
 The location can be customized in settings.py using TFTPROOT_DIR. I just realized we discovered a bug in the "acl" command in which the value of the tftp directory is hard-coded. I am that I fixing as we type!
  • Am I on the right path?  
YES!! You've already helped me to improve things including a bugfix, an enhancement, and some documentation! Thank you! :) 
Thanks
Tom
 

 



--
Jathan.
--

Tom

unread,
Aug 9, 2012, 10:25:00 AM8/9/12
to trigge...@googlegroups.com
Jathan

This is really helpful. I'm going to take this and work through it. I'll get back and let you know how it went.

Thanks
Tom

Tom

unread,
Aug 9, 2012, 3:09:01 PM8/9/12
to trigge...@googlegroups.com
Jathan

I've made some progress but I'm not quite there yet. Here's what I've run into:
  • I removed the ACL I had created before. It was just a dummy. 
  • I created a directory: /home/tftp
  • Changed trigger_settings.py to use FIREWALL_DIR = os.path.join(PREFIX, '/firewalls')
  • Put a file with a real ACL there and named it acl.openstack-green-acl
  • Added the new ACL and associated it with my device:
acl -a openstack-green-acl 10.252.17.122
  • Next I attempted the following:
(vtrigger)tpurcell@tpurcell-Latitude-E6500 ~/data/comcast-code/mytrigger $ load_acl -Q
Traceback (most recent call last):
  File "/home/tpurcell/data/comcast-code/mytrigger/vtrigger/bin/load_acl", line 668, in <module>
    def email_users(addresses, subject, body, fromaddr=settings.EMAIL_SENDER):
AttributeError: 'Settings' object has no attribute 'EMAIL_SENDER'
  • Next I realized I needed to use the --bouncy argument to force the load. "load_acl -Q --bouncy" executed without displaying an error but the ASR was not updated. The load_acl indicated there was a log file. Its contents were as follows:
 2012-08-09 11:43:34-0400 [-] Log opened.
2012-08-09 11:43:34-0400 [-] User tpurcell (uid:1000) executed "/home/tpurcell/data/comcast-code/mytrigger/vtrigger/bin/load_acl -Q --bouncy"
2012-08-09 11:43:34-0400 [-] Bouncy enabled, disabling multiple jobs.
2012-08-09 11:43:34-0400 [-] Loading openstack-green-acl OUT OF BOUNCE on 10.252.17.122
2012-08-09 11:43:38-0400 [-] 'Unable to get oncall information!'
    • Tried using the lambda argument for GET_CURRENT_ONCALL and got the same result.
    • Just set "ret" in get_current_oncall() to a dictionary similar to the one in the comments. That let me move on.
  • Next attempt at a load yielded the following:
...
Are you sure you want to proceed? y

Logging to /tmp/tmpQMJzEj_load_acl

Submitting CM ticket...
Traceback (most recent call last):
  File "/home/tpurcell/data/comcast-code/mytrigger/vtrigger/bin/load_acl", line 822, in <module>
    main()
  File "/home/tpurcell/data/comcast-code/mytrigger/vtrigger/bin/load_acl", line 762, in main
    cm_ticketnum = create_cm_ticket(work, oncall)
TypeError: _create_cm_ticket_stub() takes exactly 0 arguments (2 given) 
    • Tried using the lambda argument for CREATE_CM_TICKET and got the following:
...
  File "/home/tpurcell/data/comcast-code/mytrigger/vtrigger/local/lib/python2.7/site-packages/trigger/conf/__init__.py", line 62, in import_path
    mymodule = __import__(module)
  File "/home/tpurcell/data/comcast-code/mytrigger/trigger_settings.py", line 357
    CREATE_CM_TICKET = lambda a=None o, s: None
  • I did the following in trigger_settings and moved on:
def _create_cm_ticket_stub(x,y):
    return 123456 
  • On the next attempt at the load I got:
...
  File "/home/tpurcell/data/comcast-code/mytrigger/vtrigger/bin/load_acl", line 549, in select_next_device
    if group(dev) not in active_groups:
  File "/home/tpurcell/data/comcast-code/mytrigger/vtrigger/bin/load_acl", line 531, in group
    x = trimmer.match(dev.nodeName).group()
AttributeError: 'NoneType' object has no attribute 'group'
    • It appears this is due to the fact that my "nodeName" is an IP address not a DNS like name. The regex in group() was not happy with that. I went into load_acl and commented out the for loop in select_next_device() so it returns None.
  • Next I got really close, I think, to success. It gets all the way up to the point where the screen displays "load_acl" in the upper left hand corner, displays a timer and the dev/acl association. But that's it. I've let it run for over 5 minutes and nothing seems to happen. Using netstat its clear that I'm not connecting to the ASR over ssh. Here is what went to the log:
2012-08-09 13:49:43-0400 [-] Log opened.
2012-08-09 13:49:43-0400 [-] User tpurcell (uid:1000) executed "/home/tpurcell/data/comcast-code/mytrigger/vtrigger/bin/load_acl -Q --bouncy"
2012-08-09 13:49:43-0400 [-] Bouncy enabled, disabling multiple jobs.
2012-08-09 13:49:43-0400 [-] Loading openstack-green-acl OUT OF BOUNCE on 10.252.17.122
2012-08-09 13:49:45-0400 [-] Created CM ticket #123456
2012-08-09 13:55:06-0400 [-] Received SIGINT, shutting down.
2012-08-09 13:55:06-0400 [-] Main loop terminated.
2012-08-09 13:55:06-0400 [-] 0 failures
2012-08-09 13:55:06-0400 [-] Elapsed time: 5:20

I'm not sure if I'm missing something in the config to tell it to use ssh. Any thoughts are welcome.

Thanks
Tom 

Tom

unread,
Aug 9, 2012, 5:20:48 PM8/9/12
to trigge...@googlegroups.com
Okay

I made a mistake when "I went into load_acl and commented out the for loop in select_next_device() so it returns None". This of course caused it to never loop so it never processed even the first one. My bad. 

I changed the group() method to return the "site" from the devices.xml file. That causes it to actually attempt to process. 

What I get on the screen now is the load_acl and the timer plus a message that says: "1 failure, will report at end". It also shows my device and and says "connecting". When I Ctrl-C (beautiful ascii art) the screen displays: LOAD FAILED ON 10.252.17.122: Unable to stage TFTP File set(['openstack-green-acl']).

So its attempting to use tftp. Is there a way to make it use ssh?

Thanks
Tom

On Wednesday, August 8, 2012 4:27:28 PM UTC-4, Tom wrote:

Jathan McCollum

unread,
Aug 16, 2012, 7:22:19 PM8/16/12
to trigge...@googlegroups.com
Sorry about the delayed response. I was on vacation!

So skipping to the end here. Currently for Cisco devices, `load_acl` is hard-coded to use TFTP. We should change that. The assumption is that your server will be running tftpd and the device will have access to fetch it. Integration docs for this would be a great idea, don't you think? :)

The grouping algorithm is also hard-coded to a pattern match on the hostname of the device, so that explains why it wasn't returning anything. That is why I did factor it out to the `group()` function as you determined, so that we can make this something that can be tweaked in the settings.

So, it sounds like we have some enhancements to load_acl:

1. Add support for copying configs over SSH for Cisco devices (need the commands required to do this)
2. Make the device grouping algorithm read from a configurable parameter in settings.py. 

If you're willing to invest a little time and submit issues, I'd really appreciate it!! 


I did already create one for the bug in the 'acl' command where '/home/tftp' is hard-coded:
--
Jathan.
--
Reply all
Reply to author
Forward
0 new messages