Debugging service module

454 views
Skip to first unread message

Anand Buddhdev

unread,
May 7, 2014, 8:52:52 AM5/7/14
to ansible...@googlegroups.com
I'm running ansible 1.5.5 against a CentOS 6 server, and trying to use the "service" module to manage an upstart job. However, I keep getting this error from ansible:

$ ansible bastion3.hadoop.ripe.net -i svn/gii/ansible_hosts -sK -m service -a 'name=hdfs-sync state=stopped'
sudo password:
bastion3.hadoop.ripe.net | FAILED => failed to parse:
SUDO-SUCCESS-mwcdqmpymyognbdmbrqkqbkccfdmzwqs
Traceback (most recent call last):
File "/tmp/ansible-tmp-1399466909.49-7501828270786/service", line 2305, in <module>
main()
File "/tmp/ansible-tmp-1399466909.49-7501828270786/service", line 1170, in main
service.get_service_status()
File "/tmp/ansible-tmp-1399466909.49-7501828270786/service", line 480, in get_service_status
rc, status_stdout, status_stderr = self.service_control()
File "/tmp/ansible-tmp-1399466909.49-7501828270786/service", line 722, in service_control
rc_state, stdout, stderr = self.execute_command("%s %s %s" % (self.action, self.name, arguments), daemonize=True)
File "/tmp/ansible-tmp-1399466909.49-7501828270786/service", line 250, in execute_command
return json.loads(data)
File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.6/json/decoder.py", line 338, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

If I log into the server, and run "stop hdfs-sync" or "status hdfs-sync" or "start hdfs-sync" it all works. My ansible setup also works against other CentOS 6 boxes just fine, and can start, stop and restart other upstart jobs just fine. So this is a weird case. How can I debug this more, and find out why ansible is failing with this specific upstart job on this server.

Anand

Strahinja Kustudić

unread,
May 7, 2014, 6:57:36 PM5/7/14
to ansible...@googlegroups.com
You could set an environment variable:

ANSIBLE_KEEP_REMOTE_FILES=1

and run ansible again. Once it fails, see what files were executed, log into the remote host and run the failed python script with:

python -m trace --trace script.py

Anand Buddhdev

unread,
May 8, 2014, 10:15:04 AM5/8/14
to ansible...@googlegroups.com
On Thursday, 8 May 2014 00:57:36 UTC+2, Strahinja Kustudić wrote:

You could set an environment variable:

ANSIBLE_KEEP_REMOTE_FILES=1

and run ansible again. Once it fails, see what files were executed, log into the remote host and run the failed python script with:

python -m trace --trace script.py

Thanks for this tip. I did exactly as you describe, and ran the script on the host. To my surprise, it did what it was supposed to! The last few lines from the trace output show:

--- modulename: encoder, funcname: _iterencode_dict
encoder.py(281): if markers is not None:
encoder.py(282): del markers[markerid]
encoder.py(368): return ''.join(chunks)
{"state": "stopped", "changed": true, "name": "hdfs-sync"}
service(2146): sys.exit(0)

The service stopped running, so the script did its work. I don't understand why I get an error from ansible on my laptop then. Any ideas developers?

Anand

Adam Morris

unread,
May 8, 2014, 4:50:26 PM5/8/14
to ansible...@googlegroups.com


Actually, this looks like a bug that occurred with script and raw in some earlier versions of Ansible where the sudo success string was being leaked back.  It was fixed, I'm now wondering if it was reverted in this case...  

Can you upgrade to 1.6.1 and try it again?  

Adam

Anand Buddhdev

unread,
May 8, 2014, 4:56:35 PM5/8/14
to ansible...@googlegroups.com
On Thursday, 8 May 2014 22:50:26 UTC+2, Adam Morris wrote:

Actually, this looks like a bug that occurred with script and raw in some earlier versions of Ansible where the sudo success string was being leaked back.  It was fixed, I'm now wondering if it was reverted in this case...  

Can you upgrade to 1.6.1 and try it again?

Hi Adam, it still fails:

$ ansible --version
ansible 1.6.1
$ ansible bastion3.hadoop.ripe.net -sK -m service -a 'name=hdfs-sync state=restarted'

sudo password:
bastion3.hadoop.ripe.net | FAILED => failed to parse:

SUDO-SUCCESS-dawyrikofupcjoyftolafyddywphopwj

Traceback (most recent call last):

File "/tmp/ansible-tmp-1399582444.03-10583586416099/service", line 2411, in <module>
main()
File "/tmp/ansible-tmp-1399582444.03-10583586416099/service", line 1198, in main
service.get_service_status()
File "/tmp/ansible-tmp-1399582444.03-10583586416099/service", line 509, in get_service_status

rc, status_stdout, status_stderr = self.service_control()

File "/tmp/ansible-tmp-1399582444.03-10583586416099/service", line 750, in service_control

rc_state, stdout, stderr = self.execute_command("%s %s %s" % (self.action, self.name, arguments), daemonize=True)

File "/tmp/ansible-tmp-1399582444.03-10583586416099/service", line 250, in execute_command

Adam Morris

unread,
May 8, 2014, 5:09:56 PM5/8/14
to ansible...@googlegroups.com


On Thursday, May 8, 2014 1:56:35 PM UTC-7, Anand Buddhdev wrote:
On Thursday, 8 May 2014 22:50:26 UTC+2, Adam Morris wrote:

Actually, this looks like a bug that occurred with script and raw in some earlier versions of Ansible where the sudo success string was being leaked back.  It was fixed, I'm now wondering if it was reverted in this case...  

Can you upgrade to 1.6.1 and try it again?

Hi Adam, it still fails:

$ ansible --version
ansible 1.6.1
$ ansible bastion3.hadoop.ripe.net -sK -m service -a 'name=hdfs-sync state=restarted'
sudo password:
bastion3.hadoop.ripe.net | FAILED => failed to parse:
SUDO-SUCCESS-dawyrikofupcjoyftolafyddywphopwj

There is an open bug report https://github.com/ansible/ansible/issues/7319 which sounds very similar...

Do you need to provide a password to use sudo on the remote host?  Do you need to use sudo?  I'm curious because  that first line SUDO-SUCCESS ... should be being eaten by part of ansible...  

Adam

Anand Buddhdev

unread,
May 8, 2014, 5:28:36 PM5/8/14
to ansible...@googlegroups.com
On Thursday, 8 May 2014 23:09:56 UTC+2, Adam Morris wrote:

Hi Adam,

There is an open bug report https://github.com/ansible/ansible/issues/7319 which sounds very similar...

Yes, this bug report sounds a lot like what I'm experiencing.
 
Do you need to provide a password to use sudo on the remote host?  Do you need to use sudo?  I'm curious because  that first line SUDO-SUCCESS ... should be being eaten by part of ansible...  

On the remote host, I need to use sudo to run commands as root, and I need to provide a password. 

Dick Davies

unread,
May 9, 2014, 11:57:02 AM5/9/14
to ansible list
This isn't an upstart script is it?

I saw something very very similar if I tried to set enabled=no on an
upstart-managed
service on CentOS a few weeks back. Removing that clause made it work.
> --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ansible-proje...@googlegroups.com.
> To post to this group, send email to ansible...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/cddf536f-b849-4ed2-836d-e2664962ce6a%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Michael DeHaan

unread,
May 9, 2014, 7:44:36 PM5/9/14
to ansible...@googlegroups.com
This is a traceback in the service module, certaintly.

Please be sure there's a ticket open and if you don't have the same issue, file a new ticket, and we'll look into this promptly.

We take the position that a traceback is always a bug in nearly all cases, modules should return reasonable errors if they ever have to fail, etc.




Anand Buddhdev

unread,
May 10, 2014, 5:21:26 AM5/10/14
to ansible...@googlegroups.com, di...@hellooperator.net
On Friday, 9 May 2014 17:57:02 UTC+2, Dick Davies wrote:

Hi Dick,

This isn't an upstart script is it?

I saw something very very similar if I tried to set enabled=no on an
upstart-managed
service on CentOS a few weeks back. Removing that clause made it work.

Yes, as my original message said, this is an upstart script.

Anand 

Anand Buddhdev

unread,
May 10, 2014, 5:22:51 AM5/10/14
to ansible...@googlegroups.com
On Saturday, 10 May 2014 01:44:36 UTC+2, Michael DeHaan wrote:

Hi Michael,

This is a traceback in the service module, certaintly.

Please be sure there's a ticket open and if you don't have the same issue, file a new ticket, and we'll look into this promptly.

I believe my issue is very similar, and possibly the same, as issue #7319.
Reply all
Reply to author
Forward
0 new messages