Virtualbox / Ansible Non-zero exit status 2300218

845 views
Skip to first unread message

Chris Stevens

unread,
May 11, 2018, 9:34:23 AM5/11/18
to Packer
I have an existing packer build that has been reliable for years, but just this week has become flaky with intermittent failures as I'll describe below.

I don't have enough clarity into the issue right now, but wanted to describe it here in case others have a similar sounding issue. I'll add to this post as I have time to debug the issue further.

The changes made since we last recall a stable build is updating to the latest versions for the build environment:
Packer (1.2.3)
VirtualBox (5.2.12)
Ansible (2.5.2)

We also moved up to the latest build target ( CentOS 7.5 (1804) from CentOS 7.4 (1708) ).

I see the same flakiness if I revert back to the 1708 CentOS release.

I see the same flakiness if I revert Packer back to 1.2.1.

I doubled both the memory and CPU allocation that I normally use and still see the same issues.

It is worth noting that we had 1 good build out of maybe 10 or 15 attempts.

The most common failure mode is that the build will run properly for 10 or 20 minutes and then stall on random Ansible tasks within the provisioner for 2 to 5 or more minutes and then eventually fail as shown in the log dump below.

2018/05/10 11:21:51 ui:     virtualbox-iso:
    virtualbox-iso:
2018/05/10 11:21:51 ui:     virtualbox-iso: TASK [mysql : Install tools for Mysql] *****************************************
    virtualbox-iso: TASK [mysql : Install tools for Mysql] *****************************************
2018/05/10 11:31:17 packer: 2018/05/10 11:31:17 [ERROR] Remote command exited without exit status or exit signal.
2018/05/10 11:31:17 packer: 2018/05/10 11:31:17 [INFO] RPC endpoint: Communicator ended with: 2300218
2018/05/10 11:31:17 [INFO] 32044 bytes written for 'stdout'
2018/05/10 11:31:17 [INFO] 885 bytes written for 'stderr'
2018/05/10 11:31:17 [INFO] RPC client: Communicator ended with: 2300218
2018/05/10 11:31:17 [INFO] RPC endpoint: Communicator ended with: 2300218
2018/05/10 11:31:17 packer: 2018/05/10 11:31:17 [INFO] 32044 bytes written for 'stdout'
2018/05/10 11:31:17 packer: 2018/05/10 11:31:17 [INFO] 885 bytes written for 'stderr'
2018/05/10 11:31:17 packer: 2018/05/10 11:31:17 [INFO] RPC client: Communicator ended with: 2300218
2018/05/10 11:31:17 [INFO] (telemetry) ending ansible-local
2018/05/10 11:31:18 packer: 2018/05/10 11:31:18 Executing VBoxManage: []string{"controlvm", "packer-centos-7-x86_64", "poweroff"}
2018/05/10 11:31:18 packer: 2018/05/10 11:31:18 stdout:
2018/05/10 11:31:18 packer: 2018/05/10 11:31:18 stderr: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
2018/05/10 11:31:18 packer: 2018/05/10 11:31:18 Executing VBoxManage: []string{"storageattach", "packer-centos-7-x86_64", "--storagectl", "IDE Controller", "--port", "0", "--device", "1", "--medium", "none"}
2018/05/10 11:31:18 packer: 2018/05/10 11:31:18 stdout:
2018/05/10 11:31:18 packer: 2018/05/10 11:31:18 stderr:
2018/05/10 11:31:18 ui: ==> virtualbox-iso: Deregistering and deleting VM...
==> virtualbox-iso: Deregistering and deleting VM...
2018/05/10 11:31:18 packer: 2018/05/10 11:31:18 Executing VBoxManage: []string{"unregistervm", "packer-centos-7-x86_64", "--delete"}
2018/05/10 11:31:18 packer: 2018/05/10 11:31:18 stdout:
2018/05/10 11:31:18 packer: 2018/05/10 11:31:18 stderr: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
2018/05/10 11:31:18 ui: ==> virtualbox-iso: Deleting output directory...
==> virtualbox-iso: Deleting output directory...
2018/05/10 11:31:18 [INFO] (telemetry) ending virtualbox-iso
2018/05/10 11:31:18 ui error: Build 'virtualbox-iso' errored: Error executing Ansible: Non-zero exit status: 2300218
2018/05/10 11:31:18 Builds completed. Waiting on interrupt barrier...
2018/05/10 11:31:18 machine readable: error-count []string{"1"}
2018/05/10 11:31:18 ui error:
==> Some builds didn't complete successfully and had errors:
2018/05/10 11:31:18 machine readable: virtualbox-iso,error []string{"Error executing Ansible: Non-zero exit status: 2300218"}
2018/05/10 11:31:18 ui error: --> virtualbox-iso: Error executing Ansible: Non-zero exit status: 2300218
2018/05/10 11:31:18 ui:
==> Builds finished but no artifacts were created.
2018/05/10 11:31:18 [INFO] (telemetry) Finalizing.
Build 'virtualbox-iso' errored: Error executing Ansible: Non-zero exit status: 2300218

==> Some builds didn't complete successfully and had errors:
--> virtualbox-iso: Error executing Ansible: Non-zero exit status: 2300218

==> Builds finished but no artifacts were created.
2018/05/10 11:31:19 waiting for all plugin processes to complete...
2018/05/10 11:31:19 /usr/local/bin/packer: plugin process exited
2018/05/10 11:31:19 /usr/local/bin/packer: plugin process exited
2018/05/10 11:31:19 /usr/local/bin/packer: plugin process exited
2018/05/10 11:31:19 /usr/local/bin/packer: plugin process exited
2018/05/10 11:31:19 /usr/local/bin/packer: plugin process exited
2018/05/10 11:31:19 /usr/local/bin/packer: plugin process exited
2018/05/10 11:31:19 /usr/local/bin/packer: plugin process exited
2018/05/10 11:31:19 /usr/local/bin/packer: plugin process exited
2018/05/10 11:31:19 /usr/local/bin/packer: plugin process exited



A less common failure mode occurs much earlier in the build run and looks like an Ansible module failure.

    virtualbox-iso: TASK [common : Update OS packages] *********************************************
    virtualbox-iso: changed: [127.0.0.1]
    virtualbox-iso:
    virtualbox-iso: TASK [common : Install common OS tools / services] *****************************
    virtualbox-iso: An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AttributeError: 'Task' object has no attribute 'async_val'
    virtualbox-iso: fatal: [127.0.0.1]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}
    virtualbox-iso: to retry, use: --limit @/tmp/packer-provisioner-ansible-local/development.retry
    virtualbox-iso:
    virtualbox-iso: PLAY RECAP *********************************************************************
    virtualbox-iso: 127.0.0.1                  : ok=6    changed=4    unreachable=0    failed=1
    virtualbox-iso:
==> virtualbox-iso: Deregistering and deleting VM...
==> virtualbox-iso: Deleting output directory...
Build 'virtualbox-iso' errored: Error executing Ansible: Non-zero exit status: 2

==> Some builds didn't complete successfully and had errors:
--> virtualbox-iso: Error executing Ansible: Non-zero exit status: 2


Another less common failure mode.

==> virtualbox-iso: Provisioning with shell script: scripts/finalize.sh
==> virtualbox-iso: Deregistering and deleting VM...
==> virtualbox-iso: Deleting output directory...
Build 'virtualbox-iso' errored: Retryable error: Error removing temporary script at /tmp/script_6974.sh: Timeout during SSH handshake

==> Some builds didn't complete successfully and had errors:
--> virtualbox-iso: Retryable error: Error removing temporary script at /tmp/script_6974.sh: Timeout during SSH handshake


Rickard von Essen

unread,
May 11, 2018, 11:35:07 AM5/11/18
to packe...@googlegroups.com
This happens because the ssh session is closed. (see https://github.com/hashicorp/packer/blob/68287566b873c317144235a202c5e9da9f88f4f7/packer/communicator.go#L15). Most likely because sshd gets updated/restarted. 

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/mitchellh/packer/issues
IRC: #packer-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Packer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to packer-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/packer-tool/f679de0a-06df-4667-b9b1-4919f70cc73a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris Stevens

unread,
May 11, 2018, 11:54:08 AM5/11/18
to Packer
Thanks. My ansible playbook does a system update, so that could be doing it.

Is there something like "expect_disconnect" that will work with ansible-local?

Chris Stevens

unread,
May 11, 2018, 12:13:59 PM5/11/18
to Packer
Actually now that I think about it, I haven't seen this issue even 1 time when the system upgrade task was being executed.

That makes me think it is something else. Maybe a panic with the VM or something, though I haven't noticed anything like that in the logs yet.

Rickard von Essen

unread,
May 16, 2018, 3:08:32 AM5/16/18
to packe...@googlegroups.com
It solely depends on which packages are available for update. So it depends on an external state.

On 11 May 2018 at 18:13, Chris Stevens <chris....@traxo.com> wrote:
Actually now that I think about it, I haven't seen this issue even 1 time when the system upgrade task was being executed.

That makes me think it is something else. Maybe a panic with the VM or something, though I haven't noticed anything like that in the logs yet.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/mitchellh/packer/issues
IRC: #packer-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Packer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to packer-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/packer-tool/70dd85e7-9087-4fc3-86e5-25836dcd5e0a%40googlegroups.com.

Chris Stevens

unread,
May 22, 2018, 1:32:51 PM5/22/18
to Packer
I've been doing more testing for this issue. Various forms of the issue appear transiently regardless of packer version, virtualbox version, and CentOS release (1708 vs 1804). Most if not all of the issues present as a timeout or disconnect from the running VM.

Very recently, I was able to capture the last output from the VM and the last output observed is a dhcp / NetworkManager failure (shown below).

I think it is very unlikely that this is a packer issue, but will continue to pursue it as this affects our ability to update our developer environment and might help somebody else.

May 22 17:12:23 localhost.localdomain NetworkManager[683]: <info>  [1527009143.5457] dhcp4 (enp0s3): activation: beginning transaction (timeout in 45 seconds)
May 22 17:12:23 localhost.localdomain NetworkManager[683]: <info>  [1527009143.5510] dhcp4 (enp0s3): dhclient started with pid 8751
May 22 17:12:23 localhost.localdomain dhclient[8752]: execve (/usr/libexec/nm-dhcp-helper, ...): No such file or directory
May 22 17:12:23 localhost.localdomain dhclient[8751]: DHCPREQUEST on enp0s3 to 255.255.255.255 port 67 (xid=0x3ecf9459)
May 22 17:12:23 localhost.localdomain dhclient[8751]: DHCPACK from 10.0.2.2 (xid=0x3ecf9459)
May 22 17:12:23 localhost.localdomain dhclient[8753]: execve (/usr/libexec/nm-dhcp-helper, ...): No such file or directory
May 22 17:12:23 localhost.localdomain dhclient[8751]: bound to 10.0.2.15 -- renewal in 39668 seconds.

....

May 22 17:13:08 localhost.localdomain NetworkManager[683]: <warn>  [1527009188.5642] dhcp4 (enp0s3): request timed out
May 22 17:13:08 localhost.localdomain NetworkManager[683]: <info>  [1527009188.5646] dhcp4 (enp0s3): state changed unknown -> timeout
May 22 17:13:08 localhost.localdomain NetworkManager[683]: <info>  [1527009188.5757] dhcp4 (enp0s3): canceled DHCP transaction, DHCP client pid 8751
May 22 17:13:08 localhost.localdomain NetworkManager[683]: <info>  [1527009188.5758] dhcp4 (enp0s3): state changed timeout -> done
May 22 17:13:08 localhost.localdomain NetworkManager[683]: <info>  [1527009188.5763] device (enp0s3): scheduling DHCPv4 restart in 120 seconds, 1 tries left (reason: lease expired)

....

May 22 17:15:53 localhost.localdomain NetworkManager[683]: <warn>  [1527009353.5411] dhcp4 (enp0s3): request timed out
May 22 17:15:53 localhost.localdomain NetworkManager[683]: <info>  [1527009353.5415] dhcp4 (enp0s3): state changed unknown -> timeout
May 22 17:15:53 localhost.localdomain NetworkManager[683]: <info>  [1527009353.5520] dhcp4 (enp0s3): canceled DHCP transaction, DHCP client pid 18343
May 22 17:15:53 localhost.localdomain NetworkManager[683]: <info>  [1527009353.5521] dhcp4 (enp0s3): state changed timeout -> done
May 22 17:15:53 localhost.localdomain NetworkManager[683]: <info>  [1527009353.5525] device (enp0s3): state change: activated -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
May 22 17:15:53 localhost.localdomain NetworkManager[683]: <warn>  [1527009353.5567] device (enp0s3): Activation: failed for connection 'enp0s3'
May 22 17:15:53 localhost.localdomain NetworkManager[683]: <info>  [1527009353.5571] manager: NetworkManager state is now CONNECTED_LOCAL
May 22 17:15:53 localhost.localdomain NetworkManager[683]: <info>  [1527009353.5579] device (enp0s3): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')

Timeout, server 127.0.0.1 not responding.


Reply all
Reply to author
Forward
0 new messages