[Stemcell] Stacked during init process (possibly due NFS server).

63 views
Skip to first unread message

Alexander Lomov

unread,
Mar 10, 2015, 3:52:31 PM3/10/15
to bosh...@cloudfoundry.org
Hi, everyone. 

I work on a stemcell for a new cloud provider. This cloud uses xen-hvm hypervisor. In order to do it, I needed to redefine steps with installing grub (image_create and image_install_grub stages). I took base OS image of ubuntu trusty from S3 (http://s3.amazonaws.com/bosh-os-images/). 

To build image I used following rake tasks:

bundle exec rake stemcell:build_os_image[ubuntu,trusty,$base_os_image_path]
CANDIDATE_BUILD_NUMBER=2826 bundle exec rake stemcell:build_with_local_os_image[cloud-name,hvm,ubuntu,trusty,go,$base_os_image_path]

After that I have stemcell that can be used. The problem is that VM created from this image is stacked during running init scripts. Here is a tail of /var/logs/syslog (you can find details here):

Mar 9 09:26:54 localhost kernel: [ 3.878411] PM: Hibernation image not present or could not be loaded.
Mar 9 09:26:54 localhost kernel: [ 3.897767] Freeing unused kernel memory: 1336K (ffffffff81d20000 - ffffffff81e6e000)
Mar 9 09:26:54 localhost kernel: [ 3.938790] Write protecting the kernel read-only data: 12288k
Mar 9 09:26:54 localhost kernel: [ 3.958610] Freeing unused kernel memory: 796K (ffff880001739000 - ffff880001800000)
Mar 9 09:26:54 localhost kernel: [ 4.001406] Freeing unused kernel memory: 688K (ffff880001b54000 - ffff880001c00000)
Mar 9 09:26:54 localhost kernel: [ 4.130373] FDC 0 is a S82078B
Mar 9 09:26:54 localhost kernel: [ 4.355540] EXT4-fs (xvda1): mounting ext3 file system using the ext4 subsystem
Mar 9 09:26:54 localhost kernel: [ 4.428316] EXT4-fs (xvda1): mounted filesystem with ordered data mode. Opts: (null)
Mar 9 09:26:54 localhost kernel: [ 5.164564] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input4
Mar 9 09:26:54 localhost kernel: [ 5.822889] random: init urandom read with 63 bits of entropy available
Mar 9 09:26:54 localhost kernel: [ 8.901966] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Mar 9 09:26:54 localhost kernel: [ 9.251613] EXT4-fs (xvda1): re-mounted. Opts: (null)
Mar 9 09:26:54 localhost kernel: [ 9.282699] random: nonblocking pool is initialized
Mar 9 09:26:54 localhost kernel: [ 9.510202] FS-Cache: Loaded
Mar 9 09:26:54 localhost kernel: [ 9.566678] RPC: Registered named UNIX socket transport module.
Mar 9 09:26:54 localhost kernel: [ 9.584590] RPC: Registered udp transport module.
Mar 9 09:26:54 localhost kernel: [ 9.602315] RPC: Registered tcp transport module.
Mar 9 09:26:54 localhost kernel: [ 9.626114] RPC: Registered tcp NFSv4.1 backchannel transport module.
Mar 9 09:26:54 localhost kernel: [ 9.653703] FS-Cache: Netfs 'nfs' registered for caching
Mar 9 09:26:54 localhost kernel: [ 9.691566] Installing knfsd (copyright (C) 1996 ok...@monad.swb.de).
Mar 9 09:33:40 localhost rsyslogd: [origin software="rsyslogd" swVersion="7.4.6" x-pid="586" x-info="http://www.rsyslog.com"] exiting on signal 15.
Mar 10 12:32:41 localhost rsyslogd: [origin software="rsyslogd" swVersion="7.4.6" x-pid="586" x-info="http://www.rsyslog.com"] start
Mar 10 12:32:41 localhost rsyslogd-2184: action '*' treated as ':omusrmsg:*' - please change syntax, '*' will not be supported in the future [try http://www.rsyslog.com/e/2184 ]
Mar 10 12:32:41 localhost rsyslogd: rsyslogd's groupid changed to 104
Mar 10 12:32:41 localhost rsyslogd: rsyslogd's userid changed to 101
Mar 10 12:39:09 localhost rsyslogd: [origin software="rsyslogd" swVersion="7.4.6" x-pid="586" x-info="http://www.rsyslog.com"] exiting on signal 15.

I'm not sure that problem is in rsyslogd, because possibly its process is simple killed. My first guess is that problem is in NFS server (Installing knfsd (copyright (C) 1996 ok...@monad.swb.de).

I found some some solutions in internet but they didn't worked for me. Currently I try to debug this issue by running init scripts one by one, but it cat take time.

Does anybody had such problem? Could you tell a possible solution for it? 

Thank you,
Alex L.

Dmitriy Kalinin

unread,
Mar 10, 2015, 5:40:04 PM3/10/15
to bosh...@cloudfoundry.org
Few questions:

- Which kernel version are you using?
- How do you know that VM is stuck: is it not pingable on the network? 
- Is the VM configured with DHCP/manual config?
- Is serial console available?

Even though rsyslog seems to say it's exiting I don't see any actual errors.

Alexander Lomov

unread,
Mar 11, 2015, 2:14:28 AM3/11/15
to bosh...@cloudfoundry.org
Here are some some details:
  • I generated stemcell using following kernel version: 3.13.0-40-generic. Base OS image has 3.13.0-46-generic version. 
  • I use terminal from web console. This terminal responses to keyboard, but stops making any output after "Installing knfsd ..." string.
  • I tried to run with auto DHCP option, but it didn't work
  • yes, I can get access to it from web console
Will check more options for networking and put updates here.

Dmitriy Kalinin

unread,
Mar 11, 2015, 9:41:53 PM3/11/15
to bosh...@cloudfoundry.org
Then I would recommend removing nfs to check if stemcell properly boots up.

Parthiban Annadurai

unread,
Mar 12, 2015, 11:13:33 AM3/12/15
to bosh...@cloudfoundry.org
Hi All,
          Am in need of kind help of you guys to solve the following issue.

When I uploading CF-202 release, it gives,

Started downloading remote release > Downloading remote release
. Failed: No space left on device @ io_write - /var/vcap/data/tmp/director/release-ce1317ec-3f9b-4c0c-b22a-058688507e5f (02:41:27)

Error 100: No space left on device @ io_write - /var/vcap/data/tmp/director/release-ce1317ec-3f9b-4c0c-b22a-058688507e5f


How to increase the Director Space?? Thanks..

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-dev+u...@cloudfoundry.org.

Dmitriy Kalinin

unread,
Mar 12, 2015, 12:54:56 PM3/12/15
to bosh...@cloudfoundry.org
Please create a separate thread for your question.

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-dev+unsubscribe@cloudfoundry.org.

Alexander Lomov

unread,
Mar 12, 2015, 4:19:27 PM3/12/15
to bosh...@cloudfoundry.org
Dmitry, you was completely right. The problem was in wrong network configuration.

Thank you for help.

------------------------
Alex Lomov
Altoros — Cloud Foundry deployment, training and integration
Twitter: @code1n GitHub: @allomov

To unsubscribe from this group and stop receiving emails from it, send an email to bosh-dev+u...@cloudfoundry.org.

Parthiban Annadurai

unread,
Mar 15, 2015, 1:22:53 PM3/15/15
to bosh...@cloudfoundry.org
Hi All,
         With all of your kind help i successfully installed SPIFF sucessfully. But, the problem is am not able to use the generated manifest file cf-deployment.yml, since it is empty in content. So, please anyone tell how to generate the manifest file with the cf-stub.yml file, am followingdocs.cloudfoundry.org for installing CF using BOSH. Thanks in Advance..

Alexander Lomov

unread,
Mar 15, 2015, 4:54:30 PM3/15/15
to bosh...@cloudfoundry.org
Hi, Parthiban

As Dmitriy K. said it would be better to have a separate thread with discussion of your issue. It will be easier to track updates this way. Take in account that there are bosh-user and vcap-dev groups for question of such kind.

Talking about your issue, you probably will be interested in this script

It helps to generate manifests for particular infrastructures. 

Best luck,
Alex L.

------------------------
Alex Lomov
Altoros — Cloud Foundry deployment, training and integration
Twitter: @code1n GitHub: @allomov

Reply all
Reply to author
Forward
0 new messages