Fix for ironfoundry.me error from Friday, 8/8/2014

10 views

Skip to first unread message

brian.button

unread,

Aug 12, 2014, 6:54:09 PM8/12/14

to ironf...@googlegroups.com

Hi, all,

There was an issue with ironfoundry.me during the day on Friday, 8/8 that prevented applications from being staged.The incident was reported by two users early that morning, and we began to investigate the cause very shortly thereafter.

Summary:

* The issue was easily replicated, with every attempt to push an application failing with an error of "FAILED". While this was true, it was not entirely helpful

* The cause of this symptom was found to be a full hard drive on the cloud controller. This problem was solved by recreating the api node

* The underlying cause of this outage was discovered to be that the NFS share that the cloud controller uses to store data was not successfully mounted during the bosh install process. This resulted in the cloud controller storing buildpacks and droplets locally, which is what led to the full drive partition.

* Additional investigation was performed on our QA instance, where we discovered that there were a small number of issues in our deployment manifest. Fixing those issues allowed for the NFS share to be mounted what the api node was recreated, which gives the cloud controller over a terabyte of of storage for buildpacks and droplets.

* We updated our deployment manifest creation process so that the manifest errors would not recur.

* We will recreate the api node on beta.ironfoundry.me very soon, which should prevent this outage from occurring again. This change will be transparent to our users and will be done shortly.

Thank you to those who reported the issue. It should be completely resolved now. If you still see issues with pushing applications to ironfoundry.me, please contact us either through the google group or on twitter.