This is an important alert for all users of the Ubuntu 8.04 Hardy AMIs
listed on
http://alestic.com and
http://ec2hardy.notlong.com
OVERVIEW
If you are running an Ubuntu 8.04 Hardy AMI built before 2008-09-22 or
a re-bundled AMI based on one of these, you are likely to encounter
serious problems when you upgrade the standard Ubuntu packages. The
problems can take the form of processes hanging or spinning,
potentially causing your application to become unusable.
The recommended path is to upgrade to the latest AMI listed on
http://alestic.com but there are a couple other options listed below
if you need a short term fix on a running instance or a privately
bundled AMI.
PROBLEM
(The following is based on observed behavior. The exact mechanics are
not yet understood but help in this regard is welcomed.)
If libc6-i686 is installed and /lib/tls exists then there is some
negative interaction between the Ubuntu system and (presumably) the
EC2 Xen kernel which can cause some functions to spin CPU and not
return.
The Ubuntu 8.04 Hardy AMIs from
http://alestic.com built prior to
2008-09-22 have libc6-i686 installed but have had /lib/tls removed to
prevent the negative effects.
Ubuntu recently released an upgrade to libc6 available on the standard
repositories. When an Ubuntu instance does an upgrade after this date
(e.g., "apt-get upgrade") the libc6 package triggers the re-creation
of /lib/tls which brings the problem back.
SOLUTION
There are three known solutions to this problem. Only *one* of them
needs to be applied.
(1) Upgrade to the latest Ubuntu AMI listed on
http://alestic.com The
latest AMIs install libc6-xen, remove libc6-i686, and leave /lib/tls
in place. Based on tests and user reports, this seems to relieve most
if not all of the problems which can be tracked down to this defect.
Upgrading the AMI is the recommended path.
If you have a running instance which needs to be fixed or have bundled
your own AMIs and can't afford to start from scratch yet, then either
of the following two should improve the situation.
(2) OR, Install libc6-xen and remove libc6-i686 using commands like:
apt-get install -y libc6-xen
apt-get remove -y libc6-i686
This can be done before or after upgrading Ubuntu packages. If you do
this after libc6 has already been upgraded, you may need to restart
processes or reboot.
This is the recommended method for running instances.
(3) OR, Remove /lib/tls using a command like:
rm -rf /lib/tls
This must be done *after* upgrading Ubuntu packages and whenever libc6
is upgraded in the future. Note that this leaves a window open
between the upgrade of libc6 and the removal of /lib/tls where started
processes may be defective.
You may need to restart processes or reboot after performing this
step.
This method returns your instance to the closest state it was in prior
to the upgrade (in case you have any concerns about installing libc6-
xen). However, it leaves the instance vulnerable to the same
problem the next time libc6 is upgraded.
EXAMPLES
The following steps were taken to demonstrate that the listed
solutions fix bad behavior. A simple Perl command is used to detect
the failure case.
These tests were performed on the following (now outdated) AMI on an
m1.small instance:
ami-c0fa1ea9 alestic/ubuntu-8.04-hardy-base-20080905.manifest.xml
# Show that the problem does not exist in the original AMI
# dpkg -s libc6-xen
Package `libc6-xen' is not installed and no info is available.
[...]
# dpkg -s libc6-i686
Package: libc6-i686
Status: install ok installed
[...]
# ls -d /lib/tls
ls: cannot access /lib/tls: No such file or directory
# perl -e 'glob("xxx*")'
[Returns immediately which is good]
# Upgrading Ubuntu packages adds /lib/tls back in and causes the
problem
# apt-get update && apt-get upgrade -y
[...]
Processing triggers for libc6 ...
ldconfig deferred processing now taking place
# ls -d /lib/tls
/lib/tls
# perl -e 'glob("xxx*")'
[Spins forever using up available CPU which is bad]
^C
# FIX (1) - Upgrade to the latest Ubuntu AMI listed on
http://alestic.com
# perl -e 'glob("xxx*")'
[Returns immediately which is good]
# OR, FIX (2) - Install libc6-xen and remove libc6-i686
# apt-get install -y libc6-xen
# apt-get remove -y libc6-i686
# perl -e 'glob("xxx*")'
[Returns immediately which is good]
# OR, FIX (3) - Remove /lib/tls and restart programs
# rm -rf /lib/tls
# perl -e 'glob("xxx*")'
[Returns immediately which is good]
EXCLUSIONS
This problem does not seem to affect the older AMIs for Ubuntu 7.10
Gutsy or Ubuntu 6.06 Dapper. Upgrading to the latest AMI would
probably still be a good idea.
This problem does not seem to affect the Debian AMIs.
This problem does not seem to be related to the spinning of
mysqld_safe after mysql is stopped on Ubuntu. At this point we are
still looking for a solution to that though there is a workaround:
http://groups.google.com/group/ec2ubuntu/browse_thread/thread/4d9e54ade1f1d35b
This problem does not seem to be related to the hanging of EBS volumes
in certain circumstances which include Debian+EBS+XFS.
http://developer.amazonwebservices.com/connect/thread.jspa?messageID=99070𘋾
FEEDBACK
This problem is sufficiently severe that I am sending this notice out
before I understand the entire scope of the problem or the mechanisms
behind what is causing the problem.
Feedback and clarification are welcomed as are additional problem
reports. These Ubuntu AMIs are a community effort and I sincerely
appreciate all the folks who pitch in to help.
http://groups.google.com/group/ec2ubuntu
--
Eric Hammond
http://www.anvilon.com