ALERT: Ubuntu upgrade causes problems with older 8.04 Hardy AMIs

111 views
Skip to first unread message

Eric Hammond

unread,
Oct 8, 2008, 3:18:53 PM10/8/08
to ec2ubuntu

This is an important alert for all users of the Ubuntu 8.04 Hardy AMIs
listed on http://alestic.com and http://ec2hardy.notlong.com


OVERVIEW

If you are running an Ubuntu 8.04 Hardy AMI built before 2008-09-22 or
a re-bundled AMI based on one of these, you are likely to encounter
serious problems when you upgrade the standard Ubuntu packages. The
problems can take the form of processes hanging or spinning,
potentially causing your application to become unusable.

The recommended path is to upgrade to the latest AMI listed on
http://alestic.com but there are a couple other options listed below
if you need a short term fix on a running instance or a privately
bundled AMI.


PROBLEM

(The following is based on observed behavior. The exact mechanics are
not yet understood but help in this regard is welcomed.)

If libc6-i686 is installed and /lib/tls exists then there is some
negative interaction between the Ubuntu system and (presumably) the
EC2 Xen kernel which can cause some functions to spin CPU and not
return.

The Ubuntu 8.04 Hardy AMIs from http://alestic.com built prior to
2008-09-22 have libc6-i686 installed but have had /lib/tls removed to
prevent the negative effects.

Ubuntu recently released an upgrade to libc6 available on the standard
repositories. When an Ubuntu instance does an upgrade after this date
(e.g., "apt-get upgrade") the libc6 package triggers the re-creation
of /lib/tls which brings the problem back.


SOLUTION

There are three known solutions to this problem. Only *one* of them
needs to be applied.

(1) Upgrade to the latest Ubuntu AMI listed on http://alestic.com The
latest AMIs install libc6-xen, remove libc6-i686, and leave /lib/tls
in place. Based on tests and user reports, this seems to relieve most
if not all of the problems which can be tracked down to this defect.

Upgrading the AMI is the recommended path.

If you have a running instance which needs to be fixed or have bundled
your own AMIs and can't afford to start from scratch yet, then either
of the following two should improve the situation.

(2) OR, Install libc6-xen and remove libc6-i686 using commands like:

apt-get install -y libc6-xen
apt-get remove -y libc6-i686

This can be done before or after upgrading Ubuntu packages. If you do
this after libc6 has already been upgraded, you may need to restart
processes or reboot.

This is the recommended method for running instances.

(3) OR, Remove /lib/tls using a command like:

rm -rf /lib/tls

This must be done *after* upgrading Ubuntu packages and whenever libc6
is upgraded in the future. Note that this leaves a window open
between the upgrade of libc6 and the removal of /lib/tls where started
processes may be defective.

You may need to restart processes or reboot after performing this
step.

This method returns your instance to the closest state it was in prior
to the upgrade (in case you have any concerns about installing libc6-
xen). However, it leaves the instance vulnerable to the same
problem the next time libc6 is upgraded.


EXAMPLES

The following steps were taken to demonstrate that the listed
solutions fix bad behavior. A simple Perl command is used to detect
the failure case.

These tests were performed on the following (now outdated) AMI on an
m1.small instance:

ami-c0fa1ea9 alestic/ubuntu-8.04-hardy-base-20080905.manifest.xml

# Show that the problem does not exist in the original AMI

# dpkg -s libc6-xen
Package `libc6-xen' is not installed and no info is available.
[...]

# dpkg -s libc6-i686
Package: libc6-i686
Status: install ok installed
[...]

# ls -d /lib/tls
ls: cannot access /lib/tls: No such file or directory

# perl -e 'glob("xxx*")'
[Returns immediately which is good]

# Upgrading Ubuntu packages adds /lib/tls back in and causes the
problem

# apt-get update && apt-get upgrade -y
[...]
Processing triggers for libc6 ...
ldconfig deferred processing now taking place

# ls -d /lib/tls
/lib/tls

# perl -e 'glob("xxx*")'
[Spins forever using up available CPU which is bad]
^C

# FIX (1) - Upgrade to the latest Ubuntu AMI listed on

http://alestic.com

# perl -e 'glob("xxx*")'
[Returns immediately which is good]

# OR, FIX (2) - Install libc6-xen and remove libc6-i686

# apt-get install -y libc6-xen
# apt-get remove -y libc6-i686

# perl -e 'glob("xxx*")'
[Returns immediately which is good]

# OR, FIX (3) - Remove /lib/tls and restart programs

# rm -rf /lib/tls

# perl -e 'glob("xxx*")'
[Returns immediately which is good]


EXCLUSIONS

This problem does not seem to affect the older AMIs for Ubuntu 7.10
Gutsy or Ubuntu 6.06 Dapper. Upgrading to the latest AMI would
probably still be a good idea.

This problem does not seem to affect the Debian AMIs.

This problem does not seem to be related to the spinning of
mysqld_safe after mysql is stopped on Ubuntu. At this point we are
still looking for a solution to that though there is a workaround:

http://groups.google.com/group/ec2ubuntu/browse_thread/thread/4d9e54ade1f1d35b

This problem does not seem to be related to the hanging of EBS volumes
in certain circumstances which include Debian+EBS+XFS.

http://developer.amazonwebservices.com/connect/thread.jspa?messageID=99070&#99070


FEEDBACK

This problem is sufficiently severe that I am sending this notice out
before I understand the entire scope of the problem or the mechanisms
behind what is causing the problem.

Feedback and clarification are welcomed as are additional problem
reports. These Ubuntu AMIs are a community effort and I sincerely
appreciate all the folks who pitch in to help.

http://groups.google.com/group/ec2ubuntu


--
Eric Hammond
http://www.anvilon.com

sghe...@hotmail.com

unread,
Oct 8, 2008, 4:11:50 PM10/8/08
to ec2u...@googlegroups.com
Eric,

risking to contribute complete nonsense here, but... it might help you:
Have you caught my posts on Sept 25th, quoting a script called
xen-divert-tls-libc (I think what this script does is roughly equivalent
to the option 'rm -rf /lib/tls' but a little less destructive and easier
on the Debian package-system).

I remember I found the original script through the Ubuntu forums/HowTo
pages. Heres a Wiki Page form xensource.com that seems to document it:
http://wiki.xensource.com/xenwiki/DebianTlsLibcDiversion.

I think this script existed for the sole reason that xen-kernels would
not play nice with libc-tls, the latter being the preferred choice by
Debian/Ubuntu. I know that I'm quoting these things loosely from memory,
and therefore they must have been related to older (pre-8.04) version of
Ubuntu [which makes them inapplicable a priori, since the issue doesn't
seem to exist with these versions?], but they sound sufficiently similar
in nature to ring the bell with me.

Of course, knowing the high standard of your own posts :) I suppose you
are well-versed in this topic and knew the details I'm lacking all
along. In that case, please forget my ignorance, and another thankyou
for providing these splendid AMI's and excellent support,

Regards
Seth
> http://developer.amazonwebservices.com/connect/thread.jspa?messageID=99070𘋾

Eric Hammond

unread,
Oct 8, 2008, 7:22:25 PM10/8/08
to ec2ubuntu
Seth:

Yes, thank you. That article helped me solve the original problem
with /lib/tls on the early Hardy AMIs. I didn't understand the entire
scope at the time (and probably still don't) but it looked like it
took a lot of manual effort to prevent /lib/tls from being re-created
in certain circumstances (which seems to be the trigger now). That
script may have alleviated the current situation, but it was old
enough and complex enough that I wasn't comfortable just running it at
the time without additional guidance or understanding.

Where we seem to have landed for the time being is to let /lib/tls
exist along with libc6-xen. This works in the cases where I and
others have run into problems and should let libc6 upgrade seamlessly
in the future without manual intervention.

If anybody has any indication this is an undesirable combination,
please let us know. The information I've been able to find is minimal
and quickly gets to a level of Linux internals where I'm not an
expert.

--
Eric Hammond
http://www.anvilon.com

On Oct 8, 1:11 pm, "sghee...@hotmail.com" <sghee...@hotmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages