/dev/sda1 is quickly getting filled up in controller node with "johnsond-osp" profile

Srikanth Vavilapalli

unread,

Sep 5, 2015, 5:03:21 PM9/5/15

to David M. Johnson, cloudlab-users

Hi

This is the second time I have noticed this behavior with my openstack cluster created with "johnsond-osp" profile at clemson site.

Initially the disk allocation looks as shown below after all Open stack comes up on the controller node:

svavilap@ctl:~$ df -h

Filesystem Size Used Avail Use% Mounted on

udev 126G 0 126G 0% /dev

tmpfs 26G 42M 26G 1% /run

/dev/sda1 16G 2.9G 13G 20% /

tmpfs 126G 0 126G 0% /dev/shm

tmpfs 5.0M 0 5.0M 0% /run/lock

tmpfs 126G 0 126G 0% /sys/fs/cgroup

ops.clemson.cloudlab.us:/proj/xos-PG0 100G 851M 100G 1% /proj/xos-PG0

ops.clemson.cloudlab.us:/share 97G 0 90G 0% /share

tmpfs 26G 0 26G 0% /run/user/0

tmpfs 26G 0 26G 0% /run/user/20001

However after few days of usage, the /dev/sda is completely filled up and any operation on this node throws up an error: "write failed: /tmp/sortko4fc2: No space left on device". I have not downloaded any large files onto this machine other launching XOS docker container in addition to open stack services.

When I created a cluster using the same profile on "Clouldlab Utah", the disk allocation on the controller node is much bigger as shown below (The /dev/sda1 is getting assigned with 110G), However I could not use this Utah site because of not able to consistently bring up my experiment using this "johnsond-osp" profile there (I have seen many times my experiment booting process fails)

svavilap@ctl:~$ df -h

Filesystem Size Used Avail Use% Mounted on

udev 32G 0 32G 0% /dev

tmpfs 6.3G 89M 6.3G 2% /run

/dev/sda1 110G 2.7G 102G 3% /

tmpfs 32G 0 32G 0% /dev/shm

tmpfs 5.0M 0 5.0M 0% /run/lock

tmpfs 32G 0 32G 0% /sys/fs/cgroup

ops.utah.cloudlab.us:/share 97G 1.3G 88G 2% /share

ops.utah.cloudlab.us:/proj/xos-PG0 100G 624K 100G 1% /proj/xos-PG0

tmpfs 6.3G 0 6.3G 0% /run/user/0

tmpfs 6.3G 0 6.3G 0% /run/user/20001

Until this problem is root caused, would it be possible to increase the primary disk space on the controller and compute nodes at "Cloudlab Clemson" while using this "johnsond-osp" profile?

Thanks

Srikanth

On Fri, Aug 7, 2015 at 11:37 AM, David M. Johnson <john...@flux.utah.edu> wrote:

On 08/06/15 23:26, Srikanth Vavilapalli wrote:
> Hi
>
> I have a created a openstack cluster with "johnsond-osp" profile and I
> see below ERROR logs in "ceilometer" service (Atleast I have seen in
> Ceilometer Notification agent). I am not sure if it is because of this
> ERROR, I am not getting Neutron related delta events (network.create,

The profile didn't install/configure the neutron l3 metering agent (it's
not in the default ubuntu setup docs), so that stuff wasn't there. Also
neutron wasn't configured to send notifications; it is now. That's the
neutron side.

The errors you see below are a ceilometer bug. It's trying to open an
event pipeline config file, and it can't handle not finding one. They
now distribute an event_pipeline.yaml file, but it's evidently not in
the Ubuntu Kilo packages, despite hitting ceilometer master back in
February. So the profile handles this too, drops in their current
version if it's not there.

Based on all these changes, I think it will work for you now. I also
adding your store_events and disable_non_metric_meters to our ceilometer
configuration; they are good defaults for now.

Thanks! Try instantiating a new experiment based on johnsond-osp now.

> network.update...etc) appearing in the "ceilometer meter-list" or
> "ceilometer event-list". In ceilometer config file, I have set
> "store_events=True" and "disable_non_metric_meters=False". Can any help
> me if this is a known bug or some configuration issue in my setup.
> Appreciate your help..
>
> Thanks
> Srikanth

David

> 2015-08-07 01:02:13.241 39924 DEBUG ceilometer.pipeline [-] Pipeline
> config file: None _setup_pipeline_manager
> /usr/lib/python2.7/dist-packages/ceilometer/pipeline.py:673
> 2015-08-07 01:02:13.243 39924 ERROR
> ceilometer.openstack.common.threadgroup [-] coercing to Unicode: need
> string or buffer, NoneType found
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup Traceback (most recent call last):
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup File
> "/usr/lib/python2.7/dist-packages/ceilometer/openstack/common/threadgroup.py",
> line 145, in wait
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup x.wait()
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup File
> "/usr/lib/python2.7/dist-packages/ceilometer/openstack/common/threadgroup.py",
> line 47, in wait
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup return self.thread.wait()
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup File
> "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 175, in
> wait
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup return self._exit_event.wait()
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup File
> "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 121, in wait
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup return hubs.get_hub().switch()
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup File
> "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 294, in switch
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup return self.greenlet.switch()
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup File
> "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in
> main
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup result = function(*args,
> **kwargs)
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup File
> "/usr/lib/python2.7/dist-packages/ceilometer/openstack/common/service.py",
> line 491, in run_service
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup service.start()
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup File
> "/usr/lib/python2.7/dist-packages/ceilometer/notification.py", line 107,
> in start
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup self.event_pipeline_manager
> = pipeline.setup_event_pipeline()
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup File
> "/usr/lib/python2.7/dist-packages/ceilometer/pipeline.py", line 691, in
> setup_event_pipeline
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup return
> _setup_pipeline_manager(cfg_file, transformer_manager, EVENT_TYPE)
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup File
> "/usr/lib/python2.7/dist-packages/ceilometer/pipeline.py", line 675, in
> _setup_pipeline_manager
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup with open(cfg_file) as fap:
> 2015-08-07 01:02:13.243 39924 TRACE
> ceilometer.openstack.common.threadgroup TypeError: coercing to Unicode:
> need string or buffer, NoneType found

David M. Johnson

unread,

Sep 8, 2015, 1:49:55 PM9/8/15

to Srikanth Vavilapalli, cloudlab-users

On 09/05/15 15:03, Srikanth Vavilapalli wrote:
> Hi
>
> This is the second time I have noticed this behavior with my openstack
> cluster created with "johnsond-osp" profile at clemson site.
>
> Initially the disk allocation looks as shown below after all Open stack
> comes up on the controller node:
>
> svavilap@ctl:~$ df -h
> Filesystem Size Used Avail Use% Mounted on
> udev 126G 0 126G 0% /dev
> tmpfs 26G 42M 26G 1% /run
> /dev/sda1 16G 2.9G 13G 20% /
> tmpfs 126G 0 126G 0% /dev/shm
> tmpfs 5.0M 0 5.0M 0% /run/lock
> tmpfs 126G 0 126G 0% /sys/fs/cgroup
> ops.clemson.cloudlab.us:/proj/xos-PG0 100G 851M 100G 1% /proj/xos-PG0
> ops.clemson.cloudlab.us:/share 97G 0 90G 0% /share
> tmpfs 26G 0 26G 0% /run/user/0
> tmpfs 26G 0 26G 0% /run/user/20001

Did you wait to check space usage until you've gotten the second email
from the experiment about being finished setting up? I would have
thought usage after setup would be more like 4.5GB, maybe over 5GB. The
base image for the controller node probably uncompresses to at least
2-3GB when it's loaded on disk (the other images are slightly different
due to installed packages). Then we download cloud images for the guest
VMs to run (and a few other things -- consumes maybe half a GB or so)
into /root/setup, and then import them into glance. Then there's setup
stuff in /root/setup (where the setup scripts store their state and tmp
files), and openstack runtime stuff in /var/lib/glance .

Anyway, you should get up to 5.5GB pretty fast after instantiation, I
think. The setup scripts configure many of the openstack services with
verbose or debug mode logging, so if you're pounding on the services,
those log directories in /var/log are going to be filling up fast. One
thing I will do for you is make the verbose and debug logging modes
optional; this may help. But you'll have to start using the "OpenStack"
profile instead of "johnsond-osp" -- it's the same thing, but more
featureful, tested, and complete. "johnsond-osp" was really just a
temporary stand-in, and it won't work or will disappear sometime later
on. I'll let you know when I have this done. You could also get rid of
the large image files in /root/setup ; those aren't necessary anymore
after they've been imported into glance.

I can't really comment on your docker container's size or any of the XOS
software's size... you'll have to see what's eating up disk using the
'du' command yourself.

> However after few days of usage, the /dev/sda is completely filled up

> and any operation on this node throws up an error: "*write failed:
> /tmp/sortko4fc2: No space left on device*". I have not downloaded any

> large files onto this machine other launching XOS docker container in
> addition to open stack services.

> When I created a cluster using the same profile on "Clouldlab Utah", the
> disk allocation on the controller node is much bigger as shown below
> (The /dev/sda1 is getting assigned with 110G), However I could not use
> this Utah site because of not able to consistently bring up my
> experiment using this "johnsond-osp" profile there (I have seen many
> times my experiment booting process fails)
>
> svavilap@ctl:~$ df -h
> Filesystem Size Used Avail Use% Mounted on
> udev 32G 0 32G 0% /dev
> tmpfs 6.3G 89M 6.3G 2% /run
> /dev/sda1 110G 2.7G 102G 3% /
> tmpfs 32G 0 32G 0% /dev/shm
> tmpfs 5.0M 0 5.0M 0% /run/lock
> tmpfs 32G 0 32G 0% /sys/fs/cgroup
> ops.utah.cloudlab.us:/share 97G 1.3G 88G 2% /share
> ops.utah.cloudlab.us:/proj/xos-PG0 100G 624K 100G 1% /proj/xos-PG0
> tmpfs 6.3G 0 6.3G 0% /run/user/0
> tmpfs 6.3G 0 6.3G 0% /run/user/20001
>
>
> Until this problem is root caused, would it be possible to increase the
> primary disk space on the controller and compute nodes at "Cloudlab
> Clemson" while using this "johnsond-osp" profile?

Sorry, it's non-trivial to expand the root partitions on the existing
disk images that the profile uses, for a lot of reasons. The profile's
disk images are based on our standard disk image, which provides a 16GB
root partition, and allows you, the experimenter, to use the remainder
of the disk as you see fit. We *always* have users who want physical
disk space that is not already partitioned (especially not in the root),
to use for LVM or any kind of storage experiment. Also, these standard
images have to work on a wide variety of machines (small to large disks,
sometimes only a single physical disk), and 16GB basically represents a
(currently generous) guess of how much space a base Linux OS
installation, plus a kernel compilation, would consume in the root
partition. So, our standard partition layout is carefully designed to
satisfy lots of customers :).

My first advice to you is that you try to put all the stuff you install
on the node in an LVM logical volume. The openstack profile sets up an
LVM physical volume on the controller node's disk using unpartitioned
space... for instance, swift uses this space. You could certainly add a
logical volume for your XOS stuff.

If you need extra space in the root partition, and it can't be solved by
decreasing openstack service logging, or moving your own stuff into an
LVM, you'll have to get really creative or make your own custom images.

> Srikanth

David

Leigh Stoller

unread,

Sep 9, 2015, 6:42:52 AM9/9/15

to David M. Johnson, Srikanth Vavilapalli, cloudlab-users

> If you need extra space in the root partition, and it can't be solved by
> decreasing openstack service logging, or moving your own stuff into an
> LVM, you'll have to get really creative or make your own custom images.

Can’t the openstack profile use an ephemeral blockstore for temp space?
See https://www.cloudlab.us/dev/stoller/show-profile.php?uuid=bf7b9a9d-0e18-11e5-96c6-38eaa71273fa
for an example of how to do this n a geni-lib script.

Leigh

David M. Johnson

unread,

Sep 9, 2015, 2:23:10 PM9/9/15

to Leigh Stoller, Srikanth Vavilapalli, cloudlab-users

The profile already basically makes its own local blockstore by creating
its own LVM PVs and LVs locally, on the nodes that need extra storage,
so the LVM is there enough for this use case, I think -- Srikanth could
just add another LV and mount it wherever. I'm not sure that the actual
blockstore API buys us anything. And if the goal is to get more space
in the root partition, that's harder to do with mounting volumes into
the root anyway...

> Leigh

David

Srikanth Vavilapalli

unread,

Sep 11, 2015, 1:09:05 PM9/11/15

to David M. Johnson, Leigh Stoller, cloudlab-users

Hi David

Thanks for your response. Yes, you are right. It is close to 5.5 GB by the end of complete installation. I have been monitoring the disk space and I see ~2 GB increase in the size of one file: /var/lib/mysql/keystone/token.ibd over a period of two days. May be this could be the one cause.

I am not familiar with these creation/mounting of LVM PVs and LVs. I need to read up to find what you have been suggesting here.

Meanwhile plz let me know once the "Openstack" profile is ready.

Thanks

Srikanth

David M. Johnson

unread,

Sep 16, 2015, 3:38:56 PM9/16/15

to Srikanth Vavilapalli, Leigh Stoller, cloudlab-users

Ok, I have disabled verbose and debug logging modes by default. In the
"Advanced Parameters" section, there are now two new toggle buttons to
enable them again. Thanks for waiting! Again, I did this in the
"OpenStack" profile, so if you haven't switched to that one, please do so.

David

Reply all

Reply to author

Forward