How to replicate the experiment

789 views
Skip to first unread message

Albert Vonpupp

unread,
Sep 12, 2013, 1:46:56 PM9/12/13
to opensta...@googlegroups.com
Hello,

I would like to test OpenStack-neat but I'm not sure how nor where to start.

Is there any guide (or script) on the setup procedure?

Thanks a lot.

Anton Beloglazov

unread,
Sep 12, 2013, 10:00:28 PM9/12/13
to opensta...@googlegroups.com
Hi Albert, 

Thanks for your interest in the project! There is no clear documentation or user guide yet (unfortunately, I haven't had time to write it). But you can find a substantial description of the framework in my thesis: http://beloglazov.info/thesis.pdf (Chapter 6). Section 6.3.10 lists basic steps required to deploy the system. The following repository contains scripts that I used for running experiments and processing results: https://github.com/beloglazov/spe-2013-experiments Basically, I did the following to run an experiment:

1. ./vms-boot-28-slow.sh # to boot VMs
2. ./disable-distributor.sh # temporarily disable workload distribution
2. ./workload-distributor.py # then check to log file (workload-distributor.log) to see whether all VMs are ready and send requests for the workload
3. ./enable-distributor.sh # enable the workload distributor
4. ./workload-distributor.py # distribute the workload to VMs (send a unique workload trace file to each VM)
5. Once the experiment is completed (24 hours), scp the log files from the compute and controller nodes and use scripts from the results directory to compute the result statistics

Sorry for not having a more detailed description, I hope I will be able to write one at some point.

Best regards,
Anton

Albert Vonpupp

unread,
Sep 15, 2013, 8:33:39 PM9/15/13
to opensta...@googlegroups.com
Great Anton,

Many thanks! I'm newbie to OpenStack as well =/

So far it seems to be running with DevStack on Fedora 17. I can still replicate your experiments under those conditions as you explained, or do I need "production" environment like installations?

I will be researching on energy efficiency too during my MS-CS. I've read some of your work. Very impressive, congratulations Anton!

Thanks again for your help.

Regards,
Albert.

Anton Beloglazov

unread,
Sep 15, 2013, 8:37:34 PM9/15/13
to opensta...@googlegroups.com
Hi Albert,

Thanks! Having a "production" environment for replicating the experiments is not required. However, I haven't tested OpenStack Neat under Fedora. In general, it should work, but there may be minor deployment issues.

Good luck with you research!

Best regards,
Anton


--
You received this message because you are subscribed to the Google Groups "OpenStack Neat" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openstack-nea...@googlegroups.com.
Visit this group at http://groups.google.com/group/openstack-neat.
For more options, visit https://groups.google.com/groups/opt_out.

Albert Vonpupp

unread,
Sep 15, 2013, 9:06:37 PM9/15/13
to opensta...@googlegroups.com
Many thanks Anton,

Good luck to you too!

Regards.
Albert.

Albert Vonpupp

unread,
Sep 27, 2013, 9:33:27 AM9/27/13
to opensta...@googlegroups.com
Hello Anton,

Sorry to annoy you again. I switched back to this thread since I believe that my questions are more related to it.

I started Neat last night and it has been running for about 9h. OpenStack has been running without any VM, so the three nodes are idle. Herewith the logs:

[root@marte ~]# cat /var/log/neat/db-cleaner-service.log
(empty)

[root@marte ~]# cat /var/log/neat/db-cleaner.log
2013-09-26 22:06:42,096 INFO     neat.globals.db_cleaner Starting the database cleaner, iterations every 7200 seconds
2013-09-26 22:06:44,546 DEBUG    neat.db Instantiated a Database object
2013-09-26 22:06:44,546 DEBUG    neat.db_utils Initialized a DB connection to mysql://root:badpa...@marte.eclipse.ime.usp.br/neat
2013-09-26 22:06:44,547 INFO     neat.globals.db_cleaner Cleaned up data older than 2013-09-26 20:06:44
2013-09-27 00:06:44,655 INFO     neat.globals.db_cleaner Cleaned up data older than 2013-09-26 22:06:44
2013-09-27 02:06:44,713 INFO     neat.globals.db_cleaner Cleaned up data older than 2013-09-27 00:06:44
2013-09-27 04:06:44,748 INFO     neat.globals.db_cleaner Cleaned up data older than 2013-09-27 02:06:44
2013-09-27 06:06:44,844 INFO     neat.globals.db_cleaner Cleaned up data older than 2013-09-27 04:06:44
2013-09-27 08:06:44,947 INFO     neat.globals.db_cleaner Cleaned up data older than 2013-09-27 06:06:44

[root@marte ~]# cat /var/log/neat/global-manager-service.log
Bottle v0.11.6 server starting up (using WSGIRefServer())...
Listening on http://marte.eclipse.ime.usp.br:60080/
Hit Ctrl-C to quit.

[root@marte ~]# cat /var/log/neat/global-manager.log
2013-09-26 22:06:44,726 DEBUG    neat.db Instantiated a Database object
2013-09-26 22:06:44,726 DEBUG    neat.db_utils Initialized a DB connection to mysql://root:badpa...@marte.eclipse.ime.usp.br/neat
2013-09-26 22:06:45,108 DEBUG    neat.globals.manager Calling: ether-wake -i em1 00:1c:c0:c3:f3:1f
2013-09-26 22:06:45,141 DEBUG    neat.globals.manager Calling: ether-wake -i em1 00:27:0e:23:06:e9
2013-09-26 22:06:45,157 DEBUG    neat.globals.manager Calling: ether-wake -i em1 70:71:bc:08:55:eb
2013-09-26 22:06:45,167 INFO     neat.globals.manager Switched on hosts: ['jupiter', 'saturno', 'venus']
2013-09-26 22:06:45,329 INFO     neat.globals.manager Starting the global manager listening to marte.eclipse.ime.usp.br:60080

I'm working remote on the lab today, so I cannot physically check if the compute nodes were suspended but it seems that they didn't (they ssh very quickly), here is the log of the

[root@jupiter ~]# cat /var/log/neat/local-manager.log
2013-09-27 00:02:01,924 INFO     neat.locals.manager Started an iteration
2013-09-27 00:02:01,925 INFO     neat.locals.manager The host is idle
2013-09-27 00:02:01,925 INFO     neat.locals.manager Skipped an iteration
2013-09-27 00:07:02,020 INFO     neat.locals.manager Started an iteration
...
2013-09-27 09:07:09,477 INFO     neat.locals.manager Skipped an iteration
2013-09-27 09:12:09,547 INFO     neat.locals.manager Started an iteration
2013-09-27 09:12:09,547 INFO     neat.locals.manager The host is idle
2013-09-27 09:12:09,547 INFO     neat.locals.manager Skipped an iteration

Question 1: Shouldn't the global-manager.log show pm-suspend events, something like "neat.globals.manager Calling: pm-suspend"?

I also tried to reproduce the experiments with the following procedure:https://github.com/mscs-usp/2013-mac5910-neat-experiments/blob/master/400-start-experiments.sh

On the third step (cd /opt/stack/spe-2013-experiments && python workload-distributor.py full-utilization-02) I'm using two traces with 99% load as you did, when executed I get this:
Bottle v0.11.6 server starting up (using WSGIRefServer())...
Listening on http://marte:8081/
Hit Ctrl-C to quit.

Questions 2: But I'm wondering how the system is going to send that load to the VMs? My key is called "test" as yours, but how the should the ssh key be called? Does it matter?

I noticed that on the workload-distributor.py there are hardcoded paths and files like cpu-load-generator.py (https://github.com/beloglazov/spe-2013-experiments/blob/master/workload-distributor.py#L40-L41). I changed the paths accordingly on my repos (https://github.com/mscs-usp/spe-2013-experiments/blob/master/workload-distributor.py#L40-L41)

Also the lookbusy should be installed on the VMs I guess.

Questions 3: The installation of lookbusy on the VMs is done automatically by your scripts or should I do it manually? How the controller will send the load to the VMs via Bottle's API or via ssh?

I merged your comments: "The workload-starter.py script should be deployed, configured with the server IP, and automatically started in the VM image on boot. This way you can create a single image, and then create multiple instances which will automatically request the server for the workload. Then, with the workload-distributor.py script you can distribute workload traces to all the VMs at the same time." on the 3.1 step of this script (https://github.com/mscs-usp/2013-mac5910-neat-experiments/blob/master/400-start-experiments.sh#L16). I'm using the controller address (I guess that the lookbusy server you are reffering to is also the controller), and 7 minutes of time (200 traces), but when I try to start it I get the following error: requests.exceptions.MissingSchema: Invalid URL u'143.107.45.200': No schema supplied

In general I could start the workload-distributor on step 3 (https://github.com/mscs-usp/2013-mac5910-neat-experiments/blob/master/400-start-experiments.sh#L12), but I cannot see whether all VMs are ready and send requests for the workload as you said, I think it might be an issue related to Questions 2 and 3.

I don't understand why I must enable the distributor twice as you said on the procedure, is this correct?

I hope my message is not too long and confusing.

Thanks a lot Anton!

Best regards,
Albert.

Anton Beloglazov

unread,
Sep 30, 2013, 4:05:21 AM9/30/13
to opensta...@googlegroups.com
Hi Albert,

1. I think you are right, and this problem was caused by the exceptions that you pointed out in the other thread.

2-3. Sorry for hard-coding the paths, I just didn't have time to do it properly. Here is a rough procedure of how to set up workload generation: 

1. The full set of workload traces should be uploaded to the VM image beforehand, and also made available for the workload distributor on the controller. This is basically done by cloning https://github.com/beloglazov/spe-2013-experiments on the VM image and the controller.
2. The https://github.com/beloglazov/cpu-load-generator repository needs to be cloned on the VM image - this is the script that performs the actual CPU load generation. This script uses the lookbusy tool, which can be install using the install-lookbusy.sh script.
3. workload-distributor.conf is the configuration file of the workload distributor. The workload distribution is enabled if the file contains 1, and disabled if the file contains 0.
4. The workload-starter.py script should be set to start up automatically when a VM boots, and it should be passed with the IP address of the controller running the workload distributor.
5. Initially you need to disable the workload distribution by putting 0 in the workload-distributor.conf file on the controller. This is necessary to wait for all the VMs to boot up and then start distributing the workload to all of them at the same time.
6. While the VMs are booting, you can start the workload-distributor.py on the controller to see which VMs are ready and sending requests for workload. When workload-distributor.conf contains 0, the workload distributor will receive and log all the received requests but won't actually send the command for starting the workload generation. It will log the requests to the workload-distributor.log file. You can open that file or just do something like grep ... | sort -u to get the list of VMs that are already sending requests.
7. When the workload-distributor.log file shows that all the VMs are ready and are sending requests, the workload distribution can be enabled. This is done by writing 1 to the workload-distributor.conf file and starting the workload-distributor.py script. The script will send a unique command to each VM of the following form

python2 /home/ubuntu/cpu-load-generator/cpu-load-generator.py -n 1 300 /home/ubuntu/spe-2012-experiments/full-utilization-02/01

Where /home/ubuntu/spe-2012-experiments/full-utilization-02/01 is the path to the workload trace file, which should be available on the VM (make sure that the path exists). If everything is done correctly and all the paths exist, the workload generation should be started on all the VMs at this point. Once the workload is being generated, the Neat services can be started.

I'm sorry, all of this is a bit messy and not easy to set up, but at that time I just needed to have it running and didn't have time to make it easy to use.

Please let me know if you have any other questions.

Best regards,
Anton
Reply all
Reply to author
Forward
0 new messages