Here's my perspective, grounded in solid test practices and
direct observations of how, why and where Virtualization and Performance test
tools collide and continue to collide in organizations
(1) The goals that drive the use of load generators
on virtualized instances run counter to the goals of high integrity
tests. The goal for virtualization is the maximize the use of
available hardware by placing as many virtual instances of operating systems on
the hardware as is possible. With performance testing your goal is
the maximize the impact on your application and minimize the influence your test
bed has on the performance of the virtual users. You want maximal
performance out of these virtual users and this OS instance.
(2) I have yet to meet a VMWARE, XEN, Microsoft
Virtual Server or KVM environment where load generators were not located
alongside other non-test VM instances. This sets up a problem in
basic testing
-
Constantly changing initial conditions from test to test.
Each virtual machines state is uncontrolled and running independently.
In software virtualized environments the hypervisor is constantly in control
of how resources are parsed out to the VMs based upon requests to access the
hardware . As the broker the hypervisor will make decisions
to rob resources from one instance to give to another instance on the same VM
host.
-
Uncontrolled influences while the test is ongoing. Each of
the demands of the other VMs constantly shift the mix of resources available
to your VM while the test is ongoing. This is before you add the
uncontrolled demands of the shared network interfaces,
etc...
-
Repeatability. The constantly shifting sands of initial
conditions and uncontrolled influences while the test is ongoing make it
almost impossible to meet the absolute demand that your test be repeatable
with substantially similar results. At the end of your testing any
third party should be able to take your application definition, your test
definition, your data, your load test environment setup and reproduce your
results. Throw Virtualization in the mix and you can pretty much
toss repeatability out the window.
(3) VM Clock drift. The system clock is also
virtualized and occasionally has to resync with the physical system
clock. This system clock is tied to your virtual user timing
directly and the amount of drift is unpredictable and ungovernable.
I refuse to accept a situation where I have known clock drift which can occur in
the middle of a timing record.
(4) Try to monitor your Load Generators.
You are at the mercy of whatever your hypervisor tells you is in
use in the way of resources. I have had VMs
from multiple vendors report greater than 100% CPU collective
virtualization and the clock drift makes the per second clock drift makes the
"per second" stats highly suspect. Monitoring the host OS for the virtual
machines adds its own set of complexities associated with trying to decouple the
influences of your VM from others.
(5) There is no fundamental benefit to placing more
than one Load Generator instance on the same physical host. Yes, I
have also observed this a lot in VM environments. All you are doing
is adding overhead.
I am not stating that virtualization is universally
bad. Heck, I used it everyday with certain components of LoadRunner
such as VUGEN, where I can setup clean environments for specific protocols for
recording and development. It even works out very well with the
analysis tool. But for anything that has a time basis that you are
going to make a decision upon....run run run very fast and very
far. The reasons to deploy Virtualization in production
are many and sound, but the needs for basic testing principals, consistency in
documented initial conditions, consistency and repeatability of the test trump
the reasons to fit into the standard VM model for an
organization. Performance test tools are primarily a risk
management tool from a business perspective. Tossing virtualization
into the mix adds risk back into what should be a risk averse set of
activities.
You're right, it's deployed quite widely in this fashion,
but mass does not imply a best practice. There are probably dozens
of behaviors organizations engage in everyday from a performance test
perspective that dramatically impact their test results and these practices are
engaged far often than are the best practices associated with performance
testing. The frequency of use does not imply correctness in
action.
In the end if you are compelled to use virtualized
generators then as testers we are compelled to document deviations from testing
best practices associated with documented and consistent initial conditions,
steps to reproduce and expected results, absence or inclusion of a working
hypothesis, the inclusion of control factors in the test and repeatability of
the results. It is this documentation of the known
deviations from best practices distributed with your test results which becomes
the aluminum baseball bat for the developer to go "Chicago" on
you. And they are not going to go after the CFO who mandated
the use of virtualization, they are going to go after the Performance Test team
for showing up with less than high integrity results.
As to the reasons why the bank of seven
1 host for the controller
1 Host for a "control" generator to execute a single
virtual user of each type during a performance test. You will be
able to observe differences between load generator induced influence in
your test execution by observing the differences between behavior of users on
your control generator versus the running average of the remainder of your
population. If application influence is king, then both your control
group and your global group will decay in performance at the same
rate. Where you have a load generator influenced response time then
you will observe a decay in the response time for your global group but your
control group will either be unaffected or perhaps get a little faster as
influences from the global group are used less aggressively against the
app.
1 Hot Spare/Parts Cannibal. It is still true
that one machine in seven will fail within the next 18 months. And
yes, it will likely happen in the midst of a short turnaround testing
effort. By having matched hardware you can either use this machine
as a parts cannibal, a hot host to press into a particular duty, or a hot host
to rebalance load to if you observe that your global and control groups are
being mismatched in performance
4 Primary load generators. 4GB of memory -
Windows XP sp3/4 32 bit - 250GB of HD space - Co processed video card - Ideally
intelligent disk controllers and network cards with TCP Offload Engines
(TOE). Remember, this is a test bed. You want your
hardware to be boring and work all of the time. LoadRunner is still
a 32 bit app for Load Generation and your application under test cannot tell
across the network whether it is a 16 bit, 32 bit, 64 bit or bit sliced
microprocessed computer that is connecting to it as a client. Don't
use Windows Server. There are no dependencies in LoadRunner which
mandate the use server and all it does is steal resources that can be better
used for Virtual Users. Windows XP is still the most promiscuous of
generators, supporting all protocols.
I will open up the offer. If you have a CIO and
CFO that mandate virtualization I will be happy to get on the phone with anyone
here and their end client organization to discuss the tradeoffs involved in this
decision. I will do this at no cost as this is a bad testing
practice and should be discouraged and stopped in our profession.
You may contact me at mailto:LRVirtua...@jamespulley.com to request
this service. I have all sorts of disaster stories to share and how the
end result was prolonged/delayed testing, assignment of errors to the app and
budget dollars spent to chase down an engineering ghost when in fact it was
the influence of the load test bed which caused the issue, high load tests
impacting production instances on other VMs located on the same VM
hosts. The end goal of management with virtualization is to save
money and I can point to all sorts of examples associated with this practice in
testing where this goal was turned on its
head.
James
Pulley