Virtual Machines

5 views
Skip to first unread message

Jason

unread,
Mar 3, 2011, 5:04:30 AM3/3/11
to mpir-...@googlegroups.com
Hi

I've come to the conclusion that I will probably never get the time to write
any 32bit code , and the more it gets delayed the less important it becomes.
At the moment all my machines have two OS'es installed , slackware 32bit and
slackware 64bit in two completely separate partitions. This is awkward as I
have to reboot to switch between the two. So I thought if I have slackware
64bit as the OS , and slackware32 in a virtual machine. This is fine for
testing correctness , but how does it effect tuning? and timings? . I assume
that the assembler instructions are not virtulized , just I/O , and so tuning
and timings are not affected. On my main machine as well I also have Vista 32
and 64 bit in separate partitions and if these as well could be virtulized
under linux it would make my life easier , again testing correctness would be
OK , but how about tuning and timings?

Does anyone here have any experience in this ? , or any recommendations

Thanks
Jason

Jeff Gilchrist

unread,
Mar 4, 2011, 5:56:08 AM3/4/11
to mpir-...@googlegroups.com
On Thu, Mar 3, 2011 at 5:04 AM, Jason <ja...@njkfrudils.plus.com> wrote:

> OK , but how about tuning and timings?
> Does anyone here have any experience in this ? , or any recommendations

I don't think I have ever actually tested things in virtual machines
such as tuning/timings. Since you still have your 32bit "native"
install, I would suggest doing some tuning, benchmarking and record
the timings. Set up a 32bit VM in your 64bit Linux environment and
then re-run the tests on the same machine to see if there is a drastic
change.

Jeff.

Jason

unread,
Mar 17, 2011, 2:45:18 PM3/17/11
to mpir-...@googlegroups.com

Trying VirtualBox on my trusty old K8 which has NO specific virtualization
enhancements. Host system is 64 linux and guest is 32 linux and I did 10 trys

For a native 32bit linux the mpn cycle count benchmark is the attached "real"
and for the virtual linux we get "virtual"
You can see the real system has a very consistent set of values whereas the
virtual system has timings which vary a lot , sometimes even "faster" , which
means we cant even take the smallest value :(

For the make tune what params do we get , for the real system see "tune_real"
and for virtual linux see "tune_virtual"
You can see again the real system is fairly consistent( the one's that arn't
either have very similar slopes at the crossover or our tuning is slightly
wrong) , and the virtual systems tuning is not very useful.

I dont plan to write any 32bit code so I dont care if I cant get reliable
timings , the only thing my K8 is going to do is testing for 32/64bit and
timings and tuning for 64bit on it's native 64bit OS. I suppose I could
install native Linux distro which has both 32 and 64bit in the FULL
toolchain(which ones have this?). I think I may try a virtual Solaris on the
K8 for testing purposes (cant test the BSD's on it as they require the
virtualization extensions.)

The K8 is not "made" anymore and the more modern cpu's have virtualization
extensions (so I can test the BSD's as well) so I'll give the nehalem a go and
see if this makes the timings better.

I also want to change the way we do our fake cpu testing , at the mo the fake
cpu testing is exactly like the proper test except
1) cpu detection is overridden
2) doesn't work for fat builds
3) doesn't test for instruction extensions from asm or compiler
4) a subtle autotools bug could hide other differences as we override the build
mechanism of autotools

We cant trap on the cpuid instruction , but we can replace it with a macro to
simulate it , this would only leave option 3 above as the only difference from
a real chip.

Jason

real
virtual
tune_real
tune_virtual

Jason

unread,
Mar 21, 2011, 9:01:58 PM3/21/11
to mpir-...@googlegroups.com

Trying the same exercise on my nehalem , which does have virtualization
enhancements , we get this native 32bit linux benchmark "bench_real_32bit" and
the virtual version "bench_vir_32bit" and virtualizing 64linux (on a 64bit
linux system, just to see what it's like) we get "bench_real_64bit" and the
virtual version "bench_vir_64bit"

I would say it's not as bad as the K8 , but its still pretty useless for good
timings. Perhaps later chips will get better at this , or address the issue in
a different way as virtualization does seem to becoming more common.
I have yet to test windows but I don't think the results will be any different.

For testing I want a large range of OS'es so I think I'll put Linux64
(slackware) as the main OS and all others as virtual , which will include
Linux32(slackware) (+ perhaps Fedora , due to it's security restrictions) dont
know if any others are worth testing , FreeBSD,NetBSD,OpenBSD on the machines
that have hardware virtualization capability's (VirtualBox restriction),
Vista(with cygwin and mingw*) 64 and 32 on my main machine(license
restriction) , Solaris 64 and 32 on all , anything else free? HPUX? AIX?
Darwin?(DRM problems). Also alternate compilers ie icc +? , although I dont
know how they will interact with the system compiler , and even how different
GCC's interact? , with virtual machines I could isolate each one ,and there is
a good chance it may be even easier to do it that way, because we need to
build "out of the box" , as people who have multiple setup's will be used to
specifying paths etc.

Along with the new fake cpu code this should give me a nice range to test
against. The QEMU project looks useful , and I think I may be able to run a
virtual machine of an ARM cpu (for example) on my x64 cpu , have to look into
this as this could really expand our testing base.

Clearly you can only do timings and tunings for the cpu you are running on and
considering the different ABI's for x86/64 , all the Unices ABI's for 32 and
64bit are the same , so we only have to consider Linux and windows 32/64 bit.
Linux 32bit and windows 32bit have the same ABI , and anyway I dont have the
time for 32bit , so we are left with linux 64bit and windows 64bit.

For the assembly language loops and speed.exe I can easily fake one ABI with
the other , and this is necessary for development , as currently the function
call is done indirectly , whereas in any code it is direct( the reason for the
"macro hell" of speed.h) so all low-level code for windows64 should be
optimized as if it were on a native system.

Of course with all these virtual machines floating around , I'll have to get a
new server , sandybridge :)

Jason

bench_real_32bit
bench_vir_32bit
bench_real_64bit
bench_vir_64bit
Reply all
Reply to author
Forward
0 new messages