Performance differences between physical and virtual hardware

For the first set of posts on debugging, I analyzed one cause of high system time during compilation of the Linux kernel.

For this next set of posts, I’m going to dig into a funny observation I made while doing the system time work: My physical machine appears to be slower than my VM for doing the compilation workload.

My physical machine is a brand-new machine based off an quad-core Intel i5 processor. My VM is running on a 3 1/2 year old iMac, also based off a quad core i5 processor. The VM is allocated only 2 processors, so on the face of it, the physical machine should be faster.

Let’s dive in and see if we can understand what might be going on.

Machine setup

Physical:

  • Late 2014 Dell Optiplex 7010 mini-tower
  • Quad core Intel i5 processor, with nominal speed of 3.2GHz
    • SKU is Core I5-3470
    • Model details from Inter here.
    • L1 cache size : 256KB
    • L2 cache size : 1MB
    • L3 cache size : 6MB
  • 4GB system RAM
    • 3.8GB available after integrated graphics RAM
  • 7,200 RPM Western Digital 250GB drive, communicating over a 3Gbps SATA link
  • Ubuntu Linux 12.04 LTS

Virtual:

  • Mid 2011 27″ iMac
  • Quad-core Intel i5 processor, 2.7GHz nominal speed, single socket
    • According to this site, the precise SKU is a Sandy Bridge i5-2500S
    • You can get the definitive part information from Intel here.
    • L2 cache size (per core): 256KB
    • L3 cache size (per socket) : 6MB
  • 12 GB system RAM (4 banks of DDR3 at 1333MHz)
  • 7,200 RPM Western Digital 1TB drive, communicating over a 3Gbps SATA link
    • This is the only drive in the system
  • Mac OSX Yosemite (v10.10.2)
  • Hypervisor : VirtualBox 4.3.20

The virtual operating system is:

  • RedHat Fedora 21.5, 64bit
  • 4GB of RAM
  • 2 virtual processors
  • 80GB disk image maximum size (set to expand as necessary)
  • 64MB display RAM, 3D acceleration turned on
  • Virtualization extensions enabled

Doing a clean comparison between a virtualized guest instance and physical hardware is not that easy – there are a lot of major differences – but on the surface, the physical linux system is not too different from the virtualized linux system.

Given that VirtualBox is a type 2 hypervisor (and it doesn’t have a reputation for tremendous performance), I certainly wouldn’t expect it to outperform physical hardware. Still, as the data below shows, for this particular workload, it does.

Benchmarking methodology:

One of the cardinal rules of benchmarking is to always do the same measurement multiple times. Amongst other things, we need to take into account:

  • Cache warming effects at many levels in the system
  • Other housekeeping workloads running during the lifespan of the task
  • Unknown behavior caused by data arriving over the network interface
  • Random variations of unknown origin

Of course, the more isolated and predictable the benchmarking environment, the better. That’s not terribly easy with a type 2 hypervisor, because the host operating system could be doing all sorts of things that the guest has no visibility into.

To account for these issues, I run the compile job back-to-back three times. To address some of the caching issues, my controller script has the option to clear the Linux page cache between each iteration, or to only clear it once at the beginning.

Three times isn’t really enough if I’m going to be formal about it. I’d need more samples in order to ensure that the results aren’t just coincidence. However, I’m not a statistician, so I’m not going to talk about something I don’t really know about.

Unscientific though this methodology may be, I’m seeing results that appear to be stable, so I think it’s good enough to be getting on with.

Restricting number of CPU cores:

In order to make the comparison more fair, I did turn off two of the CPU cores on the physical machine.

This can be done in two ways: Through the BIOS, or through super-user commands within the operating system. I used the latter, although I suspect that it may not have precisely the same effects as disabling CPUs in the BIOS.

Here’s how to quickly disable a CPU in Linux:

echo 0 > /sys/devices/system/cpu/cpu<n>/online

So, to turn off CPU cores 3 and 4, I issued the following commands (remember, the core numbering starts at zero):

echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online

This takes effect immediately and doesn’t require a reboot, which is handy.

Compilation code base:

For this benchmarking, I was compiling v3.9 of the Linux kernel on both physical and virtual machines. The code is a vanilla git checkout of the linux-stable code base, not the Fedora or Ubuntu code base.

I’m using v3.9 rather than one of the higher versions because one of the internal APIs has changed a little in more modern releases, which causes a compilation error with the VirtualBox kernel modules.

Benchmarking results:

Physical: 1905 secs, 1863 secs, 1884 secs

Virtual: 1784 secs, 1727 secs, 1742 secs

So, the absolute differences as as follows: 121 secs, 136 secs, 142 secs

The percentage values are improvements of: ~7%, ~9%, ~8%

So, my virtualized system, running on 3 1/2 year old hardware is about 8% faster than a brand new machine.

If this was a bake-off between two similar physical systems, this would be enough to make a decision. As I mentioned earlier, a 10% difference is worth having in a professional environment where you run the same jobs again and again and again.

That the faster system is a virtualized instance, running on 3 1/2 year old hardware, inside a (apparently not terribly optimal) type 2 hypervisor is really rather surprising.

The next series of posts will explore this issue and try to come to a root cause – we’ll follow where the data takes us.