In the last post, I discovered that compiling the Linux kernel in my VM seemed to cause the system to spend a lot of its cycles in the operating system, rather than processing my compilation workload. This is known as ‘system time’, and is easily seen with the ‘top’ tool.
Virtual vs Physical:
I was pretty sure that seeing about 35% system time was unusual. I’d normally expect to see values like this in my torture-testing environments, where I deliberately do system calls as fast as possible.
The finger of suspicion immediately fell on my VM environment – mainly because it’s the biggest factor that was different between my environments.
VMs are amazing at making the guest operating system appear to be running on its own hardware. Of course, it’s really not the case : I’m actually running two operating systems on a single piece of hardware.
Operating systems really like to have full control of the hardware, so running two parallel OS instances is not something that comes naturally to computer systems. There’s some clever slight of hand going on to give the guest OS the illusion that it’s in control, and the instances actually aren’t actually as separate as they appear.
For most purposes, this almost-separation works extremely well – VMs are great technology for development and investigative work. However, sometimes the physical reality of the computing environment leaks through, resulting in some hard-to-debug problems.
Workload characteristics:
Compilation workloads tend to start lots of fairly short-lived user-space processes which access large numbers of relatively small files. There’s the code that’s being compiled of course, but there’s also all the headers and (if the system is linking) all the object files.
One of the things that VMs do is virtualize the file system. Instead of having an actual disk that accepts commands from the OS, the VM uses a large file in the host operating system and presents that to the guest as if it were a disk.
There’s a big difference there, and VMs (especially type 2 VMs like VirtualBox, Parallels and VMware Workstation) have a reputation for poor I/O performance.
Note : I haven’t benchmarked I/O performance for myself, so take this statement with a grain of salt – I could be wrong.
A workload with lots of small random accesses would hit any file system hard, and perhaps the overhead of virtualization is what’s causing the high system time?
Cross-checking:
There’s one easy, if not terribly scientific, test that I can do to get a quick read on the problem: Repeat the same workload on a native Linux machine, to see if the same thing shows up.
Now, this really is rough and ready. Just about everything about the two environments is different.
My native Linux machine uses an Intel i5-3470 quad core processor, while my VM uses two virtual Core(TM) i5-2500S CPUs out of the four in the host machine.
I have different amounts of memory available, and the guest OS kernel versions are different. The host machine is a Mac, too, so that’s another factor.
Still, running the same compilation process on the same Ubuntu kernel release showed that I was only using ~8% system time on my physical hardware. That’s much more like what I’d expect to see for a workload that doesn’t go out of its way to issue system calls.
To add another data point (and, scientifically speaking, lots more confounding factors!), I decided to do the same test with a guest OS running under Linux’s KVM system.
Again, the guest OS showed that it was spending reasonable amounts of time in system space.
This may possibly suggest that there’s something about the VirtualBox implementation which is giving rise to high system time in the guest OS.
The next post will show how I went about investigating this further.