Debugging high system time in kernel compile

In the last post, I discovered that compiling the Linux kernel in my VM seemed to cause the system to spend a lot of its cycles in the operating system, rather than processing my compilation workload. This is known as ‘system time’, and is easily seen with the ‘top’ tool.

Virtual vs Physical:

I was pretty sure that seeing about 35% system time was unusual. I’d normally expect to see values like this in my torture-testing environments, where I deliberately do system calls as fast as possible.

The finger of suspicion immediately fell on my VM environment – mainly because it’s the biggest factor that was different between my environments.

VMs are amazing at making the guest operating system appear to be running on its own hardware. Of course, it’s really not the case : I’m actually running two operating systems on a single piece of hardware.

Operating systems really like to have full control of the hardware, so running two parallel OS instances is not something that comes naturally to computer systems. There’s some clever slight of hand going on to give the guest OS the illusion that it’s in control, and the instances actually aren’t actually as separate as they appear.

For most purposes, this almost-separation works extremely well – VMs are great technology for development and investigative work. However, sometimes the physical reality of the computing environment leaks through, resulting in some hard-to-debug problems.

Workload characteristics:

Compilation workloads tend to start lots of fairly short-lived user-space processes which access large numbers of relatively small files. There’s the code that’s being compiled of course, but there’s also all the headers and (if the system is linking) all the object files.

One of the things that VMs do is virtualize the file system. Instead of having an actual disk that accepts commands from the OS, the VM uses a large file in the host operating system and presents that to the guest as if it were a disk.

There’s a big difference there, and VMs (especially type 2 VMs like VirtualBox, Parallels and VMware Workstation) have a reputation for poor I/O performance.

Note : I haven’t benchmarked I/O performance for myself, so take this statement with a grain of salt – I could be wrong.

A workload with lots of small random accesses would hit any file system hard, and perhaps the overhead of virtualization is what’s causing the high system time?

Cross-checking:

There’s one easy, if not terribly scientific, test that I can do to get a quick read on the problem: Repeat the same workload on a native Linux machine, to see if the same thing shows up.

Now, this really is rough and ready. Just about everything about the two environments is different.

My native Linux machine uses an Intel i5-3470 quad core processor, while my VM uses two virtual  Core(TM) i5-2500S CPUs out of the four in the host machine.

I have different amounts of memory available, and the guest OS kernel versions are different. The host machine is a Mac, too, so that’s another factor.

Still, running the same compilation process on the same Ubuntu kernel release showed that I was only using ~8% system time on my physical hardware. That’s much more like what I’d expect to see for a workload that doesn’t go out of its way to issue system calls.

To add another data point (and, scientifically speaking, lots more confounding factors!), I decided to do the same test with a guest OS running under Linux’s KVM system.

Again, the guest OS showed that it was spending reasonable amounts of time in system space.

This may possibly suggest that there’s something about the VirtualBox implementation which is giving rise to high system time in the guest OS.

The next post will show how I went about investigating this further.

Comparing Linux WM load to Mac OS X (Yosemite)

In the last post I used SystemTap to look at the I/O operations generated by the desktop environment.

I thought it would be interesting to do a similar analysis on my Mac, just to have a comparison.

It turns out that it’s more complicated to analyze the Mac’s behavior than it is for Linux. Based on this specific experiment, the Mac seems to live in a middle ground between IceWM and Ubuntu’s Unity environment.

However general usage impressions and some other filesystem data suggest that disk utilization may be substantially higher.

Mac OS X tools:

The Mac uses quite a different tool set to do monitoring from Linux.

Thanks to its BSD heritage, Mac OS X has an implementation of Sun’s ‘dtrace’, which arguably is the gold standard of tracing systems.

Dtrace is not fully ported to Linux currently, although work is underway. I tried using it about six months ago and found it wasn’t fully mature yet.

The tool to use of the Mac is ‘fs_usage’. I believe (although I could be wrong) that this is based off ‘dtrace’.

It’s a very handy tool and gives quite a bit of actionable info right out of the box.

Results:

Over a 60 second period my Mac issued ~315k I/O operations. Many of these were filesystem metadata operations like ‘getattrlist’ and ‘stat64’ operations that possibly might not result in many actual disk operations.

If I filter the ‘fs_usage’ output pretty strictly, allowing a fairer comparison with the ‘systemtap’ under Linux, then I get a count of about 875 reads and writes over 60s. This is actually better than under Linux, but only if we compare these specific operations.

The majority of the load seemed to be from the Spotlight indexing system, which I’ve noticed being pretty intensive before.

Commentary:

I’m actually not that keen on Spotlight – I feel it uses a lot of resources for rather modest returns. I use its features very rarely, but I suspect that I pay a performance penalty for its indexing operations.

Even though the actual number of reads and writes doesn’t appear to be that high, you can hear the disks working. The disks are clearly doing a lot of work and application startup is rather sluggish.

My guess is that the metadata operations (getattr, stat, open etc etc) actually do contribute to quite a lot of disk activity, especially shortly after reboot when the filesystem caches may not be that warm.

Moving the head of a rotating disk is pretty time consuming. On my iMac (7,200 RPM conventional disk), the disk can only perform about 100 I/O operations per second. Even a fairly modest seek load can really eat into system performance.

We could use another tool to probe into a lower part of the file system to see how many ‘seek’ operations are actually reaching the disk. ‘iostat -o’ is an OK tool for this, but it’s not as good a tool as under Linux and there isn’t a specific column for the number of seeks / sec. You can infer what’s going on, but it’s a bit sparse.

It is possible to disable spotlight, but I’ve found that other applications use it as a central searching facility. For example, I could not search email in Microsoft Outlook when Spotlight was turned off.

Running SystemTap to monitor I/O behaviour

The objective of this mini-project was to use advanced Linux debug tools to investigate the runtime impact of Window Managers (WMs) on the machine’s storage subsystem.

It’s not a very complex, or perhaps even a very useful project, but it does provide a vehicle for bringing up the tools and answering a question in an objective manner.

Experimental environment:

I’ve finally got to the point where I can run SystemTap and (hopefully) come to some sort of understanding of how the WM drives the IO subsystem.

Here’s my operating environment:

  • Host system : 27″ iMac, OS X Yosemite, Quad core Intel I5 processor, 12GB RAM
  • Hypervisor : Oracle VirtualBox 4.3.20
  • Guest OS : Ubuntu 14.04 LTS (32 bit
  • Window managers : Ubuntu Unity, IceWM

The system is idle, with one terminal running. The OS is in full multi-user mode, so there’s possibly a range of background tasks going on.

This obviously isn’t terrible realistic, since most interesting problems occur when there’s actually something going on. Still, it’s sufficient to answer our current training question.

Running SystemTap under IceWM:

My normal development environment uses IceWM as its window manager, so I’ll use that to start off with.

I want to monitor the system for a fixed amount of time, so that errors don’t creep in by guessing the monitoring interval.

I’m going to run SystemTap as follows:

sudo timeout 60 ./watch_all_reads_writes.stap 8 1 2>&1 | tee trace.60s

Note : The ‘timeout’ utility is extremely handy. It allows me to run a command for a specified interval then deliver a signal (which I can specify if I want to). It’s not available by default on all Linux distributions and versions, but it’s available in Ubuntu 14.04.

Here’s a snippet of the output:

rs:main Q:Reg(545) vfs_write 0x800001
rs:main Q:Reg(545) vfs_write 0x800001
tee(2417) vfs_write 0x800001
tee(2417) vfs_write 0x800001
tee(2417) vfs_write 0x800001
tee(2417) vfs_write 0x800001
tee(2417) vfs_write 0x800001

So, the first interesting thing is that the presence of ‘tee’ actually causes a lot of I/O write operations, even when there’s no output from the application it’s monitoring.

I need to strip out the impact of ‘tee’ the count the read / write operations, as follows:

grep -v tee trace.60s | wc -l
>> 5

The total number of lines is 293 over 60 seconds, so ‘tee’ clearly dominates the I/O behavior.

Results with Ubuntu Unity:

Next I need to do a similar test when I’m running Ubuntu’s ‘Unity’ desktop environment.

The results are pretty different: There were 4424 read or write operations over 60s.

Stripping out the contribution from ‘tee’, gave 4112 operations, compared to 5 with IceWM.

I can extract the list of processes with this command line:

cat trace.60s.unity | gawk -F'(‘ ‘{print $1;}’ | sort -u | wc -l

There were 29 separate processes causing the IO operations (I won’t list them all here).

If I had to guess, I’d say that it’s mainly to do with Unity doing background operations to keep packages updates, but there were some interesting ones like an ‘http’ process and something called ‘Zeitgeist’.

There is more information about Zeigeist on its Ubuntu man page.

Summary:

Based on the output of this script, we’ve discovered that:

  • IceWM causes minimal I/O subsystem load when the system is idle
  • Ubuntu Unity has fairly significant background activity which affects the I/O subsystem
  • The ‘tee’ application does about 3 write operations / sec, even when there’s no input that needs to be written to STDOUT / STDERR and a file. Are these zero-length writes? I don’t know – but we could do some more work to find out.

Given this, can I say that Ubuntu Unity would load my machine’s I/O subsystem too much? No, not at least under these operating conditions.

These I/O rates are pretty modest, even with the slow 7,200 rpm disks in iMacs and the overhead from the virtualization system.

Also, when I see calls to ‘vfs.read’ or ‘vs.write’, that doesn’t necessarily translate to an operation that actually causes disk heads to move.

To understand why that is, I’ll have to take a look at how the Virtual Memory subsystem of modern operating systems works. This is well outside the scope of this post, but if you’re interested I recommend reading Mel Gorman’s document at this link.

https://www.kernel.org/doc/gorman/pdf/understand.pdf

There’s a LOT in this document. Even if it refers to older kernels, it’s still an excellent resource.

Caveats:

There are caveats to my mini-analysis:

  • The monitoring tool only tracks read and write operations. There are other I/O operations such as ‘stat’, ‘open’ and ‘close’ which are not logged.
  • This tool only monitors I/O operations that go through the Virtual File System (VFS). Operation that go over network sockets won’t be tracked.

Also, I’m not a real hard-core expert in these tools, so I may have made some mistakes or omissions. If you spot anything, let me know.

Now, for one last comment: These monitoring tools are very powerful, because they operate across the whole system and can reach right into the kernel – we’ve only just scratched the surface.

The side effect is that using them can raise more questions than they answer!  I think that’s a good thing, because I get to really understand what my machine’s doing.