In the last couple of posts I described a problem I encountered when compiling the Linux kernel on my VM.
In this post, I’m going to start to dig in and see what might be going on. I’m going to use the ‘perf’ toolkit, which is part of the Linux kernel.
Measure, measure, measure:
Effective performance optimization involves discovering the ‘hot’ part of the code, rather than guessing. Accurate measurement is key, because even informed opinion often turns out to be incorrect.
Graphing the data is important too. It’s possible to see patterns in scrolling data that aren’t there when the data is visualized.
It’s possible to profile a user-space application with tools like ‘gprof’, but what happens when the time seems to be in the operating system?
System performance and ‘perf’:
The first tool most people turn to when there’s a performance problem is ‘top’. It’s a great tool and can tell me a lot – especially if I know how to interpret its output accurately.
The problem is that it’ll tell me which processes are consuming the most CPU, but not where inside those processes the time is going. That’s something that’s really hard to get without advanced debug tools.
Installing perf tools:
Even though ‘perf’ is part of the linux kernel, user-facing tools are needed to drive it. They can be installed easily as follows:
sudo apt-get install linux-tools-common
Perf top:
Once the tools are installed, simply start the workload. In this case, it’s easy:
cd <top of kernel source code>
make clean
make -j3
Then, run ‘perf top’. This will bring up an interactive interface that looks somewhat like traditional top. It’s a bit hard to capture text off it, so here’s the first few lines:
16.42% cc1 [.] 0x00116d1a
8.38% [kernel] [k] _raw_spin_unlock_irqrestore
8.28% [kernel] [k] read_tsc
6.45% [kernel] [k] finish_task_switch
4.37% libc-2.19.so [.] __memset_sse2
This shows that a symbol inside the ‘cc1’ process is taking the most amount of the CPU. That’s no surprise, because we know that the system is compiling C code, and ‘cc1’ is part of the gnu C compiler toolchain.
Now, there’s a problem here, because I’m seeing a hex address rather than a human-readable symbol name. It is possible to resolve the address into a symbol manually, but it’s easier to install the debug symbol packages.
What’s more interesting is the kernel symbols on lines 2 through 5. These three symbols are showing up in the sample more often than the user-space workload, which strongly suggests that this may be where the system time is going.
Next I’ll install the debug symbols for the C toolchain (it’s not as straightforwards as it may seem, unfortunately), and then we’ll take a look at what these kernel symbols actually mean.