embedded, hypervisor, isolation, l4, microkernel, operating systems and virtualization, performance
Microkernels 101
This is the promised third installment of my dissection of a paper our competitors at VirtualLogix had written and presented at IEEE CCNC last January. The paper compares what the authors call the “hypervisor” and “micro-kernel” approaches to virtualization. The present benchmark results using their VLX system and our OKL4 as representatives of each category.
In the first blog I explained how their benchmarking approach was flawed in several ways, and consequently the results are worthless. In the second blog I claimed (backed up by references to the scientific literature) that the paper presented an out-dated 1980s view of microkernels. I promised a detailed rebuttal. Here we go…
Let’s first look at a pretty general statement:
“Micro-kernels provide higher-level abstractions, such as tasks, threads, memory contexts and IPC mechanisms, which are similar to those already implemented by a complete operating system kernel…”
This is an instance of the first-generation microkernel view (exemplified by Chorus) that microkernels are mini-OSes. It is totally at odds with Liedtke’s minimality principle (as explained in the previous blog), which requires that the microkernel puts a minimal wrapper around hardware mechanisms, just enough so they can be securely controlled by unprivileged software.
A microkernel must provide an abstraction of the security-relevant mechanisms of the hardware. Any kernel (or hypervisor for that matter) must do that, what characterises a microkernel is that it doesn’t do more than the absolute minimum. This means that the microkernel must provide:
- an abstraction of hardware address spaces, because they are the fundamental mechanism for memory protection. Whether you call that “tasks” (as in early versions of L4) or “address spaces” (as in current versions) or virtualised physical memory is a matter of nomenclature, nothing more. The point is that it’s a minimal wrapper around hardware, enough to allow secure manipulation by user-level code. It isn’t a “task” in the Chorus sense, which is a heavyweight object;
- an abstraction of execution on the CPU, because that is required for time sharing. Whether you call that “thread”, “scheduler activation” or “virtual CPU” may be notation or maybe a small difference in approach, but not much more, as long as it’s minimal;
- a mechanism for communication across protection boundaries. This can be in the form of a (light-weight, unlike Mach or Chorus) IPC mechanism, a domain-crossing mechanisms (as in “lightweight RPC”, Pebble or our Mungi system) or a virtual inter-VM interrupt. There are semantic differences between those options, but as long as it’s really minimal, either are valid options. A virtual network interface (as offered as a primitive by some hypervisors) is not minimal, as it requires network-like protocols;
- memory contexts? At the level of a (proper) microkernel, that’s just the same as a “task”—an abstraction of memory protection. Hence, there should not be separate primitives for tasks and memory contexts.
In summary, the microkernel provides mechanisms corresponding to hardware features. It doesn’t provide services, just fundamental mechanisms. In particular, it does not duplicate OS services. This misunderstanding was one of the causes for the failure of first-generation microkernels. The OS community understands this. The authors of that paper don’t seem to.
Now, for completeness (and because I promised) is the list of (some of) the things the authors of that paper got wrong.
“Micro-kernels manage tasks and threads: they offer services to create, delete and list them”.
Tasks (or address spaces) and threads (or scheduler activations or virtual CPUs) must indeed be created and destroyed. A hypervisor creates and destroys virtual machines, it’s the same thing. What is relevant is whether these concepts are high-level (OS-like) or low-level (minimal hardware abstractions). In particular, no L4 version I know of has a service to list tasks or threads. That wouldn’t be minimal, and thus in violation of microkernel principles. It’s a first-generation concept.
And a microkernel doesn’t “manage” these things. It provides the basic abstractions, as a hypervisor does (just using different terms). Management is left to user-level software. Another thing first-generation microkernels didn’t get right.
“… a typical way to run an OS kernel on top of a microkernel is to insure [sic] that all processes and threads managed by the guest OS are known to, and managed by the underlying micro-kernel.” And further: “Micro-kernels provide services to schedule threads according to micro-kernel scheduling policies that replace the existing guest OS processor scheduling policies.”
I’m not going to get into an argument about the semantics of “typical”, but the second claim states as factual something that is just plain wrong.
Fact is that L4Linux, the first port of Linux to L4, did map all Linux processes to L4 threads, and scheduled them by the L4 scheduler. It must be kept in mind that the purpose of that work wasn’t to champion a particular approach to Linux para-virtualization on L4, it was to demonstrate that high-performance systems on top of (real) microkernels are possible (and it succeeded at that).
OK Linux, our para-virtualized Linux on OKL4, specifically doesn’t do what is claimed in the above quote. In fact, Wombat, our first para-virtualized Linux, specifically scheduled virtual machines, and let the guest OS manage (schedule etc) its processes, as you’d expect it from a virtual machine. In other words, Wombat apps were scheduled according to Linux scheduling policies, not L4 policies. This has been clearly described in the literature.
However, I have also explained that in the embedded space, this is precisely a shortcoming of virtual machines. The reason is that embedded systems are highly-integrated systems and the two-level scheduling approach that is inherent in virtualization cannot appropriately deal with such global resources as the CPU. The VirtualLogix authors would be well advised to read that paper, as they don’t seem to understand this, given the following quote from their paper: “[In a CE device] there is usually little need for a tight integration between applications running in each environment.” I hate to tell you guys, but there are a lot of devices out there that are in fact tightly integrated. And some use OKL4’s high-performance IPC mechanisms to achieve that!
“Within a micro-kernel, there is no other way to communicate between threads running in different tasks of a guest OS than use microkernel IPC.”
This is just stunning nonsense! Where did they get that from? Did Chorus really not support shared memory? Whatever, in one of the early L4 papers, Liedtke describes in detail L4’s approach to memory mappings. It’s an extremely elegant and powerful model that gives you everything you need. Check the literature, guys!
Device drivers: The paper creates the impression that a microkernel forces you to run device drivers as native microkernel applications.
This is indeed the most secure model, as drivers are isolated and the rest of the system is protected from (at least some of) their bugs. In fact, high-performance user-level device drivers were pioneered by L4. But who says that you can’t use a guest OS’s native driver, and leave them inside that guest if you wish? In fact, that is specifically supported by OKL4, but implies that you trust that guest with the device. If you worry about security, you should minimise your trusted computing base. And this means not trusting a 200,000-or-more LOC guest.
There’s more, but going through every single erroneous statement would be way too boring, so I focussed on the main ones. Just don’t think that everything I didn’t address is correct…
The bottom line is that almost everything written about microkernels in that paper is wrong. It may apply to (thoroughly discredited) first-generation microkernels, but it doesn’t apply to OKL4. The authors of that paper violated a basic rule: don’t generalise from one instance to all. There are rotten apples, but that doesn’t mean all apples are rotten. Every kid over the age of five understands that.
There’s more entertainment value in that paper, stay tuned.
Dear all, I’m Diego.
1. I would like to ask you about the crazy idea to implement the IPC in a kind of run time support making it out of the kernel. is that approach wrong?. Do you know some examples? and have some one thought about that?. someone can tell me why it is not possible to think about such kind of solution.
IPC, as done in L4, is simply a context switch with a payload. It’s already the minimum of what has to be in the kernel. You can’t take that functionality out, unless you have no protection at all (i.e. no security).