microkerneldude

2009/03/14

Microkernels 101

This is the promised third installment of my dissection of a paper our competitors at VirtualLogix had written and presented at IEEE CCNC last January. The paper compares what the authors call the “hypervisor” and “micro-kernel” approaches to virtualization. The present benchmark results using their VLX system and our OKL4 as representatives of each category.

In the first blog I explained how their benchmarking approach was flawed in several ways, and consequently the results are worthless. In the second blog I claimed (backed up by references to the scientific literature) that the paper presented an out-dated 1980s view of microkernels. I promised a detailed rebuttal. Here we go…

Let’s first look at a pretty general statement:

“Micro-kernels provide higher-level abstractions, such as tasks, threads, memory contexts and IPC mechanisms, which are similar to those already implemented by a complete operating system kernel…”

This is an instance of the first-generation microkernel view (exemplified by Chorus) that microkernels are mini-OSes. It is totally at odds with Liedtke’s minimality principle (as explained in the previous blog), which requires that the microkernel puts a minimal wrapper around hardware mechanisms, just enough so they can be securely controlled by unprivileged software.
A microkernel must provide an abstraction of the security-relevant mechanisms of the hardware. Any kernel (or hypervisor for that matter) must do that, what characterises a microkernel is that it doesn’t do more than the absolute minimum. This means that the microkernel must provide:

an abstraction of hardware address spaces, because they are the fundamental mechanism for memory protection. Whether you call that “tasks” (as in early versions of L4) or “address spaces” (as in current versions) or virtualised physical memory is a matter of nomenclature, nothing more. The point is that it’s a minimal wrapper around hardware, enough to allow secure manipulation by user-level code. It isn’t a “task” in the Chorus sense, which is a heavyweight object;
an abstraction of execution on the CPU, because that is required for time sharing. Whether you call that “thread”, “scheduler activation” or “virtual CPU” may be notation or maybe a small difference in approach, but not much more, as long as it’s minimal;
a mechanism for communication across protection boundaries. This can be in the form of a (light-weight, unlike Mach or Chorus) IPC mechanism, a domain-crossing mechanisms (as in “lightweight RPC”, Pebble or our Mungi system) or a virtual inter-VM interrupt. There are semantic differences between those options, but as long as it’s really minimal, either are valid options. A virtual network interface (as offered as a primitive by some hypervisors) is not minimal, as it requires network-like protocols;
memory contexts? At the level of a (proper) microkernel, that’s just the same as a “task”—an abstraction of memory protection. Hence, there should not be separate primitives for tasks and memory contexts.

In summary, the microkernel provides mechanisms corresponding to hardware features. It doesn’t provide services, just fundamental mechanisms. In particular, it does not duplicate OS services. This misunderstanding was one of the causes for the failure of first-generation microkernels. The OS community understands this. The authors of that paper don’t seem to.

Now, for completeness (and because I promised) is the list of (some of) the things the authors of that paper got wrong.

“Micro-kernels manage tasks and threads: they offer services to create, delete and list them”.

Tasks (or address spaces) and threads (or scheduler activations or virtual CPUs) must indeed be created and destroyed. A hypervisor creates and destroys virtual machines, it’s the same thing. What is relevant is whether these concepts are high-level (OS-like) or low-level (minimal hardware abstractions). In particular, no L4 version I know of has a service to list tasks or threads. That wouldn’t be minimal, and thus in violation of microkernel principles. It’s a first-generation concept.
And a microkernel doesn’t “manage” these things. It provides the basic abstractions, as a hypervisor does (just using different terms). Management is left to user-level software. Another thing first-generation microkernels didn’t get right.

“… a typical way to run an OS kernel on top of a microkernel is to insure [sic] that all processes and threads managed by the guest OS are known to, and managed by the underlying micro-kernel.” And further: “Micro-kernels provide services to schedule threads according to micro-kernel scheduling policies that replace the existing guest OS processor scheduling policies.”

I’m not going to get into an argument about the semantics of “typical”, but the second claim states as factual something that is just plain wrong.
Fact is that L4Linux, the first port of Linux to L4, did map all Linux processes to L4 threads, and scheduled them by the L4 scheduler. It must be kept in mind that the purpose of that work wasn’t to champion a particular approach to Linux para-virtualization on L4, it was to demonstrate that high-performance systems on top of (real) microkernels are possible (and it succeeded at that).
OK Linux, our para-virtualized Linux on OKL4, specifically doesn’t do what is claimed in the above quote. In fact, Wombat, our first para-virtualized Linux, specifically scheduled virtual machines, and let the guest OS manage (schedule etc) its processes, as you’d expect it from a virtual machine. In other words, Wombat apps were scheduled according to Linux scheduling policies, not L4 policies. This has been clearly described in the literature.
However, I have also explained that in the embedded space, this is precisely a shortcoming of virtual machines. The reason is that embedded systems are highly-integrated systems and the two-level scheduling approach that is inherent in virtualization cannot appropriately deal with such global resources as the CPU. The VirtualLogix authors would be well advised to read that paper, as they don’t seem to understand this, given the following quote from their paper: “[In a CE device] there is usually little need for a tight integration between applications running in each environment.” I hate to tell you guys, but there are a lot of devices out there that are in fact tightly integrated. And some use OKL4’s high-performance IPC mechanisms to achieve that!

“Within a micro-kernel, there is no other way to communicate between threads running in different tasks of a guest OS than use microkernel IPC.”

This is just stunning nonsense! Where did they get that from? Did Chorus really not support shared memory? Whatever, in one of the early L4 papers, Liedtke describes in detail L4’s approach to memory mappings. It’s an extremely elegant and powerful model that gives you everything you need. Check the literature, guys!

Device drivers: The paper creates the impression that a microkernel forces you to run device drivers as native microkernel applications.

This is indeed the most secure model, as drivers are isolated and the rest of the system is protected from (at least some of) their bugs. In fact, high-performance user-level device drivers were pioneered by L4. But who says that you can’t use a guest OS’s native driver, and leave them inside that guest if you wish? In fact, that is specifically supported by OKL4, but implies that you trust that guest with the device. If you worry about security, you should minimise your trusted computing base. And this means not trusting a 200,000-or-more LOC guest.

There’s more, but going through every single erroneous statement would be way too boring, so I focussed on the main ones. Just don’t think that everything I didn’t address is correct…

The bottom line is that almost everything written about microkernels in that paper is wrong. It may apply to (thoroughly discredited) first-generation microkernels, but it doesn’t apply to OKL4. The authors of that paper violated a basic rule: don’t generalise from one instance to all. There are rotten apples, but that doesn’t mean all apples are rotten. Every kid over the age of five understands that.

There’s more entertainment value in that paper, stay tuned.

2009/03/07

Living in the past

In a recent blog I talked about a paper our competitors at VirtualLogix had written and presented at the IEEE Consumer Communications and Networking Conference last January. The paper compares what the authors call the “hypervisor” and “micro-kernel” approaches to virtualization. They present benchmark results using their VLX system and our OKL4 as representatives of each category.

In the previous blog I explained how their benchmarking approach was flawed in several ways, and nothing useful can be learned from the results. Moreover, I explained that they perform a classical apples-vs-oranges comparison (without alerting the reader to the fact). In a word: the results they publish are rubbish.

I promised a followup looking at some of the (many) other defects in that paper. So, today I’m looking at what they say about microkernels.

In short, their description of microkernels presents a 1980’s view, and is completely out of touch with the last 15+ years of operating-systems research. This is like using a Ford Model T to represent cars, while the Prius is parked outside.

Who am I to say that, you may wonder? After all I’ve only been active in the area for a mere 15 years, when Michel Gien, one of the authors of that paper (and co-founder of VirtualLogix) was a founder of Chorus Systèmes, a company that started building microkernels almost 30 years ago? As the paper explains, the Chorus engineering team founded VirtualLogix in 2002. (In fact, they founded Jaluna, the company renamed itself VirtualLogix in 2006.) So, having been doing microkernels for 30 years, shouldn’t they know them?

Maybe the 30 years are the problem. The view of microkernels presented in the paper corresponds very well to that of the “first-generation” microkernels from the 1980s, of which Chorus and CMU’s Mach are the best-known representatives. By the early 1990s, those microkernels were generally known to lead to poorly-performing systems. As a result, the whole microkernel concept got a bad name, as people believed that the performance problems were an inevitable result of microkernel-based designs.

However, in 1993 (“only” 15 years ago), Jochen Liedtke showed that this conclusion was utterly wrong. He demonstrated that the poor performance of first-generation microkernels was a result of poor design and implementation of those systems. He showed how a well-performing microkernel must be built. And he provided a constructive proof: his prototype (then called L3) of what is now known as the L4 microkernel family, which became the prime representative of second-generation microkernels, and the benchmark against all others are still measured today.

L4 didn’t beat previous kernels by a mere 10%. Not even by a factor of two. It beat them by a staggering order of magnitude: Mach by a factor of 20, and Chorus by a factor of 10! And this was published in the toughest OS conferences of all, the ACM Symposium on Operating Systems Principles (SOSP), whose reviewers are the world experts and not easily fooled. In fact, the performance demonstrated by various versions of L4 has never been beaten to this day, and referred to as “the speed of light” by researchers building competing systems!

Given those facts, we can safely rule out Chorus as a representative of the state of the art in microkernels, and everyone who thinks of microkernels in Chorus terms is clearly out of touch. Yet, what the VirtualLogix paper says about microkernels is a perfect match to Chorus, while totally at odds with the well-published principles of L4.

Interestingly, the VirtualLogix folks actually cite one of Liedtke’s papers (their reference 16) where Liedtke very precisely and lucidly describes the principles behind second-generation microkernels. It is utterly baffling how they can provide a description of microkernels that is so at odds with the very paper they cite.

What Liedtke’s paper describes is what is nowadays considered the standard definition of a microkernel (check wikipedia if you don’t believe me), also known as the minimality principle of microkernels:

A concept is tolerated inside the microkernel only if moving it outside the kernel, i.e., permitting competing implementations, would prevent the implementation of the system’s required functionality.

If you look at the claimed characteristics of microkernels in the VirtualLogix paper, you’ll find that almost all of them violate this definition. In other words, what is described there may be Chorus or Mach, but not what is since the mid ’90s the accepted notion of a microkernel.

Btw, there’s another interesting claim in that paper: “… all projects to re-architect Unix along a micro-kernel approach failed so far…” If I was QNX, I’d be pretty upset about that statement. They have been successfully marketing a Posix-compliant (that’s the general definition of Unix-like) system that is implemented as a multi-server system on top of a microkernel. For over 20 years. Calling QNX, with tens of millions of systems deployed, a failure of a microkernel system seems a bit strong. But it ain’t my problem, of course.

Similar things could be said about Symbian OS, except that they haven’t been going for quite that long. But they are deployed in the hundreds of millions, almost as widely as OKL4! Do you consider that a failure?

This blog is getting long, so I’ll stop before I bore you. I’ll do a more detailed rebuttal of their claimed microkernel properties soon…

2009/02/17

Benchmarks: How not to do them

Our competitors VirtualLogix have published a paper titled “A practical look at micro-kernels and virtual machine monitors” at last January’s IEEE Computer Communications and Networking Conference (CCNC) in Las Vegas. It paints a picture that, compared to their VLX system, microkernels (read “OKL4”) are an inferior basis for virtualization for CE devices.

This may come as a surprise to those who know that OKL4 is deployed in 250 million mobile phone handsets and VLX in zero! Do you really think our customers, which include leading chipset suppliers and handset OEMs, would deploy such a poor system ?

We’ll have a look at how the facts stack up. It turns out that the paper is full of flawed methodology and incorrect conclusions.

Unfortunately, I cannot point to an on-line copy of the paper. As participants of CCNC’09 we have an electronic copy, but the copyright is owned by the IEEE and their rules don’t allow us to re-distribute it. Furthermore, the CCNC proceedings aren’t even up on IEEE Xplore yet. (But note that my CCNC’09 paper is available on-line.) However, VirtualLogix have been busy distributing copies to potential customers, so some of you will have seen it, others will have to wait until it’s up on IEEE Xplore (or ask VirtualLogix for a copy).

There is so much wrong with the paper that it’s hard to work out where to start picking it apart. But given that the most damaging assertions relate to performance, let’s have a look at the benchmarks presented. We can make a few interesting observations.

Evaluation Platform

For one, the paper claims that it is comparing the inherent suitability of what they call “hypervisors” (represented by their VLX system) with “microkernels” (represented by OKL4, as if OKL4 wasn’t a hypervisor, but that’s for a future blog). To end up with a fair comparison, what should you use: the best representative, i.e. a mature, highly-optimised version, or a recent port that hasn’t been optimised?

Well, their choice was to use OKL4 on an ARM11 platform. OKL4 has supported ARM11 for a while, but no serious effort had been made into tuning its performance. On the other hand, our ARM9 version has been out there for a long time, and in fact we have published its performance years ago, and challenged everyone to match it. No-one has come forward, least of all VirtualLogix.

Clearly, to establish performance limits, OKL4 on ARM9 should be the starting point. Ignoring it and using ARM11 is an inadequate approach which I would not let any of my undergraduate students get away with.

Achieving the real performance

Everyone in the business knows that benchmarking is difficult. Even in a seemingly easy case as measuring lmbench performance on a native Linux system using a binary distribution (you can’t get it easier than this) turns out to be far from trivial. We’ve seen way too many cases where this would lead to nonsensical or irreproducible results.

Things tend to be much more tricky in the embedded space. Companies tend to spend weeks and months benchmarking products. It ain’t easy.

Under such circumstances, what would you expect to see if a competitor benchmarks your product, without your involvement (or even knowledge)? Would you expect that competitor to be impartial and disinterested in the result? Do you think your customers should put any faith at all in the results?

I leave the answer to you.

Data completeness and reproducibility

One of the important rules of publishing experimental results is to provide them in a way that an independent validation is possible, and to provide sufficient information for someone to do this. Not doing this is poor science that is not fit for publication.

Yet these rules have been thoroughly violated in that paper. All the data they present (on which they build their case) is the relative performance of OKL4 and VLX.

This means that it is impossible to say whether the measured OKL4 performance makes sense. I don’t know what the baseline is. The results could be a factor of ten off, and you couldn’t tell. This is appalling science!

You may wonder how this could pass through a scientific peer-review process. In my experience it clearly shouldn’t have. The (anonymous) reviewers of that paper were obviously out of their depth.

Apples vs oranges

Finally, the paper compares apples with oranges, without saying so (but in this case the reviewers were given no chance to find out, as the details were not revealed in the paper).

The story behind this is that VLX can be run in two ways: One is proper virtualization (this and only this is what they describe in the paper) where the hypervisor is in full control of hardware resources, and the guest operating systems are de-privileged. The other one, which is not mentioned in the paper at all, and thus not revealed to the reader, including the reviewers (but the presenter made a lot of fuss about it in the talk at CCNC) is what I call pseudo-virtualization: the guest OS is not de-privileged but co-located with the “hypervisor” in kernel mode. Obviously, this means that the hypervsior no longer has exclusive control over resources, which is why this completely fails to qualify as virtualization, even according to the definition they give in their own paper! (You don’t believe me? Check for their description of “optional” isolation on their website. Isolation of guests is implicit in a virtual-machine environment, not optional. Why would they do this? Could it be because the performance of their “isolated” execution mode is unimpressive?) [Note added 2012-01-08: VirtualLogix has since been acquired by Red Bend Software, and the original content is no longer available. However, the description of Red Bend’s “Mobile Virtualization” technology refers to an optional “Isolator” module, which provides “even stronger isolation”. Given that virtualization, by definition, provides full isolation, one can conclude that the above arguments still apply.]

What does this have to do with that paper? Well, in front of me I have our lmbench performance data for the 2.1 release measured on a Freescale iMX31 processor (the same used by VirtualLogix in their paper). The interesting observation is that for most measures, what they claim as the performance ratio between VLX- and OKL4-based virtualization is close to the ratio between the performance of native Linux and OK Linux (remember, this is an un-optimised OKL4 version on ARM11, much improved in the meantime). So, what are we to conclude? The virtualization overhead of VLX is essentially zero? If you believe that, then I’ve got a great deal for you in snake oil with really amazing healing properties!

What’s really behind this is apples and oranges. As became clear during the talk at CCNC, they compared pseudo-virtualized Linux on VLX (i.e., the Linux guest running privileged) with properly-virtualized (we don’t do anything else) OK Linux (running in user mode) on OKL4. I leave it to you to judge whether that is a fair comparison.

Summary

So, in summary, the benchmark data presented in the paper is worthless: it uses an unoptimised OKL4 platform where a well-optimised one is available, even then it may not be showing the real performance of the OKL4 system they used (because of the challenges involved in benchmarking), and it compares apples with oranges.

We aren’t afraid of comparing our performance with VLX. But it has to be a fair, transparent, apples with apples comparison.

I’ll discuss some of the other main faults of the paper in future blogs, stay tuned.

2008/07/04

Are microkernels and hypervisors converging?

I was asked this question by a reader of my recent article Microkernels Rule! in embedded.com. The short answer is that hypervisors are becoming microkernels.

Yes, there’s a convergence, but it’s mostly the hypervisors that are changing. Microkernels have been successfully used as hypervisors
for over ten years (L4Linux was done at Dresden and published in ’97). In x86 space, L4Linux has had roughly the same performance as
XenoLinux. And, of course, OKLinux running on OKL4 represents the cutting edge of virtualization technology for embedded systems.

On the other hand, hypervisors are definitely (albeit slowly) becoming microkernel-like. People are becoming sensitive to the size of the
trusted computing base, and the VM people are waking up to the fact that as the use cases for virtualization increase, high-performance
communication (the traditional strength of microkernels) is becoming a critical requirement. I’ve heard two keynotes last year where VMware-founder Mendel Rosenblum was describing his vision for virtualization. If you listened to his arguments carefully, it was obvious that the
implication was that the hypervisor was turning into a microkernel.

The embedded space is likely driving this, as the traditional VM model (with its implied strong isolation) isn’t appropriate there.

So, yes, there’s a convergence. Hypervisors are turning into microkernels. OKL4 is already at the convergence point.

2008/04/10

About Security: Singularity vs L4, Part 2

In a recent blog I compared Singularity with L4, explaining the fundamental difference in the two approaches and showed that L4 had a significant performance advantage. I promised a follow-up posting focussing on security, particularly as it related to the embedded-systems context.

Singularity is written in a modern, type-safe language (Sing#), while OKL4 is written in type-unsafe C and assembler. This means that many classical implementation bugs are prevented in Singularity by language rules (enforced by the compiler) while there is no such protection for the OKL4 implementation (although static-analysis tools, such as Goanna, can make up for much of this difference). More importantly, code written in Sing# is readily amenable to formal verification, due to the well-defined semantics and coding restrictions enforced by the language.

So, this should mean that it is relatively easy to formally verify (prove) the implementation correctness of Singularity, much easier than in OKL4, and as such, Singularity should be a superior platform (trusted computing base) for safety- and security-critical systems. Right?

Wrong.

The above argument ignores one important fact: Singularity isn’t all written in Sing#. Why not?

For one, every OS kernel requires code that is inherently type-unsafe. You can’t avoid pointer arithmetic in the kernel, you can’t avoid raw byte copies (for message passing, saving context etc), you can’t avoid jump tables (for exception vectors). These are inherently unsafe operations that cannot be expressed in a type-safe language. Hence, Singularity is also forced to use low-level code written in C, assembler or something similar.

But what they really forget to tell you is that, Sing#, as a managed language, needs significant run-time support. This isn’t just all the usual libraries, some of which probably require type-unsafe implementations for performance. It also includes the garbage collector. A big, ugly piece of tricky, unsafe code. And you can be sure it’s big. Certainly compared to OKL4.

Galen Hunt, the leader of the Singularity project, in private conversation freely admitted to me that “the amount of dirty code in Singularity is probably larger than all of L4”. (Where “probably” is an understatement, my estimate is that the garbage collector itself is several times OKL4’s code size.)

So, while formally verifying the Sing# implementation might not be all that difficult, verifying the dirty bits certainly is a much bigger job than verifying the complete OKL4 kernel. This is why you won’t be seeing a fully formally-verified Singularity any time soon. But you will see a fully formally-verified L4 kernel this year, and the commercial OKL4 version soon after.

The implication for security, safety and reliability should be obvious. A fully-verified kernel (i.e. OKL4) will provide a truly trustworthy foundation on which to build highly safe and secure system. Singularity can’t do this in the foreseeable future.

2008/04/03

Microkernels vs hypervisors

I get asked this question a lot: what is the difference between a hypervisor and a microkernel? Frequently the question is accompanied by competitor-planted bullshit such as: isn’t it better to use a hypervsior for virtualization, as it is secifically designed for that, while a microkernel isn’t? But the question also pops up at scientific meetings, such as this week’s IIES workshop.

The short answer is that a microkernel is a possible implementation of a hypervsior (the right implementation, IMHO), but can do much more than just providing virtual machines.

For the long answer we have to dig a bit deeper, as the two have different motivations:

A hypervisor, also called a virtual-machine monitor, is the software that implements virtual machines. It is designed for the sole purpose of running de-privileged “guest” operating systems on top (except for the deceptive pseudo-virtualizers). As such it is (or contains) a kernel (defined as software running in the most privileged mode of the hardware).
A microkernel is a minmal base for building arbitrary systems (including virtual machines). It is characterised as containing the minmal amount of code that must run in the most privileged mode of the hardware in order to build arbitrary (yet secure) systems.

So the primary difference between the two is purpose, and that has implications on structure and APIs.

By definition (the generality requirement), a microkernel can be used to implement a hypervisor. This is what we are doing with OKL4, and has been done with verious members of the L4 microkernel family for over ten years. In fact, the 1997 SOSP paper by Härtig et al was the first to demonstrate a high-performance para-virtualized main-stream OS (Linux), and there are no published data on a para-virtualized Linux on ARM processors that out-performs OK Linux. Obviously, (well-designed) microkernels are an excellent base for hypervisors.

How about the other way round? Can a hypervisor be used to implement a microkernel?

In general not. As said above, a hypervisor is designed for a single purpose, and that is to run guest OSes. It could be used to virtualize a microkernel, but that isn’t the same (and would certainly result in sucking performance).

The reason is that a hypervisor generally lacks the minimality of a microkernel. While less powerful (in the sense that it doesn’t have the generality of a microkernel) it typically has a much larger trusted computing base (TCB) than a microkernel. It contains all the virtualization logic, and all physical device drivers needed to support the virtual machines. For example, the Xen hypervisor itself is about 5–10 times the size (in LOC) of the OKL4 microkernel. In addition, it has the privileged special virtual machine “Dom0”, which contains a complete Linux system, all part of the TCB (which is therefore of the order of a MLOC). Compare this to OKL4 which lets you run security-critical code with a TCB as small as 15kLOC. A small TCB is important for safety, security and reliability (it’s a consequence of the security principle of least authority, POLA), and as such especially important in mission-critical embedded systems.

So, what about the people who claim “virtual machine monitors are microkernels done right?” as (Xen co-inventor) Steven Hand did tounge-in-cheek at HotOS’05? Steven essentially claims that the microkernel folks have been focussing on the wrong things, such as fast IPC. I debunked his arguments in a follow-up paper. At that HotOS workshop I also predicted that the VMM folks only pretended that IPC didn’t matter, and that within two years they would be writing papers about fast communication between VMs (a problem microkernels solved 15 years ago). I was right, of course.

And the reality is that hypervisors are starting to become more like microkernels. People are starting to discover that virtualization by itself doesn’t solve many problems, particularly the security and reliability issures resulting from mushrooming complexity. (For an overview of those issues see my recent paper on the role of virtualization in embedded systems.) Anyone who has recently heard a talk by VMware founder Mendel Rosenblum knows what I mean. In fact, when he gave his keynote at last year’s Usenix conference, he was asked (not by a microkernel guy!) “aren’t you re-inventing microkernels?” Touché…

If even the “memory is cheap” server folks are discovering the importance of a small TCB, people who want to use virtualization in embedded systems should certainy take notice.

In summary, microkernels have demonstrated that they can do what hypervisors can. But hypervisors are far away from doing what microkernels can. And the most powerful of those is to make your TCB truly trustworthy. For the foreseeable future, this is only possible with microkernels.

The conclusion seems obvious: microkernels are virtual-machine monitors done right, and more.

To learn more, see some of my recent white papers:

as well as my recent blogs, specifically

2008/03/18

Who is hacking your pacemaker? Or your brakes?

Security and safety is important for all embedded systems, not just those which deal with money or national security.

This is why you’ll be seeing more stories like the one of the researchers who hacked an implantable defibrillator (ICD) and demonstrated that you could kill the wearer remotely (although only from a short distance).

What’s really behind this is that embedded systems are changing from encapsulated closed devices to networked (and frequently open) systems. And as soon as a system has any form of wireless connectivity, it is subject to new classes of attacks, like it or not. You’ll see the same thing happening with cars, aeroplanes, home entertainment systems, you name it. As soon as they get networked, crackers will get in there.

The problem is that most of these devices aren’t designed with security in mind. In many cases a networked system starts off as the new model of a non-networked one. And, as you can imagine, the internal software architecture doesn’t change much. After all, it’s just a maintenance interface (which over time becomes a convenience function etc). This is apparently what happened to the ICD. And the designers made it easy for the black hats by using a completely insecure communication protocol.

The reality is that most embedded systems these days hold assets that must be protected (just think about it: which of the devices you own does not contain data that would help an idendity thief?) or can cause damage if they misbehave. This is just another way of saying what I said above: security and safety are relevant for all embedded systems.

And security and safety aren’t something to add later. If it isn’t designed into the system, it’s virtually impossible to achieve. This must be recognised in building embedded systems: they need to be designed for security and safety.

How do you do this? By a defensive structure, where faults in one component are prevented from propagating into other components. And by clearly identifying all security assets, and putting in place the means to protect them. First of all this means to apply the principle of least authority (POLA): A component that has no need to access certain data should not be able to. And components on which security or safety is dependent must be minimised and protected: the system needs a minimal trusted computing base (TCB). And a security policy must be in place that controls where information can and cannot get to (and which cannot be circumvented). And the TCB must actually be trustworthy.

This can only be done by structuring a system into small and mutually protected components, which communicate only via well-defined channels, subject to the system’s security policy. And that policy must be implemented by a minimal TCB.

The way to achieve this is by basing your system on secure high-performance microkernel. Secure embedded systems need microkernels. And there’s no better one than OKL4.

2008/03/13

Q: What is the difference between a microkernel?

A: One protection is both different…

I’m of course referring to the latest twist in the microkernel story: Singularity vs L4. The recent public release of the Singularity source by Microsoft has generated some attention, and invited some comparisons with L4. It has also “enriched” the blogosphere with blatantly incorrect statements such as “the use of SIPs effectively eliminates the overhead traditionally incurred by context-switching in conventional microkernels.”

So, what’s going on?

L4 and Singularity can be considered as the leading microkernel representatives of two alternate approaches to protection: hardware or software mechanisms. The L4 approach is to use hardware mechanisms (processor modes and the MMU/MPU) to protect processes (and their data) from each other. This is the standard way for providing protection in operating systems, and the idea of microkernels based on this approach goes back to Brinch Hansen’s Nucleus (1970).

The alternative, used by Singularity, is to use software means, specifically language-based protection (using the type system) and static analysis. While most people think this is a revolutionary idea, it is actually quite old. The approach has been used by the Burroughs mainframes since 1961! (I used a B6700 a lot in a former life, and it was sloooooow!)

There is no such thing as a free lunch, and protection is no exception.

The cost of hardware-based protection comes in the form of system-call exception (to change the execution mode from user to kernel, aka “trapping into the kernel”) and the cost of setting up, maintaining and tearing down mappings (manipulating page tables and MMU). Kernel entry costs anything from one cycle (Itanium) via a dozen or so cycles (decent RISC processors) to many hundreds of cycles (x86). Mapping-related cost come mostly in the form of handling TLB misses (typically a few dozen cycles). For most loads (exhibiting decent locality) the mapping overhead is low, except on processors without a tagged TLB (x86 is again the main offender) where the TLB is completely flushed on each context switch (leading to many TLB reloads just after the switch).

How about software, a la Singularity? Surely, there are no such overheads, as invoking the kernel is just a function call, and a process switch is just a thread swich without change of addressing context? That’s true, but many overlook the fact that this is balanced by other overheads resulting from language-based protection: a run-time range check must be applied for each pointer-dereference or array-indexing operation, unless one can statically (i.e. at compile time) determine that it is safe (this is where static analysis comes in). All those run-time checks obviously cost. On top of this is the cost of garbage collection, as type-safe languages (like Sing#) have automatic memory management (this is also an aspect of language-based protection).

So, both schemes have a cost. Which is higher? That’s really impossible to say a priori, and in general depends on the nature of the computation. The main reason is that you compare a (relatively small) overhead that occurs all the time (most statements of the programming language) against a much higher overhead that occurs much less frequently (invoking system calls and performing context switches).

The only way to tell is to build the systems and then benchmark against each other. It is thus just a little bit curious that the Singularity folks (given that they routinely refer to Singularity as a microkernel) never benchmark it against the top-of-the-line microkernel, i.e. L4. Instead they publish benchmarks comparing with Windows, FreeBSD and Linux. What does this prove?

How should the two be compared? Well, there are two extreme cases we can look at, and, as it happens, the information is all there if you know where to look for it.

The first is to measure the overhead imposed on normal programs by the run-time checks, garbage collection, etc. This is the extreme L4-friendly benchmark, as obviously the cost to a program running on L4 would be zero. The friendly Singularity folks have conveniently measured a lower limit of this cost, published in their white paper Singularity: Rethinking the Software Stack: They find that turning off compiler run-time checks gains 4.7% performance. Note that this is a lower limit on the overhead any code (OS as well as application) will experience on Singularity. The real cost will be higher, as their test probably still had the garbage collector active, and almost certainly didn’t apply all the additional compiler optimisations that are possible once the runtime checks are removed. So, in summary, running any code on Singularity gives you a slowdown of at least about 5% compared to L4. This is for x86, but the processor architecture is unlikely to have a significant effect on this.

The other extreme is to look at a scenario where two processes do nothing but sending messages between each other. This is the extremely L4-unfriendly benchmark, as it is dominated by the cost of hardware protection (kernel entry and exit, switching addressing context), none of which applies to Singularity. The paper Sealing OS Processes to Improve Dependability and Safety measures this as “message ping/pong” (Table 8) and “IPC costs” (Table 9). As I said, they only show measurements for Singularity, Windows, FreeBSD and Linux, not the real benchmark for microkernel performance. Fortunately, L4 numbers are available too, and have been for years: The Karlsruhe team benchmarked L4 IPC on an AMD-64 machine. (It isn’t exactly the same box which was used in the Singularity benchmarks, but it is close enough, 1.6GHz vs 2.0GHz and the same architecture.) The result: cross-process IPC in Singularity costs 803 cycles for one byte, 933 cycles for four bytes, while L4 does it in 230 cycles for up to 24 bytes. Actual time: 402/467ns for Singularity, 144ns for L4.

In other words, L4 is about three times faster in the case that’s maximally biased in Singularity’s favour! So much about “eliminates the overhead traditionally incurred by context-switching in conventional microkernels”…

Now, this is also on a very L4-unfriendly architecture (x86). Things look much better for L4 on processors with a tagged TLB (which is just about every one other than x86). Such processors, like ARM, MIPS, PowerPC, are prevalent in embedded systems. Embedded systems, especially battery-powered ones, also happen to be particularly sensitive to performance. So the result is clear: L4 yet again wins the performance stakes—hands down!

I will compare the two approaches from the point of view of security and relevance to embedded systems in a future blog.

2008/02/13

Trusted vs trustworthy computer systems

What does the distinction between “trusted” and “trustworthy” mean for computer systems?

People tend to talk of a trusted computer system when they refer to a system that is trusted to perform security- or safety-critical operations. Unsurprisingly, the military and defence communities have worried about this for a while, and the term is explicilty used in the famous Orange Book, officially referred to the “Trusted Computer System Evaluation Criteria”. It has now been replaced by the Common Criteria for IT Security Evaluation, or Common Criteria for short.

The Orange Book and the CC define an evaluation process that aims to ensure that the systems they trust to do their safety- or security-critical operations are actually trustworthy. The idea is that systems are subjected to a more-or-less thorough security evaluation, and if they meet certain criteria, they can then be certified as trustworthy to certain assurance level.

This is all fine for expensive military systems where the odd dozen millions for security evaluation doesn’t matter that much. (And it is expensive, the industry estimate is that CC evaluation at the highest level, EAL7, cost $10k per line of code!) But for embedded systems, particularly consumer goods sold for not more than a few 100$, such as the ubiquitous mobile phones, this aproach isn’t feasible.

Well, actually it isn’t even good enough for what it’s designed. The expensive evaluation certainly will make you sleep better if you subscribe to the theory that anything expensive must be good, so something very expensive must be very good, right? If you’re a bit more of a sceptic, you might be interested in looking at what CC actually gives you. It turns out that besides a nice stamp of approval, it gives you no security guarantee whatsoever. It’s a glorified ISO-9000 process. Even at the highest level. If you don’t believe me, have a look at the relevant wikipedia article. Or my recent white paper “Your System is Secure? Prove it!”

At OK Labs we are going in a direction which makes much more sense. One the one hand, we are making systems more trustworthy by minimising their trusted computing base (TCB). If the security-critical code base is small (and with OKL4 it can be as small as 20,000 lines) then it is inherently less faulty than something that’s hundreds of thousands of lines of code, even if they have gone through an expensive process producing reams of printed paper. This is achieved by our OKL4 microkernel technology, the hottest thing on the planet (but I may be a bit biased ;-)). The OKL4 microkernel provides a minimal basis for secure systems. And it supports virtualization, so you can run a complete operating system (such as Linux) in a virtual machine, without having to trust it. And all that at negligible performance overhead.

So, small is beautiful as far as security is concerned. But it goes further. As explained in another white paper, OKL4 is small enough that it is possible to prove that it is secure. Using sound and solid (but slightly non-trivial) maths—the next hottest thing on the planet. And we don’t just prove things about some abstract model of the system, we prove the actual C/assembler code. Nothing short of this gives you a guarantee that your system is trustworthy. And you shouldn’t have to rely on less.

2008/01/25

OS Co-Location is NOT Virtualization!

I’m sure you’ve all seen this phenomenon, which I’ll call everything-is-a-taco. Assume that for some reason tacos are suddenly very popular. The next thing that happens is that any sort of fast food starts being marketed as a taco—you get hamburgers that look and taste like tacos, pizzas that look and taste like tacos and salads called taco fiesta… you get the picture.

Similar things happen in the high-tech world. System virtualization is a big thing these days in the enterprise world; everybody seems to be into it. Well, at least the term is usually applied correctly.

Not so outside the enterprise space. Virtualization for embedded systems is receiving increasing attention, and the everything-is-a-taco phenomenon is clearly there. Some people are spreading confusion by calling a hamburger a taco.

Let me explain. As I have discussed in a recent white paper, a core characteristic of virtualization is that it provides multiple virtual machines, each of them looking like a real machine. The virtual-machine monitor, or hypervisor, creates this illusion by controlling all system resources. The virtual machines only get to see virtual resources, and the hypervisor controls how these are mapped to physical resources.

This separation of resources is behind the popularity of virtualization: it strongly isolates virtual machines from each other. It is what enables all the cool uses of virtual machines in the enterprise space, such as server consolidation, load balancing though life migration, firewalling off applications which are at risk of being compromised, etc, etc. Isolation in a virtual machine environment is not optional, it is inherent.

Would someone get away with marketing as “virtualization” a setup where several operating systems (say Linux and Windows) are intermingled on the same processor, each running in kernel mode with full access to all physical memory? Where each bug in one OS would take down all the others? Would they not rather get laughed out of the room suggesting this approach? Such a setup would be properly called “OS co-location”. Have you ever heard of it in the enterprise marketplace? Or seen it sold as “virtualization”? Of course, not, because it wouldn’t be of much use.

Yet exactly this is happening in embedded systems, where some vendors are brazenly selling OS co-location as virtualization. It is not. Virtual-machine isolation is not optional, it’s inherent. A hamburger is not a taco, and OS co-location is not virtualization, if anything it’s pseudo-virtualization.

And why would you want it, if you can have embedded virtualization technology, such as OKL4, that gives you real virtualization at negligible overhead? I have never seen a use case where pseudo-virtualization has any real benefit over virtualization.