Microkernels Really Do Improve Security

2018/08/23

Many of us operating systems researchers, going back at least to the US DoD’s famous 1983 Orange Book, have been saying that a secure/safe system needs to keep its trusted computing base (TCB) as small as possible. For system of significant complexity, this argues for a design where components are protected in their own address spaces by an underlying operating system (OS). This OS is inherently part of the TCB, and such should itself be kept as small as possible, i.e. based on a microkernel, which comprises only some 10,000 lines of code. Which is why I have spent about a quarter century on improving microkernels, culminating in seL4, and getting them into real-world use.

Monolithic vs microkernel-based OS structure.

It is intuitive (although not universally accepted) that a microkernel-based system has security and safety advantages over a large, monolithic OS, such as Linux, Windows or macOS, with their million-lines-of-code kernels. Surprisingly, we lacked quantitative evidence backing this intuition, beyond extrapolating defect density statistics to the huge TCBs of monolithic OSes.

Finally, the evidence is here, and it is compelling

Together with two undergraduate students, I performed an analysis of Linux vulnerabilities listed as critical in the Common Vulnerabilities and Exposures (CVE) database. A vulnerability is tagged critical if it I easy to exploit and leads to full system compromise, including full access to sensitive data and full control over the system.

For each of those documented vulnerabilities we analysed how it would be affected if the attack were performed against a feature-compatible, componentised OS, based on the seL4 microkernel, that minimised the TCB. In other words, an application running on this OS should only be dependent on a minimum of services required to do its job, and no others.

We assume that the application requires network services (and thus depend on the network stack and a NIC driver), persistent storage (file system and a storage device driver) and console I/O. Any OS services not needed by the app should not be able to impact its confidentiality (C), integrity (I) and availability (A). For example, an attack on a USB device should not impact our app. Such a minimal TCB architecture is exactly what microkernels are designed to support.

The facts

The complete results are in a peer-reviewed paper, I’m summarising them here. We find that of all of the 115 critical Linux CVEs, we could analyse 112, the remaining 3 did not have enough information to understand how they could be mitigated. Of those 112:

33 (29%) were eliminated simply by implementing the OS as a componentised system on top of a microkernel! These are attacks against functionality Linux implements in the kernel, while a microkernel-based OS implements them as separate processes. Examples are file systems and device drivers. If a Linux USB device driver is compromised, the attacker gains control over the whole system, because the driver runs with kernel privileges. In a well-designed microkernel-based system, only the driver process is compromised, but since our application does not require USB, it remains completely unaffected.
A further 12 exploits (11%) are eliminated if the underlying microkernel is formally verified (proved correct), as we did with seL4. These are exploits against functionality, such as page-table management, that must be in the kernel, even if it’s a microkernel. A microkernel could be affected by such an attack, but in a verified kernel, such as seL4, the flaws which these attacks target are ruled out by the mathematical proofs.
Taken together, we see that 45 (40%) of the exploits would be completely eliminated by an OS designed based on seL4!
Another 19 (17%) of exploits are strongly mitigated, i.e. reduced to a relatively harmless level, only affecting the availability of the system, i.e. the ability of the application to make progress. These are attacks where a required component, such as NIC driver or network stack, is compromised, and as a result compromising the whole Linux system, while on the microkernel it might lead to the network service crashing (and becoming unavailable) without being able to compromise any data. So, in total, 57% of attacks are either completely eliminated or reduced to low severity!
43 exploits (38%) are weakly mitigated with a microkernel design, still posing a serious security threat but no longer qualifying as “critical”. Most of these were attacks that manage to control the GPU, implying the ability to manipulate the frame buffer. This could be used to trick the human user into entering sensitive information that could be captured by the attacker.

Only 5 compromises (4%) were not affected by OS structure. These are attacks that can compromise the system even before the OS assumes control, eg. by compromising the boot loader or re-flashing the firmware, even seL4 cannot defend against attacks that happen before it is running.

Effect of (verified) microkernel-based design on critical Linux exploits.

What can we learn from this?

So you might ask, if verification prevents compromise (as in seL4), why don’t we verify all operating systems, in particular Linux? The answer is that this is not feasible. The original verification of seL4 cost about 12 person years, with many further person years invested since. While this tiny compared to the tens of billions of dollars worth of developer time that were invested in Linux, it is about a factor 2–3 more than it would cost to develop a similar system using traditional quality assurance (as done for Linux). Furthermore, verification effort grows quadratically with the size of the code base. Verifying Linux is completely out of the question.

The conclusion seems inevitable: The monolithic OS design model, used by Linux, Windows, macOS, is fundamentally and irreparably broken from the security standpoint. Security is only achievable with a (verified) microkernel design. Furthermore, using systems like Linux in security- or safety-critical applications is at best grossly negligent and must be considered professional malpractice. It must stop.

From → operating systems and virtualization, safety and security, seL4

8 Comments

123 permalink

One question that comes into my mind that could a microkernel design introduce some new attack vectors in addition to mitigating some of the ones that would be present in a monolithic design.

Btw, referring to cvedetails as “the” CVE database seems little inaccurate to me, as CVE is not really a database in itself, but a standard for vulnerability databases and such. But I guess that data eventually gets synced between MITRE’s database, NVD and other databases in a reasonable time. Cvedetails seems to source the data from MITRE, right?

Reply
- Gernot permalink
  
  Correct about the somewhat loose reference to the CVE database.
  
  I don’t how a better systems architecture would enable new classes of attacks, minimising the TCB has been the accepted way of reducing the attack surface since the Orange Book. But there is a risk that the benefits of architecture are over-estimated: once you actually build a system you might find that it is harder than thought to keep it strictly modular. However, there has been plenty of experience with this, so that risk is low.
  
  Reply
nunya permalink

“Furthermore, using systems like Linux in security- or safety-critical applications is at best grossly negligent and must be considered professional malpractice.”

This seems a little over the top to me since i’m not seeing an OS that is production ready anywhere. What micro kernel-based OS is ready that i can run all the linux software (or feature complete alts) i need to be able to run for firewalls and servers? Genode OS Framework looks pretty cool, but they are too busy focusing on desktop GUI stuff and now mobile (cool, but wrong priorities, imo) to make a feature complete FW/server OS. Maybe i’m mistaken? The object is not to just boot hardware, after all.

Reply
Gernot permalink

Well, if you want full Linux compatibility, you’re stuck with Linux and thousands of vulnerabilities. If waiting for the zero-day is acceptable, then all good, but that seems the wrong strategy for something security *critical*. And for *safety-critical* use cases, microkernels are pretty much the norm and have been for yonks. That doesn’t stop some folks from putting Linux onto medical devices that can kill, and that’s definitely malpractice.

Also, you need nowhere near full Linux functionality to run a firewall. In fact, some people are working on doing this with seL4. And I believe QNX is running inside CISCO products.

And regarding “servers”, depends what kind of servers. For Cloud hosting, many folks use Xen, which isn’t quite a microkernel but not that far away (about a factor 10 bigger than seL4).

Reply
- nunya permalink
  
  I should have been more specific i guess. I’m really talking about the applications available in the linux ecosystem for a fw/router/wap, etc (nftables, ssh, dnscrypt-proxy, dnsmasq, hostapd, etc) or server (web/samba/xmpp/voip/email, etc). do these or suitable (full featured) alternatives exist for all of this? if not, i don’t see that there is much of a choice for these use cases.
  
  Reply
Mo Li permalink

Dear Prof Gernot,
As author said, the effect of CVE-2015-4001(a usb driver bug) can be eliminated with microkernel design because the affect would be confined to the specific driver.

However, CVE-2016-7912 is also a USB bug, and it is tagged as CA, that is, with microkernel design the analysed hypothecal security critical process still suffers Credentiality and Avalability problems. Why is CVE-2016-7912 different from CVE-2015-4001?
Do you have analysis record for each CVE listed in the paper?
Thank you very much!

Reply
- Gernot permalink
  
  After more than 5 years it’s difficult to reconstruct details. I do have an evaluation spreadsheet I could upload when I get the time…
  
  Reply

Trackbacks & Pingbacks

Microkernels Really Do Improve Security - Cog

Microkernels Really Do Improve Security