10 Years seL4: Still the Best, Still Getting Better

2019/08/06

A week ago, on 29 July, we at Trustworthy Systems celebrated seL4 Freedom Day, the 5^th anniversary of the open-sourcing of seL4. Furthermore it was seL4’s 10^th birthday – the anniversary of the completion of the first functional-correctness proof (of seL4 and of any operating system), showing that the implementation was bug-free.

This anniversary prompts me to reflect on what happened with seL4 in those 10 years, what we have achieved with it, how it evolved, and what the on-going challenges and future directions are. I hope I’ll find the time to follow this up with more in-depth examinations of some of those developments.

seL4 was build from the beginning with three objectives in mind, all of equal importance:

to have its implementation formally proved correct (which is what we mean when we say “verified”)
to solve long-standing problems with resource management in microkernels (and OSes in general)
to be suitable for real-world use.

I’ll look at how seL4 addresses all three.

The Proofs

The defining feature of seL4 is clearly its verification story, starting with the functional correctness proof. And, amazingly, it’s still fairly unique in this respect, much more so than I had assumed when we first created it. I had estimated we’d have a window of 3–5 years before someone created another verified OS kernel. Turned out I totally over-estimated the competition 😉.

While there are now a number of other OSes around with some sort of a code verification story, they are almost all toys (notwithstanding what their authors might be claiming). The only verified OS I have heard of that seems to have a defensible claim of being suitable for real-world use is the recently verified PROVENCORE system from French company Prove&Run. And even this system has a much narrower target domain: it’s designed for use in TrustZone-based trusted execution environments (TEEs), with a static architecture, and gets away with a very simple protection model. seL4, in contrast, is fully dynamic, and can support arbitrary secure system architectures, thanks to its flexible capability-based access-control model.

More proofs

The original proof showed that seL4’s C code was a correct implementation of its formal specification (mathematically speaking the code is a “refinement” of the spec). This was done for an ARM11 processor (32-bit Arm v6 architecture). There were a few missing bits and pieces, most have been completed by now.

The main remaining hole is that the kernel’s boot code is still not verified. This means that our proofs say that if the kernel boots into a safe state, it will remain in a safe state. The boot-code verification needs to show that the kernel actually gets into a safe state initially. Clearly this proof is overdue (and we are working on it as a background activity).

Beyond that, much progress has happened on the verification side:

seL4 proof chain.

We proved (in 2011) that the kernel is able to enforce integrity, i.e. prevent a subject from modifying data without explicit authorisation. This shows not only that the spec is correctly implemented, but that it has the desired property (i.e. integrity enforcement).
We proved (in 2013) that the separation-kernel configuration of seL4 enforces confidentiality in the (very strong) information-flow sense, meaning it can prevent unauthorised information leakage, including through covert storage channels. This is the complement of the integrity property, showing that (direct or indirect) reads only succeed if authorised. Note that the formalism has no notion of time, so this infoflow property cannot preclude timing channels (timing-channel prevention are one of our present foci).
We developed (in 2013) a translation-validation toolchain which proves that the binary (produced by the compiler and linker) is a correct translation of the verified C code. This means we do not have to trust the compiler (we use gcc), nor our assumptions on C semantics, drastically reducing the trusted computing base (TCB) of our verification.

Together, these proofs show that not only is the kernel’s implementation free of bugs, and that the compiler did not introduce bugs on its own, but that the kernel enforces security in a very strong sense, and that property applies to the binary that is executing on the silicon.

We finally performed (in 2011) a complete and sound worst-case execution-time (WCET) analysis of seL4. To my surprise, this was the first such analysis performed of a protected-mode OS kernel in the literature, and is a core ingredient making seL4 suitable (and superior to any other system) for use in safety-critical real-time systems where not all code is trustworthy (so-called mixed-criticality systems, MCS). We have meanwhile improved this analysis, using the translation-validation framework to link properties proved about the source code to the binary. This improves the assurance of the whole process, and allows us to leverage our correctness proofs to bound loops and eliminate infeasible paths in the binary.

More architectures

The original proofs were done for Arm v6, and have since been ported to Arm v7. We added verification of the Arm v7a virtualisation support (“hyp mode”) in April’17. Arm v7a remains the architecture with the most complete verification story. This made seL4 a verified hypervisor.

In July’18 we completed the functional correctness proofs for x64, making this the first verified 64-bit version of seL4.

And before the end of this year we expect to complete the functional-correctness proofs for the 64-bit RISC-V architecture, as well as the translation-validation toolchain that carries those proofs through to the binary.

Still to do

There are a number of things still to be completed. I have mentioned kernel initialisation earlier. There are also a number of parts that were not verified to the same degree as the rest of the kernel, in particular MMU management. A PhD thesis on formalising the MMU has just completed, so this is mostly done.

We are also working on verifying the multicore version of seL4. This is challenging because of seL4’s uncompromising design for performance, which means that the implementation is racy. Removing races would make the problem far easier to solve, but we won’t accept the performance compromises this would imply. After all, our motto is “security is no excuse for poor performance”, a core differentiator to other high-assurance systems.

Resource Management

We developed seL4, starting in 2004, on the back of 10 years of experience with high-performance L4 microkernels, including the OKL4 kernel from Open Kernel Labs, which developed out of our L4-embedded kernel and ended up on billions of Qualcomm modem chips. Another fork of L4-embedded is what now runs on the Secure Enclave of all Apple iOS devices.

Issues of earlier L4 kernels

During that time we learnt about a number of shortcomings in the original L4 design. Several of those were already fixed in our L4-embedded kernel (a fork of the Karlsruhe Pistachio kernel), including overcoming the restrictive nature of the overloaded synchronous IPC, and complementing it with an asynchronous notification mechanism. The IPC model kept evolving with seL4, until it finally settled into a protected procedure call, complemented by Notifications which are binary semaphores.

The removal of a number of non-minimal features, including “long” IPC, and IPC timeouts, had already begun with L4-embedded and was adopted in seL4. We further eliminated a number of implementation tricks that had outlived their usefulness, including the virtual TCB array, and the process-kernel design.

By the time we started with seL4, the L4 community had reached a consensus that the model of addressing IPC to threads needed to be replaced by capability-based addressing and port-like IPC endpoints. (This was helped along by Jon Shapiro’s observation that the traditional L4 model had covert storage channels.)

Capabilities also cleanly solved another issue with original L4, that of limiting communication. The original model relied on an (inflexible) process hierarchy and redirection to a monitor process (“chief”) to limit data flow. Capabilities provide a cleaner, simpler and low-overhead model: Having a privilege does not in itself imply the ability to share that privilege, an additional grant right is needed to pass on capabilities.

Our 2016 paper discusses these issues in detail.

Untypeds

An issue that remained unresolved pre-seL4 was principled management of kernel memory. Prior L4 kernels had a kernel heap for allocating kernel data structures, which lead to poor isolation and the potential for denial-of-service (DoS) attacks; some kernels introduced kernel memory pools and quota as an unconvincing fix. Kevin, who had experimented with various models for years, developed the present seL4 model: The kernel has no heap whatsoever, and a strictly bounded stack; it never allocates memory after boot. Instead, user-level code is required to explicitly provide memory to the kernel for any operation that requires meta-data allocation. The mechanism for doing this is seL4’s Untyped memory, and retyping it onto kernel-object types.

This model of retyping Untyped memory is extremely powerful. It is a very simple abstraction for managing (spatial) resources: A security domain can only create kernel objects out of whatever amount of Untypeds it is given. In particular, this makes it impossible for an agent to interfere with a domain’s ability to create kernel objects, thus avoiding DoS attacks by construction.

seL4’s memory management model extends user partitions into the kernel

Furthermore, the need for usermode to provide working memory to the kernel means that partitioning userland will implicitly lead to partitioning of kernel memory, a very powerful property. This has been a major enabler of our isolation (integrity and confidentiality) proofs.

The downside is that the model is tricky to use and makes it easy to shoot yourself in the foot (as many students will attest). However, it is not the job of the microkernel to protect your foot from yourself, it is to protect you from malicious code by providing strong isolation. This is exactly what seL4 does extremely well.

Time

The biggest remaining hole in the resource management story of the original seL4 had to do with managing time (as stated in the 2016 paper). L4 traditionally had a weak (if any) temporal isolation story, and at the time of the design of seL4, this was left in the too-hard basket, and we used a fairly naive approach to scheduling. This was always meant to be addressed later.

Temporal integrity

We fixed some minor issues when performing the WCET analysis of the kernel, including replacing the (in average fast but worst-case extremely slow) “lazy scheduling” by Benno scheduling (which is as fast in average without a pathological worst case).

But the main problem remained: no accurate accounting for time, especially with shared servers. This meant that seL4’s support for hard real-time (RT) systems was limited: only in a narrow set of scenarios could it provide the required integrity guarantee, namely that a critical thread will be able to meet its deadline.

“Passive’ servers execute on a client’s donated scheduling context.

We finally addressed this cleanly by evolving the model of scheduling contexts (developed early by the Dresden group for a pre-capability kernel) and integrating it with seL4’s capability model. This finally made time just another resource authorised by capabilities, and thus a first-class resource managed in a principled fashion. Scheduling concepts are now first-class kernel objects; they represent the right to access a share of the CPU.

This model is implemented in the MCS kernel, called so because it provides the right mechanism for supporting MCS, including the ability to guarantee time to critical threads in the presence of untrusted high-priority threads. It is the first capability-based model of managing time in a way that is suitable for hard real-time systems (together with the concurrent work on the Composite OS; KeyKOS “meters” were also time capabilities but not suitable for hard RT).

The MCS kernel is, for now, in a branch. The reason is our commitment that changes to the mainline kernel must not break verification. The MCS model requires very invasive changes to the kernel, which means re-verifying is a lot of work. However, this work is nearing completion, and we expect MCS to be merged into mainline by the end of this year or early next.

As it is the future of seL4, and has a number of other improvements that simplify many use cases, we strongly recommend new developments to be based on MCS.

Timing channels

The MCS kernel solves temporal integrity, but confidentiality remains unresolved: the kernel cannot prevent information leakage through timing channels.

Of course, in this respect seL4 is no worse than other general-purpose OSes, but in seL4’s case it’s the last remaining security problem, while others have bigger issues to deal with. In fact, given our experience with trying to stop leakage on commercial hardware, I strongly suspect that all “high-assurance” separation kernels (which are based on strict space and time partitioning, SATP) leak as well.

Time protection partitions a system, including the kernel.

Our recent work on time protection indicates that we can solve this problem in seL4, by temporally or spatially partitioning all shared hardware resources, we partition even the kernel. This assumes that the hardware manufacturers get their act together and provide mechanisms for resetting all time-shared resources (and I’m working on the RISC-V Foundation to ensure that RISC-V does).

However, so far our time-protection is a set of primitive mechanisms. They require a proper integration with the seL4 model, which we are working on right now. This will then create a new verification challenge, which we think is solvable.

Design for Real-World Use

Suitability for real-world use really mans two things: generality and performance.

Generality

For a microkernel, this means keeping it as much as possible policy-free, and instead providing primitive mechanisms that are general enough to support the construction of virtually arbitrary systems on top.

Core to this is seL4’s capability-based access-control model (strongly influenced by KeyKOS and EROS), with its support for efficient delegation. And it required solving the resource-management problems that had plagued L4 (and other) microkernels in the past. Details of this model have evolved (as discussed above), the principles and “look-and-feel” have remained.

Whether we hit the right design point is still undecided. For years we were busy supporting various security- and safety-critical deployments (and evolving the system to improve this support) that we have not focussed very much on truly general systems, including flexible multi-server architectures. This is on-going work, which recently has accelerated.

Performance

When we started the seL4 project, my message to the team was unambiguous: “I will not consider this project a success if we lose more than 10% in IPC performance against the fastest kernel we’ve built to date.”

At the time of the first publication on seL4 (SOSP, Nov’09), the verified kernel was right at that 10% limit (the unverified version was faster and almost on par with our previous ARM kernel). Further work on verification-friendly micro-optimisation of the IPC fastpath resulted in the verified seL4 outperforming all our previous kernels. And that means verified seL4 outperforms any microkernel. In almost all cases that’s by about a factor of 10 in IPC latency. The closest in performance is the Fiasco.OC (L4Re) microkernel, which is only about a factor two slower than seL4. As I said earlier: Security is no excuse for poor performance!

Our uncompromising performance focus means that we cannot take shortcuts others regularly take, and is, for example, the reason our (high-performance) multicore version is not yet verified. But it’ll happen!

Real-world deployments

That our design-for-real-use works is shown by the fact that seL4 uptake is accelerating. One might think that ten years are a long time for this, but keep in mind that for the first half of this period, seL4 was locked up in IP arrangements that make it essential useless for anything but an internal research and teaching vehicle. Furthermore, deploying seL4 for enforcing security or safety means that (unless you are developing a system from scratch) you have re-architect your system thoroughly. This is not something people enter into lightly, and it also takes time.

Boeing ULB flying on seL4.

The DARPA HACMS program was a great success in this regard. Not only did HACMS show successful use of seL4 in real-world systems: Boeing’s Unmanned Little Bird (ULB) autonomous helicopter and US Army autonomous trucks.

Cyber-retrofit of ULB.

More importantly, it demonstrated the feasibility of using seL4 to retrofit security into an existing real-world system,
a process dubbed incremental cyber-retrofit by then DARPA I2O Director John Launchbury. This is an approach we (and others) are presently taking to harden a number of real-world systems.

It is in the nature of open source that we only know about a fraction of the projects building on seL4, and it is not unusual that I find out about one of them by accident. And some projects we are involved in are commercial-in-confidence. But there are some I can talk about.

A USB-based secure communication device, which uses military-strength encryption to communicate via Bluetooth or Wifi, has been evaluated and approved for defence use. To my knowledge this is the first time that software (i.e. seL4) enforced isolation was considered equivalent to air gapping by a certification authority. It is already in use in several defence forces.
CDDC displaying windows of different classifications.
The Cross-Domain Desktop Compositor (CDDC) is a device that allows securely connecting a single keyboard, mouse and display to multiple networks of different security classifications or domains. Technically it is a drop-in replacement for a KVM switch, but more powerful, as it supports the concurrent rendering of windows from multiple domains on the same screen. Hardware ensures that each window is decorated according to its classification, and there is a reserved region on the screen that indicates where inputs go to. If so configured, the system allows the operator to cut-and-paste between windows. Previous designs (not using seL4) left certification authorities unconvinced of their security. The CDDC is about to enter a productisation phase.
HENSOLDT Cyber is developing a secure hardware-software system, based on their own secure RISC-V chip, and an seL4-based secure operating system. [Disclosure: I’m chief scientist (software) of HENSOLDT Cyber.] It targets critical systems in defence, avionics, industrial automation and automotive.
A US-based autonomous driving startup is using seL4 to protect the safety of its systems.
There are various seL4-based TrustZone TEEs in development. And the Keystone TEE for RISC-V is based on seL4.
Third parties are running seL4 training courses.
A US DoD-funded organisation has run an seL4 Summit last year and is running another one this year.

This is a selection, there is much more going on, and deployments are definitely accelerating.

The Future

Above I listed number of current engineering projects, including full, verified RISC-V support, and verifying and mainlining the MCS model. On top of that we’re looking for funding to verify the 64-bit Arm v8 port. The differences in the privileged ISA between Arm v7 and v8 are big enough to make this essentially an architecture port, comparable to RISC-V. But the RISC-V verification project resulted in improved abstractions in the proof architecture which should reduce cost.

Then there is the verification of the multicore kernel, which I said is on-going, although slowly. I’m hopeful we’ll be able to accelerate this significantly with external funding. This has significant research challenges to address, besides the proof engineering.

And the there is time protection, where we have developed the core mechanism, but not yet a proper security model integrated into the seL4 API, this is on-going research, as is the verification of the mechanisms.

In this context I’d like to come up with a better (verifiable) confidentiality notion than the sledge-hammer information flow, which is super-restrictive and only applicable to a narrow set of realistic scenarios. We want something that is much more practicable.

Besides that we are working on strongly growing the seL4 community, especially fostering the seL4 open-source community. We have taken a number of steps there, including an RFC process. Much more is in the pipeline, and I hope we’ll be able to make an announcement in the near future.

10 Years seL4: Still the Best, Still Getting Better

The Proofs

More proofs

More architectures

Still to do

Resource Management

Issues of earlier L4 kernels

Untypeds

Time

Temporal integrity

Timing channels

Design for Real-World Use

Generality

Performance

Real-world deployments

The Future

Trackbacks & Pingbacks

Leave a comment Cancel reply

Recent Posts

Categories

Archives

10 Years seL4: Still the Best, Still Getting Better

The Proofs

More proofs

More architectures

Still to do

Resource Management

Issues of earlier L4 kernels

Untypeds

Time

Temporal integrity

Timing channels

Design for Real-World Use

Generality

Performance

Real-world deployments

The Future

Share this:

Trackbacks & Pingbacks

Leave a comment Cancel reply

Recent Posts

Categories

Archives