“Trustworthy Systems Research is Done” – Are You Kidding, CSIRO?

2021/05/31

CSIRO, Australia’s national research agency, has just decided to disband the Trustworthy Systems (TS) team, the creators of seL4, the world’s first operating system (OS) kernel proved correct and secure. TS is widely regarded and admired as the leader in the use of formal methods (mathematical proof techniques) to real-world software systems, and arguably the team that put CSIRO’s Data61 on the map internationally.

Why?

Why would they cut down their shining example of research excellence, with a rare track record coming up with fundamental solutions to real problems, and taking those solutions to the real world?

One of the reasons given by CSIRO is that “seL4 [is] now a mature technology that is ‘well supported’ outside the organisation’.”

This claim, that seL4 is a “mature technology” that needs no more research and has sufficient support is stunning on so many levels. For one, the group is not accidentally called “Trustworthy Systems” (and not, say, the “seL4 Research Group”). seL4 is only the starting point for achieving trustworthiness in computer systems. It’s as if over 100 years ago people said combustion engines are a solved problem once it was shown they could power a car.

Fact is that, while seL4 is mature enough to be deployed in the real world, there’s plenty of fundamental research work left on seL4 itself, and there is far more research left on how to achieve real-world trustworthy computer systems. It’s not that just sprinkling a bit of seL4 fairy dust over a system will make it trustworthy. More on both points below.

In this context it’s interesting to note that the Head of Australia’s Department of Home Affairs warns that the threat of cyber attacks to Australia’s critical infrastructure is “immediate”, “realistic” and “credible”, and could take down the nation’s electricity network, just days after we learn that CSIRO shuts down the research that specifically aims to stop such attacks (and is arguably the best approach to achieving such protection). Great timing.

Work to do: seL4

It is true, the seL4 kernel is mature in many ways, good enough to be deployed in real-world systems. It is already in daily use in the real world, and is being designed into many more systems. But that doesn’t mean it’s “done”.

Right now, seL4 solves a number of fundamental security problems, and it provides the best possible solution to these problems. In particular, it provides the strongest possible spatial isolation, in that it guarantees that memory cannot be accessed without explicit authorisation. It also provides strictly controlled communication between subsystems, in that two subsystems (provably) cannot communicate through system calls or memory unless explicitly authorised. And it does this with unbeaten performance. This is more than any other real-world OS can give you.

What seL4 cannot (yet) do, and no other OS can either, is to provide temporal isolation guarantees. This comes in two guises, the integrity and the confidentiality aspect.

Here, integrity means the ability to guarantee timeliness of real-time systems, especially mixed-criticality systems (MCS), where critical, high-assurance real-time tasks operate concurrently to untrusted code. seL4’s new MCS model provides temporal integrity to a significant class of MCS, and its verification is on-going. However, it does not yet fully solve the problem. Specifically, we found that there are important use cases for which the present MCS model is not sufficient. On-going research is addressing this, leading to further improvements of the model.

Furthermore, we have not yet developed the formal framework for reasoning about timing guarantees on top of the MCS model. This is, of course, what is needed for making high-assurance MCS a reality, and is a significant research challenge, which is presently unfunded. Again, while we’re ahead of any other system, the world’s emerging cyberphysical systems need more.

Much more work remains on the confidentiality side: Here the problem is to guarantee that there is no information leakage through covert timing channels; this kind of leakage is a serious real-world problem, as demonstrated in the Spectre attacks. Timing channels have long been put into the too-hard basket by most people. Triggered by Spectre there is now a flurry of activity, but most are band-aid solutions addressing symptoms. In contrast, we are working on a principled, fundamental approach to a complete prevention of timing channels. We call this approach time protection, in analogy to the established memory protection. The feedback from the research community has been strong: the work has already won three best-paper awards, yet we are only at the beginning of this line of work.

Specifically we have designed some basic OS mechanisms for providing time protection, and have shown that they can be effective on the right hardware, but also that contemporary hardware is deficient. Presently, with support by the Australian Research Council and the US Air Force, we are working on proving that these mechanisms are effective on suitable hardware. This work, having progressed well, is now under threat as CSIRO took the unusual step of returning the Air Force funding.

We are also working with the RISC-V community on defining appropriate hardware support to allow time protection to do its job. But much more research is needed on the OS side, as so far we have some basic mechanisms, that work in very restricted use cases. It’s far from having an OS model that addresses the large class of systems where timing channels are a security threat. This work is presently unfunded.

And finally, we have not yet solved the problem of verifying seL4 for multicore platforms. While there exist kernels with a multicore verification story, these kernels have performance that makes them unsuitable for real-world use. Thanks to our past research we now understand how to verify multicore seL4, but we need funding to do it.

So much about seL4 research being “done”. seL4 does define the state of the art, but the state of the art is still a fair bit behind the needs of the real world.

Work to do: Scaling trustworthiness to full systems

Beyond seL4, there’s the wider Trustworthy Systems agenda: creating a societal shift towards mainstream adoption of software verification, as the TS home page has been saying for years. We have made some progress here, with verification uptake increasing in academia and industry, but it’s far from mainstream.

To enable this shift, the team has more concrete research goals. These include:

Lower-cost approaches for verifying the non-kernel parts of the trusted computing base, such as device drivers, file and network services, but also the actual applications. So far, verified software is still more expensive to produce than the usual buggy stuff (although life-cycle cost is probably already competitive). TS’s declared aim is to produce verified software at a cost that’s at par with traditionally engineered software;
Proofs of high-level security properties of a complete system (as opposed to “just” the underlying microkernel);
Proofs of timeliness of a complete real-time system built on seL4;
Design of a general-purpose operating system that is as broadly applicable as Linux, but where it is possible to prove security enforcement.

These are all research challenges that remain unsolved, are of high importance for the security and safety of real-world systems and which TS is in a prime position to address. DARPA is throwing many millions at scaling up trustworthiness. So much about “research done”.

Tackling big problems was always core to the TS approach. As you can see, we’re nowhere running out of big problems to solve! And we have the track record and credibility to deliver, but we need funding to do it.

Speaking of AI

AI systems are increasingly used in life- and mission-critical settings, autonomous cars are a great example. But can we trust our life on an AI system, if a hacker can bypass or influence its decision? Clearly not, and the TS research agenda is very much about enabling such trust.

I can support this with an industry quote, from Dr Daniel Potts of autonomous driving company (and seL4 Foundation member) Ghost Locomotion:

“Ghost is building a trustworthy AI system that will deliver safe self-driving for consumer cars. The company is using formal methods to achieve the reliability required to deploy AI to millions of cars with the guarantee that no harm will be done. AI can only be safely deployed in the field if the underlying system is trustworthy.

“There is more research required into how to architect, construct and integrate an AI application on a trusted system of this complexity, and how we achieve this at scale. Investing in AI research without investing trustworthy systems research will greatly diminish the impact and applicability of AI to real-world products.”

Clearly, few are better placed than TS to do this research.

Moving forward

The reaction of the community to this crisis facing TS has been incredible supportive, and there are many discussions about supporting TS and its research agenda. Clearly, the seL4 Foundation is key, and we are encouraging the community to support the Foundation, by joining or by providing direct financial and in-kind support, or engage with TS directly.

For now I’m very happy to announce that UNSW’s School of Computer Science & Engineering has committed to support TS to the end of the year!

This is great news, as it gives us the time to line up more pathways and support to ensure the future of TS and its research and tech-transfer agenda. With the community’s help we’ll get there!

From → CSIRO, embedded systems, operating systems and virtualization, safety and security, seL4, Trustworthy Systems

4 Comments

“Trustworthy Systems Research is Done” – Are You Kidding, CSIRO?

Why?

Work to do: seL4

Work to do: Scaling trustworthiness to full systems

Speaking of AI

Moving forward

Trackbacks & Pingbacks

Leave a comment Cancel reply

Recent Posts

Categories

Archives

“Trustworthy Systems Research is Done” – Are You Kidding, CSIRO?

Why?

Work to do: seL4

Work to do: Scaling trustworthiness to full systems

Speaking of AI

Moving forward

Share this:

Trackbacks & Pingbacks

Leave a comment Cancel reply

Recent Posts

Categories

Archives