commercialisation, embedded, hypervisor, l4, microkernel, mobile, NICTA, startups, virtualization
OK Labs Story (1): The Beginning
Last week I promised a bit of history on Open Kernel Labs. Here is the first installment.
It all started more than ten years ago, when one morning I got a call from Kevin, identifying himself as an IP lawyer for Qualcomm. Knowing their litigious nature, this would normally send shivers down your spine. However, the conversation was all friendly, he wanted to know details of the IP status of our L4 microkernel (Pistachio-embedded) which we had open-sourced under a BSD license (the release of June’08 is still around). This was our fork of the Pistachio kernel from Karlsruhe, which we had earlier ported to several architectures, including ARM. What was special about this fork was that it was optimised for use in resource-constrained embedded systems.
When I went into the lab after the call to tell my students, Harvey immediately drew a connection to a guy called Eric who, using a yahoo address, had been asking very detailled technical questions on the mailing list.
This phone call was in May 2004, and Qualcomm was clearly interested in our kernel. I was going to the US anyway a couple weeks later, and in early June I visited them in San Diego, with an NDA in place (expired a long time ago). I spent a few hours in fairly intense technical discussions with a vice president and a senior engineer (Eric), and the upshot was that I was back in early August, together with my student Benno, to deliver a three-day course in L4 design, philosophy and use (paid at consulting rates, including business-class flight). The audience were a dozen or so engineers from Qualcomm, plus some from customers.
The course went well, and by September we had a contract to deliver consulting services to help Qualcomm engineers do prototype development. Initially this work was done by Benno and Carl, but expanded over time to about six full-time staff. Only half a year later, in about February 2005, and unknown to us for quite a while, Qualcomm decided to move L4 into production, as the kernel underneath their modem firmware and BREW operating system. The first phones running L4 (the Toshiba W47T) started shipping in Japan in the second half of 2006.
The technical reasons behind all this we only understood over time. It turns out that Qualcomm at the time had two problems. One was the flat-address-space design of their modem stack and tightly integrated BREW operating system. This had grown to millions of lines of code, and debugging such a beast without memory protection is a nightmare. This is obviously made worse by the proliferation of applications at that (pre-smartphone) time. And it was obviously not possible to support an open environment (with arbitrary third-party apps) in that way. There was also a desire to modularise the modem stack.
Qualcomm’s Plan A was to retrofit protection into REX, their home-grown RTOS. Some people foresaw this effort to fail, and were looking for a Plan B. (Apparently we were Plan B.2, I never found out what B.1 was.) The failure must have been spectacular, given the amazingly fast transition from evaluation to deployment of L4 (although Eric had clearly done a fair bit of prototyping before they approached us).
There was a second driver: Motorola had made a strategic move to Linux as their phone OS, Linux-based Motorola handsets were already shipping in China. They wanted to use Qualcomm chips in Linux handsets. Qualcomm in turn did not want Linux (or any GPL software) anywhere near their core IP. In order to sell to Motorola they needed a way to IP-wise isolate their IP from Linux. In short, they were looking for a virtulisation solution, which we sort-of had, in the form of an early-day Linux running on our L4 kernel. They had looked at various candidates satisfying their requirements, which were
- supports virtualised Linux,
- has the ability to run their modem stack efficiently, and
- runs on ARM processors.
They found that what we had came closest, despite no-one (including us!) claiming that what we had was production quality. So they basically contracted us to get it there, and my students did exactly that.
Ironically, the virtualised Linux didn’t make much of an impact. There were endless delays at Motorola’s side, partially due to internal politics, as well as doubts (from outside the group working with L4) that it could deliver the required performance. This scepticism, although unjustified, is somewhat understandable. The ARM9 cores used at the time (based on the ARMv5 architecture) have virtually-indexed, virtually-tagged caches. This means that the hardware cannot distinguish between data belonging to different address spaces, and consequently Linux (as well as other OSes, such as Windows CE) flushed the cache on every context switch. As in the virtualised setup Linux runs in its own address space, invoking a Linux system call requires a context switch (and another one when returning from the system call). Even on native Linux, caches get flushed very often, and Linux performance was bad, and one would reasonably assume that virtualisation would make this worse.
In fact, it was the other way round: on ARM9 processors, Linux ran faster virtualised on L4 than native! This was due to a rather nifty use of hardware features that allowed us to make context switches really fast on L4 (up to 50 times faster!) We had actually published this “fast address-space switch” (FASS) trick, yet none of our competitors picked it up, it seems to have exceeded their abilities (not surprising given what we found out about the competition over time, but I’ll leave this for another time). And our original implementation of FASS had been in Linux, and we had offered it upstream, but the maintainers rejected it as too complex. What’s a factor 50 between friends?
The dedicated Motorola engineers working on L4 eventually got an ARM9-based, virtualised Linux-on-L4 phone out into the market (the Motorola Evoke). I used it, and it was definitely snappy and nice to use, with seamless integration of Linux- and BREW-based apps. Much faster than the painfully slow HTC TyTN-II, which had a much more powerful ARM11 processor running Windows Mobile. But it was too late. By that time, smartphones required much more grunt than such a low-end processor could deliver (even without virtualisation and sharing the processor with the modem), so the trend went to separate, powerful applications processors. The era of “consolidated” phones, running an application OS concurrently with a real-time OS (to support the modem stack) on the same processor was stillborn.
© 2014 by Gernot Heiser. All rights reserved.
Permission granted for verbatim reproduction, provided the reproduction is of the complete, unmodified text, is not made for commercial gain, and this copyright note is included in full. Fair-use abstracting permitted.
From → Open Kernel Labs
Trackbacks & Pingbacks