Skip to content

OK Labs Story (6): The Management Team

Steve is an incredible networker (which helps collecting intelligence, he’s got an excellent machine for picking up market rumours). His network also includes a lot of people whom he can draw on as consultants or employees.

Early in the life of the company he moved back to Chicago, to set up the company headquarters there. We needed a US presence (and parent company) and Steve initially ran it from his home. He soon got an excellent deal for office space in the heart of the Chicago CBD, exactly opposite the Willis Tower. Chicago isn’t the most obvious place, and time-zone wise, the West Coast would have been better, but in the end I don’t think it mattered. Chicago is well-connected and Motorola’s mobile business (who should have been a major customer) was there.

He also went on to hire an executive team: CFO, VP Product Management, VP Marketing, all based in Chicago. The VP Sales (eventually also Chicago-based) took years, including some false starts: one resigned within 2 weeks for family reasons, and a French guy resigned after the first phone meeting (never met either, but maybe they sensed more issues than I thought?) Together with Benno and myself this comprised the initial executive team.

Generally the executives Steve hired in the first round were nice and decent folks, I liked working with them. Rob, the VP Product Management, was a much better choice for talking to engineers than Steve, he didn’t rub them the wrong way and didn’t bullshit. Dennis, the CFO, was a voice of sanity. Marti, the VP Marketing, built a great 3-body marketing team that achieved amazing visibility with a very small budget, they were full of cool ideas and executed quickly and effectively.

A clear issue was the lack of domain knowledge in the executive team. Some of the false-start Sales VPs had it, the eventual Sales VP didn’t (but tried to made up for it by Steve-style bullshitting). Rob came from Wind River and as such the embedded systems domain, which is at least related. He had held a senior position at Wind River, but that didn’t prepare him for defining a product strategy for a startup. In the end he mostly managed the product development process, while the actual product strategy came from Benno.

But the most crucial shortcoming of the executive team was that all but Benno (and me, and maybe Dennis in his quiet way) were yes-men/women, who didn’t ever challenge Steve, and this presumably is why they were picked. No-one who is willing to form and express their own opinion, or dares to confront wishful thinking with facts, is welcome in the Steve world. Such a person would have to go (as it happened with a number of non-executive staff, see Tony and Josh).

Chicago head office was this bizarre perfect world, where everything went to plan and everything always looked good. Benno called it the “reality-distortion field” that was created there, with all those fantastic deals that were just about to happen.

Obviously, such an environment, where fantasy is reality and critical thinking is treason, is a recipe for disaster. It is also extremely painful for someone who spends considerable effort developing bullshit detectors in his students. Unfortunately I had to get used to muting my bullshit detectors whenever I went into a meeting with Steve. Meetings in Chicago were this roller-coaster of having fun catching up with folks, especially our marketing team, and facing this deluge of bullshit. Can’t imagine how others survived that on a daily basis.

The obvious question is, why didn’t I try to do something about this earlier on? It’s a question I still struggle with, I can’t find an easy answer. I can’t claim that I didn’t see the problems, but somehow I felt too insecure to react properly. I also didn’t know how to react, and I didn’t have good mentors to turn to. And I was isolated on the Board.


© 2014 by Gernot Heiser. All rights reserved.

Permission granted for verbatim reproduction, provided the reproduction is of the complete, unmodified text, is not made for commercial gain, and this copyright note is included in full. Fair-use abstracting permitted.

StartPreviousNext

OK Labs Story (5): Qualcomm

Our first customer was Qualcomm, and they were the most important one for a long time. During its lifetime, OK sourced the bulk of its revenues from Qualcomm. They were a critical partner.

The relationship was fantastic for a long time and, I’m told, quite unusual. Qualcomm doesn’t have a reputation of being an easy partner, particularly for small players. They also don’t have a track record of licensing technology, their usual approach is to by outright (technology or company) or do it themselves.

Throughout we maintained an excellent relationship at the engineering level. The technical people at Qualcomm held our engineers in high regard, I never heard anything other than compliments.

Steve’s original achievement was the re-negotiation of our Qualcomm contract, providing secure (and significantly increased) funding that was the stepping stone for creating the company. And he managed to increase the volume of the contract a few more times.

By that time, Qualcomm had long decided to base their future firmware versions on our L4, and, in fact, end-user products were close to shipping. Getting from there to deployment on a billion devices was now mere engineering, and we knew we wouldn’t fail on that.

However, over time, Steve’s style created problems with Qualcomm. He ended up seriously annoying a sequence of division heads (i.e. the people controlling the money), mostly through his negotiation style. As a company, we gained a reputation of arrogance, which really appalled everyone (who knew about it) in Sydney.

Negotiations, in true Steve style, always went to the brink, and all deals were last-minute. Essentially we were betting the company every year when the contract was due.

I vividly remember one particular Steve moment: The Qualcomm manager was hospitalised. As Steve reported at the Board meeting, he called him up “to wish him well” but at the same time remind him of our negotiations. Steve seemed mightily proud. But, as I soon heard through the engineering back-channel, the Qualcomm manager was hugely unimpressed.

This was on the back of an earlier disastrous misjudgment in dealing with Qualcomm. Google was moving into the mobile space with Android, and the first Android phone was the Nexus One, or HTC Dream. We knew it would have a Qualcomm chipset, with OKL4 the OS on the baseband processor. Steve thought it a smart idea to have a press release timed for the release, titled something like “Google had a Dream, and we’re in it”.

To say that Qualcomm was unimpressed about us preempting their PR that way is the understatement of the century. They were livid. They were foaming out of their mouth. Steve got a call saying “take that down immediately!” As Steve proudly reported at the board, they fumbled around with “communication issues” due to flights etc for a few hours to give it more exposure, until finally taking it down after about half a day. Fantastic way to treat your friends, and clearly a good way to make the relationship last, right?

And it was so annoyingly short-sighted and unnecessary. We could have simply waited for a few days, until Qualcomm had made their move. There was a lot of hype around Android, and it wasn’t going to die down in a day. We might even have got more attention by staying away from the peak of the hype, and in any case, by timing it less aggressively, Qualcomm wouldn’t have objected, the release would have stayed up, it would have come up in Google searches, etc. But long-term thinking isn’t Steve’s strength.

Needless to say, an increasing number of people inside Qualcomm had their knifes out on OK, and were waiting on the opportunity to stick them in. And the opportunities were bound to arrive. The main thing that kept the relationship going was the excellent work done by our engineers, which clearly provided value to Qualcomm. I talked more than once to senior people on the Qualcomm side who were torn between enjoying working with OK engineering, and pissed off with what they saw as arrogance and lack of trustworthiness of OK management. The net effect that influence inside Qualcomm shifted from our friends to our opponents.

Early in the life of the company we were introduced to the Qualcomm team which was working on their new DSP architecture, what later became known as the Hexagon. We met the chief architect and a number of other staff involved. Hexagon was to become Qualcomm’s proprietary replacement for ARM cores as their modem processors, and L4 was to be ported to that architecture.

On the engineering side the work made good progress. However, problems arose from multiple directions. One was Qualcomm-internal politics, where someone started developing their own (unprotected) RTOS, and started to sabotage us. But he was greatly helped by Steve’s waning popularity with Qualcomm management. After a long and painful period of politics and negotiations, Qualcomm finally decided to use the home-grown RTOS for the Hexagon.

Technically, this was a step backwards for Qualcomm, from a protected system to a traditional flat address-space model. But for us it meant that our days as the OS underneath Qualcomm’s modem firmware were numbered. Today (by all I know) L4 still ships on Qualcomm chips (so total L4 deployment should now be well over 3 billion), but on one or more of the auxiliary cores, no longer the main game.

While clearly not good, this wasn’t a disaster yet. Qualcomm had, throughout, received our code under the original open-source license (and could have just released it in source form). We didn’t earn royalties, only (lucrative) consulting revenue. This was one (but by far not only) reason for a complete re-design and -implementation of our kernel, which resulted in the OKL4 Microvisor, the basis of our virtualisation product. Essentially we now had two code bases, the “microkernel” for Qualcomm and the “Microvisor” for everyone else.

Our big chance to change the Qualcomm relationship arose when Qualcomm looked at supporting virtualisation on the application processor, and negotiated for a (non-exclusive) buy-out license of the Microvisor, to ship on their apps processors. Things seemed to go well, until Steve put an insanely huge asking prize on the table. Qualcomm was seriously pissed off, and the relationship never recovered. Internally our aim was to get just over 1/6th of the price asked. Most people involved believe that, had we asked what we really wanted, Qualcomm would have just accepted it without much negotiation. But with the price Steve asked, it was too bloody obvious that they could have done it themselves much cheaper. In the end we got nothing, except a terminally poisoned relationship.

The final straw came later, in the form of the AMSS disaster, which I’ll discuss in a future blog.


© 2014 by Gernot Heiser. All rights reserved.

Permission granted for verbatim reproduction, provided the reproduction is of the complete, unmodified text, is not made for commercial gain, and this copyright note is included in full. Fair-use abstracting permitted.

StartPreviousNext

OK Labs Story (4): The CTO

Benno’s appointment as VP Engineering, for all its good sides, was also part of Steve’s divide-and-rule approach – in this case designed to side-line me. It meant that Steve established a direct link with Benno bypassing me. I could also feel a distance developing between Benno and myself, which particularly saddened me, given that he was my star student and like a son to me. Whether this was the result of Benno thinking he had to actively compete with me, or it was Steve actively bad-mouthing me, I can’t tell (but I’ve seen a lot of bad-mouthing of third parties to consider the latter entirely possible).

Of course, there was also a (to a degree inevitable) clash between purity and engineering realities. And I felt that I had to give Benno the freedom to fulfil his role and earn the respect of the team, so I was very careful to avoid seeming at odds with him or undermining him. My confidence in Benno’s technical insight made this easier.

But one of the effects was that I was increasingly excluded from technical design discussions and decisions.

Circumstances made this relatively easy. Part of OK’s deal with NICTA was that the outcome of the still progressing research program (the seL4 microkernel and its formal verification) would go to OK, and I was to oversee the completion of that program. So I was 50% OK Labs CTO and 50% group leader in NICTA. As a part-timer I inherently couldn’t be running too much on a day-by-day basis.

In hindsight, this arrangement was a grave mistake. By the time seL4 was ready, OK had had to build its market and products on the existing technology, and wasn’t the right vehicle for marketing seL4. (That’s not the full story, but more about that later). We should have left seL4 out of the deal, and I should have either had a stronger day-to-day involvement or just be a consultant. But that’s with the benefit of hindsight…

So, my part-time status helped Steve to remove me from most decision-making. But I was excluded from too many other things too.

Many of the personal highlights of my OK experience were talking to customers, be it engineers, VPs or CTOs. I learned a lot about real-world trade-offs, market drivers and technical insights. And my presence helped the company’s reputation. One memorable example was a technical meeting at Nokia, with engineers and CTO-office staff, including a known virtualisation sceptic. As Abi told me afterwards, that sceptic after the meeting commented “he’s the smartest CTO I’ve met, and totally devoid of bullshit.”

But bullshit-freedom isn’t an asset in Steve World. In fact, he thought he could do anything I could do (just better, of course). For example he seemed to truly believe that having sat through one of my presentations, he could do it himself just as well. The few times I observed him doing this, I felt like hiding in a corner for embarrassment, and did my best to prevent damage (typically involving kicking him under the table). But it was surely embarrassing, and any engineers around couldn’t help noticing that. Steve simply can’t keep his mouth shut when he thinks he knows something, even if there are real experts around. I can only imagine what went on when he was on his own.

And I was around less and less. Once Steve hired a VP Sales (who sang to Steve’s song book without missing a beat) I was generally not taken to customer meetings any more. This was partially compensated by some of our top engineers moving into sales, such as Abi, another one of my star students. At least they knew what they were talking about, but generally weren’t in a position to take a stand against bullshit. The rare exceptions included field-application engineer Josh (also a former student) and sales guy Tony. The two of them brought in the biggest non-Qualcomm contract during my time at the company (from Motorola). In fact the two were the only successful sales team we ever had (despite Steve heaping praise on the VP Sales at the Board whenever they managed not to completely mess up a sale). But they didn’t take any bullshit, so they were pushed out and massively bad-mouthed. This became a pattern.

There were instances crying for my involvement, but mostly I only learned about them afterwards (or not at all). At one stage, several opportunities started to develop in China. One of our engineers, native Chinese, who was involved in the technical evaluations told me later “it’s because of you, you’re well-known in China” (something I was quite oblivious to, but had later independently confirmed). The natural reaction should have been to fly me over and talk about the technology and vision. But no, Steve and his VP-Sales clone knew better. In the end, none of the China opportunities went anywhere.

Another case was DoCoMo, the main Japanese mobile operator. They had, in a 2006 white paper co-authored by Intel proposed the concept of the dual-persona phone (two logical phones, work and private, on a single physical handset, separated by virtualisation). In 2009 they were looking at developing one. There was a competitor in play, whose only advantage over us was a security evaluation (similar to Common Criteria but extremely low-grade – glorified tyre-kicking). DoCoMo was going to go with us if we could tick that box. However, for us it would have been insane to go through the expense of an essentially worthless security “evaluation”. By the time I heard about it, they had been talking in circles for 6 months! I joined the next phone conference, and within an hour we had a solution. Too late, as it turned out, things had changed internally in DoCoMo and the project never happened.

There were plenty of other opportunities where I never got a chance to contribute. Besides Steve’s systematic side-lining of myself, a contributing factor was the cultural ignorance of Chicago (i.e. OK headquarters). Steve in particular prides himself of “understanding” other cultures, but, like is technical “insights”, this is very superficial stuff he’s read in mags and blogs. He utterly fails to understand the appeal of technical authority in continental European as well as Asian cultures. People there have high respect for professors and value their opinions, especially where the professor is strongly identified with a particular technology (as is the case with myself and the microkernel technology that formed the basis of OK products).

As such, the standard approach should have been to get me in front of senior engineers/CTOs of any prospective customer. Steve (and his VP Sales) never understood this, despite my attempts to tell them. But what do you expect from someone who seems to believe that bullshit is an adequate substitute for substance? One telling instance of complete lack of appreciation of culture was when he asked a junior marketing staff to approach leaders of big companies across a range of industry sectors. That way, the CEO of BMW got an email “Dear Norbert, I’d like to tell you about our virtualisation products”. The mail probably never reached Reithofer, but if it did, it could only get us filed in the category of dubious operators, rather than a respectable technology provider, and would certainly have done far more harm than good.

As a consequence I grew increasingly frustrated with my lack of ability to contribute in a meaningful way. This was compounded by my observation that the company strategy was wrong, based on wishful thinking rather than analysis of market needs and technical facts. (More on this later.) I was asked to write white papers and technical blogs (something I enjoyed greatly) but that’s a pretty minor role for the CTO and founder. And I was supposed to screen invention disclosures and work with the inventors and the patent attorneys on getting patens filed. That stopped after I assessed two of Steve’s “inventions”, which he got Josh to write up. After finding that the inventive step was non-existent, I wasn’t troubled with any further ones…

I also wasn’t effective as a director. I was caught between trying to avoid being seen as undermining the CEO while trying to voice my concerns. I failed miserably at the latter. Basically I was very naive (probably hard to believe by people who know me) but I was also quite aware of Steve’s bad-mouthing machine, which I had seen at work more than enough. I should have taken a stronger, independent stand at the Board, but that’s water under the bridge. It’s a reflection of my inexperience, which Steve exploited well.

More about the board later.

Basically, my position became more frustrating as time went on. Sometime during 2009 I had given up any hope that OK would be a major success. I stayed on partially because I thought I owed it to the many engineers who were there primarily because of me, and partially because, having been at the conception and birth, I wanted to stay around for the funeral. It couldn’t be far off.


© 2014 by Gernot Heiser. All rights reserved.

Permission granted for verbatim reproduction, provided the reproduction is of the complete, unmodified text, is not made for commercial gain, and this copyright note is included in full. Fair-use abstracting permitted.

StartPreviousNext

OK Labs Story (3): The Engineers

We had a world-class engineering team, by any definition. Except for one or two we hired from outside, the complete team that spun out of NICTA on 1 January 2007 consisted of my former students (and Carl, who never was a student at UNSW but had worked with us as an engineer since the early days of NICTA). Most had years of experience hacking microkernel code, and all had an excellent understanding of what makes a microkernel tick, and how you build systems on top of L4. Each could have doubled or tripled their salary by moving to Silicon Valley. They also had grown up working in a tight-knit team, so personnel-wise we were in good shape.

Before we could actually spin out, a (to me) unexpected hurdle arose: the contract negotiations. Steve took a very hard-line approach with the employment contracts. While most high-tech startups tend to provide fairly liberal conditions, what Steve offered was generally the legal minimum. In addition there were very broad and severe non-compete clauses (extending for a full year after leaving the company). While our legal advice was that this was almost certainly unenforceable in NSW, Steve insisted on them anyway.

The result was a staff revolt. About half of the staff refused to sign the contracts in the offered form, some making extensive changes. What followed was a lengthy (4 weeks or so) period of to-ing and fro-ing until we had a contract people were willing to sign. The end effect was that the company started on a sour note. I have to accept some of the blame here, as the original contract didn’t feel right to me, not the way I wanted to treat my students. But, again, I was inexperienced, didn’t trust my instincts in this unfamiliar territory, and trusted Steve to know what he was doing. Clearly a misjudgment on my side, and one I regret to this day.

In the end, Chuck, one of the A Team, decided not to join OK Labs, but instead moved to the Valley to join the Core OS Team at Apple, where he still works happily. At least he didn’t bear any grudge against me, we’re still good friends and meet up for a beer or three whenever I’ve got spare time in SF.

Not so for two others, Alex and Hal, who did sign on but never recovered from the original confrontation. They left about a year later, triggered by another poor handling of a situation by Steve: he had forgotten to perform a stock split before assigning stock options to staff, with the result that they got 2.5 times the percentage they were meant to get. Instead of saying “sorry, guys, we stuffed up” he tried to sell the correction as an improvement to staff, with a mail starting “good news! the price of your stock options has been reduced”. Not a smart way to treat grown-up engineers.

This created more bad blood. And, while I left these management issues in Steve’s hand (bad call, I know!) staff saw both of us as “management”. And it’s safe to say most staff were there primarily there because of me, and saw this as a betrayal of the trust they had in me. This was particularly the case with Alex and Hal, the two who left after a year. Hal hasn’t spoken a word to me since. It was my advanced OS course which had brought him to UNSW (as a postgraduate coursework student), so this hurts and saddens me.

My takeaway: If something seems wrong, do something, familiar territory or not. I should have known that, of course. One of the challenges was that I’m used to be able to trust my instincts, but this was an unfamiliar environment where I didn’t feel I could trust them, and Steve did a convincing show of knowing what he was doing. He also turned any doubt immediately into a personal trust issue, a great way to prevent any frank and open discussion. As I learned over the years, this was symptomatic: he would never admit to any mistake, and the mere suggestion that his approach wasn’t good would be seen as betrayal. The exact opposite to how I’m used to running things: frank and open discussions and looking for consensus.

After the bad start things improved quickly, and morale in the engineering team was very good once most people didn’t have to deal with Steve directly. Steve appointed Benno as VP Engineering, which helped to create that distance. Benno had been my star student, one of the most experienced in the team, despite his young age. He had proven himself in the early Qualcomm project, and is generally one of the sharpest people I know. Making him VP Engineering (rather than getting an experienced manager from outside) was a good move: Benno had more or less grown into the role already and did an excellent job on it. He is a real leader, respected by his peers, absorbed a lot about software processes, and also was a visionary Chief Architect.

We had excellent engineers, and a VP Engineering in charge who, despite his youth, provided strong leadership. The team achieved many amazing feats, generally delivering high-quality software very quickly. One of the most satisfying comment came from the Motorola engineer responsible for the virtualised phone project. He said “there were no bugs!” with the look on his face of someone who for the first time enters a Tardis and sees that it’s bigger inside. And we got consistently glowing feedback from customers after some of our engineers visited them, they never failed to impress through their insight and professionalism.

So in engineering we clearly had what it takes to be successful. All we needed was a plan and leadership to execute it, the engineers would build whatever was required.


© 2014 by Gernot Heiser. All rights reserved.

Permission granted for verbatim reproduction, provided the reproduction is of the complete, unmodified text, is not made for commercial gain, and this copyright note is included in full. Fair-use abstracting permitted.

StartPreviousNext

OK Labs Story (2): The CEO

Sometime during 2004 NICTA employed two entrepreneurs-in-residence. Their job was to identify spinout opportunities among NICTA projects. One of them was Steve, who showed up at my door as soon as word got around that we were getting significant consulting income from Qualcomm. He talked about setting up a company, and I essentially told him to go away, as I didn’t think that you could make enough money from selling operating systems.

Steve persisted and got involved, and effectively became the commercial manager of the Qualcomm project. Concurrently, new momentum developed: we were approached by TI, Ericsson, Apple and later Samsung and did paid projects with each of them, and a number of other potential commercial users had discussions with us. While this is exactly what I had predicted when I first started the ERTOS program in NICTA in 2003, namely that the classic unprotected RTOS-technology was reaching its use-by date, I was still surprised when it started to happen so quickly.

The Samsung case was, in hindsight, interesting in its own way. The folks who approached (and paid) us were the ones who had done the port of Xen to ARM, and were trying to promote this as a solution for virtualisation on phones (totally unsuitable, as I explained at length in a paper some time later). In hindsight it is likely that they only engaged us to better understand this competitor technology in order to keep it out of Samsung. This went as far as sabotaging any attempts by folks at their San-Jose-based lab to build things on L4.

Not all of the various prospects were in the mobile space. Some (including Ericsson) were producing network infrastructure (mobile base stations, routers etc), and yet others were for point-of-sale terminals. With this wide range of options, Steve eventually convinced me that there was a business opportunity. Once I made this step, it was also obvious that we had to be able to move quickly to seize opportunities, and take business risks. Neither of these are possible in a taxpayer-funded research institute, so it was clear that we had to create a spinout.

I had zero experience in this, and was happy to leave the lead to Steve, who obviously had the experience. He was on top of what was required to set up a company, and the network to hire management and sales staff. He also created a business plan and financial models to present to the NICTA board. I was surprised how full these were of assumptions plugged out of thin air, but hey, I had no clue, and presumably that’s the way it is done.

However, what we needed in order to build the company and ramp up staffing was either external investment or a degree of stable revenue. We had a revenue stream from Qualcomm, but this was on a work-for-hire basis, with people working as requested by Qualcomm, and we would invoice monthly based on actual hours worked. While the general trend of the volume (and revenue) was upward, we needed more predictability. And it was Steve’s major achievement to re-negotiate the contract with Qualcomm for a fixed amount over 12 months, invoiced quarterly. The actual amount was a big increase over what we had invoiced the previous year (and would increase further in following years). We had what we needed to spin out!

But before that we had to negotiate a deal with NICTA, mostly concerning IP conditions, rent for the use of NICTA facilities, and NICTA’s share of equity. The eventual deal involved NICTA transferring all its rights to existing IP (which was mostly open-source code, much of it developed from open-source code created by Karlsruhe and my UNSW team prior to NICTA’s creation). It also involved exclusive licenses, with a buy-out option on achieving certain investment milestones, for the IP still under development: seL4 and its formal verification.

During the spinout negotiations I got to get to see Steve’s negotiation tactics – he considers himself an excellent negotiator. They were based on huge land-grabs, which he then defended viciously, trying to extract concessions on every bit of ambit territory “conceded”. Whenever he could establish that the other side had a hard deadline of sorts, he would only pretend to negotiate when in fact playing for time in an attempt to force concessions. He would also systematically and deliberately push for concessions he knew the negotiators on the other side couldn’t make, in order to force the decision up to the CEO of NICTA.

This “worked” to a degree, and he did end up with some surprisingly good deals. NICTA, being a young organisation with high expectations from its stake holders, needed runs on the board, and was keen to see companies spun out, and was as such a relatively soft target. Also, the attitude on NICTA’s side was to be supportive of its startups. Thus Steve managed to get a very good deal, much better than I thought possible (or even reasonable). Although, as I learned later, some of the hard-found concessions were needed by the company (but others were pretty sweet).

Steve’s tactics created a few excellent deals in a number of other cases. However, there was a high cost. Many negotiations ended in failure when they should (and could) have been successful. And many bridges were burnt in the process.


© 2014 by Gernot Heiser. All rights reserved.

Permission granted for verbatim reproduction, provided the reproduction is of the complete, unmodified text, is not made for commercial gain, and this copyright note is included in full. Fair-use abstracting permitted.

StartPreviousNext

OK Labs Story (1): The Beginning

Last week I promised a bit of history on Open Kernel Labs. Here is the first installment.

It all started more than ten years ago, when one morning I got a call from Kevin, identifying himself as an IP lawyer for Qualcomm. Knowing their litigious nature, this would normally send shivers down your spine. However, the conversation was all friendly, he wanted to know details of the IP status of our L4 microkernel (Pistachio-embedded) which we had open-sourced under a BSD license (the release of June’08 is still around). This was our fork of the Pistachio kernel from Karlsruhe, which we had earlier ported to several architectures, including ARM. What was special about this fork was that it was optimised for use in resource-constrained embedded systems.

When I went into the lab after the call to tell my students, Harvey immediately drew a connection to a guy called Eric who, using a yahoo address, had been asking very detailled technical questions on the mailing list.

This phone call was in May 2004, and Qualcomm was clearly interested in our kernel. I was going to the US anyway a couple weeks later, and in early June I visited them in San Diego, with an NDA in place (expired a long time ago). I spent a few hours in fairly intense technical discussions with a vice president and a senior engineer (Eric), and the upshot was that I was back in early August, together with my student Benno, to deliver a three-day course in L4 design, philosophy and use (paid at consulting rates, including business-class flight). The audience were a dozen or so engineers from Qualcomm, plus some from customers.

The course went well, and by September we had a contract to deliver consulting services to help Qualcomm engineers do prototype development. Initially this work was done by Benno and Carl, but expanded over time to about six full-time staff. Only half a year later, in about February 2005, and unknown to us for quite a while, Qualcomm decided to move L4 into production, as the kernel underneath their modem firmware and BREW operating system. The first phones running L4 (the Toshiba W47T) started shipping in Japan in the second half of 2006.

The technical reasons behind all this we only understood over time. It turns out that Qualcomm at the time had two problems. One was the flat-address-space design of their modem stack and tightly integrated BREW operating system. This had grown to millions of lines of code, and debugging such a beast without memory protection is a nightmare. This is obviously made worse by the proliferation of applications at that (pre-smartphone) time. And it was obviously not possible to support an open environment (with arbitrary third-party apps) in that way. There was also a desire to modularise the modem stack.

Qualcomm’s Plan A was to retrofit protection into REX, their home-grown RTOS. Some people foresaw this effort to fail, and were looking for a Plan B. (Apparently we were Plan B.2, I never found out what B.1 was.) The failure must have been spectacular, given the amazingly fast transition from evaluation to deployment of L4 (although Eric had clearly done a fair bit of prototyping before they approached us).

There was a second driver: Motorola had made a strategic move to Linux as their phone OS, Linux-based Motorola handsets were already shipping in China. They wanted to use Qualcomm chips in Linux handsets. Qualcomm in turn did not want Linux (or any GPL software) anywhere near their core IP. In order to sell to Motorola they needed a way to IP-wise isolate their IP from Linux. In short, they were looking for a virtulisation solution, which we sort-of had, in the form of an early-day Linux running on our L4 kernel. They had looked at various candidates satisfying their requirements, which were

  1. supports virtualised Linux,
  2. has the ability to run their modem stack efficiently, and
  3. runs on ARM processors.

They found that what we had came closest, despite no-one (including us!) claiming that what we had was production quality. So they basically contracted us to get it there, and my students did exactly that.

Ironically, the virtualised Linux didn’t make much of an impact. There were endless delays at Motorola’s side, partially due to internal politics, as well as doubts (from outside the group working with L4) that it could deliver the required performance. This scepticism, although unjustified, is somewhat understandable. The ARM9 cores used at the time (based on the ARMv5 architecture) have virtually-indexed, virtually-tagged caches. This means that the hardware cannot distinguish between data belonging to different address spaces, and consequently Linux (as well as other OSes, such as Windows CE) flushed the cache on every context switch. As in the virtualised setup Linux runs in its own address space, invoking a Linux system call requires a context switch (and another one when returning from the system call). Even on native Linux, caches get flushed very often, and Linux performance was bad, and one would reasonably assume that virtualisation would make this worse.

In fact, it was the other way round: on ARM9 processors, Linux ran faster virtualised on L4 than native! This was due to a rather nifty use of hardware features that allowed us to make context switches really fast on L4 (up to 50 times faster!) We had actually published this “fast address-space switch” (FASS) trick, yet none of our competitors picked it up, it seems to have exceeded their abilities (not surprising given what we found out about the competition over time, but I’ll leave this for another time). And our original implementation of FASS had been in Linux, and we had offered it upstream, but the maintainers rejected it as too complex. What’s a factor 50 between friends?

The dedicated Motorola engineers working on L4 eventually got an ARM9-based, virtualised Linux-on-L4 phone out into the market (the Motorola Evoke). I used it, and it was definitely snappy and nice to use, with seamless integration of Linux- and BREW-based apps. Much faster than the painfully slow HTC TyTN-II, which had a much more powerful ARM11 processor running Windows Mobile. But it was too late. By that time, smartphones required much more grunt than such a low-end processor could deliver (even without virtualisation and sharing the processor with the modem), so the trend went to separate, powerful applications processors. The era of “consolidated” phones, running an application OS concurrently with a real-time OS (to support the modem stack) on the same processor was stillborn.


© 2014 by Gernot Heiser. All rights reserved.

Permission granted for verbatim reproduction, provided the reproduction is of the complete, unmodified text, is not made for commercial gain, and this copyright note is included in full. Fair-use abstracting permitted.

StartPreviousNext

RIP Open Kernel Labs, Welcome Cog Systems

This month marks the second anniversary of the acquisition of Open Kernel Labs (OK Labs) by General Dynamics (GD). It also marks ten years of us engaging with Qualcomm on commercialising L4 microkernel technology, and eight years since OK Labs was founded. Clearly an occasion to reflect.

In the two years since its acquisition, OK Labs has, in a way, died, and was, in a way, reborn. Specifically, in February of this year, GD closed down the former OK Labs engineering office in Sydney. But rather than this being the end of the L4-based business, it was a new beginning. The core engineers (mostly my former students) created a new, fully Australian-owned company, called Cog Systems.

Cog is continuing OK Labs’ mission of bringing L4 microkernel technology to the world. They have a reseller license that allows them to market the OKL4 Microvisor, which was the main product of OK Labs. And they added their own IP that complements it. They also partner with GD to service pre-existing contracts for OKL4 deployments. In that sense they are really OK Labs reborn.

I think this is a great outcome. We now have a local company who are experts in L4 technology, and are highly skilled and innovative. They will continue to be a great partner not only for GD, but for us in NICTA too, as they will be able to work with us on delivering seL4-based solutions to customers.

The existence of Cog is one reason I’m confidently talking to potential seL4 customers who would use engineering support to build their critical systems. It’s also a step towards creating a critical-system ecosystem in Sydney. I’m very much looking forward to working with them in the future!

I will over the next few weeks reflect on how we got here over the last then years, in particular the history of OK Labs. Stay tuned.

PS: Neither NICTA nor I hold shares in Cog, nor do we have any other interest in the company, other than our general desire to support the Australian high-tech industry. We’re dealing with Cog at arm’s length.


© 2014 by Gernot Heiser. All rights reserved.

Permission granted for verbatim reproduction, provided the reproduction is of the complete, unmodified text, is not made for commercial gain, and this copyright note is included in full. Fair-use abstracting permitted.

Next

seL4 is finally free! Can you afford not to use it?

The world’s most highly assured operating system (OS) kernel

… the only one with a formal proof of implementation correctness (freedom from bugs), the only one with a proof of security that extends all the way to the executable code, the only protected-mode OS with sound worst-case execution time (WCET) bounds, is finally free. It is available under the same license as the Linux kernel, the GPL v2. Everyone can use it, and everyone should. And, as I’ll explain, they may, one day, be considered negligent for not using seL4, and such liability could reasonably start with any system that is designed from now on.

Critical systems

Let’s expand on this a little. Let’s assume you, or our company, is building a device/system whatever which has safety or security requirement. There are lots of these, where failure could lead to death or injury, loss of privacy, or loss of money. Many systems that are part of everyday life fall into that category, not only national-security type of systems (but those obviously do too).

Medical implants

Medical implants could kill a patient or fail to keep them alive if they fail. Such systems may fail because their critical functionality (that keeps the patient alive) is compromised by a “non-critical” part that misbehaves (either by a bug triggered during normal operation, or a vulnerability exploited by an attacker). Most implants have such “non-critical” code, in fact, it tends to be the vast majority of the software on the device. Most devices these days have some wireless communication capability, used for monitoring the patient, the state of the device, and maybe even to allow the physician to adjust its operation. On systems not built on seL4, there can be no guarantee that this “non-critical” software is free from dangerous faults, nor can there be a guarantee that its failure cannot cause the critical bits to fail.

Cars

Complex software in a car could misbehave and compromise critical components like engine control, breaks, air bags. Attacks on cars are well documented. The CAN networks in cars are fundamentally vulnerable, so many of the devices on the bus can be the starting point of an attack. Furthermore, functionalities that used to run on separate ECUs (microprocessors) are increasingly consolidated onto shared processors, creating more potential vulnerabilities. Car hacks clearly have the potential to kill people.

Industrial automation

Industrial automation systems are in many ways – other than scale 😉 – similar to medical implants: Some critical functionality (which may or may not be small but is likely to be highly assured) runs beside large, complex software stacks that supports monitoring, control, reprogramming etc, and is far too complex for high assurance to be feasible. If such a system misbehaves it may cause millions of dollars worth of damage, or even endanger lives.

Others

There are many other systems where failure is expensive or dangerous. They include systems performing financial transactions (eg point-of-sale systems or ATMs), voting machines, systems holding or providing access to personal data (cards with medical history, etc).

There are many more examples.

Limitations

Of course, seL4 cannot give a complete guarantee of no failure either. The hardware could fail, the system could be ill-designed or wrongly configured, or few other things could go wrong. But it definitely (for a well-designed and -configured system) makes it impossible for a compromised “non-critical” subsystem to interfere, through a software fault, with the critical part. This is by far the most likely source of error, and seL4 gives you the highest degree of certainty achievable to date that this won’t happen.

So, why would you not want to use seL4 in your next design?

Let’s examine the reasons people usually give.

License

seL4 is published under the GPLv2. This means that any improvements will have to be open-sourced as well, which is a concern to some people.

I claim this is mostly a misconception. Nothing you build on top of seL4 will be affected by the kernel’s license (just as you can build closed-source stuff on Linux). seL4 contains the GPL to the kernel, and our license files have explicit wording (taken from Linux) confirming that.

How about changes to seL4 itself? Reality is that you aren’t likely to make significant changes to seL4. For one, any change will invalidate the proofs. Ok, maybe you don’t care too much about this, as even if it’s not 100% verified, even a slightly modified seL4 system will be massively more dependable than anything else. So maybe you want to do some “improvements” and then keep them to yourself?

As long as those are actual “improvements”, I suspect that you won’t be able to outrun the momentum that’s behind seL4 at present. No-one knows better than us how to improve seL4 even more (despite it already being much better than any comparable system), and we keep doing this. And if you have a real use case that requires some improvements, you’re better off talking to us than to try on your own.

Or maybe your improvements are really adaptations to proprietary hardware. In most cases that will be a non-issue, as seL4 contains no device drivers (other than a timer and the interrupt-controller driver), so any proprietary stuff isn’t likely to be visible to the kernel. But if you really have to modify the kernel in a way that exposes sensitive IP, then you can always get access under a different license (for a fee). If this is your situation then talk to us.

Processor architecture

Maybe you want to run seL4 on a processor that isn’t presently supported. As long as it’s a supported CPU architecture (ARM and x86) then that isn’t a big deal, and we can help you porting. If you need it on a different architecture (say Power) then that would involve a fair bit more work. Still, you should talk to us about a port, the cost is likely to be a small fraction of your overall project cost, and will be tiny compared to the potential cost of not using seL4 (more on that below).

Performance

I thought our work in the past 20 years was enough to debunk the “microkernels are slow” or even “protection is too expensive” nonsense. Ok, some microkernels are slow, and, in fact, almost all commercial ones are, especially the so-called high-assurance ones (OKL4 is an exception). But seL4 isn’t. It runs loops around all the crap out there! And OKL4, shipped in billions of power-sensitive mobile devices, has shown that a fast microkernel doesn’t cause a performance problem. Performance is definitely one of the more stupid excuses for not using seL4!

Resource usage

Unless you have an extremely constrained system, then seL4 is small (compared to just about everything else). If you’re running on a low-end microcontroller (with not only serious resource constrains but also no memory protection) you should look at our eChronos system instead.

Legacy

Your design has to re-use a lot of legacy software that is dependent on your old OS environment. Whether that’s a real argument probably depends a lot on how well your legacy software is written. If it is essentially unstructured homebrew that doesn’t abstract common functionality into libraries, then you may have a real problem. In the sense of having a real problem, which is going to bite you hard sooner or later, whether you’re migrating to seL4 or not. If that isn’t the case, then there is likely to be a sensible migration path, the cost of which is likely to be dwarfed by the potential cost of not using seL4 (see below).

Environment and support

seL4 is a young system, with a limited support environment in terms of libraries and tools. That is true, but it’s also changing rapidly. We’ll see userland developing quickly. And don’t you think that just because it’s open source there isn’t good support. Between NICTA and GD and our partners, we can certainly provide highly professional support for your seL4-based development. Just talk to us.

So, what’s the trade-off?

Well, ask yourself: You’re developing some critical system. Critical in the sense that failure will be very costly, in terms of lost business, lost privacy, loss of life or limb. Imagine you’re developing this based on yesterday’s technology, the sort of stuff others are trying to sell you, and what you may have been using believing (at the time at least) that it was state of the art. Now it ain’t.

So, imagine you are developing this critical system and decide not to use seL4. And, sooner or late, it fails, with the above costly consequences. You may find yourself in court, being sued for many millions, trying to explain why you developed your critical system using yesterday’s technology, rather than the state of the art. I wouldn’t want to be in your shoes. In fact, I might serve as an expert witness to the other side, which argues that you should have known, and that doing what you did was reckless.

Do you want to run that risk? For what benefit?

Think carefully.

Clarification on seL4 media reports

The open-sourcing of seL4 has created a bit of media coverage, generally very positive, good to see.

Some of it, however, is inaccurate and potentially misleading. I’ve commented where I could, but some sites don’t allow comments.

One frequent error is the (explicit or implicit) claim that seL4 was developed for or under the DARPA HACMS program, stated eg at Security Affairs, and seemingly originated at The Register. This is incorrect: seL4 pre-dates HACMS by several years, and was, in fact, a main motivation for HACMS. We are a “performer” under this project, and, together with our partners, are building an seL4-based high-assurance software stack for a drone, please see the SMACCM Project page for details. [Note 31/7/14: The Register has now corrected their article.]

LinuxPro states that DARPA and NICTA released the software, the release was by General Dynamics (who own seL4) and NICTA. Also, seL4 wasn’t developed for drones, it’s designed as a general-purpose platform suitable for all sorts of security- and safety-critical systems. I for one would like to see it making pacemakers secure, I may need one eventually and with present technology would feel rather uncomfortable about this.

I’ll keep adding if I find more…

Sense and Nonsense of Conference Rankings

The CORE Rankings

Some may know that a few years ago, the Australian Research Council (ARC) had a ranking of publication outlets produced. For computer science, the exercise was outsourced to CORE, the association of Australian and NZ CS departments (Oz equivalent of CRA). It categorised conferences (and journals) into A*, A, B and C venues.

I have in the past stated what I think of that list: very little. In short, I think it’s highly compromised and an embarrassment for Australian computer science. And I’m outright appalled when I see that other countries are adopting the “CORE Rankings”!

The ARC disendorsed the rankings in 2011. Yet, in 2013, CORE decided to maintain and update it. I argued that updating with a similar process as the original one will not improve the list.

The Fellows Letter

Now, some senior colleagues (Fellows of the Australian Academy of Science) have written an open letter, denouncing not only the CORE list, but basically any use of publication venues as an indicator of research quality.

The letter was, apparently, written by Prof Bob Williamson from the ANU, and fellow group leader at NICTA. Bob is a guy I have a lot of respect for, and we rarely disagree. Here we do completely. I also highly respect the other Fellows (one of them is my boss).

The Fellows essentially argue (with more clarification by Bob) that looking at where a person has published is useless, and the right way to judge a researcher’s work is to read their papers.

What I think

With all respect, I think this is just plain nonsense:

  1. These rankings exist, like it or not. In fact, we all use them all the time. (Ok, I cannot prove that the “all” bit is strictly true, some, like Bob, may not, the rest of us do.) When I look at a CV, the first thing I look for is where did they publish. And I claim that is what most people do. And I claim it makes sense.

    Fact is that we know what the “good” venues are in our respective disciplines. This is where we send our papers to, this is where we tell our students and ECRs they need to get their papers accepted. They are the yardsticks of the community, like it or not, it is where you publish to have impact. Publishing in the right venues leads to high citations, publishing in the wrong ones doesn’t.

    Of course, we really only understand the venues in our own sub-disciplines, and may be a few neighbouring ones. So, collecting and documenting these top venues across all of CS isn’t a bad thing, it creates clarity.

  2. The idea that someone can judge a person’s work simply by reading some of their papers (even the self-selected best ones), with respect, borders on arrogance. In effect, what this is saying is that someone from a different sub-discipline can judge what is good/significant/relevant work!

    If this was true, then we as a community could reduce our workload drastically: We’d stop having conference PCs where everyone has to read 30 papers, and every paper gets at least half a dozen reviews before being accepted (as at OSDI, where I’m presently struggling to get all my reviews done). Instead, every conference would simply convene a handful of Bobs, divide the submissions between them, and each decides which one of their share of the papers should be accepted.

    Of course, things don’t work like this, for good reasons. I’ve served on enough top-tier conference PCs to have experienced plenty of cases where the reviews of discipline experts diverge drastically on multiple papers. In my present OSDI stack of 29 papers this is true for about 35% of papers: 10 papers have at least one clear reject and one clear accept! And it is the reason why each paper gets at least 6 reviews: we get the full spectrum, and then at the PC meeting work out who’s right and who’s wrong. The result is still imperfect, but vastly superior to relying on a simple opinion.

    Now these reviewers are the discipline experts (in this case, leading researchers in “systems”, incorporating mostly operating systems and distributed systems). If you get such a diversity of opinions within such a relatively narrow subdiscipline, how much would you get across all of computer science? I certainly would not claim to be able to judge the quality of a paper in 80% of computer science, and someone thinks they can, then my respect for them is taking a serious hit.

    In summary, I think the idea that someone, even one of the brightest computer scientists, can judge an arbitrary CS paper for its significance is simply indefensible. An expert PC of a top conference accepting a paper has far more significance than the opinion of a discipline outsider, even a bright one!

  3. Of course, that doesn’t justify using the publication outlets as the only criterion for promotion/hiring or anything else. That’s why we do interviews, request letters etc. Also, I definitely believe that citations are a better metric (still imperfect). But citations are a useless measure for fresh PhDs, and mostly of not much use for ECRs.

  4. Nor do I want to defend the present CORE list in any way. I said that before, but I’m repeating for completeness: the present CORE list is the result of an utterly broken process, is completely compromised, and an embarrassment for Australian computer science. And any attempt to fix it by using the existing process (or some minor variant of it) is not going to fix this. The list must either be abandoned or re-done from scratch, using a sound, robust and transparent process.

  5. My arguments only are about top venues. A track record of publishing in those means something, and identifying across all of CS what those top venues are has a value. By the same token I believe trying to categorise further (i.e. B- and C-grade venues, as done in the CORE list) is a complete waste of time. Publishing in such venues means nothing (other than positively establishing that someone has low standards). So, if we bother to have a list, it should only be a list of discipline top venues, nothing more.