Skip to content

Green computing through formal verification?

2013/03/18

This may seem like a long bow to draw, and I admit there’s a bit of marketing in the title, but there’s substance behind it: Verification can save energy in data centres, by significantly speeding up transaction processing.

How does that work?

The key is the defensive way many systems are built to cope with failure. Specifically database systems, which are behind e-commerce and many other web sites, are designed to protect against data loss in the case of system crashes. This is important: if you do a transaction over the web, you want to make sure that you aren’t charged twice, that if they charge you, they are actually going deliver etc. That wouldn’t be ensured (even if the vendor is trustworthy) if the server crashed in the middle of the transaction.

Databases avoid losses by ensuring that transactions (which is now a technical term) have the so-called ACID properties of atomicity, consistency, integrity and durability. The ones relevant to this discussion are atomicity (meaning that a transaction is always either completed in totality or not at all) and durability (meaning that once a transaction is completed successfully, its results cannot be lost). Databases achieve atomicity by operating (at least logically) on a temporary copy of the data and making all changes visible at once, without ever exposing intermediate state.

Durability is generally ensured by write-ahead logging: The database appends the intended effect of the transaction to a log, which is kept in persistent storage (disk), before letting the transaction complete and making its results visible to other transactions. If the system (the database itself or the underlying operating system) crashes halfway through, the database at restart goes through a recovery process. It will replay operations from the log, as long as the log records a complete transaction, and discards incomplete transaction logs. This ensures that only complete transaction effects are visible (atomicity).

Durability is only ensured if it can be guaranteed that the complete log entry of any completed transaction can be recovered. This requires that the database only continues transaction processing once it is certain that the complete transaction log is actually safely recorded on the disk.

This certainty does not normally exist for write operations to disk. Because disks are very slow (compared to RAM), the operating system does its best to hide the latency of writes. It caches writes to files in buffers in RAM (the so-called block cache), and performs the physical writes asynchronously, that is it overlaps them with continuing execution. This is fine, except if you want to ensure durability of your transactions!

Databases therefore perform log writes synchronously. They use an OS system call called “sync()” (to be precise, it’s “fsync()” in Unix/Linux systems). This call simply blocks the caller until any pending writes on the respective file have finished. The database then knows that the data are on disk, and can proceed, knowing that durability is maintained.

What’s the problem with this?

There’s a significant cost, in terms of latency. Disk writes are slow, and the database has to wait until completion. In some cases, it can spend the majority of the time waiting and twiddling thumbs.

What does formal verification have to do with this?

If the underlying operating system is guaranteed not to crash, then we don’t need to block. The database can continue processing, and the OS does the write asynchronously.

This is essentially the thought I had while sitting in a database seminar a few years ago. I was thinking “you guys go though such a lot of trouble to ensure you don’t lose data. We could do that so much easier!”

In fact, that’s what we’re doing now, in a system we call RapiLog. We’re using the stability guarantees of our formally verified  seL4 microkernel to get the best of both worlds: the performance advantage of asynchronous I/O, and the durability guarantees of synchronous writes. And does it ever work! We’re observing a doubling of database throughput under the right circumstances!

And increased throughput means that you can process more requests with the same hardware. This means that your data centre can be smaller, which doesn’t only reduce capital cost, but also running cost, most of which is the cost of electricity for powering the computers and for air conditioning. That’s the link to green computing.

And the cool thing is that we can do this with unmodified legacy systems: We run an unmodified database binary (including commercial databases where we have no source) on an unmodified Linux system. We just put it into a virtual machine and let an seL4-based “virtual disk” do all the magic.

How about power blackouts?

Good question! What happens if mains power suddenly goes down?

For the traditional database setup that isn’t a problem. Either the transaction is recorded in a log entry that is safely on disk when power goes, in which case the transaction is durable and will be recovered at restart time. Or there is no (complete) log entry, in which case the transaction was never completed, and is discarded at restart. But with seL4?

Turns out we can deal with that too. Computer power supply have capacitors which store some electrical energy. When mains power is cut, that energy can run the computer for a bit. Not a long time, only some 100 milliseconds. But that is plenty of time to flush out any remaining log data to the disk. Turns out that all we need is about 20 milliseconds, which is very comfortably inside the window given by the capacitors. All we need is an immediate warning (in form of an interrupt raised by hardware).

Implications on system complexity

Maybe even more important than the performance improvement is the potential for reducing system complexity. Complex systems are expensive (in terms of development and maintenance). They are also error prone, meaning more likely to fail. Reducing complexity is always a good thing. And we have shown that some of the complexity (in this case in database systems) can be avoided if we can provide guarantees about the reliability of the software. Which is what our Trustworthy Systems work is all about.

Want to know more?

Read the paper. It will be presented at next months’ EuroSys Conference in Prague.

8 Comments
  1. Blog bug report: replace “concurrency” with “consistency” in the definition of ACID

  2. Oops! Thanks for that, fixed now.

  3. John Regehr permalink

    Hi Gernot, this looks like great work, but let me play devil’s advocate for a bit. What if some of the defensive machinery has evolved in response not to blackouts and software bugs, but rather to some of the other bad things that can happen to data when you store a lot of it over a long period of time, such as various kinds of undetectable bit corruption in RAM, on busses, and on disk?

    It seems like at some level you need to take a very weak model of disk storage where bad things can happen to data at any time. On this kind of bad storage, no efficient filesystem will be able to protect from all errors, but perhaps you can show that your less-defensive system is no more vulnerable to these errors than are more traditional codes. The drawback is that this may be very hard.

    • Hi John,

      I don’t see a difference between traditional DBMS and RapiLog wrt data corruption – both are writing the same data.

      The one difference (which is spelt out in the paper) is fail-stop processor failures, which the traditional DBMS setup tolerates but RapiLog (at present) doesn’t. (It could in principle, by rebooting the HW and the microkernel without touching user memory, flush the data to disk and do a cold reboot. Present BIOSes don’t let us do this, they insist on a memory check on a machine-check interrupt, so we’d have to replace the BIOS…)

      • John Regehr permalink

        So I guess the question is, How much non-fail-stop behavior is seen in today’s datacenters and how well does the software deal with it?

        I’m not sure that reading the RAM following a hardware RAM corruption event is a good idea, it could be quite corrupted. If you’ve exploited the correctness proof to remove a lot of defensive code, this might well have the side effect of increasing the average time before you “notice” corrupted RAM by crashing.

      • I was’t talking about RAM corruption. A machine check would result from a CPU-internal irrecoverable fault. Of course, that could be triggered by a cache error, in which case we’d have a problem, but only if it happens while processing RapiLog-critical metadata or in the case of an I-cache problem that affects the correct operation of seL4 itself. That’s probably orders-of-magnitude less likely than other machine checks that could be handled.

        No-one is going to give you real data on this, but what I could find out from talking to people informally is that silent errors are much rarer than fail-stop (i.e. machine-check exception). In other words, manufacturers are doing a good job in either masking or detecting errors, and thus preventing silent data corruption.

        Are you going to be at CPS Week? Would be good to catch up.

  4. John Regehr permalink

    Hi Gernot, unfortunately I can’t make it to CPSWEEK. Will you be at PLDI? I’m very much looking forward to seeing the paper by Magnus et al.

Trackbacks & Pingbacks

  1. When “persistent” storage isn’t | microkerneldude

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: