An abstract for the paper I’m writing

yo dawg —
    i heard you like concurrency libraries,
so i built a concurrency abstraction
    for building concurrency abstractions,
so you can use my library
    for implementing your library

(see this if you’re confused)

Visualizing contention’s cost

This is a quick follow-up to the previous post, with a lot more data.

The basic point is that the factors determining parallel speedup radically change depending on the grain of parallelism.

Read More

The quantum mechanics of parallelism

"Quantum mechanics differs significantly from classical mechanics in its predictions when the scale of observations becomes comparable to the atomic and sub-atomic scale, the so-called quantum realm." (Wikipedia)

And so it is with thread-level parallelism. Every thread performs some local work before needing to coordinate with other threads. When the scale of that work shrinks to the scale of coordination primitives—call it the quantum realm—Amdahl’s law no longer readily applies.

Unlike physical quantum phenomena, it’s easy enough to observe this breakdown in practice.

Read More

… and also hit up San Francisco, finally!

… and also hit up San Francisco, finally!


Visited Stockholm for the UPMARC summer school

Visited Stockholm for the UPMARC summer school


MFPS 2011

MFPS 2011 is winding down. It turned out to be a very enjoyable conference — I understood many more of the talks than I expected. Aside from the many excellent separation logic talks, the standouts for me were:

Neel Krishnaswami's talk on higher-order FRP, which amongst other things introduced an elegant, practical type system for avoiding space leaks.

Stephanie Weirich's talk on the Trellys project, which attempts to make dependent types practical for programming, not just for proof assistants. You begin with a full-blown programming language (including full recursion and other effects), which is logically inconsistent. Then you use a separate typing judgment to pick out a “logical” sublanguage, which does coincide with a logic via Curry-Howard. There are many interesting design choices about how these two languages relate, and how to achieve proof irrelevance at the right points. The upshot is that dependent types can be introduced gradually into a program.

Alex Simpson's talk about verification of probabilistic systems. The talk began with a summarization of his previous work on a sequent calculus for proving mu-calculus properties about CCS processes. I had previously seen this work, but he explained it in a way that was new to me: after giving a game semantics for mu-calculus formulas, he proceeded to motivate all of the sequent calculus rules via this game semantics — in an almost mechanical fashion. This seems like a very useful technique.

Vasileios Koutavas's talk showing how any of several naive extensions to applicative bisimulation fail to be sound, via a series of counterexamples. The conclusion in the end is that you really do need the machinery of environmental bisimulation to handle many interesting language features. I suspect the paper will serve as a great way to understand the subtleties of these language features, and I plan to read it soon.

Aside from watching these wonderful talks, I also gave a presentation of my own, about new full abstraction results for the pi-calculus using ideas from separation logic. Here’s the abstract:

Read More

STM ‘77 - ‘06

I’ve been perusing the first edition of Transactional Memory, trying to get to grips with the space of STM implementations. Below are some of the milestones in scalable multiprocessor implementations. All the implementations are free from deadlock and are linearizable: transactions appear to take place atomically. However, some provide stronger liveness and safety properties.


An implementation is obstruction-free if a (finite) transaction is guaranteed to succeed when executed in isolation, regardless of the ambient state of memory. In other words, a transaction can only fail due to active interference from other threads. It cannot be passively obstructed by another thread that happens to hold a lock. Thus a thread descheduled by a page fault or preemption cannot block other transactions.

An implementation is lock-free if some (finite) transaction is always guaranteed to succeed, even if the scheduler is unfair.

A blocking implementation guarantees, at best, livelock freedom given a fair scheduler.


An additional safety property is consistency, meaning that the reads performed within a transaction could, at all times, form an atomic snapshot of memory from some point in the past. This must always be true by the time a transaction commits (to ensure transactional atomicity), but a “doomed” transaction that has seen inconsistent values might keep running for a time before being aborted and retried. If an implementation does not guarantee consistency, transactional code cannot rely on invariants that relate different words of memory. This possibility raises the specter of transactional code spuriously throwing an exception or going into a loop.

Each implementation below is classified according to its progress guarantee (blocking, obstruction-free, lock-free) and its consistency guarantee.

Read More

Domain theory slogans

I recently gave a tutorial on domain theory, which I tried to motivate from operational intuitions. The outline for the talk was the series of slogans below.

Answers != Meaning

It’s not enough to think of the meaning of a program as the result it produces on evaluation. For example, what about lambdas, which traditionally evaluate to themselves? Makes it useless to compare them. And what about incomplete program fragments? Those need independent meaning if we want to talk about optimization, but they can’t be evaluated.

Read More

Online CS lectures

Ellen Petersen has put together “Lecturefox”, a collection of freely-available lectures from universities worldwide. There’s a nice selection of CS lectures including plenty of PL. See, for example, Dan Grossman’s PL class.

2 years ago -

Lightweight threading

Lightweight threads are a nice way to expose parallelism, especially when it’s unclear how to equally divide a problem in advance. If threads are cheap, programs can spawn thousands of them, corresponding to small chunks of work. The runtime system balances this work between the available processors, using for example work-stealing deques. Lightweight threads enable a number of fine-grained parallel programming models, with futures being the most prominent example.

What, in turn, enables lightweight threading? Generally not the kernel: trapping into kernel mode, dealing with full-blown OS scheduling, allocating a per-thread stack etc is too heavyweight. User-mode threading can do much better. User-mode thread creation and synchronization is an order of magnitude faster than the kernel equivalent, or at least was in 1992 — I don’t have a more recent comparison.

Mitch Wand’s famous paper, Continuation-based multiprocessing, shows how to implement kernel threads using call/cc. But the same technique can (and has, many times) been used for user-mode threading.

There’s a catch, though. If you want to spread your N user threads over M cores, you’ll need to multiplex them onto M kernel threads (i.e., hybrid threading). If one of those user threads calls a blocking operation, its supporting kernel thread is blocked. And if one of the kernel threads is preempted, so are all the user threads associated with it. Without additional kernel support, user-mode threading libraries must employ resizable kernel thread pools, and waste cycles managing them. Unfortunately, kernel support a la scheduler activations appears to be DOA.

Transactional Memory, 2nd edition

This is an invaluable resource for getting to grips with the range of STM implementation strategies: a freely available book from Tim Harris, James Larus, Ravi Rajwar.

2 years ago -

Unit testing in Scala

There are a bunch of nice testing frameworks for Scala. I’ve settled on SimpleSpec (which is based on Specs) because of its extremely lightweight, literate style:

object StackSpec extends Spec {
  class `An empty stack` {
    val stack = Stack()

    def `should have a size of zero` {
      stack.size must be(0)

    def `should be empty` {
      stack.isEmpty must be(true)

You get output like

Specification "StackSpec"
  An empty stack should
  +  have a size of zero
  +  be empty

showing that all tests passed. Together with, I’ve got continuous compilation and testing — and already caught a bug.

Lightweight pimp-my-library

Scala lets you effectively add declarations to a class externally to its definition, using implicit conversions. Unlike .NET’s extension methods, this pimp my library pattern does not actually change the underlying class, instead converting the class when the new features are used. In practice the conversion can often be compiled away.

You can give an implicit conversion to an anonymous type, which will be applied on a structural basis:

implicit def string2toInt(s: String): 
  { def toInt: Int } =  new {
  def toInt: Int =  Integer.parseInt(s)


Modern Microprocessors - A 90 Minute Guide!

This is a fantastic, concise, informative summary of three decades of processor design trends. Superscalar, VLIW, superpipelining, SMT, and all the related tradeoffs explained.

3 years ago -


Objectified is a 2009 documentary on design, part of a trilogy (the first being “Helvetica”, the third TBD). The film mostly consists of interviews with designers, but also has some pretty stunning footage of the manufacturing process. It articulates the quiet sense of pleasure and displeasure of interacting with man-made objects—and it seems that while designers disagree about many things, they are all very irritated by seemingly-small details that just aren’t quite right. Enjoyable, short, recommended.