Unboxed union types in Scala via the Curry-Howard isomorphism

Scala has a highly expressive type system, but it doesn’t include everything you might find yourself hankering after — at least, not as primitives. There are a few genuinely useful things which fall under this heading — higher-rank polymorphic function types and recursive structural types are two I’ll talk about more in later posts. Today I’m going to show how we can encode union types in Scala, in the course of which I’ll have an opportunity to shed a little light on the Curry-Howard isomorphism and show how it can be put to work for us.

So, first up, what is a union type? A union type is pretty much what you’d expect: it’s the union of two (or more, but I’ll limit this discussion to just two) types. The values of that type are all of the values of each of the types that it’s the union of. An example will help to make this clear, but first a little notation — for reasons which will become apparent later I’ll write the union of types T and U as T ∨ U (ie. the two types flanking the logical or’ operator), and so we write the union of the types Int and String as Int ∨ String. The values of this union type are all the values of type Int and all the values of type String.

What does this mean more concretely? It means that if we could express this type directly in Scala we would be able to write,

def size(x: Int  String) = x match {
  case i: Int => i
  case s: String => s.length
}

size(23) == 23   // OK
size("foo") == 3 // OK
size(1.0)        // Not OK, compile time error

In other words, the size method would accept arguments of either type Int or type String (and their subtypes, Null and Nothing) and nothing else.

It’s important to recognize the difference between this use of a union type and the similar use of Scala’s standard Either. Either is what’s known as a sum type, the analog of union types in languages which don’t support subtyping. Recasting our example in terms of Either we get,

def size(x: Either[Int, String]) = x match {
  case Left(i) => i
  case Right(s) => s.length
}

size(Left(23)) == 23    // OK
size(Right("foo")) == 3 // OK

Either[Int, String] can model the union type Int ∨ String because there is an isomorphism between the two types and their values. But equally clearly the Either type manages this by way of a layer of boxed representation, rather then by being an unboxed primitive feature of the type system. Can we do better than Either? Can we find a way of representing union types in Scala which doesn’t require boxing, and which provides all of the static guarantees we would expect?

It turns out that we can, but to get there we have take a detour through first-order logic via the Curry-Howard isomorphism. Curry-Howard tells us that the relationships between types in a type system can be viewed as an image of the relationships between propositions in a logical system (and vice versa). There are various ways that we can fill that claim out, depending on the type system we’re talking about and the logical system we’re working with, but for the purposes of this discussion I’m going to ignore most of the details and focus on simple examples.

To illustrate Curry-Howard (in the context of a type system with subtyping like Scala’s), we can see that there is a correspondence between intersection types (A with B in Scala) and logical conjunction (A ∧ B); between my hypothetical union types (A ∨ B) and logical disjunction (also A ∨ B, as hinted earlier); and between subtyping (A <: B in Scala) and logical implication (A ⇒ B). On the left hand side of each row in the table below we have a subtype relationship which is valid in Scala (although, in the case of the union types at the bottom, not directly expressible), and on the right hand side we have a logical formula which is obtained from the type relationship on the left by simply rewriting (with to and <: to ) — in each case the result of the rewriting is a logically valid.

(A with B) <: A (A ∧ B) ⇒ A
(A with B) <: B (A ∧ B) ⇒ B
A <: (A ∨ B) A ⇒ (A ∨ B)
B <: (A ∨ B) B ⇒ (A ∨ B)

The essence of Curry-Howard is that this mechanical rewriting process (whichever direction you go in) will always preserve validity — valid type formulae will always rewrite to valid logical formulae, and vice versa. This isn’t only true for conjunction, disjunction and implication. We can also generalize the correspondence to logical formulae which include negation (the key one for us here) and universal and existential quantification.

So what would it mean to add negation to the mix? The conjunction of two types (ie. A with B) has values which are instances of both A and B, so similarly we should expect the negation of a type A (I’ll write it as ¬[A]) to have as it’s values everything which isn’t an instance of A. This is also something which can’t be directly expressed in Scala, but suppose it was?

If it was, then we would be able to crank on the Curry-Howard isomorphism and De Morgan’s laws to give us a definition of union types in terms of intersection types (A with B) and type negation. Here’s how that might go …

First recall the De Morgan equivalence,

(A  B)  ¬(¬A  ¬B)

Now apply Curry-Howard (using Scala’s =:= type equality operator),

(A  B) =:= ¬[¬[A] with ¬[B]]

If we could work out a way of expressing this in Scala, we’d be home and dry and have our union types. So can we express type negation?

Unfortunately we can’t. But what we can do is transform all of our types in a way which allows negation to be expressed in the transformed context. We’ll then need to work out how make that work for us back in the original untransformed context.

Some readers might have been a little surprised earlier when I illustrated Curry-Howard using intersection types as the correlate of conjunction, union types as the correlate of disjunction and the subtype relation as the correlate of implication. That’s not how it’s normally done — usually product types (ie. (A, B)) model conjunction, sum types (ie. Either[A, B]) model disjunction and function types model implication. If we recast our earlier table in terms of products, sums and functions we end up with this,

(A, B) => A (A ∧ B) ⇒ A
(A, B) => B (A ∧ B) ⇒ B
A => Either[A, B] A ⇒ (A ∨ B)
B => Either[A, B] B ⇒ (A ∨ B)

On the left hand side we’re no longer looking for validity with respect to the subtype relation, instead we’re looking for evidence of the principle of parametricity, which allows us to determine if a function type is implementable just by reading it’s signature. It’s clear that all the function signatures on the left in the table above can be implemented — for the first two we have an (A, B) pair as our function argument, so we can easily evalutate to either an A or a B, using _1 or _2,

val conj1: ((A, B)) => A = p => p._1
val conj2: ((A, B)) => B = p => p._2

and for the last two we have either an A or a B as our function argument, so we can evalute to Either[A, B] (as Left[A] or Right[B] respectively).

val disj1: A => Either[A, B] = a => Left(a)
val disj2: B => Either[A, B] = b => Right(b)

This is the form in which the Curry-Howard isomorphism is typically expressed for languages without subtyping. Because this mapping doesn’t reflect subtype relations it isn’t going to be much direct use to us for expressing union types which, like intersection types, are inherently characterized in terms of subtyping. But it can help us out with negation, which is the missing piece that we need.

Either with or without subtyping, the bottom type (Scala’s Nothing type) maps to logical falsehood, so for example, the following equivalences all hold,

A => Either[A, Nothing] A ⇒ (A ∨ false)
B => Either[Nothing, B] B ⇒ (false ∨ B)

because the function signatures on the left are once again all implementable, and the logical formulae on the right are again all valid (see this post from James Iry for an explanation of why I haven’t shown the corresponding cases for products/conjunctions). Now we need to think about what a function signature like,

A => Nothing

corresponds to. On the logical side of Curry-Howard this maps to A ⇒ false, which is equivalent to ¬A. This seems fairly intuitively reasonable — there are no values of type Nothing, so the signature A => Nothing can’t be implemented (other than by throwing an exception, which isn’t allowed).

Let’s see what happens if we take this as our representation of the negation of a type,

type ¬[A] = A => Nothing

and apply it back in the subtyping context that we started with to see if we can now use De Morgan’s laws to get the union types we’re after,

type [T, U] = ¬[¬[T] with ¬[U]]

We can test this using the Scala REPL, which will very quickly show us that we’re not quite there yet,

scala> type ¬[A] = A => Nothing
defined type alias $u00AC

scala> type [T, U] = ¬[¬[T] with ¬[U]]
defined type alias $u2228

scala> implicitly[Int <:< (Int  String)]
<console>:11: error: Cannot prove that Int <:< 
  ((Int) => Nothing with (String) => Nothing) => Nothing.
       implicitly[Int <:< (Int  String)]

The expression implicitly[Int <:< (Int ∨ String)] is asking the compiler if it can prove that Int is a subtype of Int ∨ String, which it would be if we had succeeded in coming up with an encoding of union types.

So what’s gone wrong? The problem is that we have transformed the types on the right hand side of the <:< operator into function types so that we can make use of the encoding of type negation as A => Nothing. This means that the union type is itself a function type. That’s clearly not consistent with Int being a subtype of it — as the error message from the REPL shows. To make this work, then, we also need to transform the left hand side of the <:< operator into a type which could possibly be a subtype of the type on the right. What could that transformation be? How about double negation?

type ¬¬[A] = ¬[¬[A]]

Lets see what the compiler says now,

scala> type ¬¬[A] = ¬[¬[A]]
defined type alias $u00AC$u00AC

scala> implicitly[¬¬[Int] <:< (Int  String)]
res5: <:<[((Int) => Nothing) => Nothing,
  ((Int) => Nothing with (String) => Nothing) => Nothing] =
    <function1>

scala> implicitly[¬¬[String] <:< (Int  String)]
res6: <:<[((String) => Nothing) => Nothing,
  ((Int) => Nothing with (String) => Nothing) => Nothing] =
    <function1>

Bingo! ¬¬[Int] and ¬¬[String] are both now subtypes of Int ∨ String!

Let’s just check that this isn’t succeeding vacuously,

scala> implicitly[¬¬[Double] <:< (Int  String)]
<console>:12: error: Cannot prove that
  ((Double) => Nothing) => Nothing <:<
    ((Int) => Nothing with (String) => Nothing) => Nothing.

We’re almost there, but there’s one remaining loose end to tie up — we have subtype relationships which are isomorphic to the ones we want (because ¬¬[T] is isomorphic to T), but we don’t yet have a way to express those relationships with respect to the the untransformed types that we really want to work with.

We can do that by treating our ¬[T], ¬¬[T] and T ∨ U as phantom types, using them only to represent the subtype relationships on the underlying type rather that working directly with their values. Here’s how that goes for our motivating example,

def size[T](t: T)(implicit ev: (¬¬[T] <:< (Int  String))) =
  t match {
    case i: Int => i
    case s: String => s.length
  }

This is using a generalized type constraint to require the compiler to be able to prove that any T inferred as the argument type of the size method must be such that it’s double negation is a subtype of Int ∨ String. That’s only ever true when T is Int or T is String, as this REPL session shows,

scala> def size[T](t: T)(implicit ev: (¬¬[T] <:< (Int  String))) =
     |   t match {
     |     case i: Int => i
     |     case s: String => s.length
     |   }
size: [T](t: T)(implicit ev: <:<[((T) => Nothing) => Nothing,
  ((Int) => Nothing with (String) => Nothing) => Nothing])Int

scala> size(23)
res8: Int = 23

scala> size("foo")
res9: Int = 3

scala> size(1.0)
<console>:13: error: Cannot prove that
  ((Double) => Nothing) => Nothing <:<
    ((Int) => Nothing with (String) => Nothing) => Nothing.

One last little trick to finesse this slightly. The implicit evidence parameter is syntactically a bit ugly and heavyweight, and we can improve things a little by converting it to a context bound on the type parameter T like so,

type |∨|[T, U] = { type λ[X] = ¬¬[X] <:< (T  U) }

def size[T: (Int |∨| String)#λ](t: T) =
  t match {
    case i: Int => i
    case s: String => s.length
  }

And there you have it — an unboxed, statically type safe encoding of union types in unmodified Scala!

Obviously it would be nicer if Scala supported union types as primitives, but at least this construction shows that the Scala compiler has all the information it needs to be able to do that. Now we just need to pester Martin and Adriaan to make it directly accessible.