First-class polymorphic function values in shapeless (1 of 3) — Function values in Scala

Posted: 27 Apr 2012
by: Miles Sabin

Miles Sabin — 27 April 2012 [edit]

One of the distinguishing features of the HList (a data structure which combines the characteristics of both sequences and tuples) implementation in shapeless is its support for a map() higher-order function which, superficially at least, appears to operate very similarly to the one defined on ordinary Scala collection types like List,

scala> import shapeless._ ; import HList._

// List[Int]
scala> List(1, 2, 3) map singleton
res0: List[Set[Int]] = List(Set(1), Set(2), Set(3))

// List[String]
scala> List("foo", "bar", "baz") map singleton
res1: List[Set[String]] = List(Set(foo), Set(bar), Set(baz))

// HList
scala> (23 :: "foo" :: false :: HNil) map singleton
res2: Set[Int] :: Set[String] :: Set[Boolean] :: HNil =
      Set(23)  :: Set(foo)    :: Set(false)   :: HNil

(here singleton() is a function which makes a single element Set from its argument).

Arranging for things to work out this smoothly involves quite a bit of hidden sophistication, and in this short series of articles I’ll explain what that sophistication is and why it’s necessary.

The first observation to make about the REPL transcript above is that the singleton() function is an argument to the map() higher-order functions defined for List and HList. This is part and parcel of Scala making good on its claim to blend object-oriented and functional programming styles by representing functions as values.

The second observation to make is that this function is being applied to arguments of several different types — Int, String, and Boolean. In other words, the function is polymorphic in its argument type.

Given that there’s nothing at all unusual looking about the first two uses of map above (over the vanilla lists) you might put these two observations together and quite reasonably conclude that Scala directly supports polymorphic function values.

That’s not the case, however. Appearances to the contrary, all Scala values (and hence Scala function values in particular) are monomorphic (in the sense relevant here, see qualifications below), and representing polymorphic functions requires some work.

Before we get to that, we need to understand method-level parametric polymorphism in Scala. And we need to understand the differences between between Scala’s methods and Scala’s function values. With that in hand we’ll be able to see why Scala’s standard function values can only be monomorphic. That will set the stage for our exploration of ways to encode polymorphic function values in parts 2 and 3 of this series.

Method-level parametric polymorphism

The best way to get to grips with method-level parametric polymorphism is to compare the definitions and uses of a monomorphic and a polymorphic method. At the point of definition the constrast is simply between those methods with arguments of fixed types and those with arguments which have a type that is bound by a method-level type parameter. So, for example,

// Monomorphic methods have type parameter-free signatures
def monomorphic(s: String): Int = s.length

monomorphic("foo") 

// Polymorphic methods have type parameters in their signatures
def polymorphic[T](l: List[T]): Int = l.length

polymorphic(List(1, 2, 3))
polymorphic(List("foo", "bar", "baz"))

Monomorphic methods can only be applied to arguments of the fixed types specified in their signatures (and their subtypes, I’ll come back to this in a moment), whereas polymorphic methods can be applied to arguments of any types which correspond to acceptable choices for their type parameters — in the example just given we can apply monomorphic() to values of type String only, but we can apply polymorphic() to values of type List[Int] or List[String] or … List[T] for any type T.

Of course, Scala is both an object-oriented and a functional programming language, so as well as parametric polymorphism (ie. polymorphism captured by type parameters) it also exhibits subtype polymorphism. That means that the methods that I’ve been calling monomorphic are only monomorphic in the sense of parametric polymorphism and they can in fact be polymorphic in the traditional object-oriented way. For instance,

trait Base { def foo: Int }
class Derived1 extends Base { def foo = 1 }
class Derived2 extends Base { def foo = 2 }

def subtypePolymorphic(b: Base) = b.foo

subtypePolymorphic(new Derived1) // OK: Derived1 <: Base
subtypePolymorphic(new Derived2) // OK: Derived2 <: Base

Here the method subtypePolymorphic() has no type parameters, so it’s parametrically monomorphic. Nevertheless, it can be applied to values of more than one type as long as those types stand in a subtype relationship to the fixed Base type which is specified in the method signature — in other words, this method is both parametrically monomorphic and subtype polymorphic.

It’s parametric polymorphism that I’ll be talking about in the remainder of this article and the sequel — I won’t be mentioning subtype polymorphism again, and from now on I’ll just talk about methods and functions being monomorphic or polymorphic without a “parametric” qualifier.

Methods vs. function values

So far I’ve been talking about Scala methods … now we need to understand how they differ from Scala function values. Scala methods are exactly like Java methods: they’re components of classes or traits and aren’t first-class values in their own right. Scala function values, on the other hand, are first-class values and are represented by JVM-level classes rather than being components of some other class. It’s the first-class value nature of Scala function values which allows them to be passed as arguments to the higher-order functions and methods which give Scala it’s functional flavour.

Nevertheless, Scala methods can have a very function-like feel, particularly when they’re defined on Scala objects used as modules, or nested inside other method or function definitions. In these cases they appear to be free floating, lacking the implicit left-hand-side “receiver” argument characteristic of object-oriented methods. And in many cases they can be used in applications of higher-order functions without any visible additional ceremony. For example,

scala> object Module {
     |   def stringSingleton(s: String) = Set(s)
     | }
defined module Module

scala> import Module._
import Module._

scala> stringSingleton("foo")
res0: Set[String] = Set(foo)

scala> List("foo", "bar", "baz") map stringSingleton
res1: List[Set[String]] = List(Set(foo), Set(bar), Set(baz))

The stringSingleton() method of the Module object appears to be indistinguishable from a first-class function value. But the appearances are deceptive. The method isn’t free-standing: we could have used this in its body and it would have referred to the Module singleton object, even after the import. And it’s not the method which is passed to map — instead a transient function value is implicitly created to invoke the stringSingleton() method (this is a process known as eta-expansion) and it’s that function-value which is passed to map.

Fortunately these mechanics are completely invisible most of the time, but they’re relevant to us now, so let’s make them a bit more visible. We can ask explicitly for the Scala compiler to give us the eta-expanded function value, allowing us to give it a name and discover its type. We do this using Scala’s multipurpose “_” — in this manifestation it’s acting as a function-value-forming operator,

scala> val stringSingletonFn = stringSingleton _
stringSingletonFn: (String) => Set[String] = <function1>

scala> stringSingletonFn("foo")
res2: Set[String] = Set(foo)

scala> List("foo", "bar", "baz") map stringSingletonFn
res3: List[Set[String]] = List(Set(foo), Set(bar), Set(baz))

This sequence is exactly equivalent to what we had before, but now we can see that a new function value of type (String) => Set[String] has been created — it’s this which is passed to map.

In this instance both the method and the eta-expanded function value are monomorphic. Let’s see what happens if we try to do the same thing with a polymorphic method,

scala> def singleton[T](t: T) = Set(t)
singleton: [T](t: T)Set[T]

scala> singleton("foo")
res4: Set[java.lang.String] = Set(foo)

scala> singleton(23)
res5: Set[Int] = Set(23)

So far so good — our method can be applied at arbitrary types as expected. Now let’s try explicitly eta-expanding this, as we did in the monomorphic case,

scala> val singletonFn = singleton _
singletonFn: (Nothing) => Set[Nothing] = <function1>

scala> singletonFn("foo")
<console>:14: error: type mismatch;
 found   : java.lang.String("foo")
 required: Nothing
       singletonFn("foo")
                   ^

scala> singletonFn(23)
<console>:14: error: type mismatch;
 found   : Int(23)
 required: Nothing
       singletonFn(23)
                   ^

Ouch! Something’s gone badly wrong here. Look at the type inferred for singletonFn! In the earlier monomorphic case, the type of our eta expanded function was quite straightforward and unsurprising: (String) => Set[String]. But here we have (Nothing) => Set[Nothing] — where has this Nothing come from? And what’s happened to the polymorphism of the underlying method? To see what’s going on we’ll need to dig a little deeper into Scala’s function types and how function values are represented.

Scala function types

One of the joys of Scala’s mixed object-functional design is that there’s nothing particularly special about function types: (String) => Set[String] is simply syntactic sugar for one of a family of FunctionN traits, one for each function arity between 0 and (somewhat arbitrarily) 22. In this particular case we have a function with a single argument, so the function value is an instance of the Function1 trait. Simplifying a little, that trait and the implementation of the stringSingletonFn instance of it looks like this,

trait Function1[-T, +R] {
  def apply(v: T): R
}

val stringSingletonFn = new Function1[String, Set[String]] {
  def apply(v: String): Set[String] = Module.stringSingleton(v)
}

The crucial thing to notice is that the function’s argument and result type parameters are all declared at the trait level. This has two consequences. The first is that when we create an instance of Function1 we (or the compiler) must choose the argument and result types at that point. In the eta-expansion expression above we haven’t explicitly specified the argument type, so the compiler is left to infer it, and since there is no useful information for it to work with, it fills the argument type in as Nothing.

The second is an immediate consequence of the first — because we (or the compiler) must choose the argument and result types at the point at which the function value is created they are fixed, once and for all, from that point on. This means that even if we were to eliminate Nothing by specifying an argument type we would still have a problem,

scala> val singletonFn: String => Set[String] = singleton _
singletonFn: (String) => Set[String] = <function1>

scala> singletonFn("foo")
res6: Set[String] = Set(foo)

scala> singletonFn(23)
<console>:14: error: type mismatch;
 found   : Int(23)
 required: String
       singletonFn(23)
                   ^

Having specified that we want our function value instantiated to accept String arguments we are stuck with that choice forever after. Or, in other words, we have completely lost the polymorphism of the underlying method.

Returning to our example from the introduction — where we have an ordinary Scala List, if we take the above polymorphic method definition of singleton() we get an appearance of polymorphic function values,

scala> def singleton[T](t: T) = Set(t)
singleton: [T](t: T)Set[T]

// eta-expanded to Int => Set[Int]
scala> List(1, 2, 3) map singleton
res0: List[Set[Int]] = List(Set(1), Set(2), Set(3))

// eta-expanded to String => Set[String] 
scala> List("foo", "bar", "baz") map singleton
res1: List[Set[String]] = List(Set(foo), Set(bar), Set(baz))

This is because the polymorphic method is eta-expanded each time it’s used as an argument to map(), first with T instantiated as Int and then with it instantiated as String. But in both cases the resulting function value has a fixed argument type, ie. is monomorphic. And that’s just fine, because each particular List only has elements of one fixed type as well.

But now consider the HList case — here we have only one application of map(), hence we only have one opportunity for eta-expansion. This can only deliver one standard Scala function value, and that function value can be applied to arguments of just one fixed type. But we need more than one … Oops!

So here we have the crux of the problem — we can have first-class monomorphic function values and we can have second-class polymorphic methods, but we can’t have first-class polymorphic function values … at least we can’t with the standard Scala definitions. And yet it’s first-class polymorphic function values that we need to map over an HList … what to do?

In the next article in this series I’ll start explaining how we can get the best of both worlds …

Discussion

Heiko Seeberger (@hseeberger) – Sat, 28th Apr 2012, 10:39am GMT

Excellent introduction to Scala’s blend of OO and FP, one of the most important language features. Keep on the good work!

Antonio Jr. – Fri, 13th Jul 2012, 10:13pm GMT

Nice article. But maybe you can fix two listings,

val stringLengthFn = new Function1[String, List[String]] {
  def apply(v : String) : Int = Module.stringLength(v)
}

It should be Module.stringSingleton(v) and on,

scala> def singleton[T](t : T) = List(t)
singleton: [T](t: T)List[T]

scala> List(1, 2, 3) map singleton              // eta-expanded to Int => Set[Int]
res0: List[Set[Int]] = List(Set(1), Set(2), Set(3))

It should be,

def singleton[T](t : T) = Set(t)

or else,

res0: List[List[Int]] = List(List(1), List(2), List(3)).

Miles Sabin (@milessabin) – Sat, 14th Jul 2012, 12:53pm GMT

@Antonio Good catch … thanks for letting me know.