[back to bqbase index]

Comparing Impact of Type Erasure in Scala and Java

The implementation of Java Generics in JDK1.5 and following uses an erasure scheme in order to be able to compile generic types to non-generic types understood by the JVM. This is in contrast to C#2.0, which is compiled to a .NET CLR that understands generics. Scala, being designed to interact smoothly with Java, uses erasure, too. There is some discussion on the net concerning erasure in Java -- enough motivation to compare ideas and limitations of the erasure approach as implemented in Scala and Java.

  • Generics Papers
  • Erasure Mantraps (variance, arrays, casting)
  • Pattern Matching
  • Scala's Compound Types
  • Conclusion

Generics Papers

The path from the theoretical F-bounded polymorphism to its application in object-oriented programming [0] and finally JDK1.5 and C# 2.0 has seen some considerable research. I want to list a non-exhaustive list of papers here that touch on generics, erasure and variance.

Java generics are based on the Pizza design by Odersky and Wadler [1], where erasure is called "homogeneous translation". A key property of this design was that it did not modify the JVM. It was possible to translate away generics source-to-source. Sun Microsystems was interested, Bracha, Odersky, Stoutamire and Wadler reworked the design for Java, yielding GJ [2].

People had something to discuss. The GJ compiler (with generics turned off) became JDK1.3. Igarashi, Pierce and Wadler published another formalization in their Featherweight Java calculus [3]. Thorup and Torgersen introduced use-site variance, applying virtual types [4]. Use-site variance was proven sound for GJ quite a bit later by Igarashi and Viroli [6].

While Sun hesitated on the generics issue, Kennedy and Syme provided a different design for .NET [5] -- one that does not use erasure. Finally, Torgersen et al. adapted use-site variance by introducing wildcards [8,9].

When JDK1.5 came out, all new facilities were introduced in a way that prioritized backwards-compatibility. The JVM was not affected by generics, and moreover erasure of the new collections libraries was equivalent with the old, non-generic ones. In contrast, C# 2.0 offered generics but currently does not tell a story on variance (a definition-site variance scheme for C# is proposed in [10]). The intermediary language (the virtual machine) changed to accomodate generics, so consequently the non-generic collection libraries were complemented by new, generic ones.

Scala is compiled to the JVM language and aims for interoperability, so the straightforward choice was to use erasure for generics. We should mention that Scala supported generic types from the beginning, which predates the JDK1.5 release by a couple of years. Due to our compiler's architecture, our upcoming .NET backend will use erasure as well, even though interoperability on .NET would probably mandate preserving types.

Erasure Mantraps

Java Generics are often used with Java's use-site variance mechanism, wildcard types. Hence a lot of complaints regarding wildcards and erasure are really problems related to variance. We begin with two examples that show the value of variance in Scala. Then we turn to some anomalies regarding arrays.

Variance I

Safalra gives examples of the problems in his entry titled What's Wrong With Java: Type Erasure. We shall follow through his examples and see how Scala behaves on them.

Java Scala
          String[] s = new String[10];
          Object[] o = s;
          o[0] = new Object();
          val s = new Array[String](19)
          val o: Array[Object] = x
          o(0) = new Object()
runtime java.lang.ArrayStoreException: java.lang.Object compile time error: type mismatch;
          found   : scala.Array[java.lang.String]
          required: scala.Array[java.lang.Object]
          

In Java, an array of type A covariant in A, meaning you can assign an array of type B as long as B is a subtype of A (we write B <: A). In Scala, Array is written like any other generic type, and moreover is invariant. The assignment is rejected by the compiler, avoiding the problem. Now let's look at the same example using generic types.

Java Scala
          Vector<String> s=new Vector<String>(10);
          Vector<Object> o=(Vector<Object>)s;
          o.add(new Object());
          val x: List[String] = List("one", "two", "three")
          val y: List[Object] = x
compile time error: inconvertible types
found   : java.util.Vector<java.lang.String>
required: java.util.Vector<java.lang.Object>
        Vector<Object> o = (Vector<Object>)s;
                                           ^
no error

Since in Scala, List is covariant (meaning A <: B implies List[A] <: List[B], the string list can be assigned to the object list variable. This is not related to erasure per se, but a matter of variance. While it is true that erasure makes Vector[String] and Vector[Object] indistinguishable to the JVM, the prime obstacle is the lack of variance in the type system.

Salafra is right when he points out that erasure makes it impossible to add the needed runtime check (which is in case of arrays). But it would be bad design to burden developers with runtime checks. It was bad design to make arrays covariant in the first place. In Scala (and a C# proposal [10]), class signature with a variant parameter are restricted in a way that avoids runtime checks. You simply cannot have methods behave in a way that would require runtime checks. Concretely, you cannot add an element to a List without allocating a new list object (which will be of the right type).

Variance II

On, Eric Gunnerson's blog Scott Wisniewski has problems in figuring out the right bounds for the generic types.

The motivation seems to have a signature def Foo[A,B](list:List[Pair[A, B]]): Unit but without making Foo polymorphic.

Java Scala
import java.util.LinkedList;

class Pair<T1, T2> {
    public T1 First;
    public T2 Second;
}

class HashSomeMethods {

    public static void Foo(LinkedList<Pair<?, ?>> list) {
        //... code here
    }
    public static void Main(String[] args) {
        Foo(new LinkedList<Pair<Integer, String>>());
    }
}
import java.util.LinkedList;

// standard library
case class Pair[+T1, +T2](val _1:T1, val _2:T2)

object HashSomeMethods {
    // pending proposal: Foo(list: List[Pair[_, _]])
    def Foo(list: List[Pair[Any, Any]]): Unit = { 
        //... code here
    }
    def main(String[] args) {
        Foo(List[Pair[Integer, String]](...));
    }
}
compile error:
Foo(java.util.LinkedList<Pair<?,?>>) in HashSomeMethods 
    cannot be applied to 
   (java.util.LinkedList<Pair<java.lang.Integer,java.lang.String>>)
        Foo(new LinkedList<Pair<Integer, String>>());
        ^
no error

The problem in Java is that although Pair<Integer, String> <: Pair<?,?>, this subtype relationship does not carry over to LinkedList<...>. Introducing the right bound solves the problem public static void Foo(LinkedList<? extends Pair<?,?>>. In Scala, this is not an issue, because we have definition-site variance. Since List is covariant, the compiler allows to pass (we plan to allow wildcard syntax for this, try the -Xexperimental option).

There is pending proposal that allows to use wildcard types in the Scala language. Their semantics is defined as an existential type whose bounds are determined by the context. For covariant types, they are simply replaced by the upper bound. The proposal would also allow Array[String] <: Array[_] , but let us first look at arrays in more detail.

Arrays

Java Generics do not allow the creation of arrays of a type parameter. Again, Scala does not know such a restriction, although it uses erasure.

Java Scala
class GenericArrayTest<T>{

  public <T> T[] returnArray(){
    return new T[10];
  }

}
class GenericArrayTest[T]{
  def returnArray: Array[T] = {
    return new Array[T](10);
  }
}
compile error:
generic array creation
    return new T[10];
           ^
no error

In Java, the problem can be circumvented by using reflect.Array.newInstance and a class object. Bruce Eckel used an object array with a cast instead.

class GenericArrayTest<T>{
    private Class<T> tClass; // initialised somewhere

    public <T> T[] returnArray() {
      return (T[])java.lang.reflect.Array.newInstance(tClass,10);
    }
}

Behind the scenes, Scala uses an array wrapper that is backed by an object array. It also contains the additional methods and a decent toString implementation that a Scala array offers with respect to a Java array. But since Scala aims for interoperability, arrays are turned into Java arrays when needed. Thus, the following code behaves the same.

Java Scala
Object[] x = new Object[3];
      String[] y = (String[])x;
val x = Array[Object]("a","b","c")
    val y:Array[String] = x.asInstanceOf[Array[String]]
runtime error:
java.lang.ClassCastException: [Ljava.lang.Object;
runtime error:
java.lang.ClassCastException: [Ljava.lang.Object;

We should mention that Java does not have a type that corresponds to the Array[_]. In that sense, Scala's wildcard proposal is a better approximation of existential types, because it treats arrays and other generic types uniformly.

Casting

When casting, all bets are off. Generic types are no exception, and both Java and Scala behave the same.

Java Scala
Vector<String> s = new Vector<String>(10);
Vector o = s;
o.add(new Object());
Object p = s.get(0);
String t = s.get(0); // boom
    
val s = new ListBuffer[String]
val o = s.asInstanceOf[ListBuffer[Object]]
o += new Object()
val p:Object = s(0)
val t:String = s(0) // boom
compile warning:
[unchecked] unchecked call to add(E) 
  as a member of the raw type java.util.Vector
runtime error:
java.lang.ClassCastException: java.lang.Object
runtime error:
java.lang.ClassCastException: java.lang.Object

Bridge methods

Bridge methods are additional methods that get inserted in order to "connect" implementations of methods in instances of generic interfaces. For instance assume the following definitions

interface Comparable<T> { public int compareTo(T b) }
class Foo implements Comparable<Foo>{
       public int compareTo(Foo b){
               return 0;
       }
}

Then consider a method that expects a Z and a Comparable<Z>. It cannot know the signature compareTo(Foo), so it calls the interface method compareTo(Object). Consequently, Foo must also have a compareTo(Object) method, which forwards to compareTo(Foo). This method is called a bridge method.

The trouble is, what if there is a compareTo(Object) method already? Ejiro Sumi reported this problem to the TYPES forum. No doubt that this is a problem related to type erasure. In my opinion, it is not a serious problem for classes that have been designed to be generic from the start. I find it is better to signal this as an error, and that is exactly what the scala compiler does.

Java Scala
interface Comp<T>{ int comp(T x); }
class Bar { public int comp(Object x) { return 0; }}
class Foo extends Bar implements Comp<Foo> {
        public int comp(Foo x) { return 1; }
        public static void main(String[] args) {
                new Foo().comp(new Object());
        }
}
trait Comp[T] { def comp(x:T):int }
class Bar { def comp(x:Object):int = 0 }
class Foo extends Bar with Comp[Foo] { def comp(x:Foo) = 1 }
runtime error:
java.lang.ClassCastException: java.lang.Object
        at Foo.comp(odd.java:3)
        at Foo.main(odd.java:6)
compile error:
name clash between inherited members:
method comp:(java.lang.Object)scala.Int in class Bar and
method comp:(Foo)scala.Int in trait Comp
have same type after erasure: (java.lang.Object)scala.Int
class Foo extends Bar with Comp[Foo] { def comp(x:Foo) = 1 }
^

Head Types, Raw Types and instanceOf

We include this subsection only for completeness, because Scala's isInstanceOf test is ultimately translated to Java's instanceOf. Due to erasure, instanceOf (resp. isInstanceOf) tests of constructed types only compare the head type. This would be List for List<String> (resp List[String]).

In Scala, a head type is not really considered a type -- it is some artefact that occurs after typechecking. The Java designers however chose to include things like List as "raw types". This choice is so awkward and wrong that probably it was made only to go the last mile of backwards compatibility. Use of raw types is discouraged in practice, but people that cannot wrap their head around use-site variance will welcome this opportunity to "switch off" typechecking.

Pattern Matching

Finally, we should mention facilities that exist in Scala, but not in Java. Scala has match expressions, which however may not contain generic type tests.
x match {
  case List[String](_*) => // syntax error
}
In fact, just like tests of the kind of isInstanceOf[List[String]], pattern matching will erroneously match values of constructed types, because it compares only the head type List:
(List(1,2,3):List[Any]) match {
  case _:List[String]=> Console.println("oops") // reached :-/
}

The above-mentioned proposal to introduce wildcard types also outlaws type tests of constructed types in casts and isInstanceOf tests. A pending question related to GADTs is whether in patterns, constructed types should be replaced with constructed types that use wildcards List[_], or whether new type variables should be introduced which are linear, scoped over the pattern body and cannot be used for much else than polymorphic recursion. This depends on the mechanism used to implement GADTs.

Scala's Compound Types

Since Scala's type system is more expressive, type erasure affects Scala also in programs that cannot be expressed in Java. Compound types have been suggested for Java, too [11].

Compound types express combinations of several types (e.g. when defining a variable or in a pattern match). The type T1 with ... with Tn { R } talks about a type made of component types Ti and refinement R. That is, a value of a type conforms to each Ti. Combining several traits is possible using a form of multiple inheritance, and there are rules avoiding the known problems of diamond inheritance and accidental name clashes (but we cannot discuss everything here, eh? :-) ).

The type erasure of such a compound type is simply T1 (the first), so the others are forgotten. Compound types are not allowed as type tests in pattern matching, for a reason. The following simple program looks meaningful but demonstrates the limits of type erasure when compiling Scala code.

object main { // don't do this at home

  trait Impl

  trait SizeImpl extends Impl { def size = 42 }

  trait ColorImpl extends Impl { def color = "red" }

  type Both = SizeImpl with ColorImpl

  def info(x:Impl) = x match {
    case x:Both      => "size  "+x.size+" color "+x.color // you wish
    case x:SizeImpl  => "size  "+x.size
    case x:ColorImpl => "color "+x.color
    case _           => "n.a."
  }

  def main(args:Array[String]): Unit = {
    // make up some class that has a size
    class MyNode extends SizeImpl
    Console.println("hello " + info(new MyNode))
  }
}

Easy, right? If something has both, a size and a color, we want to give both. Update: Feature Request bug#789 was implemented and so this program works now.

The subtlety here is in performing two tests isInstanceOf[SizeImpl] && isInstanceOf[ColorImpl], one for each component of the compound type. Here, we ignore the fact that the erasure of Both = SizeImpl with ColorImpl is SizeImpl. This can be justified with the fact that dynamic type tests should work on the accurate type where possible, not the erasure. For List[String], accuracy is not possible, however for type SizeImpl with ColorImpl it is.

Conclusion

Erasure seems useful not just for backwards-compatibility, but because complete runtime-type information as promoted by dynamically typed languages comes at a cost. The design of the .NET CLR Generics tackles this cost by code specialization. The above cases should have made clear when it is erasure and when it is the language that is to be blamed for a particular shortcoming.

I claim that a lot of erasure-bashing is really due to a lack of definitions-site variance in Java's type system. While it's true that the existing use-site variance is more flexible, it seems cumbersome to use.

In case you are not a hardcore user already, I also hope that you got curious about Scala.

Sorry for the vagueness regarding the various Scala improvement proposals floating around. These are not written down in a publishable form. When such a write-up becomes available I shall include proper links here.

References

[0]
P. Canning, W. Cook, W. Hill, W. Olthoff, J.C. Mitchell (1989). F-Bounded Quantification for Object-Oriented Programming. FPLCA
[1]
M. Oderksy, P. Wadler (1997). Pizza into Java: Translating theory into practice. POPL
[2]
G. Bracha, M. Odersky, D. Stoutamire and P. Wadler (1998). Making the future safe for the past: Adding Genericity to the Java Programming Language. OOPSLA
[3]
A. Igarashi, B.C.Pierce and P. Wadler (1999). Featherweight Java: A minimal core calculus for Java and GJ. OOPSLA. journal version ToPLaS, 23(3):396-450, May 2001.
[4]
K.K. Thorup, M. Torgersen (1999). Unifying Genericity - Combining the Benefits of Virtual Types and Parameterized Classes. ECOOP
[5]
A. Kennedy and D. Syme. (2001). Design and Implementation of Generics for the .NET Common Language Runtime. PLDI
[6]
A. Igarashi and M. Viroli (2002). On Variance-Based Subtyping for Parametric Types. ECOOP. superceded by Variant parametric types: A flexible subtyping scheme for generics. ToPLaS, 28(5):795-847, September 2006
[7]
M. Odersky, V. Cremet, C. Röckl, M. Zenger (2003). A Nominal Theory of Objects with Dependent Types. ECOOP
[8]
M. Torgersen, C.P. Hansen, E. Ernst, P. von der Ahé, G. Bracha, N. Gafter (2004). Adding Wildcards to the Java programming language SAC-OOP
[9]
M. Torgersen, E. Plesner, C.P. Hansen (2005). Wild FJ. FOOL
[10]
B. Emir, A. Kennedy, C. Russo and D. Yu (2006). Variance and Generalized Constraints for C# Generics. ECOOP
[11]
Martin Büchi and Wolfgang Weck (1998). Compound Types for Java . OOPSLA

published:2006-10-16  last change:2006-11-03

changelog: