Home > .NET, Java, Programming Languages > Java and .NET Generics

Java and .NET Generics

January 10th, 2007

Java 5.0 and .NET 2.0 both support generic types, however, the implementations of generic in these platforms are different. Java makes use of a technique called “code sharing” in which there is just one type to represent all generic versions of that type, for example: ArrayList<Integer> and ArrayList<String> are both compiled down to ArrayList as the compiler perform erasure on generic types. On the other hand, .NET makes use of “JIT code specialization” technique in which there is a separate run-time representation of each version of a generic type (i.e. List<int> and List<string> both have unique representations at run-time).

These implementations of generics in the two platforms imply a few key points:

1. Generic types in Java do not support an easy way to write value type generic version. For example you cannot declare ArrayList<int> since this cannot be “erased” to an java.util.ArrayList at all. Instead you have to declare a generic version of the value type wrapper, e.g. ArrayList<Integer>, and make use of boxing/unboxing for further manipulation.

2. Since .NET support run-time code specialization, the performance of the .NET implemenation is much better than the Java counterpart since no casting is necessary. This is particularly true regarding value type generic since no boxing/unboxing is necessary in the .NET implementation.

3. Although we can do really stupid things in Java, we cannot do the same in .NET. Consider the following piece of code, ever think you never get a ClassCastException with Java generic? Think again…

List<Integer> list1 = new ArrayList<Integer>();
list1.add(1);
list1.add(2);
List list2 = list1;
list2.add("A"); // strange enough?
for (int i : list1){
    System.out.println(i); // 1, 2, and boom!!!
}

You can even do this to have a List<String> instance which actually consists of an Integer object:

List<Integer> list1 = new ArrayList<Integer>();
list1.add(1);
List<String> list2 = (List)list1; // valid cast

However, the above seems to be caused by compiler design mistake rather than due to the “code sharing” technique. In fact, I believe the Java compiler could have been written to disallow such castings. In .NET, List<string> is, well, List<string>, not List or List<object> or List<int>, thus the .NET compiler has no choice but to disallow such castings.

Nevertheless, there are reasons that the Java language architects decide not to use code specialization like .NET, and we can easily understand them by looking at the drawbacks in .NET generic:

1. Due to the non-type-erasure approach, to enable backward compatibility, .NET framework must include new classes for generic types, instead of simply adding generic support to existing classes. But, it’s MS to write those classes, not the application developers.

2. One consequent of the above is that .NET applications written using generic classes won’t be executable in older version of the framework. In Java, this is possible thanks to type erasure (the type must exist in the old JDK though). Nevertheless, I don’t see why this should be a matter, since it won’t be executable anyway if you use any language features or classes of the new version, e.g. 2.0.

3. At run-time, there is a great deal of types created by the CLR, each for each generic version of a type, while for Java, there is just one representation for all generic versions. Well, as long as it runs fast, I don’t care much about a little bit memory inefficiency here.

In the bottom line, I believe these above drawbacks of the .NET generic implementation are minor, compared with the advantages it has over the Java counterpart and I think the Java version will eventually be evolved into some similar to .NET (or even better)

  1. March 14th, 2007 at 23:36 | #1

    Thanks a lot for your comments, Ricky. You offer a lot of insights to the topic.

    There’s no reason why .NET needs to have a separate runtime representation of each. If you made one collection each for 10,000 types, and used it once, then the code-sharing approach would be faster. That’s an extreme, of course; I don’t know at what point in a reasonable app the opposite approach would become a problem.

    First, I assume you know that a generic type representation will not be created unless it is used. Second, even if you have 10000 types, chances are that a significant amount of these (those of reference types) are enforced by the CLR to share the same JIT-compiled x86 code. (Of course, if you are talking about 10000 struct types then things will be different, but in which application would you need that then :-) .) Finally, I see this as an inefficiency of space, not of time. Given that, can you elaborate on why you think the type-erasure approach would be faster?

    Have you got hard data that shows that .NET collections are faster, or is this speculation? It would be valid for the Hotspot VM to eliminate casts if it can be certain that they would always pass. Remember that casts are only actually checks at runtime.

    Well, I think that if the two implementation approaches can be measured on the same JVM or CLR than we will have a good comparison, otherwise we may happen to see the differences not only caused by the generics. Anyway, some quick code shows the following results:

    Method
    Add and retrieve 1000, 100,000, 1,000,000, 10,000,000, and 20,000,000 elements (String or int) to non-generics and generics list. Note that in .NET, the generics equivalent of ArrayList is List<int>. The test is run on a 1.6GHz Pentium M. Download the code here, remember to increase JVM’s heap size otherwise you may receive an out of heap space error.

    JDK 5.0 Update 11

    Elements ArrayList of String ArrayList of int ArrayList<String> ArrayList<Integer>
    1000 0 ms 0 ms 0 ms 16 ms
    100000 16 ms 31 ms 0 ms 31 ms
    1000000 156 ms 157 ms 93 ms 125 ms
    10000000 1172 ms 2906 ms 1235 ms 2797 ms
    20000000 2797 ms 8906 ms 2594 ms 8312 ms

    .NET 2.0

    Elements ArrayList of String ArrayList of int List<string> List<int>
    1000 0 ms 0 ms 0 ms 0 ms
    100000 0 ms 15 ms 15 ms 0 ms
    1000000 62 ms 93 ms 78 ms 46 ms
    10000000 859 ms 2828 ms 515 ms 593 ms
    20000000 1812 ms 6828 ms 1968 ms 921 ms

    The example of really stupid code in Java relies on the developer ignoring compile warnings, i.e., it relies on the intentionally-stupid developer.

    I agree, developers should only ignore the warnings if they are totally sure what they are doing. But if the code is more complex, it is can be difficult to detect the problem even if the warnings are paid attention to. However, I admit I have never seen such stupid code in any of my projects :-) . Finally, I do not see why the compilers should not disallow really ridiculous stuffs like casting a list of int to a list of string.

    Marcos – reflection can be emulated through the use of type tokens.

    Let’s say I need to know if an instance of list is a list of String a list of Integer, can I do that with type tokens? Not possible without generics type reification.

    Buu Nguyen – Generics information exists in class files. It doesn’t exist for local variables. Everywhere else it does.

    And this is where I screwed up. You’re exactly right. I based on the “type-erasure” definition and jumped to the conclusion that the class file is not changed due to generics information – and just forgot about the custom attributes. In fact, the bytecode won’t change since generics types are completely erased by the compiler, but custom attributes are added to the class files to facilitate things such as reflection of generic methods (still reflection in Java is not comparable to .NET without having generics type reification implemented). Thanks for pointing that out, Ricky.

    Again, thanks for your comments and hope to hear more from you!

  2. March 14th, 2007 at 16:48 | #2

    Excellent post, also the reflection is a lot different because you can loop trouhgt generic types and ask for type parameters, I think that in Java that cant be done based on the type erasure

    Cheers
    Marcos

  3. March 14th, 2007 at 17:12 | #3

    Great point, Marcos! Generics information simply does not exist in Java class files. I should have listed that as another strength of the .NET generics implementation.

  4. March 14th, 2007 at 17:35 | #4

    There’s no reason why .NET needs to have a separate runtime representation of each. If you made one collection each for 10,000 types, and used it once, then the code-sharing approach would be faster. That’s an extreme, of course; I don’t know at what point in a reasonable app the opposite approach would become a problem.

    Have you got hard data that shows that .NET collections are faster, or is this speculation? It would be valid for the Hotspot VM to eliminate casts if it can be certain that they would always pass. Remember that casts are only actually checks at runtime.

    The example of really stupid code in Java relies on the developer ignoring compile warnings, i.e., it relies on the intentionally-stupid developer.

    One advantage of the erasure approach is that the stupid code is compilable, and testable, so one can generify their code in small stages, rather than having to generify everywhere before testing. Now, that wouldn’t matter so much, because the IDEs have caught up with the language sufficiently, but in the early days of Java 5, the IDEs didn’t know where they were.

    Reification is proposed for Java 7.

    Marcos – reflection can be emulated through the use of type tokens.

    Buu Nguyen – Generics information exists in class files. It doesn’t exist for local variables. Everywhere else it does.

  5. March 15th, 2007 at 18:40 | #5

    “First, I assume you know that a generic type representation will not be created unless it is used. Second, even if you have 10000 types, chances are that a significant amount of these (those of reference types) are enforced by the CLR to share the same JIT-compiled x86 code.”

    I assumed the former, and didn’t realise the latter. What is the criteria the CLR uses to decide whether to share the code or not? It is at least interesting that the CLR has the choice as to whether to share the code or not, which the JVM does not have.

    “Finally, I see this as an inefficiency of space, not of time.”

    Space inefficiency works out to be time inefficiency, quite often, because of cache misses (lack of memory locality).

    The benchmarking code is flawed because the same VM instance is used for all code. I haven’t read it properly, because of that, though I appreciate your attention to detail there.

    “Finally, I do not see why the compilers should not disallow really ridiculous stuffs like casting a list of int to a list of string.”

    Because that cast might be desirable when you are only halfway through generifying some code. It is actually a good thing, if you are working with good programmers, to allow arguably crap code, but to tell the developer how crap the code is. That’s what Bracha’s optional typing is about; worth a Google.

    “Let’s say I need to know if an instance of list is a list of String a list of Integer, can I do that with type tokens?”

    No. You’d need some type that was a List plus a type token for that. E.g.:

    class ListPlusTypeToken<E>
    {
    public final List<E> list;
    public final Class<E> type;

    public ListPlusTypeToken(List<E> list,Class<E> type)
    {
    this.list=list;
    this.type=type;
    }
    }

  6. March 16th, 2007 at 13:15 | #6

    What is the criteria the CLR uses to decide whether to share the code or not?

    If a method of generic type or a type parameterized method is already JIT-compiled for a specific reference type, then it will be reused when the code of the same method is needed for any other reference types. The same does not hold for value types since the internal representation of them are different, thus unique method code must exist for each type. For structs, those with the same internal layout will share the same x86 code.

    Space inefficiency works out to be time inefficiency, quite often, because of cache misses (lack of memory locality).

    I agree, if the space inefficiency is big enough. In the CLR case, the runtime code sharing would, hopefully, prevent excessive space inefficiency.

    The benchmarking code is flawed because the same VM instance is used for all code

    I did it on JDK 5 – update 11, which is the latest. I believe there is better way to do the benchmark but I do not quite understand why it is flawed. What do you suggest to benchmark to be done differently?

    That’s what Bracha’s optional typing is about; worth a Google.

    I’ll definitely take a look at that.

    You’d need some type that was a List plus a type token for that…

    Thanks for sharing the code. This emulation is helpful in several use cases.

    Again, thanks a lot for offering your insights into the topic, Ricky.

  7. March 16th, 2007 at 19:26 | #7

    So the code sharing works the same for all non-primitive types? I see. It seems that .NET terminology says non-value, not non-primitive. So, given that Java doesn’t support collections of primitive types, the behaviour would seem identical.

    By VM instance, I meant the actual running VM. You only get fair results if you restart your VM between each benchmark, and ideally ‘warm up’ the VM loading in the OS – as most OSs take longer to start the first instance of a process. If there’s any overlap between two tests, the JIT may kick in and skew the results. Of course, you could turn the JIT off, but then your results are skewed in a different way.

    If I don’t reply, drop me an email (ricky.clarkson@gmail.com). I only spotted this reply because my RSS reader told me that the dzone link had been promoted.

  8. March 17th, 2007 at 16:23 | #8

    So, given that Java doesn’t support collections of primitive types, the behaviour would seem identical.

    Do you mean behavior of generics of reference types in .NET and Java? If it was what you meant, how come it is identical given the fact that type information is erased by the Java compiler?

    You only get fair results if you restart your VM between each benchmark, and ideally ‘warm up’ the VM loading in the OS – as most OSs take longer to start the first instance of a process.

    It did indeed take ridiculous amount of time in the first run (I recalled it took 27 seconds to finish the 10,000,000 elements for ArrayList of Integer). However, I did run the benchmark several times and only recorded the result when it looked stable. As I rerun the benchmark again many times today, the results are just close to that. Any idea?

    If I don’t reply, drop me an email (ricky.clarkson@gmail.com)

    Thanks for your consideration, Ricky. Just a suggestion if you are not aware of it: why don’t you subscribe to this entry’s RSS (you can see it at the end of the entry) :-)

  9. March 17th, 2007 at 20:59 | #9

    Interesting.

    My main complaint about C# generics is that arithmetic doesn’t work:

    public T add(T a, T b) { return a+b; }

    will fail in C#, because operators are early-bound, and the compiler isn’t smart enough to bind them differently for different specializations. Then again, a comparable example will succeed in VB, because VB late-binds its operators.

    My main complaint about Java generics is that they break under method polymorphism, since they’re not really different types. A method with the signatures:

    DoStuff(List list)
    DoStuff(List list)

    will fail, despite the fact that you might legitimately need different implementations.

  10. March 17th, 2007 at 21:01 | #10

    Arrgh. Hopefully this will render the examples properly:

    public T add<T>(T a, T b) { return a+b; }

    DoStuff(List<String> list)
    DoStuff(Listt<Integer> list)

  11. March 21st, 2007 at 15:31 | #11

    …fail in C#, because operators are early-bound, and the compiler isn’t smart enough to bind them differently for different specializations. Then again, a comparable example will succeed in VB, because VB late-binds its operators.

    Are you sure this works in VB.NET? T is not constrained and can be any type, the compiler should not compile to assure type-safety. I cannot get the VB.NET code compiled, with the same error message as the C# version “Operator ‘+’ is not defined for types ‘T’ and ‘T’”, which I think expected.

    public T add<T>(T a, T b) { return a+b; }
    DoStuff(List<String> list)
    DoStuff(Listt<Integer> list)

    I would think the same, but when I tried it out, it turned out that if both DoStuff methods return the same type, then it would not compile, as expected, but if they return different types, then it compiled just fine.

     public static String DoStuff(List list) {
        return null;
     }
     public static int DoStuff(List list) {
        return 0;
     }

    I don’t understand why it can possibly compile, and there is no “magic” in the generated bytecode.

     public static java.lang.String DoStuff(java.util.List);
     Code:
       0:   aconst_null
       1:   areturn
    
     public static int DoStuff(java.util.List);
     Code:
       0:   iconst_0
       1:   ireturn
    

    The above does not look like valid bytecode. I’ll need to look at the spec…

  12. September 3rd, 2008 at 14:43 | #12

    Anyone interested in why methods accepting two generic types & having different return type are not considered by the compiler as a naming clash may refer to this explanation: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6182950

  13. Hoang
    September 8th, 2011 at 09:15 | #13

    Does .NET supports the self-bound generic pattern like in Java enum?

    Enum<E extends Enum>

  14. Hoang
    September 8th, 2011 at 09:28 | #14

    Sorry for typo, i meant Enum<E extends Enum>

Comments are closed.