Going Native 2.0, The future of WinRT

August 8, 2012 WinRT, C#, .Net, Win8 edit

In the recent years, we have seen lots of fuzz about the return of “Going native” after the managed era popularized by Java and .NET. When WinRT was revealed last year, there was some shortsighted comments to claim that “.NET is dead” and to glorify the comeback of the C++, the true and only real way to develop an application, while at the same time, JIT was being more and more introduced in the scripted world (JavaScript being one of the most prominent JIT user). While in the end, everything is going native anyway - the difference being the length of the path to go native and how much optimized it will be - the meaning of the “native” word has slightly shifted to be strongly and implicitly coupled with the word “performance”. Even being a strong advocator for managed language, the performance level is indeed below a well written C++ application, so should we just accept this fact and get back to work with C++, with things like WinRT being the backbone of the interop? To tell you the truth, I want .NET to die and this post is about why and for what.

The Managed Era

Let’s just begin by revisiting recent history of managed development that will highlight current challenges. Remember the Java slogan? “write once runs everywhere”, it was the introduction of a paradigm where a complete “safe” single language-stack based on a virtual machine associated with a large set of API would allow to easily develop an application and target any kind of platforms/OS. It was the beginning of the “managed” era. While Java has been quite successfully adopted in several development industries, it was also quite rejected by lots of developers that were aware of memory management caveats and the JIT not being as optimized as it should be (though they did some impressive improvements over the years) with also a tremendous amount of bad design choice, like the lack of native struct, unsafe access or the route to go native through JNI extremely laborious and inefficient (and even recently, that they were considering to get rid off all native types and make everything an object, what a terrible direction!).

Java failed also in the heart of his slogan: it was in fact not possible to embrace in a single unified API all the usage of each target platforms/OS, leading to things like Swing, not what can be called an optimal UI framework. Also, from the beginning, Java was only design with a single language in mind, though lots of people found JIT/bytecode as an opportunity to port scripting languages to Java JVM.

In the meantime of early Java, Microsoft that tried to enter the Java market by integrating some custom language extensions (with the end story we know) and finally came with their own managed technology, which was in several aspects better conducted and designed: from the ground bytecode, unsafe construct, native interop, lightweight but very efficient JIT + NGEN, C# rapid language evolution, C++/CLI... etc, taking multiple language interop into account from the beginning and without the burden of the Java slogan (though Silverlight on MacOS or Moonlight were a good try).

Both systems share a similar managed monolithic stack: metadata, bytecode, JIT and GC are tightly coupled. Also performance wise, it is far from being perfect: the JIT is implying a startup cost and the executing code is not as fast as it should mainly because:

The JIT is performing poor optimization compare to full C++ -O2, because it needs to be fast when generating code (also, unlike Java hotspot JVM, .NET JIT is not able to hot swap existing JIT code by a better optimized code)
Managed types, like Array access are always checking bounds (apart for simple loops where the JIT can suppress the check if the for-limit is less or equal the array’s length)
GC can pause all threads to collect objects (though new GC in 4.5 made some improvements) which can cause unexpected slow down in an application.

But even with this performance deficiency, a managed eco-system with its comprehensive Framework is the king of productivity and language interop, with a descent overall performance for all languages running inside it. The apogee of the managed era was probably around the launch of Windows Phone and Visual Studio 2010 (using WPF for its rendering, though WPF is also built on top of lots of native code), where managed languages were the only authorized way to develop an application. That was not the best thing that could happen, considering the long list of pending issues with .NET performance, enough to stimulate all the “native coders” to strike back, and they were absolutely in their rights.

It turns out that somewhat it signs the "decline" of .NET. I don’t know much about Microsoft organization internals, but what is commonly reported is that there is some serious competition between divisions, good or bad, but for .NET, for the past few years, Microsoft seemed to running out of gas (for example, almost no significative improvements in the JIT/NGEN, lots of pending request for performance improvements, including things like SIMD that were asked for a long time), and my guess is that the required changes could only take place in a global strategy, with deep support and commitment from all divisions.

In the mean time, Google was starting to push its NativeClient technology, allowing to run sandboxed native code from the browser. Last year, in this delirium trend of going native, Microsoft revealed that even HTML5 implemented in next IE was going native! Sic.

In "Reader Q&A: When will better JITs save managed code?" Herb Sutter, one of the "Going Native" evangelist, provides some interesting insights about what the "Going Native" philosophy is thinking about JIT, with lots of inaccurate facts, but lets just focus on the main one : Even if JIT could improve in the future, managed languages made such a choice of safety over performance, that they are intrinsically doomed to not play in the big leagues. Miguel de Icaza posted a response about it in "Can JITs be faster?" and he explained lots of relevant things about why some of Herb Sutter statements were misleading.

Then WinRT came here to somewhat smooth the lines. By taking part of the .NET philosophy (metadata and some common “managed” types like strings and arrays) and the good old COM model (for a common denominator of native interop), WinRT is trying to solve the problem of language interoperability outside the CLR world (thus without the performance penalties for C++) and to provide a more “modern” OS API. Is this the definitive answer, the one that will rule them all? So far, not really, it is on the direction of the certain convergence that could lead to great things, but it is still uncertain that it will take the right track. But what could be this “right track”?

Going native 2.0, Performance for All

Though safety rules can have a negative impact on performance, managed code is not doomed to be run by poor JIT compiler (For example, Mono is able to run C# code natively compiled through LLVM on iOS/Linux) and it would be fairly easy to extend the bytecode with more "unsafe" levels to provide fine grained performance speedup (like suppressing array bounds checking...etc.).

But the first problem that can be currently identified is the lack of a strong cross-language compiler infrastructure, this is ranging from the compiler used in IE10 Javascript JIT, to the .NET JIT and NGEN compilers or into the Visual C++ compilers (to name a few), all using different code for almost the same kind of laborious and difficult problem of generating efficient machine code. Having a single common compiler is a very important step to provide a high performance code accessible from all languages.

Felix9 on Channel9 found that Microsoft could be actually working on this problem, so that's a good news, but the problem of the "performance for all" is a small part of a bigger picture. In fact the previous mentioned "right track" is a broader integrated architecture, not only an enhanced LLVM stack, but baked by Microsoft's experience in several fields (C++ compiler, JIT, GC, metadatas... etc), a system that would expose a completely externalized and modularized “CLR” composed of:

An intermediate mid level language, entirely queriable/reflectionnable, very similar to LLVM IR or .NET bytecode, defining common datatypes (primitives, string, array... Etc). An API similar to System.Reflection.Emit should be available. Vectorized types (SIMD) should be first class types as int or double are. This IL code should not be limited to CPU target usage, but should allow GPU computing (similar to AMP) : it should be possible to express HLSL bytecode with this IL, with the benefits to leverage on a common compiler infrastructure (see following points). Typeless IL should also be possible to allow dynamic languages to be expressed more directly.
A dynamic linked library/executable, like assemblies in .NET, providing metadatas, IL code, query/reflection friendly. When developing, code should be linked against assemblies/IL code (and not against crappy C/C++ headers).
An IL to native code compiler, which could be integrated in a JIT, an offline or a cloud compiler, or a mixed combination. This compiler should provide vectorization whenever target platform support it. IL code would be compiled to native code at install/deploy time, based on the target machine architecture (at dev time, it could be done after the whole application has been compiled to IL). The compiler stages should be accessible from an API and offer extension points as much as possible (providing access to IL to IL optimization, or to provide pluggable IL to native code transform). The compiler would be responsible to perform global program optimization at deploy time (or at runtime for JIT scenarios). Optimizations options should range from fast compilation (like JIT) to aggressive (offline, or hot swap code in a JIT). A profile of the application could also be used to automatically tune localized optimizations. This compiler should support advanced JIT scenarios, like dynamic hotspot analysis and On Stack Replacement (aka OSR, allowing heavy computation code to be replaced at runtime by a better optim code), unlike current .NET JIT that only compiles a method on a 1st run. This kind of optimization are really important in dynamic scenarios where type inference is sometimes discovered later (like Javascript).
An extensible allocator/memory component, allowing concurrent allocators, where the Garbage Collector/GC would be one implementation, though a major part of applications would use it to manage most of their lifecycle objects, leaving the most performance critical objects to be managed by other allocator schemes (like reference counting scenarios used by COM/WinRT). There is no restrictions to use different allocator models in a same application (and this is already what's happening when in a .NET application we need to deal with native interop to allocate objects using OS functions).

The philosophy is very similar to a CLR stack, however it doesn't force an application to be ran by a JIT compiler (yes there is NGEN in .NET, but it was designed for startup reasons, not for high performance reasons, plus it is a black box only working on assemblies installed into the GAC) and it allows mixed memory allocation GC/non-GC scenarios.

In this system, full native interoperability between languages would then be straightforward without sacrifying performance over simplicity and vice-verca. Ideally, an OS should be built from the ground up with such a core infrastructure. This is what was (is?) probably behind a project like Redhawk (for the compiler part), or Midori (for the OS part), in such an integrated system, probably only drivers would require some kind of unsafe behaviors.

[Update 9 Aug 2012: Felix9 again found that an intermediate bytecode, more low level than MSIL .NET bytecode, called MDIL could be already in used, and that could be the intermediate bytecode mentioned just above, though looking at the related patent "INTERMEDIATE LANGUAGE SUPPORT FOR CHANGE RESILIENCE", there are some native x86 registers in the specs that don't fit well with an architecture independent bytecode. Maybe they would keep MSIL as-is and leverage on a lower level MDIL. We will see.].

So what WinRT is tackling in this big picture? Metadatas, a bit of sandboxes API and an embryo of interoperability (through common datatypes and metadatas), as we can see, not so much, a basic COM++. And as we can obviously realize, WinRT is not able to provide advanced optimizations in scenarios where we use a WinRT API: for example, we cannot have a plain structure that can expose inlinable methods. Every method calls in WinRT are virtual calls, forced to go through a vtable (and sometimes several virtual calls are needed, when for example a static method is used), so even a simple property get/set will go through a virtual call. This is clearly inefficient. It looks like WinRT is only targeting coarse level API, leaving all the fine grained level API at the mercy of performance heterogeneity, restricting common scenarios where we want to access high performance code everywhere, without going through a layer of virtual calls and non-inlinable code. Using an extended COM model is not what we can call “Building the Future”.

Productivity and Performance for C# 6.0

A language like C# would be a perfect candidate in such a modular CLR system, and could be mapped easily to the previous intermediate bytecode. Though to efficiently use such a systen, C# should be improved on several aspects:

More unsafe power where we could turn off “managed” behaviors like array access checking (kind of “super unsafe mode”, where we could possibly use CPU pre-caching instructions before accessing next array elements, kind of "advanced" stuff impossible to do with current managed arrays without using unsupported tricks)
A configurable new operator that would integrate different allocator schemes.
Vectorized types (like HLSL float4) should be added to the core types. This has been asked for a long time (with ugly patches in XNA WP to "solve" this problem).
Lightweight interop to native code (in the case we would still be calling native code from C# unlike in an integrated OS): current manage to unmanaged transition is costly when calling native methods even without any "fixed" variables. An unsafe transition should be possible without the burden of the current x86/x64 prologue/epilogue of the unmanaged transition generated by current .NET JIT.

From a general language perspective, not strictly related to performance, there are lots of small area that would be important to be addressed as well:

Generics everywhere (in constructors, in implicit conversions) with more advanced constructs (contracts on operators... etc), closer to C++ template versatility but safer and less cluttered.
Struct inheritance and finalizers (to allow lightweight code to be executed on exit of a method, without going through the cumbersome "try/finally" or "using" patterns).
Add more MetaProgramming: allow static method extensions (not only for "this"), allow class mixin (mixin the content of a class inside another, usefull for things like math functions), allow modification of class/types/methods construction at compile time (for example, methods that would be called at compile time to add method/properties to a class, very similar to eigenclass in Ruby meta-programming instead of using things like T4 template code generation), more extensively, allow DSL like syntax language extensions at several points into the C# parser (Roslyn doesn't provide currently any extension point inside the parser) so that we could express language extensions in C# as well (for example, instead of having Linq syntax hardcoded, we should be able to write it as an extension parser plugin, fully written in C#). [Edit] I have posted a discussion "Meta-Programming and parser extensibility in C# and Roslyn" about what is intended behind this meta-programming thing at the Microsoft Roslyn forum. Check it out![/Edit]
A builtin symbol or link type where we could express a link to a language object (a class, a property, a method) by using a simple construction like: symbol LinkToMyMethod = @MyClass.MyMethod; instead of using Linq expressions (like (myMethod) => MyMethod inside MyClass). This would make more robust code using INotifyPropertyChanged or simplify all property based systems like WPF (which is currently an ugly duplication of the method definition).

Bottom line, is that there is less to add to C# than there is to remove from C++ to fully leverage on such a system and to greatly improve developer’s productivity, again without burning efficiency. One could argue that C++ already offers all of this and much more, but this is exactly why C++ is so much cluttered (syntax wise) and dangerous for the vast majority of developers. It allows unsafe everywhere, while unsafe code is always localized in an application (and is always source of memory corruption, so it is much easier to fix if they are clearly identified and strictly localized in the code, same than using asm keyword in non standard C/C++). It is easier and safer to track exceptional usages in a large codebase than to have it allowed everywhere.

Next?

We can hope that Microsoft took a top-down approach, by addressing unified OS API for all languages and simple interoperability first, and that they will introduce these more advanced features in later version of their OS. But this is an ideal expectation and it will be interesting to follow if Microsoft will effectively challenge this. Even if It was recently revealed that WP8 .NET applications would benefit some Cloud compilers, so far, we don't know much about it: Is it just a repackaging of NGEN (which is again, not performance oriented, generating code very similar to current JIT) or a non public RedHawk compiler?

Microsoft has lots of gold in their backyard, with years of advanced native code compilations with their C++ compiler, JIT, GC, and all the related R&D projects they have...

So to summarize this post: .NET must die to a better integrated, performance oriented, common runtime where the managed (safety/productivity) vs native (performance) is no longer a border, and this should be a structural part of next WinRT architecture evolution.