Comments:"Surprise! Mozilla can produce near-native performance on the Web | Ars Technica"
In a bid to make JavaScript run ever faster, Mozilla has developed asm.js. It's a limited, stripped down subset of JavaScript that the company claims will offer performance that's within a factor of two of native—good enough to use the browser for almost any application. Can JavaScript really start to rival native code performance? We've been taking a closer look.
The quest for faster JavaScript
JavaScript performance became a big deal in 2008. Prior to this, the JavaScript engines found in common Web browsers tended to be pretty slow. These were good enough for the basic scripting that the Web used at the time, but it was largely inadequate for those wanting to use the Web as a rich application platform.
In 2008, however, Google released Chrome with its V8 JavaScript engine. Around the same time, Apple brought out Safari 4 with its Nitro (née Squirrelfish Extreme) engine. These engines brought something new to the world of JavaScript: high performance achieved through just-in-time (JIT) compilation. V8 and Nitro would convert JavaScript into pieces of executable code that the CPU could run directly, improving performance by a factor of three or more.
Mozilla and Microsoft followed suit. Mozilla introduced TraceMonkey in Firefox 3.5 in 2009 and Microsoft released Chakra in 2011.
JIT compilation provided great scope for accelerating the performance of JavaScript programs, but it has its limits. The problem is JavaScript itself. The behavior of the language makes it hard to optimize. In languages such as C and C++, the behavior of a program is baked in when the program is compiled. Languages like Java and C# add a little more flexibility, but most of the time they share that same characteristic. The functions and data that make up a particular class are fixed when the program is compiled.
This isn't true of JavaScript. In JavaScript, the way an object is meant to behave can change at more or less any time. A JIT engine could produce executable code to make an object behave one way, and then that object could be modified to invalidate the executable code. This means that the executable code has to be quite conservative to guard against this kind of modification. From time to time, bugs have cropped up that cause bad code to be generated.
Browser developers are, therefore, in a frustrating position. They want scripting engines that are faster to enable the browser to be used for a wider range of applications, but their efforts to improve performance are hamstrung by JavaScript itself. The language simply isn't designed for high performance optimization.
Breaking the speed limit by changing the rules
This has all led to a number of efforts to change JavaScript itself. The first notable one is Google Dart. Google Dart is a scripting language that is aimed at the same kind of programs as JavaScript is currently used for, with syntax that is broadly familiar to JavaScript developers but without many of the traits that make JavaScript difficult to optimize.
Google's original ambition was to have Dart integrated into the browser, using a Dart-specific engine where available or translating to JavaScript when not. Google also developed Dartium, a fork of its Chromium browser (Chromium being the open-source counterpart to Chrome) that includes the Dart engine.
As a practical matter, getting both Web and browser developers to embrace an all-new language with an all-new engine is an uphill struggle. JavaScript isn't going to go away any time soon, so adding additional languages simply increases the complexity of browsers and spreads development resources thinner.
asm.js
Mozilla proposed an alternative. Rather than using an entirely new language, Mozilla defines a strict subset of JavaScript that it calls asm.js. The asm.js subset of JavaScript is very limited. It eschews, for example, JavaScript's object-oriented constructs. As a result, it also eschews many of JavaScript's hard-to-optimize dynamic capabilities.
Instead of using objects and classes, asm.js programs manipulate a large array representing "memory" in a manner not entirely dissimilar to the way C and C++ programs manipulate system memory. This does not mean that concepts such as objects and classes cannot be used. It means instead that they must be implemented and used by asm.js programs in the same way that C++ compilers implement and use them. In a C++ program, an object in memory is typically represented by the memory address of the class's v-table (a table of all the functions belonging to the object's class) followed by the storage for the object's data. So too in asm.js: the memory array would contain, in consecutive elements, the array index of the v-table and then the object data.
asm.js also contains special hints to indicate which data types are being used. In traditional JavaScript, numbers can behave more or less like integers, or more or less like floating point numbers. The behavior changes depending on the operations being performed. For example, JavaScript will let you perform bitwise operations on floating point numbers by coercing those numbers into integers first. This coercion happens automatically and implicitly, meaning that JIT compilers cannot safely assume that a number is of one type or the other. asm.js uses explicit indicators to specify whether numbers (and operations on those numbers) should use integer-like behavior or floating point-like.
This representation is much lower level than that found in traditional JavaScript programs, but it comes with an important constraint: it's nonetheless still JavaScript. The big memory array uses (relatively recently introduced) JavaScript Typed Arrays. It was originally created for WebGL, but it became available in all modern browsers, including the WebGL-less Internet Explorer 10. The number type indicators similarly use JavaScript constructs. For example, to indicate that a number is an integer, asm.js uses "bitwise or with zero" (an operation that forces JavaScript to coerce to integer-like, but which does not change the number's value).
The result is that, unlike Dart programs that need a Dart engine or explicit translation to JavaScript, asm.js programs already run in any browser. They're just JavaScript programs, albeit weird JavaScript programs that don't look like anything that a human would ever produce.
Fewer features mean better performance
Browsers that recognize and have explicit support for asm.js can, however, take advantage of this knowledge to perform better optimization. An engine that knows about asm.js also knows that asm.js programs are forbidden from using many JavaScript features. As a result, it can produce much more efficient code. Regular JavaScript JITs must have guards to detect this kind of dynamic behavior. asm.js JITs do not; asm.js forbids this kind of dynamic behavior, so the JITs do not need to handle it. This simpler model—no dynamic behavior, no memory allocation or deallocation, just a narrow set of well-defined integer and floating point operations—enables much greater optimization.
The fact that asm.js doesn't look like JavaScript any human would produce might seem like a problem. Scant few developers of native code programs use assembler, and asm.js is even more feature-deprived than most real assembly languages. Mozilla doesn't really intend for developers to write asm.js programs directly, however. Instead, the idea is that compilers use asm.js as the target, with programs themselves written in some other language.
That language is typically C or C++, and the compiler used to produce asm.js programs is another Mozilla project: Emscripten. Emscripten is a compiler based on the LLVM compiler infrastructure and the Clang C/C++ front-end. The Clang compiler reads C and C++ source code and produces an intermediate platform-independent assembler-like output called LLVM Intermediate Representation. LLVM optimizes the LLVM IR. LLVM IR is then fed into a backend code generator—the part that actually produces executable code. Traditionally, this code generator would emit x86 code. With Emscripten, it's used to produce JavaScript.
Emscripten can be used in two modes. It can produce regular JavaScript and it can produce asm.js JavaScript. In both cases, the output would not be described as human-readable. Just as with asm.js, the regular JavaScript uses the basic concept of a big array to represent "memory" with operations performed on that array. It was the success of this approach that led to the development of asm.js: asm.js is a formalized set of rules for how this style of JavaScript should be written.
So that's what asm.js is. The real question, however, is how fast does it go? We've built a number of common benchmarks using Emscripten to take a look.