Test262 is a conformance test suite for ECMAScript, the programming language on which JavaScript is based. Containing 36,103 individual tests at the time of this writing, Test262 ensures that different implementations of the language, such as the JavaScript engines in web browsers or stand-alone runtimes like Node.js or Moddable XS, agree on the minutiae of every aspect of JavaScript behavior. But Test262 doesn’t just reflect the latest formal version of the ECMAScript standard, it also plays a key role in shaping the design and evolution of features in progress.

In 2018, our work writing and maintaining Test262 tests focused on supporting in-progress JavaScript features by writing tests for them earlier in the standardization process. In doing so, we helped uncover design issues in spec text and made it easier for vendors to begin implementations–ultimately shortening the waiting time before developers could start using new features in real code.

ECMAScript is standardized by TC39, a group of JavaScript developers, implementers, and programming language academics. New features (such as dynamic imports, class fields, or BigInt support for larger integers) follow an organized progression through 4 stages of maturity as the committee formalizes syntax, resolves compatibility issues, proves real-world use cases, assesses community support, and incorporates feedback from implementers.

Writing more tests, earlier

Stage 3 is an important milestone, in particular, because it is the last stage before acceptance into the standard. To advance, features must be implemented in two or more engines and pass Test262 acceptance tests. In practice, however, coordinating and testing multiple implementations presents a difficult chicken-and-egg situation. Individual browser engines, for example, are often hesitant to take the first step in implementing a new feature without tacit understanding that other browser engines will follow suit. Meanwhile, the implementation process itself often uncovers issues which will necessarily change the design of the feature, making successful implementation a moving target.

This coordination complexity can and does impact the time it takes for features to reach Stage 4 acceptance (although there are plenty of other factors that impact standardization time as well). To help smooth the process, we worked to add Test262 tests for all Stage 3 features in 2018, regardless of whether or not vendors had begun to implement those features.

In 12,278 new tests, we authored and reviewed coverage for features including BigInt, class fields & methods (public and private), dynamic import, a variety of Internationalization APIs, and expanded coverage for SharedArrayBuffer and Atomics used with BigInt TypedArrays. These Stage 3 Test262 tests gave implementers a head start by turning spec text into conformance test code, allowing runtimes (and in turn web browsers) to ship the features more quickly and with better interoperability.

Evolving SharedArrayBuffers

To better understand Test262’s role in making spec features an implementation reality, we can consider the process of shipping SharedArrayBuffers and the various APIs which control safe memory access of shared memory in multithreaded code. Although a proposal for shared memory in JavaScript existed as early as 2015, SharedArrayBuffers were not officially ratified as part of the ECMAScript spec until ES2017 in early 2017, alongside the Atomics API (which included Atomics.wait and Atomics.wake). By late 2017, all four browser engines had shipped support for SharedArrayBuffers and the Atomics methods.

In January 2018, the disclosure of the Spectre and Meltdown security vulnerabilities upended traditional assumptions about the ability of untrusted code (like JavaScript or WebAssembly) to access arbitrary virtual memory across security boundaries in the operating system. In particular, the vulnerabilities proved the real-world potential of using SharedArrayBuffers to construct high-resolution timers to monitor cache accesses and gain access to memory in a renderer process. In response to the vulnerability, browsers unshipped SharedArrayBuffers and Atomics until other mitigations like Site Isolation could be implemented to prevent cross-origin information leaks.

This nearly unprecedented move provided a unique opportunity to revisit the design of the Atomics API and rectify a naming usability issue. At the May 2018 TC39 meeting, attendees agreed that Atomics.wake should be renamed to Atomics.notify to prevent confusion with Atomics.wait. Although such a change would normally be out of the question given the risk of breaking existing websites, the forced removal of SharedArrayBuffers in the Spectre aftermath gave the committee a narrow window to rectify the API name.

Bocoup led the charge to move all four browser engines to the new API name before the feature was shipped again by implementing the name change in Test262. In this case, the test suite provided the extra impetus to coordinate a speedy patch in all existing implementations. Failing tests can be an efficient way to convey a high priority fix to multiple vendors and ensure coordination within a narrow window of time.

Test262 played another role in ensuring that SharedArrayBuffers evolved functionality efficiently. When the BigInt proposal reached Stage 3, TC39 agreed that the Atomics and SharedArrayBuffers APIs would be extended to cover the semantics of sharing memory with BigInt64Arrays. Normally, adding this kind of functionality to an existing API would require vendors to update or write new unit tests to provide coverage for the new feature. However, because we updated Test262 to support preemptible BigInt64Arrays at the beginning of Stage 3, by the time vendors began to update their implementations they were able to do so without writing local test material. The Test262 tests could be imported and used directly to ensure that the semantics of a relatively complex API were correctly updated.

Improving Overall Interoperability

Regardless of market share, each JavaScript implementation shares an equal portion of the total compatibility story. Spec conformance in a single runtime is not sufficient for a healthy, interoperable JavaScript ecosystem.

Given this shared responsibility, how do we quantify the interoperability of JavaScript features (and in turn, predictability for developers)? One way is to compare Test262 results for different features across different engines. If our metric represents the degree to which a developer can expect the same JavaScript code to return the same results in multiple implementations, the ideal metric only takes into account the similarity of Test262 results, rather than a comparison of absolute pass/fail rates.

Therefore, we measured Test262 interoperability by considering the sum of the differences in test pass rate per test, across JavaScript engines. A score of 100% represents perfect agreement among all engines on all tests for a feature—each engine is responsible for an equal percent of the whole. Today, Stage 3 tests have an average score of 33%, or 51% when normalized by feature, since some features have many more tests than others. Meanwhile, all ECMAScript tests together have an average interoperability score of 81%. You can always check new feature interoperability on the dedicated page on test262.report.

Clearly, JavaScript developers should be optimistic about their code running similarly in different environments, runtimes, and web browsers! As more features are added to the language, the likelihood that they are correct and interoperable out-of-the-gate has also increased, a big improvement over the days when new features were often avoided by developers while they waited for vendors to gradually fix interoperability bugs.

The future of Test262 in the TC39 process

Going forward, Bocoup plans to continue implementing Test262 tests as soon as possible for in-development Stage 2 and ready-to-implement Stage 3 features. We would even be interested in making this milestone distinct from implementation feedback in the TC39 process. The earlier Test262 tests are authored for a new JavaScript feature, the sooner designers and implementers can be certain that there is a ground-truth, in code, for the behavior of the feature.

In a more abstract sense, Test262 maintainers play a role in an informal system of checks and balances, joining implementers and spec writers as a key player in shaping the evolution of JavaScript. Should implementation progress stall or spec writing miss a corner case, the authoring of a conformance test can help push a feature into existence.

For this reason, we are excited to continue improving the quality and completeness of the Test262 suite, as well as to find new ways for Test262 to help implementers, spec writers, developers, and the broader JavaScript community benefit from predictable JavaScript implementations.