Interoperability in Our Terms – Mapping Web Conformance Tests to Web Features

Posted by Mike Pennisi

May 12 2026

Ever think about just how huge the web platform is? As web developers, we all naturally familiarize ourselves with the bits we use every day, but since exactly no one uses all of it all the time, very few have a grasp on everything that’s available. (And that’s okay! I see you!)

We’re reminded of this every time a project pushes us outside of our comfort zone. Tackling a new challenge often involves learning what web browsers are capable of today. And because the platform is constantly growing, that re-orientation is just as much an adventure for veteran developers as it is for newbies.

This perennial problem motivated the WebDX Community Group to create a taxonomy for the platform, organized specifically for developers: the web-features project. Bocoup teamed up with some long-time collaborators at Google and spent the fall of 2025 classifying the conformance tests in WPT and Test262 according to that taxonomy.

A close-up photograph of a card catalog with the numerous drawers spanning out beyond the frame

With over 700 web-features to consider, we knew it’d be a tricky project. We didn’t know how tricky, though.

The Task At Hand

The web platform is thoroughly documented by rigorous specifications, but have you, gentle web developer, ever tried to use them as a reference? It’s a bad time. The specs use a terminology all their own, including a whole language for describing APIs. They rarely offer usage examples. They might not even be contained in a single document (e.g. the behavior of fetch metadata is defined partly by a dedicated spec, partly by HTML, and partly by Fetch). And as if all that wasn’t confusing enough, they’re written in distinct editorial styles from five standards bodies¹.

So sure, the specs are technically all you need to learn how to make any web app, but framing matters! Those documents are way more useful for folks making browsers than for folks making websites. None of the gotchas I rattled off above are a mistake; it’s just that there’s an unavoidable difference between “here’s precisely how feature X works” and “here’s how you actually use feature X.”

That’s why the web-features project exists! It’s taking a concerted effort to catalog the entire platform in terms that are meaningful to developers. It’s a multi-faceted, multi-stakeholder venture, and conformance test classification is a recent-but-important extension. Understanding test results in terms of web-features paves the way for substantial improvements to the insights that can be presented to developers when they are charting out the platform (e.g. via webstatus.dev or the web-features explorer).

A screen shot of a table from webstatus.dev showing WPT pass/fail results for various web-features

On a practical level, this meant writing “feature mapping” files across WPT and Test262. Named WEB_FEATURES.yml, these files described which tests belonged to which web-feature². They looked a little like this:

features:
- name: cookie-enabled
  files:
  - cookie-enabled-noncookie-frame.html
- name: cookies
  files:
  - "*"
  - "!cookie-enabled-noncookie-frame.html"

Sounds easy, right?

Getting Our Hands Dirty

You might look at a feature like device-posture and think this whole effort would be a cinch. WPT has a top-level directory named device-posture/ for Pete’s sake! Just tag all the tests in there and move on to the next feature. A whopping 110 web-features are tested from top-level WPT directories with their ID. If you’re ready to dig deeper into the directory hierarchy, you’ll find 79 more.

$ find . -name WEB_FEATURES.yml | xargs grep -Eo '\- name:.*' | grep -E '/(.*)/WEB_FEATURES.yml:.* \1$' | wc -l
189

Except this is the web we’re talking about here, so of course it’s more complicated than that!

First of all, the 189 web-features we’ve optimistically classified represent less than a fifth of full corpus as of this writing. That convoluted Bash one-liner isn’t going to cut it 80% of the time.

Second: there’s nuance even among those apparently-cut-and-dry cases. Take the Shadow DOM web-feature, for instance. That feature covers a huge set of capabilities, and in fact, it’s something of a catch-all for more specific features. WPT’s shadow-dom/ directory houses tests for all of that, so the actual mapping has to accommodate the <slot> element, imperative slot assignment, and even one test for document.caretPositionFromPoint(). And then there are the sub-directories, each with their own set of considerations.

Most importantly, though, we have to consider the vast majority of web-features that aren’t tested in a directory created for them. CSS counters, for example, are tested in six different places! Identifying these requires custom tooling and fluency in the web platform (e.g. recognizing that “text indent” is spelled text-indent in CSS but textIndent in JavaScript³ and that Observables are part of the DOM—not JavaScript).

Here again, this bedlam isn’t a mistake; I’m not throwing shade at WPT. Much of the complexity of platform engineering stems from the fact that capabilities can’t be cleanly segmented. Even though we can uniquely name features, they are rarely cohesive enough to fit into a tree-like directory structure. That reality speaks to a challenge we haven’t addressed yet–one that no amount of attention to detail can address.

Cross-Cutting Tests

Platform implementers are especially interested in the interactions between features. That’s not because those interactions always represent common usages, but rather because they exercise the most nuanced (read: easy to screw up) parts of their codebase. We explored this in great detail for Test262 tests ten years ago (a duration which I reveal with the strongest reluctance imaginable), so let’s trot out one of those atrocious tests–any one will do:

function newTarget() {}
newTarget.prototype = null;

var sample = new Int8Array(8);

var ta = Reflect.construct(Int8Array, [sample], newTarget);

assert.sameValue(ta.constructor, Int8Array);
assert.sameValue(Object.getPrototypeOf(ta), Int8Array.prototype);

This is a test for Typed Arrays and new.target. Unlike tests where multiple features are tested together out of convenience⁴, there’s no amount of refactoring that can correct this. The entanglement is the test.

From a strict classification standpoint, this is barely worth considering. Just classify the test for both features. Unfortunately, anyone who wants to interpret the results of tests like this, e.g. webstatus.dev, faces a higher-order challenge. If the test fails in some browser, is that because of a bug in Typed Arrays? A bug in new.target? Both?

It’s not always easy to make this judgement call (let alone to automate it). When it comes to commenting on the work of others, we’ve always been extremely sensitive to misrepresentation. We’re downright allergic to subjective statements in conformance tests.

That’s why we’re already exploring more advanced data visualizations than the simple “pass/fail” metric you’ll find on wpt.fyi and test262.fyi. We’d like to empower viewers to understand this nuance and draw their own conclusions.

…but that’ll be a whole new blog post (ideally published here sometime within the next decade).

Complexity in All Directions

As with every single web-platform project Bocoup has taken on, this apparently-simple task belies a nuance well beyond what anyone anticipated. We’re endlessly excited to dive deep and get weird, so we’re thrilled to continue chasing this one down with Google (next step: a new exclusion syntax).

These efforts demonstrate the extraordinary collaboration behind iteratively building an open platform. We do the work because we recognize the societal benefit of such platforms; we write about it because it can’t be taken for granted. That is to say: stay tuned for more!

There’s the World Wide Web Consortium (W3C), Web Hypertext Application Technology Working Group (WHATWG), Ecma International, Internet Engineering Task Force (IETF), and Khronos Group. ↩
For more on the design considerations behind this solution, check out WPT RFC #163. Test262 uses a similar approach. ↩
May Tim Berners Lee bless you and keep you if you’re searching for meaningful uses of the background CSS property. ↩
There are actually a ton of these, so decoupling them will be a painstaking process in itself! ↩

Posted by
Mike Pennisi
on May 12th, 2026

Tagged in

Contact Us

We'd love to hear from you. Get in touch!

Email

hello@bocoup.com