Solving Hot Potato Bug in Node.js

Contributing to open source can be fun, but its not perfect: bureaucracy, egos, async communication across time-zones, a big challenge to any project manager, expect they tend to be nonexistent in open source.

I was working in fixing a bug in Mocha, which although renowned testing framework, is starting to show its age (10+ years), so its going through major changes to keep up with the competition. And as with any major changes its normal to have some regressions along the way, I found one and reported it the mocha team was quite agile in confirming my issue, so I dug in the code myself to diagnose the root cause but turns out this bug was much bigger then initially thought.

Being a testing library it necessitates a quite a complex files loading and discovery code, turns out Mocha’s use case was complex enough to reveal a bug in Node JS itself, so this is a story how I managed the communication and reporting and escalation of the same bug to two different major JavaScript ecosystem projects.

ESM vs CJS

For you to understand the bug, you need to understand the difference between an CJS and a ESM module, and especially and how node’s adoption of ESM was… well slow. I'd love to write about the mess which is the story of JavaScript modules, but it would be way too long of a tangent, if you’re curious you can read this article about it, but all you need to know that node.js has two main ways (module systems) to allow files to import each other: import { foo } from “library” (ESM) and const { foo } = require('library') (CJS) node JS is in the process of adding functionality to better support for the “new” ESM way.

One of the main pain points of the module system migration was the fact that although both the systems are conceptually equivalent: “import exposed symbols of some other file”, ESM has extra functionality which made it challenging to make then interoperable, the big one being ESM modules can have await expressions outside of async function body, a.k.a. “top level await”. The node team has dragged its feet with barebones ESM support for years. Only after fierce competition from new runtimes like Deno and Bun node finally added core missing ESM functionality. The important bit for us is: require(ESM)

require(ESM) is the missing piece that the ecosystem needed to start migrating en-masse to ESM, it basically allows you to require() an ESM module as if it were a CJS file helping bridging the gap and ease transition (with caveats), historically if you needed to access a ESM module from CJS (which is the more common use case as libraries tend to migrate faster then applications) you were supposed to use dynamic async imports await import(module) which are very disruptive to any program flow: imagine of instead of having require calls at the top of your file you now had to await import in each of your functions? Because don’t you forget, in CJS you can only await inside an async function! In reality no one ever did that, given the popularity of Typescript, people already had transpilers setup in their projects, which all had functionality to hide node’s lack luster ESM support. Even typescript itself.

Who’s cache is it?

Enough of Node’s quirks, on to the bug: Although convenient, the require(ESM) begs an important question: Whose cache is it? CJS and ESM are historically known for having separate module caches, does that mean require can now write to ESM module cache? Will it check on its cache first? May seem like implementation details but now was a pressing question for me. Since mocha needs to support a lot of node it couldn’t rely anymore on good old ERR_REQUIRE_ESM to fallback to to import(). Mocha’s implementation basically ignored any error thrown by the first require call and trusted node JS to keep the same behaviour if you tried to import a module twice, no matter if require(ESM) or import(ESM). Turns out it was not the case, and I had found a bug to affecting node versions all the way back to LTS which goes like this:

If you tried to require(ESM) a module with had a “top level error” (an error thrown during module evaluation outside the body of a function) the first require call would throw as expected but future calls of import(ESM) would just return an empty object. That broke the mocha implementation, that now couldn’t tell the module had a top level error: the import(ESM) promise was resolved normally hiding the error, when you called mocha it just skipped the said file without registering any tests.

The bug itself it’s not super complex to define and is quite situational. But was very important to more intricate consumers of the ESM API like Mocha. Confident of my code sandbox MRE I opened my first node bug request. Just a quick shout out: tools like code sandbox shine for MREs, allows you to isolate environment differences and pin a specific node version running in a containerized environment. A blessing for creating meaningful bug requests.

The race for the fix

At this time I was confident of my bug and was hoping to be able to claim the laurels of my investigation by merging a bug fix on node’s main branch, but my C skills were not quick enough to be able to setup the project locally and understand node internals and how to fix the bug before the legendary joyeecheung (the original require(esm) implementer) came swooping in with a agile surgical fix. In the end the fix was merged within a week which is quite fast for a project size of node.

I did ended up updating the mocha’s official documentation about the bug while we wait for node js release as a consolation prize from the gracious JoshuaKGoldberg. I hope this inspired you to also post your bug reports to your favorite library and thanks for reading.

Hot potato bug: from Mocha to Node JS

ESM vs CJS

Who’s cache is it?

The race for the fix

Subscribe to my newsletter

Samuel Henrique

Samuel Henrique