Preface: Why Unit test fails

We have been writing unit test for UI without actually thinking the environment they run in. We assume that it will catch issues before they surface to customers, but sadly there needs to be a lot of introspection if the test is written for the sake of writing it, or improving the applications we develop.
Sometimes, going for 100% code coverage is not the right metric. It is necessary to have the highest amount of code coverage, but it is certainly not sufficient!

Act 1 [2014 ]: Enzymes and Protractor

Testing is a necessary part of software. Especially in UI, when the bugs are user-facing. A weird or ugly looking UI can set off a very bad impression. But the environments we typically use for testing do not usually resemble the actual UI, they run in environments

We used to have things like Enzyme and Protractor. Enzyme was built by Airbnb with jQuery-like API to interface with UI components.
These sort of architecture wherein meant you needed to test for implementation.
For example, if you want to test for useState, you

import React, { useState, useEffect } from 'react';

function UserProfile({ userId }) {
  const [user, setUser] = useState(null);
  const [loading, setLoading] = useState(true);
  const [error, setError] = useState(null);

  useEffect(() => {
    const fetchUser = async () => {
      try {
        setLoading(true);
        const response = await fetch(`https://api.example.com/users/${userId}`);
        if (!response.ok) throw new Error('Failed to fetch user');
        const data = await response.json();
        setUser(data);
      } catch (err) {
        setError(err.message);
      } finally {
        setLoading(false);
      }
    };

    fetchUser();
  }, [userId]);

  if (loading) return <div className="loading-state">Loading...</div>;
  if (error) return <div className="error-state">Error: {error}</div>;

  return (
    <div className="user-profile">
      <h2>{user.name}</h2>
      <p>Email: {user.email}</p>
    </div>
  );
}

export default UserProfile;

Enzyme Implementation Testing

Enzyme tests often access component internals directly:

import { mount } from 'enzyme';
describe('UserProfile component with Enzyme', () => {
  // Mock fetch globally
  const mockSuccessResponse = { id: 1, name: 'John Doe', email: 'john@example.com' };
  const mockFetchPromise = Promise.resolve({
    ok: true,json: () => Promise.resolve(mockSuccessResponse),
  });

  beforeEach(() => {global.fetch = jest.fn().mockImplementation(() => mockFetchPromise);});

  afterEach(() => {global.fetch = undefined;});

  it('shows loading state initially', () => {
    const wrapper = mount(<UserProfile userId="1" />);

    // Implementation detail: checking for specific class and loding state
    expect(wrapper.find('.loading-state').exists()).toBe(true);
    expect(wrapper.state('loading')).toBe(true);
    // Implementation detail: wait for component to update
    await mockFetchPromise;
    wrapper.update();

    // Implementation detail: examining component internals
    expect(wrapper.state('user')).toEqual(mockSuccessResponse);
    expect(wrapper.state('loading')).toBe(false);
    expect(wrapper.find('h2').text()).toBe('John Doe');
    expect(wrapper.find('.user-profile').exists()).toBe(true);
  });


  it('updates when userId prop changes', async () => {
    const wrapper = mount(<UserProfile userId="1" />);

    // Implementation detail: checking fetch call details
    expect(global.fetch).toHaveBeenCalledWith('https://api.example.com/users/1');

    // Update prop and verify new fetch
    wrapper.setProps({ userId: '2' });
    expect(global.fetch).toHaveBeenCalledWith('https://api.example.com/users/2');
  });
});

This is testing very specific details in a very specific environment. What if we had to throttle the API.
It is possible, but bloody damn difficult and manual.

https://gist.github.com/aymanapatel/42a1775309c72cc4ae629b839bb8b6ff

We’ll compare this code and its complexity when using Testing library with Mock service worker. But if you see some snippets like jest.advanceTimersByTime(1000); and below:

      const delay = requestCount === 1 ? 500 : 1500; // First request faster, second slower
      const mockUserData = { id: requestCount, name: 'John Doe', email: 'john@example.com' };
      return new Promise(resolve => {
        setTimeout(() => {
          resolve({
            ok: true,
            json: () => Promise.resolve(mockUserData)
          });
        }, delay);

You realize how much setup code is required. This causes 2 issues:

Less inclination to handle these kind of scenarios as there is too much boilerplate.
If adding, maintaining the test cases becomes a hassle.

And the UI that breaks is less to do with the bug in an iterative run but more to do with dynamic nature of web such as slow APIs, inconsistent state when working across components, CSS overrides .etc.

Act 2 [2019]: Testing library

Testing library brought a different paradigm. It focuesed on writing integration tests with accessibility DOM tree at the forefront.
Kent C. Odds (one of maintainers of RTL) rooted for writing integration tests and less unit and mocky tests. It makes sense. You need to see the Return on Investment of writing test. A 95% coverage does not have a order-of-magnitude improvement over 90% in spite of having an order-of-magnitude in writing + maintaining test. Also bugs arise at the integration level such as

Network requests
Incorrect app state
CSS overrides
Much more

And no amount of meeting a conditional code coverage criteria or hand picking state variables is going to improve your “test coverage”.

Looking at the throttling example, with RTL it becomes much simpler.

The following uses Mock Service Worker which is a handy tool to mock API request, response, response times and works with Testing Library, Storybook, Playwright etc.

https://gist.github.com/aymanapatel/e3c6aae84371480e4984c75f594ce9fb

RTL: 115 lines
Enzyme: 153 Lines

But if you look at the individual test case it is almost double in the case of Enzyme.

Also RTL uses MSW that handles most of real-world simulation of APIs.

Simplicity and predictability in test cases makes the test suite for robust than following a made up metric (Please read Goodhart’s law)

Snapshot testing

In the spirit of not chasing made up metrics without a strong case for “100%” coverage, we now come to a particular type of testing that is very much needed in UI.

Snapshot Unit tests

All the code can be found here

Example 1: Vitest JSDOM mode

Simple working example

Here we have a Button whose styles are determined inside the component. A property primary is passed that does calculation on what styles need to be rendered.

// Button.jsx
import React from 'react';

const Button = ({ text, primary = false }: {text: string, primary: boolean}) => {
  // The visual regression bug: 
  // When primary is true, it should have a blue background,
  // but the color is incorrectly set to a very similar shade that looks almost identical
  // in the snapshot serialization but is visually different
  const style = {
    // other styles
    backgroundColor: primary ? '#0070f3' : '#f0f0f0', // Correct color for primary
    color: primary ? 'white' : 'black',
  };
  const overideStyle = {
    ...style,
    backgroundColor: primary ? '#0071f4' : '#f0f0f0', // Slightly different blue that passes in snapshots
    padding: primary ? '8px 16px' : '10px 20px', 
  };

  return (
    <button 
      style={overideStyle}
      className={`button ${primary ? 'primary' : 'secondary'} parent-override`}
    >
      {text}
    </button>
  );
};

export default Button;

This test passes:

Now if we change the color in the button component; then we get a test fail.

Note that it is a trivial example, but in real world these could be coming from CSS variables that can cause this bug to be caught. Note that JSDOM does not work with inherited styles which will be covered later.

Simple non-working example

Consider the example of CSS overrides though inteheritance. These are common in real systems as we highlight further.
This is our initial code:

With the CSS:

.primary {
    background-color: green;
    border-radius: 4px;
    padding: 8px 16px;
    color: white;
}

And the resulting screen:

Now let me add a bad CSS !important. You might say that this is bad practice, but in real life, there is always someone using this to make their colors pop. Could be your design system, your cookie banner, or an old CSS file overriding this. So it is a common occurrence in real world. So we need to have a test that checks this.

.primary {
    background-color: green;
    border-radius: 4px;
    padding: 8px 16px;
    color: white;
}


button {
    background-color: red !important;
}

This is the resultant screenshot:

You would expect the test to fail, but behold it still pass.

What is worse than a failing test, is a test that passes in spite of having a bug. Why did this happen:

The *.snap file compares the DOM of that particular element. The file is the same, hence it passes.

JSDOM cannot do compute styles from one DOM element to its child.

Now, do you trust your tests?! Now do you think 100% code coverage is enough!

Case against JSDOM

JSDOM is just a poor HTML implementation

In the we can see that JSDOM relies on HTML. This works well for simple snapshots, but when there are dynamic sites such as animations, CSS calculations, responsive sites etc; then the snapshots start to fall apart.

JSDOM has a list of CSS features that it does not natively support:

Media Queries: This means you can test things that are dependent on the responsive designs.

CSS styles can not inherited (styles coming from parent DOM element) as getComputedStyle is not inherited in Jest. Look at this Github issue

  const dom = new JSDOM(`
  <style>
    body {
      color: Red;
    }
  </style>hello world
  <div class="divgroup" id="d1" 
       style="padding-right: 5px;">
       Content for div 1
  </div>
  `);
  // Found
  dom.window.getComputedStyle(
  dom.window.document.querySelector('#d1'))

  // Cannot be found as the `body` style is not inhertiable
  dom.window.getComputedStyle(
  dom.window.document.querySelector('#d1')).getPropertyValue('color'))

Does not support injection stylesheets inside JSDOM environment. Many runtime CSS-in-JS libraries such as Styled components hence do not work nicely with the JSDOM environment. For more details, read this Github issue

JSDOM requires fetch overrides (monkey-patching)
1. You have to use libraries like jest-fetch-mock or nock. Well they do get the job done, but it is a monkeypatch or a bandaid. It cannot provide easy and fine grained network simulation like slow requests or internet disconnection.

Example 2: Playwright Snapshot tests

In order to combat the issues we typically see in non-browser environments such as JSDOM, it is better to use an environment that is as close to user as possible; which in case would be the actual browser.

Playwright has become the de-facto tool for E2E test. Unlike other tools such as Selenium and Cypress which is has many limitations due to the architecture underneath using Selenium Webdriver and Cypress browser respectively; Playwright is becoming more mainstream due to its usage Chrome dev tools.

For Selenium, since it uses Webdriver which has a lot of translation/hops between the test and execution

# 
Tests →  JSON Wire(legacy)/W3(New) protocol → Browser Driver → Actual Browser
                                                    ^               ^
                                                    |---------------|

This leads to heavier test runs.

For Cypress, it uses a Proxy that runs in an actual browser and is communicating via Websocket. It has amzing hot-reloading as it runs in actual browser with Websocket connection; but due to it has a couple of issues:

Since browser is heavy, it runs very slow when the number of test cases starts to increase.
Cypress was not extensible to Safari for the longest time. It started supporting Safari Webkit in 2022 by leveraging Playwright’ Webkit architecture.

Therefore Playwright is the best choice to use Snapshot for its versatility of supporting all browsers and providing mechanism to control network, can help in predictable and useful snapshot tests.

We can look into its doc on how to use Visual comparisons in Playwright.

import { test, expect } from '@playwright/test';

test('example test', async ({ page }) => {
  await page.goto('https://playwright.dev');
  await expect(page).toHaveScreenshot({ maxDiffPixels: 100 });
});

It is as simple as going to path and using the method toHaveScreenshot .

It provides 2 options:

maxDiffPixels: Allows for small variations to exist. It is so that the tests do not become too flaky.
stylePath : This allows for styles to be inserted to reduce the dynamic nature of screenshot. Could be a Layout style or differing scrollbar behaviors across browsers/OSes.

Even Vitest (which is a Jest API compatible testing framework by the Vite ecosystem) has an experimental browser mode (leveraging Playwright) that allows to write tests that run on an actual browser instead of environments such as JSDOM, HappyDOM.

It allows to use taking screenshots, but comparing and asserting it for new code is still not possible. See this blog by Maya to read more on this topic.

Example 3: Storybook

The latest version of Storybook (Version 9) has brought in different types of testing in the forefront while building components:
1. Interaction testing
2. Accessibility testing
3. Visual testing

Visual testing is the one we are most interested here. Since styles can be incorporated from an external source CSS, there is always a chance for it to get missed in unit jest-based test. Storybook along with Chromatic provides a cleaner way to manage this.
It allows the concept of Accepting or Denying the change. You might want to accept it if it was a design system change, but you can also deny if it not intended and in fact a regression change.

Case against Snapshot tests

There are still some issues with using snapshot tests:

Since snapshot tests scales to have a lot of images, it can lead to noise.
Sometimes there is also a psychological aspect to the validation of snapshot tests that make it less useful. A developer might have the scope of a particular component, but the failure might seem to be a false positive. So the developer can just update the snapshot and let it slide.

Having tools like Chromatic, Percy and Applitools help alleviate this issue. They make it less overwhelming by providing an dashboard as well as review process to make sure the UI/UX are as intended.

Snapshot testing

Table of contents

Preface: Why Unit test fails

Act 1 [2014 ]: Enzymes and Protractor

Enzyme Implementation Testing

Act 2 [2019]: Testing library

Snapshot testing

Snapshot Unit tests

Example 1: Vitest JSDOM mode

Simple working example

Simple non-working example

Case against JSDOM

Example 2: Playwright Snapshot tests

Example 3: Storybook

Case against Snapshot tests

Subscribe to my newsletter

Ayman Patel

Ayman Patel