Snapshot tests in React: why they do more harm than good

Featured on Hashnode

Hey there!

I have been exposed to snapshot tests recently and I am still trying to find out why they are so frequently used to test React applications. To me, they seem brittle, hard to maintain, vague and prone to false negatives. In my opinion, they have no place in most React applications. Here's why:

Tests

Tests have two functions:

  1. To fail when the behavior under test has changed

  2. To pass when the behavior under test has not changed

If a test passes when the behavior has changed, the result is a false positive.

If a test fails when the behavior has not changed, the result is a false negative.

In my experience, snapshot tests are prone to both false negatives and positives.

Snapshot tests

Snapshot tests have advantages, otherwise, they wouldn't be so widespread. I'll briefly enumerate them so we can later compare them to the disadvantages.

Pros

Fast to write

This is the easiest one to identify. Snapshot tests are quick to write because they just need one assertion: expect(subject).toMatchSnapshot().

Easy to update

Broken snapshot tests are also easy to update. Most of the time, you just have to run jest --updateSnapshot or equivalent.

Cons

This is what you're here for. From my point of view, there are several problems with snapshot tests that make them less useful than unit or integration tests using, e.g., React Testing Library.

Prone to false negatives

In my opinion, this is the biggest issue with snapshot tests. Tests frequently fail without any behavior change. There are several examples of this, as we will explore below.

Changing a class name, without changing the underlying styles

Let's look at the following Button component:

function Button({ children }) {
  return (
    <button className="button">
      {children}
    </button>
  );
}

Currently, it has a class named button.

Let's say this is a primary button and now we need to add a secondary button.

We could do this by having a button class that contains the common styles between the primary and secondary buttons. Also, we would need a primary and secondary class for styling each of the button types.

We could implement it like so:

function Button({
  children,
  secondary,
}) {
  const typeClass = secondary
    ? "secondary"
    : "primary";

  return (
    <button
      className={`button ${typeClass}`}
    >
      {children}
    </button>
  );
}

Now, from the user's perspective, the primary button hasn't changed. However, if we look at the code, the button's className may be "button primary" or "button secondary".

Obviously, for the latter, we don't have a test. But if we had a snapshot test for the former, it would now fail. Even if, from the perspective of the user of our application, it looks the same and behaves the same. This is a false negative: the test failed, but the behavior has not changed.

How would you replace a snapshot test in this case? Visual tests. If what we are asserting is the looks of our component, visual tests are, in my opinion, the best choice.

Prone to false positives

Updating an element's style without behavior change

Let’s say there is an element with a class applied to it. By default, snapshot tests do not inline the component's styles and instead create a snapshot with the class names. If you are using a different serializer, such as Emotion's snapshot serializer, this con does not apply.

If you are using Jest's default serializer, changing the CSS properties of a class does not cause snapshot tests to fail. This means that it is possible to introduce behavior-breaking changes that wouldn't be caught by snapshot tests.

Let's say we have a simple Button component defined as follows:

function Button({ children }) {
  return (
    <button className="button">
      {children}
    </button>
  );
}
.button {
  color: blue;
}

Now, let's write a simple snapshot test:

it("renders correctly", () => {
  const tree = renderer
    .create(<Button>Click me</Button>)
    .toJSON();
  expect(tree).toMatchSnapshot();
});

Which will generate the following snapshot:

exports[`renders button correctly`] = `
<button
  className="button"
>
  Click me
</button>
`;

Now, let's say we update the button class to be:

.button {
  color: blue;
  display: none;
}

The test will still pass, but the button is now invisible. This constitutes a false positive since the test passed when it shouldn't.

This snapshot test can be replaced with:

  • Visual tests to ensure the visual aspect of the button (e.g., padding, font size, color);

  • Unit or integration tests to assert the desired behavior of the button (e.g., clicking on it triggers an action).

Second-order effects

Snapshot tests are prone to false positives and false negatives. This results in several consequences, which we will explore below.

Maintenance burden

The ease of writing snapshot tests is sometimes replaced with the pain of updating them later. Snapshot tests frequently trigger false negatives due to, e.g., renaming a class, or refactoring a component.

If the component being changed is used in many places of the application, it can generate a huge amount of failing snapshots, which will need to be updated.

These updated snapshots will likely be bundled in a pull request, where it will be extremely time-consuming for a reviewer to validate -- assuming they will validate at all.

In the best-case scenario, the reviewer spends time reviewing repetitive changes, and we're wasting valuable time, as there are other testing strategies (i.e., visual, unit, integration tests) that would be faster to validate. In the worst-case scenario, the reviewer may skip verifying snapshots, possibly missing out on bugs that could have been caught.

Hinders refactors

Given that changing the DOM structure, renaming classes or altering props may cause snapshots to fail incorrectly, it can create an obstacle to smaller refactors. For example, someone less familiar with a system may not be comfortable making changes that require updating many snapshots -- out of fear of unknowingly breaking something --, which can hinder their ability to improve the codebase.

Let's say you have a Button component in your design system that is used in hundreds of other components in your codebase. Now, your Button component takes an href prop, which will redirect the user to the value of href when the button is clicked.

The problem is that this redirection is being implemented by rendering a button with an onClick handler:

function Button({ href, children }) {
  const handleClick = () =>
    window.location.assign(href);

  return (
    <button onClick={handleClick}>
      {children}
    </button>
  );
}

This works when you click the button, so that's how it's implemented in the design system. As time goes by, more components use this Button. Eventually, you realize that clicking that button while pressing command or ctrl does not open the href in a new tab.

Now, you see that this button should have been an a tag all along. Anchor tags natively support the behavior of opening their href in a new tab when clicked while pressing command or ctrl.

The problem we face now is that we will need to update hundreds of snapshots since the button element will now become an a tag. Even though the initial behavior has remained the same (i.e., pressing the button opens the href), the commit will contain a swath of updated snapshots that shouldn't need updating in the first place, since the behavior hasn’t changed.

Lack of assertion clarity

Snapshot tests do not clearly state what they are asserting -- since the assertion is made with toMatchSnapshot--, which causes a lack of clarity regarding what behavior is being tested. This unclearness can be mitigated by explicit test names or comments. However, from my experience, snapshot test descriptions are usually vague.

This assertion ambiguity may cause confusion when reviewing failed snapshots, leading to time wasted on deciphering what a test case is validating and, ultimately, resulting in the approval of snapshots without being confident in their correctness.

False sense of confidence due to high code coverage

Snapshot tests provide great code coverage with a single test since they render the whole component. For simple components, snapshot tests can provide 100% code coverage in a single assertion.

From a pure metric perspective that is great. From a confidence point of view, not that much.

Given that snapshot tests can more easily result in false positives, when compared to other types of tests, code coverage as a metric isn't as useful for quantifying the confidence in a system’s correct behavior.

For example, a test suite with 100% code coverage can provide less confidence in a system's correct behavior when compared to a suite with 90% code coverage, if the former has a high false positive rate.

Ultimately, this can be harmful as it might provide high, albeit unfounded, confidence that the system under test is behaving as expected.

The solution for this problem is replacing snapshot tests with the appropriate types of tests for the use case under test, even if code coverage is lowered.

Alternatives

What should you replace snapshot tests with? Well, it depends.

In my opinion, a good rule of thumb is:

  1. Is the test asserting a component's appearance (e.g., color, placement, size)? If so, you should resort to visual tests.

  2. Is the test validating a component's behavior (e.g., clicking it triggers a fetch, the disabled state prevents a click)? If so, you should use unit, integration or end-to-end tests.

Visual tests also have their disadvantages, such as being more expensive, slower, and prone to false negatives due to browser version updates, as well as the resulting images varying depending on the browser and operating system the tests are being executed on.

Despite the cons, I believe visual tests are a net positive due to being much easier to compare. Furthermore, their drawbacks can also be mitigated by best practices, e.g., reducing the screenshot area to the minimum that encloses the part being asserted.

Regarding unit, integration and end-to-end tests, they should be selected based on the usual tradeoff: unit tests are faster, while end-to-end tests provide higher confidence. Integration tests are a mix of both.

Conclusion

As I have exposed here, I believe there are net disadvantages when using snapshot tests. In my opinion, this testing strategy is better replaced with visual and unit/integration/end-to-end tests. The former is useful for style validations and the latter for behavior assurance.

I'd like to know your opinion and invite you to explain which use cases you believe snapshot testing can be useful. Feel free to write a comment below or ping me on Twitter.

Thank you for reading 🤓