Photo by Mohammad Rahmani on Unsplash
Snapshot tests in React: why they do more harm than good
Hey there!
I have been exposed to snapshot tests recently and I am still trying to find out why they are so frequently used to test React applications. To me, they seem brittle, hard to maintain, vague and prone to false negatives. In my opinion, they have no place in most React applications. Here's why:
Tests
Tests have two functions:
To fail when the behavior under test has changed
To pass when the behavior under test has not changed
If a test passes when the behavior has changed, the result is a false positive.
If a test fails when the behavior has not changed, the result is a false negative.
In my experience, snapshot tests are prone to both false negatives and positives.
Snapshot tests
Snapshot tests have advantages, otherwise, they wouldn't be so widespread. I'll briefly enumerate them so we can later compare them to the disadvantages.
Pros
Fast to write
This is the easiest one to identify. Snapshot tests are quick to write because they just need one assertion: expect(subject).toMatchSnapshot()
.
Easy to update
Broken snapshot tests are also easy to update. Most of the time, you just have to run jest --updateSnapshot
or equivalent.
Cons
This is what you're here for. From my point of view, there are several problems with snapshot tests that make them less useful than unit or integration tests using, e.g., React Testing Library.
Prone to false negatives
In my opinion, this is the biggest issue with snapshot tests. Tests frequently fail without any behavior change. There are several examples of this, as we will explore below.
Changing a class name, without changing the underlying styles
Let's look at the following Button
component:
function Button({ children }) {
return (
<button className="button">
{children}
</button>
);
}
Currently, it has a class named button
.
Let's say this is a primary button and now we need to add a secondary button.
We could do this by having a button
class that contains the common styles between the primary and secondary buttons. Also, we would need a primary
and secondary
class for styling each of the button types.
We could implement it like so:
function Button({
children,
secondary,
}) {
const typeClass = secondary
? "secondary"
: "primary";
return (
<button
className={`button ${typeClass}`}
>
{children}
</button>
);
}
Now, from the user's perspective, the primary button hasn't changed. However, if we look at the code, the button
's className
may be "button primary"
or "button secondary"
.
Obviously, for the latter, we don't have a test. But if we had a snapshot test for the former, it would now fail. Even if, from the perspective of the user of our application, it looks the same and behaves the same. This is a false negative: the test failed, but the behavior has not changed.
How would you replace a snapshot test in this case? Visual tests. If what we are asserting is the looks of our component, visual tests are, in my opinion, the best choice.
Prone to false positives
Updating an element's style without behavior change
Let’s say there is an element with a class applied to it. By default, snapshot tests do not inline the component's styles and instead create a snapshot with the class names. If you are using a different serializer, such as Emotion's snapshot serializer, this con does not apply.
If you are using Jest's default serializer, changing the CSS properties of a class does not cause snapshot tests to fail. This means that it is possible to introduce behavior-breaking changes that wouldn't be caught by snapshot tests.
Let's say we have a simple Button
component defined as follows:
function Button({ children }) {
return (
<button className="button">
{children}
</button>
);
}
.button {
color: blue;
}
Now, let's write a simple snapshot test:
it("renders correctly", () => {
const tree = renderer
.create(<Button>Click me</Button>)
.toJSON();
expect(tree).toMatchSnapshot();
});
Which will generate the following snapshot:
exports[`renders button correctly`] = `
<button
className="button"
>
Click me
</button>
`;
Now, let's say we update the button
class to be:
.button {
color: blue;
display: none;
}
The test will still pass, but the button is now invisible. This constitutes a false positive since the test passed when it shouldn't.
This snapshot test can be replaced with:
Visual tests to ensure the visual aspect of the button (e.g., padding, font size, color);
Unit or integration tests to assert the desired behavior of the button (e.g., clicking on it triggers an action).
Second-order effects
Snapshot tests are prone to false positives and false negatives. This results in several consequences, which we will explore below.
Maintenance burden
The ease of writing snapshot tests is sometimes replaced with the pain of updating them later. Snapshot tests frequently trigger false negatives due to, e.g., renaming a class, or refactoring a component.
If the component being changed is used in many places of the application, it can generate a huge amount of failing snapshots, which will need to be updated.
These updated snapshots will likely be bundled in a pull request, where it will be extremely time-consuming for a reviewer to validate -- assuming they will validate at all.
In the best-case scenario, the reviewer spends time reviewing repetitive changes, and we're wasting valuable time, as there are other testing strategies (i.e., visual, unit, integration tests) that would be faster to validate. In the worst-case scenario, the reviewer may skip verifying snapshots, possibly missing out on bugs that could have been caught.
Hinders refactors
Given that changing the DOM structure, renaming classes or altering props may cause snapshots to fail incorrectly, it can create an obstacle to smaller refactors. For example, someone less familiar with a system may not be comfortable making changes that require updating many snapshots -- out of fear of unknowingly breaking something --, which can hinder their ability to improve the codebase.
Let's say you have a Button
component in your design system that is used in hundreds of other components in your codebase. Now, your Button
component takes an href
prop, which will redirect the user to the value of href
when the button is clicked.
The problem is that this redirection is being implemented by rendering a button
with an onClick
handler:
function Button({ href, children }) {
const handleClick = () =>
window.location.assign(href);
return (
<button onClick={handleClick}>
{children}
</button>
);
}
This works when you click the button, so that's how it's implemented in the design system. As time goes by, more components use this Button
. Eventually, you realize that clicking that button while pressing command
or ctrl
does not open the href
in a new tab.
Now, you see that this button
should have been an a
tag all along. Anchor tags natively support the behavior of opening their href
in a new tab when clicked while pressing command
or ctrl
.
The problem we face now is that we will need to update hundreds of snapshots since the button
element will now become an a
tag. Even though the initial behavior has remained the same (i.e., pressing the button opens the href
), the commit will contain a swath of updated snapshots that shouldn't need updating in the first place, since the behavior hasn’t changed.
Lack of assertion clarity
Snapshot tests do not clearly state what they are asserting -- since the assertion is made with toMatchSnapshot
--, which causes a lack of clarity regarding what behavior is being tested. This unclearness can be mitigated by explicit test names or comments. However, from my experience, snapshot test descriptions are usually vague.
This assertion ambiguity may cause confusion when reviewing failed snapshots, leading to time wasted on deciphering what a test case is validating and, ultimately, resulting in the approval of snapshots without being confident in their correctness.
False sense of confidence due to high code coverage
Snapshot tests provide great code coverage with a single test since they render the whole component. For simple components, snapshot tests can provide 100% code coverage in a single assertion.
From a pure metric perspective that is great. From a confidence point of view, not that much.
Given that snapshot tests can more easily result in false positives, when compared to other types of tests, code coverage as a metric isn't as useful for quantifying the confidence in a system’s correct behavior.
For example, a test suite with 100% code coverage can provide less confidence in a system's correct behavior when compared to a suite with 90% code coverage, if the former has a high false positive rate.
Ultimately, this can be harmful as it might provide high, albeit unfounded, confidence that the system under test is behaving as expected.
The solution for this problem is replacing snapshot tests with the appropriate types of tests for the use case under test, even if code coverage is lowered.
Alternatives
What should you replace snapshot tests with? Well, it depends.
In my opinion, a good rule of thumb is:
Is the test asserting a component's appearance (e.g., color, placement, size)? If so, you should resort to visual tests.
Is the test validating a component's behavior (e.g., clicking it triggers a fetch, the disabled state prevents a click)? If so, you should use unit, integration or end-to-end tests.
Visual tests also have their disadvantages, such as being more expensive, slower, and prone to false negatives due to browser version updates, as well as the resulting images varying depending on the browser and operating system the tests are being executed on.
Despite the cons, I believe visual tests are a net positive due to being much easier to compare. Furthermore, their drawbacks can also be mitigated by best practices, e.g., reducing the screenshot area to the minimum that encloses the part being asserted.
Regarding unit, integration and end-to-end tests, they should be selected based on the usual tradeoff: unit tests are faster, while end-to-end tests provide higher confidence. Integration tests are a mix of both.
Conclusion
As I have exposed here, I believe there are net disadvantages when using snapshot tests. In my opinion, this testing strategy is better replaced with visual and unit/integration/end-to-end tests. The former is useful for style validations and the latter for behavior assurance.
I'd like to know your opinion and invite you to explain which use cases you believe snapshot testing can be useful. Feel free to write a comment below or ping me on Twitter.
Thank you for reading 🤓