Fighting False Alerts in Our Playwright Test Suite: My Battle for Stability 🛡️- Part #1
For the past couple of years, Playwright has been my daily sidekick in making sure our product stays rock-solid. Every single day, our test suite runs like clockwork, acting as our safety net 🕸️ — catching regressions before they ever reach production.
Here’s how we roll:
✅ We run tests against a preview environment before merging changes into the main branch.
✅ We execute our E2E test suite daily against both DEV and STABLE environments to support our daily release process.
✅ Our tests cover functional tests (to check our product’s core features) and snapshot tests (to make sure our charts look just right 📊).
And the best part? It’s a team effort! 💪 Everyone contributes to the test suite, making it a shared responsibility across the board.
But recently… things started getting annoying 😤. I noticed an increasing number of false test failures — alerts screaming “something’s broken!” when, in reality, nothing was. Turns out, most of these were caused by Playwright bugs 🐞 or misuse of Playwright features 😅. And let me tell you, debugging flaky tests feels like trying to catch smoke with bare hands!
Luckily, I had the chance to dig deep into this problem while working on a ticket to investigate test flakiness. In this blog, I’ll walk you through the issues I faced and how I solved them to make our test suite more stable and reliable.
Let’s dive in! 🚀
Issue #1: The Mysterious 1px Issue in Snapshot Tests 🧐
Our first headache? Snapshot tests. We were already following best practices — running snapshots inside the official Playwright Docker image — and everything was working fine… until it wasn’t.
One day, our tests started failing in GitLab CI/CD, and the reason? An extra 1px difference in the snapshots! 😩 At first, it seemed like a tiny issue, but trust me, debugging this was a nightmare. We spent two full days banging our heads against the wall, trying to figure out why this pesky 1px shift was happening only in our pipeline.
After some deep diving 🔍, we found that this wasn’t just our problem. The issue had been reported twice since 2022, and it still hasn’t been fixed! 😤 You can check the reports here and here.
Turns out, the culprit was architectural differences between the environments. The snapshots were originally generated on a Mac M1 machine inside Docker, but our tests ran on a Linux machine inside Docker. Even though both were running in containers, the underlying host architecture differed, which could explain the subtle rendering mismatch.
Cracking the Code: The Real Problem 🔍
After exploring multiple solutions, I realized the bug was in the .toHaveScreenshot()
method when used directly on a locator.
For example, this would sometimes cause flakiness and introduce the 1px difference:
await expect(await page.locator('.data-provider')).toHaveScreenshot('region-chart-dev.png');
So, I thought — what if instead of capturing a screenshot of the specific locator, we capture the entire page and then clip just the needed element? 🤯 i.e, What If I used the
.toHaveScreenshot()
method but with Page not the locator.
The Fix: Bounding Box to the Rescue 🦸♂️
To clip the element correctly, we needed to:
✅ Get the exact position (x
, y
) of the element.
✅ Get its width and height.
And guess what? Playwright already has a built-in method for this which is the .boundingBox()
method:
const boundingBox = await page.locator('.data-provider').boundingBox();
With this approach, our updated code looked like this:
await expect(page).toHaveScreenshot(
`region-chart-dev.png`,
{ clip: await page.locator('.data-provider').boundingBox() }
);
And just like that… the issue was gone! 🎉
This small but game-changing fix made our snapshot tests stable again. No more flaky 1px failures! 🚀
Wrapping Up 🎬
Debugging flaky tests can be frustrating, but every problem has a solution — sometimes, you just need to dig a little deeper! 🕵️♂️ This was just one of the tricky issues we faced in our Playwright test suite, and trust me, there’s more to come.
In the next parts of this series, I’ll share more real-world problems we encountered and how we tackled them to make our tests faster, more stable, and less flaky. So stay tuned! 📢
Until next time — happy testing! 🚀💙