Story: Try, Try, Try Again

Dale Fukami
Feb 27, 2025
3 min read

I once worked in an organization that had a dedicated QA team of about 5 members. The product was a web based app. At some point in their history they started writing Selenium tests to automate much of the work. Kudos to them for realizing they could potentially save a lot of repetitive work by automating.

I was tasked with rewriting the custom drop down widget. That’s a bit of a story in itself but there were no unit tests for the existing component. So one of my first actions was to connect with the QA team to see if they had any tests or suites of tests that exercised the dropdown in any of the areas I knew I’d be touching.

Fortunately, there was a small set that could serve as at least a minor backup during my work. Running them wasn’t simple (again, a story) but once I was given access to their tooling I could schedule some tests to run on my branch and they’d go. Prior to any changes I wanted to double check that all the tests ran successfully on the main branch. What I was greeted with was quite a surprise as half the tests failed. I connected with my QA teammate to find out what I’d done wrong. It turns out this was a common occurrence with these tests. There were a lot of reasons given as to why it may have failed in the test environment but the solution was to just retry the ones that failed. As I dug deeper I discovered that the tool they had built was set up to automatically retry failing tests up to 10 times and only then report a failure. So my first test run failures meant that the test actually had failed 10 times! The answer was still to retry them again. Eventually, after a few runs, all the tests had passed. The reality is that all the tests had passed at least once.

Now, yes, end to end tests are fickle and difficult to write in such a way that ensures consistency. I understand that having written numerous selenium tests previously. However, I couldn’t understand why the team explicitly in charge of ensuring quality could think that a 5% or lower success rate meant our product was good enough for users. What was going on here?! Continued digging into the reasoning behind all the retries I discovered that they’d found a blog post online that said along the lines of, “end to end tests sometimes fail due to network conditions, etc, so build in a retry system to prevent flaky failures from failing the entire suite”. They had been using this blog post, specifically that line in it, to justify retrying tests over and over again until they passed. What they didn’t pay attention to was the few lines later where the post said, “...so that you can then go back and fix the test to be consistent”.

It’s amazing to me that entire teams of people can take a single written line and end up so hyper focused on it that they lose sight of the big picture so badly. Not only did they end up with a test suite that required countless retries to finally achieve a “passing” state, they no doubt were spending more time clicking the retry button and waiting than they would have if they just ran the entire test script manually by hand!

Please, understand why you do things. Yes, when you’re just learning you’ll follow a few simple rules to guide you but once you have a hang of those rules then dig deeper! Why do those rules exist? What’s the purpose of each of these steps? Use some ad hoc thinking like, “what if we told our customer that our QA procedure passes 5% of the time so it should be good enough for you?”. Do this thinking at every level of your job and you’ll understand and solve problems better than 95% of people out there.

Story: Try, Try, Try Again

Recent Posts

Comments

Subscribe to Our Newsletter