Fix Fragile Tests: How Appium Caused Release Delays

The Problem: Fragile Tests Hidden by Metrics

A consumer app team with about 15 engineers had what they thought was a decent QA setup with over 200 test cases. They measured QA health by test case count, which looked great on paper.

When their QA engineer went on paternity leave in March, the CI pipeline started failing on flows that had been stable for months. The issue: a UI refresh two sprints earlier had shifted elements around, and the Appium scripts' locators were pointing at moved or renamed elements. The app looked almost identical to users, but the scripts couldn't adapt.

Three people tried to fix it, including two engineers who hadn't touched the test suite in months. It took the better part of a week, and one release went out without proper regression testing because deadlines didn't move.

The Real Cost of Maintenance

When the QA engineer returned, he revealed that 50-60% of his week was spent maintaining scripts: updating locators, fixing things that broke after UI changes, and keeping the test suite alive. Only about a third of his time was actually spent finding bugs.

The team realized they'd been measuring the wrong thing. Nobody was tracking how much time went into just keeping tests from falling apart.

The Solution: Moving Beyond Locators

The team has been rebuilding their test suite over the last couple months using a tool that doesn't rely on locators at all. Tests are written in plain English, and the tool reads the screen the way a human would. When the UI changes, it adapts.

The QA engineer reported that for the first time in two years, he came into a Monday without a list of broken scripts to fix before he could do his actual job.

The locator problem had been quietly setting a ceiling on how fast they could ship, and they didn't fully see it until it collapsed.

📖 Read the full source: r/openclaw