AT
← All notes
AI & Delivery · May 2026 · 5 min read

Predicting flaky tests with Random Forest.

How I used a Random Forest classifier on test metadata to quarantine flaky tests before they ever ran, and earned developer trust back.

Nothing kills velocity faster than a test suite nobody trusts. Our automation suite had reached that breaking point. Flaky tests were everywhere. False positives, inconsistent runs, wasted CI/CD cycles, and a build pipeline that was visibly slowing down. Developers stopped trusting automation results altogether, which is the worst failure mode a QA org can land in.

The instinct in most orgs is to write more tests, or to add another gate. I went the other direction, which lines up with my broader take that quality is a systems problem, not a testing problem. I built an AI based system to predict and isolate flaky tests using a Random Forest classifier, trained on test metadata: execution time, file change frequency, historical pass and fail variance, and a handful of other signals that quietly encode whether a test is actually stable.

The pipeline is straightforward. Engineer features from test history. Label flaky tests using historical pass and fail patterns. Train the Random Forest classifier on that labeled set. Then, on every PR, flag the tests the model predicts will be flaky and route them to a quarantine tier. CI runs the stable, high signal tests on the hot path. The quarantined ones still run, just off the critical path where they cannot block a merge.

Within weeks the impact was clear. Automated tests on every PR ran in under 20 minutes. False positives dropped by roughly 80 percent. CI/CD got about 25 percent faster because we stopped paying for retries on tests that were never going to be deterministic. Most importantly, developer confidence in automation came back. Automation became a reliable validation system again, not a noise generator.

This was not just a tech win, it was a trust win. It pairs naturally with letting AI write the tests humans never would: one side generates coverage broadly, the other quietly evicts the unstable tests before they erode confidence.

Worth noting: the only reason this shipped is that I stayed hands-on. That is the whole argument behind the unpopular opinion that might save your career — leaders who lose the tools lose the ability to build things like this. The full write up, with the feature set and the CI wiring, lives on my main site.

TagsAI Driven TestingFlaky TestsMachine LearningQuality EngineeringTest AutomationDevOpsCI/CDRandom Forest

Related notes

More from INSIGHTS / 2026

Get the next one

Subscribe for new engineering stories

Engineering articles in your inbox. No spam. Leave anytime.