Detecting Flaky Tests in Jenkins Pipelines
Flaky tests silently destroy CI/CD confidence. Learn how to automatically detect, track, and manage them in Jenkins.
You've seen it before. A test passes on your machine, fails in CI, then passes again on retry. Nobody touches the code. The test just... flickers.
This is a flaky test, and it's one of the most insidious problems in continuous integration.
Why flaky tests are dangerous
A single flaky test seems harmless. Retry the build, it passes, move on. But flaky tests compound:
- Trust erosion. Developers start ignoring test failures because "it's probably just that flaky test." Then a real failure slips through.
- Wasted time. Each flaky failure triggers an investigation. Even if it only takes 5 minutes to verify "oh, it's that test again," multiply that by 10 developers and 3 times a week — that's 2.5 hours of wasted engineering time weekly.
- Retry costs. If your pipeline retries on failure, every flaky test failure doubles your CI compute cost for that build.
- Merge queue delays. In teams using merge queues, a flaky test failure sends your PR back to the end of the queue.
What makes a test flaky?
Flaky tests typically fall into a few categories:
Timing dependencies
Tests that depend on specific timing — sleep(500), race conditions, or assumptions about execution order — are the most common source of flakiness. They pass on fast machines and fail on loaded CI agents.
Shared state
Tests that read or write shared state (databases, files, environment variables) without proper isolation can interfere with each other. Test A writes a record, Test B reads it — but only when they run in a specific order.
External dependencies
Tests that hit real APIs, external services, or network resources are inherently unreliable. DNS hiccups, rate limits, or service outages all cause intermittent failures.
Non-deterministic data
Tests that use random data, current timestamps, or floating-point comparisons can produce different results across runs.
How BuildButler detects flaky tests
BuildButler tracks every test execution across every build. This longitudinal data is the key to automatic flaky test detection.
Detection criteria
A test is flagged as flaky when it meets either condition:
- Flip frequency: The test flips between pass and fail 5 or more times within its last 30 executions
- Low pass rate: The test has a pass rate below 80% over its recent execution window
Both thresholds are configurable in Settings.
What you see in the dashboard
The Flaky Tests page shows:
- Flaky test trend chart: How many flaky tests you have over time (ideally trending down)
- Flaky test list: Every test flagged as flaky, with its pass rate, flip count, and last failure date
- Transition filters: Filter by tests that became flaky recently, tests that were fixed, or tests that have been flaky for a long time
Test history deep dive
Click any test to see its Test History — a timeline showing every execution, duration trends, and the exact builds where it flipped. This makes root-cause analysis dramatically faster.
A strategy for managing flaky tests
Detection is step one. Here's a practical workflow for actually reducing flakiness:
1. Quantify the problem
Before you fix anything, know the scope. How many flaky tests do you have? What's the trend? BuildButler's flaky test trend chart gives you this immediately.
2. Triage by impact
Not all flaky tests are equal. Prioritize by:
- Failure frequency: Tests that fail on 30% of runs are more urgent than tests that fail on 5%
- Pipeline impact: A flaky test in a critical deployment pipeline is more urgent than one in a nightly job
- Fix difficulty: Some flaky tests have obvious fixes (add a retry, mock the external service). Start with easy wins.
3. Tag and track
Use BuildButler's test tagging to label flaky tests by category — timing, shared-state, external-dependency. This helps you spot systemic patterns. If 80% of your flaky tests are timing-related, invest in better test infrastructure rather than fixing tests one by one.
4. Set a quality gate
Once you've reduced flakiness to a manageable level, prevent regression. Set up notifications to alert your team when a new test becomes flaky. Treat new flakiness like a build failure — investigate and fix immediately.
5. Monitor the trend
The goal isn't zero flaky tests (that's often impractical). The goal is a downward trend. Track your flaky test count weekly and celebrate progress.
The payoff
Teams that actively manage flaky tests report:
- 50–70% fewer CI retries
- Higher developer confidence in test results
- Faster merge times (fewer false-failure delays)
- Significantly lower CI compute costs
Flaky tests are a solved problem — you just need the data to find them. Get started with BuildButler and see your flaky tests in minutes.