Detecting Flaky Tests in Jenkins Pipelines

Flaky tests silently destroy CI/CD confidence. Learn how to automatically detect, track, and manage them in Jenkins.

You've seen it before. A test passes on your machine, fails in CI, then passes again on retry. Nobody touches the code. The test just... flickers.

This is a flaky test, and it's one of the most insidious problems in continuous integration.

Why flaky tests are dangerous

A single flaky test seems harmless. Retry the build, it passes, move on. But flaky tests compound:

Trust erosion. Developers start ignoring test failures because "it's probably just that flaky test." Then a real failure slips through.
Wasted time. Each flaky failure triggers an investigation. Even if it only takes 5 minutes to verify "oh, it's that test again," multiply that by 10 developers and 3 times a week — that's 2.5 hours of wasted engineering time weekly.
Retry costs. If your pipeline retries on failure, every flaky test failure doubles your CI compute cost for that build.
Merge queue delays. In teams using merge queues, a flaky test failure sends your PR back to the end of the queue.

What makes a test flaky?

Flaky tests typically fall into a few categories:

Tests that depend on specific timing — sleep(500), race conditions, or assumptions about execution order — are the most common source of flakiness. They pass on fast machines and fail on loaded CI agents.

Shared state

Tests that read or write shared state (databases, files, environment variables) without proper isolation can interfere with each other. Test A writes a record, Test B reads it — but only when they run in a specific order.

External dependencies

Tests that hit real APIs, external services, or network resources are inherently unreliable. DNS hiccups, rate limits, or service outages all cause intermittent failures.

Non-deterministic data

Tests that use random data, current timestamps, or floating-point comparisons can produce different results across runs.

How BuildButler detects flaky tests

BuildButler tracks every test execution across every build. This longitudinal data is the key to automatic flaky test detection.

Detection criteria

A test is flagged as flaky when it meets either condition:

Flip frequency: The test flips between pass and fail 5 or more times within its last 30 executions
Low pass rate: The test has a pass rate below 80% over its recent execution window

Both thresholds are configurable in Settings.

What you see in the dashboard

The Flaky Tests page shows:

Flaky test trend chart: How many flaky tests you have over time (ideally trending down)
Flaky test list: Every test flagged as flaky, with its pass rate, flip count, and last failure date
Transition filters: Filter by tests that became flaky recently, tests that were fixed, or tests that have been flaky for a long time

Test history deep dive

Click any test to see its Test History — a timeline showing every execution, duration trends, and the exact builds where it flipped. This makes root-cause analysis dramatically faster.

A strategy for managing flaky tests

Detection is step one. Here's a practical workflow for actually reducing flakiness:

1. Quantify the problem

Before you fix anything, know the scope. How many flaky tests do you have? What's the trend? BuildButler's flaky test trend chart gives you this immediately.

2. Triage by impact

Not all flaky tests are equal. Prioritize by:

Failure frequency: Tests that fail on 30% of runs are more urgent than tests that fail on 5%
Pipeline impact: A flaky test in a critical deployment pipeline is more urgent than one in a nightly job
Fix difficulty: Some flaky tests have obvious fixes (add a retry, mock the external service). Start with easy wins.

3. Tag and track

Use BuildButler's test tagging to label flaky tests by category — timing, shared-state, external-dependency. This helps you spot systemic patterns. If 80% of your flaky tests are timing-related, invest in better test infrastructure rather than fixing tests one by one.

50–70% fewer CI retries
Higher developer confidence in test results
Faster merge times (fewer false-failure delays)
Significantly lower CI compute costs

Flaky tests are a solved problem — you just need the data to find them. Get started with BuildButler and see your flaky tests in minutes.