BuildButler

Flaky Tests

Automatically detect and track tests that intermittently pass and fail across builds.

The Flaky Tests page identifies tests that frequently flip between pass and fail on the same branch — a common source of developer frustration and eroded pipeline confidence.

Flaky Tests — trend chart with drill-down into flaky tests by date

How flaky detection works

BuildButler analyses test results across consecutive builds and flags tests that meet either of these criteria:

  • Transition Count — tests with 5 or more status flips in the last 30 runs
  • Reliability Threshold — tests with a pass rate below 80%

Both thresholds are configurable in Settings → Flaky Tests. An info banner at the top of the page shows the current detection rules, with a Configure link to adjust them.

Flaky Tests Trend chart

A line chart showing the total number of flaky tests detected per day over the selected date range. This helps you track whether flakiness is improving or getting worse.

Click any data point on the trend line to see the specific flaky tests for that date. A table appears below showing:

ColumnDescription
Test NameName of the flaky test
ClassFully-qualified test class
JobJenkins job that ran the test
TransitionStatus change direction (e.g. FAILED → PASSED, PASSED → FAILED)
BuildBuild number (clickable link)
TimeTimestamp of the build
DurationTest execution time
ErrorError message if the test failed

Date range

The date range picker supports quick presets: 7d, 14d, 30d, 60d, 90d, or a custom From / To range.

Configuring flaky detection thresholds

Navigate to Settings → Flaky Tests to adjust:

SettingDefaultDescription
Transition Count5 flips in 30 runsHow many status flips trigger the flaky flag
Reliability Threshold80%Tests below this pass rate are flagged

Click Save to apply changes, or Reset to Defaults to restore the original thresholds.

Tips for reducing flakiness

  • Start with the most frequently-flipped tests at the top of the list
  • Use the Test History page to inspect the Duration Trend and Status Timeline for each flaky test
  • Tag flaky tests to track them and assign ownership
  • Consider quarantining consistently flaky tests while you investigate

On this page