Flake Hunt Toolkit

Detecting and eliminating flaky tests

Tech Stack Playwright, TypeScript, CI Analytics
Category Test Reliability
Status Production

The Challenge

Flaky tests were eroding team confidence in the test suite. False failures were causing unnecessary re-runs, blocking deployments, and wasting developer time investigating non-issues.

The CI pipeline had become unreliable, with a 30% failure rate attributed to test flakiness rather than actual product bugs. Teams started ignoring test failures, defeating the purpose of automated testing.

92%
Flakiness Reduced
50+
Flaky Tests Fixed
99.2%
Pipeline Reliability

Technical Solution

Built a comprehensive toolkit for detecting, analyzing, and eliminating flaky tests with trace-on-failure debugging.

playwright.config.ts TypeScript
import { defineConfig } from '@playwright/test';

export default defineConfig({
  // Retry flaky tests to confirm stability
  retries: process.env.CI ? 2 : 0,

  // Capture trace on first retry for debugging
  use: {
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },

  // Reporter for flake detection
  reporter: [
    ['html'],
    ['json', { outputFile: 'test-results/results.json' }],
    ['./reporters/flake-detector.ts']
  ],
});
reporters/flake-detector.ts TypeScript
import type { Reporter, TestCase, TestResult } from '@playwright/test/reporter';

class FlakeDetector implements Reporter {
  private flakyTests: Map<string, number> = new Map();

  onTestEnd(test: TestCase, result: TestResult) {
    // Track tests that passed on retry (flaky)
    if (result.retry > 0 && result.status === 'passed') {
      const count = this.flakyTests.get(test.title) || 0;
      this.flakyTests.set(test.title, count + 1);
    }
  }

  onEnd() {
    if (this.flakyTests.size > 0) {
      console.log('⚠️ Flaky tests detected:');
      this.flakyTests.forEach((count, name) => {
        console.log(`  - ${name} (${count} retries)`);
      });
    }
  }
}

Key Implementation Details

  • Trace-on-failure workflow capturing screenshots, videos, and network logs
  • Custom reporter for tracking retry patterns and identifying chronic flakes
  • Historical flakiness dashboard tracking test reliability over time
  • Automatic quarantine system for tests exceeding flakiness threshold
  • Root cause analysis templates for common flake patterns

Tech Stack

🎭
Playwright
Test Framework
📘
TypeScript
Custom Reporters
📊
CI Analytics
Trend Analysis
🔍
Trace Viewer
Debug Tooling

Impact & Results

  • Reduced test flakiness by 92% through systematic identification and fixes
  • Improved CI pipeline reliability to 99.2% from a 70% baseline
  • Saved 10+ hours per week previously spent on false failure investigations
  • Restored team confidence in test results with reliable feedback
  • Faster debugging with comprehensive trace artifacts on failures