The Challenge
Flaky tests were eroding team confidence in the test suite. False failures were causing unnecessary re-runs, blocking deployments, and wasting developer time investigating non-issues.
The CI pipeline had become unreliable, with a 30% failure rate attributed to test flakiness rather than actual product bugs. Teams started ignoring test failures, defeating the purpose of automated testing.
92%
Flakiness Reduced
50+
Flaky Tests Fixed
99.2%
Pipeline Reliability
Technical Solution
Built a comprehensive toolkit for detecting, analyzing, and eliminating flaky tests with trace-on-failure debugging.
playwright.config.ts
TypeScript
import { defineConfig } from '@playwright/test'; export default defineConfig({ // Retry flaky tests to confirm stability retries: process.env.CI ? 2 : 0, // Capture trace on first retry for debugging use: { trace: 'on-first-retry', screenshot: 'only-on-failure', video: 'retain-on-failure', }, // Reporter for flake detection reporter: [ ['html'], ['json', { outputFile: 'test-results/results.json' }], ['./reporters/flake-detector.ts'] ], });
reporters/flake-detector.ts
TypeScript
import type { Reporter, TestCase, TestResult } from '@playwright/test/reporter'; class FlakeDetector implements Reporter { private flakyTests: Map<string, number> = new Map(); onTestEnd(test: TestCase, result: TestResult) { // Track tests that passed on retry (flaky) if (result.retry > 0 && result.status === 'passed') { const count = this.flakyTests.get(test.title) || 0; this.flakyTests.set(test.title, count + 1); } } onEnd() { if (this.flakyTests.size > 0) { console.log('⚠️ Flaky tests detected:'); this.flakyTests.forEach((count, name) => { console.log(` - ${name} (${count} retries)`); }); } } }
Key Implementation Details
- Trace-on-failure workflow capturing screenshots, videos, and network logs
- Custom reporter for tracking retry patterns and identifying chronic flakes
- Historical flakiness dashboard tracking test reliability over time
- Automatic quarantine system for tests exceeding flakiness threshold
- Root cause analysis templates for common flake patterns
Tech Stack
Playwright
Test Framework
TypeScript
Custom Reporters
CI Analytics
Trend Analysis
Trace Viewer
Debug Tooling
Impact & Results
- Reduced test flakiness by 92% through systematic identification and fixes
- Improved CI pipeline reliability to 99.2% from a 70% baseline
- Saved 10+ hours per week previously spent on false failure investigations
- Restored team confidence in test results with reliable feedback
- Faster debugging with comprehensive trace artifacts on failures