The hardest test suite I ever built - a pytest case study
- Track:
- Testing, Quality Assurance, Security
- Type:
- Talk
- Level:
- advanced
- Duration:
- 30 minutes
Abstract
For years, this real-time video system had no tests. Every change produced unpredictable side effects. Accuracy drifted. Production incidents and escalations followed. The only “verification” was manual inspection and hope.
When I joined the project, this was the reality - and building a proper integration test suite became my first priority.
In this talk, I’ll share how I designed and evolved the hardest integration test suite of my career using pytest - and kept it readable.
The system processed live streams in production. It was non-deterministic. Individual detections were only 80–90% accurate. For testing, we replayed recorded scenarios to make system behavior observable and comparable across runs. But binary assertions were not enough. A single failed event did not mean the whole system was broken - but we needed a way to measure when it actually was.
Instead of writing one massive test, I built a layered architecture:
- dual parametrization - recording scope and event scope
- orchestration in fixtures - assertions in tiny, single-purpose tests
- statistics collection during execution
- end-of-run aggregation that summarizes system accuracy
The result was a suite that could detect regressions in model changes and produce reproducible evidence - HTML reports, structured dumps, and a summary statistics file.
This talk explores how far pytest can be stretched beyond unit tests - into a framework for architecting complex integration systems.