vibe-concurrent-test-safety
Scannednpx machina-cli add skill ash1794/vibe-engineering/concurrent-test-safety --openclawFiles (1)
SKILL.md
2.6 KB
vibe-concurrent-test-safety
Flaky tests are almost always concurrency bugs in the test, not the code.
When to Use This Skill
- Writing tests that launch goroutines or async operations
- Tests use shared mock objects accessed by multiple goroutines
- Tests that fail intermittently ("flaky")
- After adding
-raceflag and getting failures - Tests that involve daemon/server startup and shutdown
When NOT to Use This Skill
- Purely synchronous tests with no concurrency
- Tests that are already race-free (verified with
-raceflag) - Simple mock-based tests with single-threaded access
Common Concurrency Bugs in Tests
1. Direct Mock State Access
// BAD: Race condition -- mock.Requests accessed while goroutine writes
go daemon.Run(ctx)
time.Sleep(100 * time.Millisecond)
assert.Equal(t, 3, len(mock.Requests)) // RACE!
// GOOD: Thread-safe accessor
go daemon.Run(ctx)
time.Sleep(100 * time.Millisecond)
assert.Equal(t, 3, mock.RequestCount()) // Safe
2. Missing Context Cancellation Before Cleanup
// BAD: Close bus while daemon goroutine still using it
go daemon.Run(ctx)
defer bus.Close() // daemon may still be writing!
// GOOD: Cancel context first, wait, then cleanup
go daemon.Run(ctx)
defer func() {
cancel() // Signal daemon to stop
<-daemon.Done() // Wait for it
bus.Close() // Now safe
}()
3. Assertions on Timing
// BAD: Relies on timing
go startServer()
time.Sleep(50 * time.Millisecond) // May not be enough
resp := callServer()
// GOOD: Wait for readiness
go startServer()
waitForReady(server) // Poll or use channel
resp := callServer()
Audit Checklist
- No direct access to shared mock fields (use accessors)
- Context cancelled before resource cleanup
- Goroutines joined before test ends (
<-doneorwg.Wait()) - No
time.Sleepfor synchronization (use channels, waitgroups, or polling) - Assertions use
Eventuallyor polling for async results - Test passes with
-raceflag - Test passes when run 100 times (
-count=100)
Output Format
Concurrent Test Safety Audit: [Test File]
Issues Found: X
| # | Issue | Line | Fix |
|---|---|---|---|
| 1 | Direct mock access | :42 | Use mock.RequestCount() |
| 2 | Missing cancel before Close | :15 | Add cancel() before defer |
Suggested Fixes
[Code snippets for each fix]
Source
git clone https://github.com/ash1794/vibe-engineering/blob/master/skills/concurrent-test-safety/SKILL.mdView on GitHub Overview
Flaky tests are often concurrency bugs in the test, not in the code. This skill helps you spot race conditions, shared mock state issues, and cleanup ordering problems in tests that involve goroutines or async operations.
How This Skill Works
Identify common concurrency pitfalls in tests: direct access to shared mocks, missing context cancellation before cleanup, and timing-based assertions. Replace with thread-safe accessors, enforce proper shutdown sequences, and rely on synchronization primitives or readiness signals. Validate fixes by running with -race and repeating tests to surface flakiness.
When to Use It
- Writing tests that launch goroutines or async operations
- Tests use shared mock objects accessed by multiple goroutines
- Tests that fail intermittently (flaky)
- After adding -race flag and getting failures
- Tests that involve daemon/server startup and shutdown
Quick Start
- Step 1: Identify concurrency in the failing test (goroutines, async ops).
- Step 2: Replace direct mock field access with thread-safe accessors and cancel context before cleanup.
- Step 3: Run tests with -race and -count=100 to confirm stability.
Best Practices
- Avoid direct access to shared mock fields; use thread-safe accessors
- Cancel context before resource cleanup and wait for proper shutdown
- Join goroutines before test ends (use Done or wg.Wait())
- No time.Sleep for synchronization; use channels, waitgroups, or polling
- Run with -race and consider repeated runs (-count=100) to surface flakiness
Example Use Cases
- Direct mock access causes race: reading mock.Requests while a goroutine writes
- Missing cancel before Close: shutdown occurs while daemon is still using the bus
- Assertions on timing: test relies on Sleep instead of readiness signal
- Switch to thread-safe accessors and explicit synchronization to fix race
- Daemon startup/shutdown pattern that requires cancellation before cleanup
Frequently Asked Questions
Add this skill to your agents