What does 100% test coverage prove?

It proves the code executes and doesn't crash, not that it works or is well-written.

Why is human review needed with full tests?

Because tests can have weak or wrong assertions, and code readability/maintainability matters.

How should I start a review?

Use the Test Review Checklist to verify assertions, values, edge cases, and alignment with requirements.

Learn why to review implementation even with 100% test coverage

npx machina-cli add skill Intai/story-flow/learn-review-implementation --openclaw

Files (1)

SKILL.md

11.0 KB

Why Review Implementation Even with 100% Test Coverage?

Overview

This learning module helps junior developers understand why, as of today in 2026, human code review remains essential even when AI-generated code has 100% unit test coverage. Tests only verify code doesn't crash - humans must verify correctness, readability, and maintainability.

Instructions

Present the following content interactively. After each section, use AskUserQuestion to offer 3 options:

"Continue to next section"
"Show me an example"
"I have a question"

Section 1: What 100% Test Coverage Actually Proves (and Doesn't)

Explain what test coverage measures and its limitations:

What 100% test coverage proves:

Every line of code is executed during tests
The code doesn't crash during execution

That's it. Coverage doesn't verify correctness - test assertions could be inadequate, wrong, or missing entirely.

What 100% test coverage does NOT prove:

The code behaves correctly (assertions could be wrong)
The code meets functional requirements (assertions could be inadequate)
The code is easy to understand and intuitive
The code will be easy to modify later
The variable and function names are clear
The solution is appropriately simple

The key insight:

Tests answer "Does it crash?" not necessarily "Does it work?" or "Is it good code?" All three questions matter.

Section 2: Why Test Cases Need Review

Since 100% coverage only proves code doesn't crash, the test assertions themselves are critical - and they need human review.

The Problem with Trusting Tests Blindly

// This test "passes" but proves nothing
test('should process order', () => {
  const result = processOrder(mockOrder);
  expect(result).toBeDefined();  // Weak assertion - just checks it exists
});

// This test passes but has wrong expectation
test('should calculate total', () => {
  const total = calculateTotal([10, 20, 30]);
  expect(total).toBe(50);  // Wrong! Should be 60
});

// This test is missing assertions entirely
test('should update user', async () => {
  await updateUser({ id: 1, name: 'John' });
  // No assertions - test passes if it doesn't crash
});

All three tests pass. None prove correctness.

What to Check in Every Test

1. Are assertions actually verifying behavior?

expect(result).toBeDefined() - Too weak
expect(result).toBe(expectedValue) - Better

2. Are assertions checking the right values?

Review expected values manually - don't trust them
Trace through the logic yourself

3. Are critical assertions present?

Side effects verified (database, API calls)
Edge cases covered
Error conditions tested

4. Do assertions match requirements?

Cross-reference with acceptance criteria
Ensure all requirements have corresponding tests

Test Review Checklist

Each test has meaningful assertions (not just .toBeDefined())
Expected values are verified to be correct
All code paths have assertions, not just execution
Edge cases have specific assertions
Error scenarios verify the error, not just that one occurred

Section 3: Readability: Code That Passes Tests but Is Hard to Understand

Present examples of working code that has readability issues:

Example 1: Unclear Variable Names

// Passes tests but hard to understand
const d = new Date();
const t = d.getTime();
const x = t - (24 * 60 * 60 * 1000);
const r = items.filter(i => i.c > x);

// BETTER - Self-documenting
const now = new Date();
const currentTimestamp = now.getTime();
const oneDayAgo = currentTimestamp - (24 * 60 * 60 * 1000);
const recentItems = items.filter(item => item.createdAt > oneDayAgo);

Both pass the same tests. Only one is readable.

Example 2: Dense Logic

// Passes tests but requires mental parsing
return users.filter(u => u.a && u.r.includes('admin') && !u.d).map(u => ({...u, p: u.p.filter(p => p.e)}));

// BETTER - Step by step
const activeUsers = users.filter(user => user.isActive);
const adminUsers = activeUsers.filter(user => user.roles.includes('admin'));
const nonDeletedAdmins = adminUsers.filter(user => !user.isDeleted);

return nonDeletedAdmins.map(user => ({
  ...user,
  permissions: user.permissions.filter(permission => permission.enabled)
}));

Example 3: Magic Numbers and Strings

// Passes tests but meaning is unclear
if (status === 3 && retries < 5) {
  setTimeout(retry, 30000);
}

// BETTER - Named constants explain intent
const STATUS_FAILED = 3;
const MAX_RETRIES = 5;
const RETRY_DELAY_MS = 30000;

if (status === STATUS_FAILED && retries < MAX_RETRIES) {
  setTimeout(retry, RETRY_DELAY_MS);
}

Section 4: Maintainability: Code That Passes Tests but Is Hard to Change

Present examples of code that passes tests but will cause problems later:

Example 1: Over-Engineering

// Passes tests but over-engineered for a simple task
interface ConfigurationStrategy {
  getConfig(): Config;
}

class JsonConfigStrategy implements ConfigurationStrategy {
  getConfig() { return JSON.parse(fs.readFileSync('config.json')); }
}

class ConfigFactory {
  static create(type: string): ConfigurationStrategy {
    if (type === 'json') return new JsonConfigStrategy();
    throw new Error('Unknown type');
  }
}

const config = ConfigFactory.create('json').getConfig();

// BETTER - Simple solution for simple problem
const config = JSON.parse(fs.readFileSync('config.json'));

Example 2: Under-Engineering

// Passes tests but duplicates logic that should be shared
function validateUserEmail(email) {
  return email.includes('@') && email.includes('.') && email.length > 5;
}

function validateContactEmail(email) {
  return email.includes('@') && email.includes('.') && email.length > 5;
}

function validateNotificationEmail(email) {
  return email.includes('@') && email.includes('.') && email.length > 5;
}

// BETTER - Single source of truth
function validateEmail(email) {
  return email.includes('@') && email.includes('.') && email.length > 5;
}

Example 3: Hidden Complexity

// Passes tests but nesting makes flow hard to follow
function processOrder(order) {
  if (order.status === 'pending') {
    if (order.items.length > 0) {
      if (order.payment) {
        if (order.payment.verified) {
          if (order.shipping) {
            if (order.shipping.address) {
              return fulfillOrder(order);
            } else { return { error: 'No address' }; }
          } else { return { error: 'No shipping' }; }
        } else { return { error: 'Payment not verified' }; }
      } else { return { error: 'No payment' }; }
    } else { return { error: 'No items' }; }
  } else { return { error: 'Not pending' }; }
}

// BETTER - Early returns flatten the logic
function processOrder(order) {
  if (order.status !== 'pending') {
    return { error: 'Not pending' };
  }
  if (order.items.length === 0) {
    return { error: 'No items' };
  }
  if (!order.payment?.verified) {
    return { error: 'Payment not verified' };
  }
  if (!order.shipping?.address) {
    return { error: 'No shipping address' };
  }

  return fulfillOrder(order);
}

Section 5: Potential Issues in AI-Generated Code

Explain patterns that AI might produce that pass tests but need human review:

Pattern 1: Verbose but Functional

AI sometimes generates more code than necessary because it's pattern-matching from training data rather than finding the simplest solution.

// AI might generate
const result = array.map(item => item).filter(item => item !== null).reduce((acc, item) => {
  acc.push(item);
  return acc;
}, []);

// Human reviewer suggests
const result = array.filter(item => item !== null);

Pattern 2: Technically Correct Names

AI chooses names that are accurate but not intuitive to the domain:

// AI-generated (technically accurate)
const temporalBoundaryCheck = date > startDate && date < endDate;

// Human-suggested (domain-intuitive)
const isWithinCampaignPeriod = date > campaignStart && date < campaignEnd;

Pattern 3: Inconsistent Style

AI might mix patterns from different sources:

// Inconsistent: mixes callback and promise styles
getData(id, (err, result) => {
  if (err) throw err;
  processResult(result).then(console.log);
});

// Consistent: uses one style throughout
const result = await getData(id);
const processed = await processResult(result);
console.log(processed);

Section 6: What to Look For During Review

Present the review perspective:

Readability Questions

Ask yourself:

Can I understand what this code does without reading every line?
Would a new team member understand this in 6 months?
Are the names self-explanatory?
Is the code flow easy to follow?

Maintainability Questions

Ask yourself:

If requirements change, where would I need to modify this?
Is the same logic duplicated anywhere?
Are there unnecessary abstractions?
Are there missing abstractions?

Consistency Questions

Ask yourself:

Does this follow our team's coding patterns?
Does this use our existing utilities and helpers?
Is the style consistent with surrounding code?

Simplicity Questions

Ask yourself:

Is there a simpler way to achieve the same result?
Are there unnecessary intermediate steps?
Could this be achieved with fewer lines without sacrificing clarity?

Section 7: Quick Reference Checklist

Present this checklist for the developer to keep:

During implementation review, check:

Red flags to watch for:

Dense one-liners that need mental parsing
Deeply nested conditionals
Single-letter variable names (except loop counters)
Functions doing too many things
Inconsistent naming conventions

During test review, check:

Each test has meaningful assertions (not just .toBeDefined())
Expected values are manually verified to be correct
All code paths have assertions, not just execution
Edge cases have specific assertions
Error scenarios verify the error details, not just that one occurred

Remember:

100% coverage means the code ran. Review tells you both the code AND the tests are correct. You need both.

Example Inputs

Why should I review code that passes all tests?
What's wrong with 100% test coverage?
Teach me about code review
What should I look for in AI-generated code?
Why isn't test coverage enough?

Source

git clone https://github.com/Intai/story-flow/blob/main/plugins/junior-flow/skills/learn-review-implementation/SKILL.mdView on GitHub

Overview

This interactive module teaches junior developers why human code review remains essential even with 100% unit test coverage. It explains that tests verify non-crashing behavior but not correctness, readability, or long-term maintainability.

How This Skill Works

The skill presents content in sections and prompts users with AskUserQuestion after each section, offering three options to continue, view an example, or ask a question. It uses real-code examples and checklists to illustrate the limits of test coverage and the importance of readable, maintainable code.

When to Use It

After achieving full unit test coverage to verify no crashes
During code reviews to assess readability and maintainability
When test assertions look weak or misaligned with requirements
When exploring edge cases and side effects not fully covered by tests
During onboarding or junior-senior collaboration to reinforce best practices

Quick Start

Step 1: Learn what 100% coverage proves and what it doesn't
Step 2: Review test assertions for accuracy and coverage of behavior
Step 3: Improve readability by updating names and documenting intent

Best Practices

Audit test assertions for meaningfulness and correctness
Cross-check tests against acceptance criteria
Document and rename unclear variables for readability
Ensure all code paths and edge cases are asserted
Verify side effects (DB/API) are tested and observable

Example Use Cases

A test suite that passes by only checking that code runs, not outcomes
A test asserting a result is defined instead of checking the actual value
A test with a wrong expected total (e.g., 60 vs 50)
A test that makes no assertions at all
An example showing unclear variable names vs self-documenting names

Frequently Asked Questions

Add this skill to your agents