Get the FREE Ultimate OpenClaw Setup Guide →

Learn why to review implementation even with 100% test coverage

npx machina-cli add skill Intai/story-flow/learn-review-implementation --openclaw
Files (1)
SKILL.md
11.0 KB

Why Review Implementation Even with 100% Test Coverage?

Overview

This learning module helps junior developers understand why, as of today in 2026, human code review remains essential even when AI-generated code has 100% unit test coverage. Tests only verify code doesn't crash - humans must verify correctness, readability, and maintainability.

Instructions

Present the following content interactively. After each section, use AskUserQuestion to offer 3 options:

  • "Continue to next section"
  • "Show me an example"
  • "I have a question"

Section 1: What 100% Test Coverage Actually Proves (and Doesn't)

Explain what test coverage measures and its limitations:

What 100% test coverage proves:

  • Every line of code is executed during tests
  • The code doesn't crash during execution

That's it. Coverage doesn't verify correctness - test assertions could be inadequate, wrong, or missing entirely.

What 100% test coverage does NOT prove:

  • The code behaves correctly (assertions could be wrong)
  • The code meets functional requirements (assertions could be inadequate)
  • The code is easy to understand and intuitive
  • The code will be easy to modify later
  • The variable and function names are clear
  • The solution is appropriately simple

The key insight:

Tests answer "Does it crash?" not necessarily "Does it work?" or "Is it good code?" All three questions matter.


Section 2: Why Test Cases Need Review

Since 100% coverage only proves code doesn't crash, the test assertions themselves are critical - and they need human review.

The Problem with Trusting Tests Blindly

// This test "passes" but proves nothing
test('should process order', () => {
  const result = processOrder(mockOrder);
  expect(result).toBeDefined();  // Weak assertion - just checks it exists
});

// This test passes but has wrong expectation
test('should calculate total', () => {
  const total = calculateTotal([10, 20, 30]);
  expect(total).toBe(50);  // Wrong! Should be 60
});

// This test is missing assertions entirely
test('should update user', async () => {
  await updateUser({ id: 1, name: 'John' });
  // No assertions - test passes if it doesn't crash
});

All three tests pass. None prove correctness.

What to Check in Every Test

1. Are assertions actually verifying behavior?

  • expect(result).toBeDefined() - Too weak
  • expect(result).toBe(expectedValue) - Better

2. Are assertions checking the right values?

  • Review expected values manually - don't trust them
  • Trace through the logic yourself

3. Are critical assertions present?

  • Side effects verified (database, API calls)
  • Edge cases covered
  • Error conditions tested

4. Do assertions match requirements?

  • Cross-reference with acceptance criteria
  • Ensure all requirements have corresponding tests

Test Review Checklist

  • Each test has meaningful assertions (not just .toBeDefined())
  • Expected values are verified to be correct
  • All code paths have assertions, not just execution
  • Edge cases have specific assertions
  • Error scenarios verify the error, not just that one occurred

Section 3: Readability: Code That Passes Tests but Is Hard to Understand

Present examples of working code that has readability issues:

Example 1: Unclear Variable Names

// Passes tests but hard to understand
const d = new Date();
const t = d.getTime();
const x = t - (24 * 60 * 60 * 1000);
const r = items.filter(i => i.c > x);

// BETTER - Self-documenting
const now = new Date();
const currentTimestamp = now.getTime();
const oneDayAgo = currentTimestamp - (24 * 60 * 60 * 1000);
const recentItems = items.filter(item => item.createdAt > oneDayAgo);

Both pass the same tests. Only one is readable.

Example 2: Dense Logic

// Passes tests but requires mental parsing
return users.filter(u => u.a && u.r.includes('admin') && !u.d).map(u => ({...u, p: u.p.filter(p => p.e)}));

// BETTER - Step by step
const activeUsers = users.filter(user => user.isActive);
const adminUsers = activeUsers.filter(user => user.roles.includes('admin'));
const nonDeletedAdmins = adminUsers.filter(user => !user.isDeleted);

return nonDeletedAdmins.map(user => ({
  ...user,
  permissions: user.permissions.filter(permission => permission.enabled)
}));

Example 3: Magic Numbers and Strings

// Passes tests but meaning is unclear
if (status === 3 && retries < 5) {
  setTimeout(retry, 30000);
}

// BETTER - Named constants explain intent
const STATUS_FAILED = 3;
const MAX_RETRIES = 5;
const RETRY_DELAY_MS = 30000;

if (status === STATUS_FAILED && retries < MAX_RETRIES) {
  setTimeout(retry, RETRY_DELAY_MS);
}

Section 4: Maintainability: Code That Passes Tests but Is Hard to Change

Present examples of code that passes tests but will cause problems later:

Example 1: Over-Engineering

// Passes tests but over-engineered for a simple task
interface ConfigurationStrategy {
  getConfig(): Config;
}

class JsonConfigStrategy implements ConfigurationStrategy {
  getConfig() { return JSON.parse(fs.readFileSync('config.json')); }
}

class ConfigFactory {
  static create(type: string): ConfigurationStrategy {
    if (type === 'json') return new JsonConfigStrategy();
    throw new Error('Unknown type');
  }
}

const config = ConfigFactory.create('json').getConfig();

// BETTER - Simple solution for simple problem
const config = JSON.parse(fs.readFileSync('config.json'));

Example 2: Under-Engineering

// Passes tests but duplicates logic that should be shared
function validateUserEmail(email) {
  return email.includes('@') && email.includes('.') && email.length > 5;
}

function validateContactEmail(email) {
  return email.includes('@') && email.includes('.') && email.length > 5;
}

function validateNotificationEmail(email) {
  return email.includes('@') && email.includes('.') && email.length > 5;
}

// BETTER - Single source of truth
function validateEmail(email) {
  return email.includes('@') && email.includes('.') && email.length > 5;
}

Example 3: Hidden Complexity

// Passes tests but nesting makes flow hard to follow
function processOrder(order) {
  if (order.status === 'pending') {
    if (order.items.length > 0) {
      if (order.payment) {
        if (order.payment.verified) {
          if (order.shipping) {
            if (order.shipping.address) {
              return fulfillOrder(order);
            } else { return { error: 'No address' }; }
          } else { return { error: 'No shipping' }; }
        } else { return { error: 'Payment not verified' }; }
      } else { return { error: 'No payment' }; }
    } else { return { error: 'No items' }; }
  } else { return { error: 'Not pending' }; }
}

// BETTER - Early returns flatten the logic
function processOrder(order) {
  if (order.status !== 'pending') {
    return { error: 'Not pending' };
  }
  if (order.items.length === 0) {
    return { error: 'No items' };
  }
  if (!order.payment?.verified) {
    return { error: 'Payment not verified' };
  }
  if (!order.shipping?.address) {
    return { error: 'No shipping address' };
  }

  return fulfillOrder(order);
}

Section 5: Potential Issues in AI-Generated Code

Explain patterns that AI might produce that pass tests but need human review:

Pattern 1: Verbose but Functional

AI sometimes generates more code than necessary because it's pattern-matching from training data rather than finding the simplest solution.

// AI might generate
const result = array.map(item => item).filter(item => item !== null).reduce((acc, item) => {
  acc.push(item);
  return acc;
}, []);

// Human reviewer suggests
const result = array.filter(item => item !== null);

Pattern 2: Technically Correct Names

AI chooses names that are accurate but not intuitive to the domain:

// AI-generated (technically accurate)
const temporalBoundaryCheck = date > startDate && date < endDate;

// Human-suggested (domain-intuitive)
const isWithinCampaignPeriod = date > campaignStart && date < campaignEnd;

Pattern 3: Inconsistent Style

AI might mix patterns from different sources:

// Inconsistent: mixes callback and promise styles
getData(id, (err, result) => {
  if (err) throw err;
  processResult(result).then(console.log);
});

// Consistent: uses one style throughout
const result = await getData(id);
const processed = await processResult(result);
console.log(processed);

Section 6: What to Look For During Review

Present the review perspective:

Readability Questions

Ask yourself:

  • Can I understand what this code does without reading every line?
  • Would a new team member understand this in 6 months?
  • Are the names self-explanatory?
  • Is the code flow easy to follow?

Maintainability Questions

Ask yourself:

  • If requirements change, where would I need to modify this?
  • Is the same logic duplicated anywhere?
  • Are there unnecessary abstractions?
  • Are there missing abstractions?

Consistency Questions

Ask yourself:

  • Does this follow our team's coding patterns?
  • Does this use our existing utilities and helpers?
  • Is the style consistent with surrounding code?

Simplicity Questions

Ask yourself:

  • Is there a simpler way to achieve the same result?
  • Are there unnecessary intermediate steps?
  • Could this be achieved with fewer lines without sacrificing clarity?

Section 7: Quick Reference Checklist

Present this checklist for the developer to keep:

During implementation review, check:

  • Variable names describe their purpose clearly
  • Function names indicate what they do
  • Complex logic is broken into understandable steps
  • No magic numbers or strings without explanation
  • Nesting depth is reasonable (≤3 levels preferred)
  • Similar code is not duplicated
  • Abstractions are appropriate (not over/under-engineered)
  • Code follows existing project patterns
  • Uses existing project utilities where applicable
  • Style is consistent with surrounding code

Red flags to watch for:

  • Dense one-liners that need mental parsing
  • Deeply nested conditionals
  • Single-letter variable names (except loop counters)
  • Functions doing too many things
  • Inconsistent naming conventions

During test review, check:

  • Each test has meaningful assertions (not just .toBeDefined())
  • Expected values are manually verified to be correct
  • All code paths have assertions, not just execution
  • Edge cases have specific assertions
  • Error scenarios verify the error details, not just that one occurred

Remember:

100% coverage means the code ran. Review tells you both the code AND the tests are correct. You need both.


Example Inputs

  • Why should I review code that passes all tests?
  • What's wrong with 100% test coverage?
  • Teach me about code review
  • What should I look for in AI-generated code?
  • Why isn't test coverage enough?

Source

git clone https://github.com/Intai/story-flow/blob/main/plugins/junior-flow/skills/learn-review-implementation/SKILL.mdView on GitHub

Overview

This interactive module teaches junior developers why human code review remains essential even with 100% unit test coverage. It explains that tests verify non-crashing behavior but not correctness, readability, or long-term maintainability.

How This Skill Works

The skill presents content in sections and prompts users with AskUserQuestion after each section, offering three options to continue, view an example, or ask a question. It uses real-code examples and checklists to illustrate the limits of test coverage and the importance of readable, maintainable code.

When to Use It

  • After achieving full unit test coverage to verify no crashes
  • During code reviews to assess readability and maintainability
  • When test assertions look weak or misaligned with requirements
  • When exploring edge cases and side effects not fully covered by tests
  • During onboarding or junior-senior collaboration to reinforce best practices

Quick Start

  1. Step 1: Learn what 100% coverage proves and what it doesn't
  2. Step 2: Review test assertions for accuracy and coverage of behavior
  3. Step 3: Improve readability by updating names and documenting intent

Best Practices

  • Audit test assertions for meaningfulness and correctness
  • Cross-check tests against acceptance criteria
  • Document and rename unclear variables for readability
  • Ensure all code paths and edge cases are asserted
  • Verify side effects (DB/API) are tested and observable

Example Use Cases

  • A test suite that passes by only checking that code runs, not outcomes
  • A test asserting a result is defined instead of checking the actual value
  • A test with a wrong expected total (e.g., 60 vs 50)
  • A test that makes no assertions at all
  • An example showing unclear variable names vs self-documenting names

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers