What happens if no path is provided?

It uses the most recent outputs/*/ directory as the target for tests and visualizations.

Which formats are produced for visualizations?

Visuals are saved as PNG (300 dpi) and PDF using matplotlib with the scienceplots style.

How does Claude-only mode affect testing?

If --claude-only is active, Gemini MCP calls are replaced with Claude Agent subagents using the Read tool; output filenames begin with '> Source: Claude Agent subagent (claude-only mode, {style})' and file references use @filepath.

Research Test

npx machina-cli add skill Axect/magi-researchers/research-test --openclaw

Files (1)

SKILL.md

9.7 KB

Research Test & Visualization Skill

Description

Creates tests for research code and generates publication-quality visualizations. Requires implemented code in src/ of an active research output directory.

Usage

/research-test [path/to/output/dir]

Arguments

$ARGUMENTS — Optional path to the research output directory. If not provided, uses the most recent outputs/*/ directory.

Instructions

Claude-Only Mode

When --claude-only is active (passed from the parent /research pipeline), all Gemini/Codex MCP calls in this skill are replaced with Claude Agent subagents (subagent_type: general-purpose). Subagents use the Read tool to access files instead of @filepath. Output filenames remain unchanged; each output starts with > Source: Claude Agent subagent (claude-only mode, {style}).

MCP Tool Rules

Gemini: Use the following model fallback chain. Try each model in order; if a call fails (error, timeout, or model-not-found), retry with the next model:
1. model: "gemini-3.1-pro-preview" (preferred)
2. model: "gemini-2.5-pro" (fallback)
3. Claude (last resort — skip Gemini MCP tool, use Claude directly)
Visualization: Use matplotlib with scienceplots (['science', 'nature'] style). Save plots as PNG (300 dpi) and PDF.
File References: Use @filepath in the prompt parameter to pass saved artifacts (e.g., @plan/research_plan.md) instead of pasting file content inline. The CLI tools read files directly, preventing context truncation.
Web Search: Use web search freely whenever testing requires checking best practices, benchmark references, or visualization techniques:
- Claude: Use the WebSearch tool directly
- Gemini: Add search: true to mcp__gemini-cli__ask-gemini calls
- Codex: Add search: true to mcp__codex-cli__ask-codex calls
- When to search: testing methodologies, benchmark datasets, expected values for validation, visualization best practices

Step 0: Locate Implementation

Find the active research output directory (from $ARGUMENTS or most recent).
Verify src/ exists and contains implementation code.
Read the research plan (plan/research_plan.md) for test strategy guidance.
Read all source files to understand what needs testing.

Step 1: Test Strategy Discussion

Prepare a summary of the implemented code (key functions, expected behaviors, edge cases).
Consult Gemini for test suggestions:

mcp__gemini-cli__ask-gemini(
  prompt: "Given the following research plan and implementation, suggest comprehensive test cases. Include unit tests, integration tests, and validation tests against known results.\n\nResearch plan:\n@{output_dir}/plan/research_plan.md\n\nSource files:\n@{output_dir}/src/*.py",
  model: "gemini-3.1-pro-preview"  // fallback: "gemini-2.5-pro" → Claude
)

If --claude-only: Replace the Gemini call above with:

Agent(
  subagent_type: "general-purpose",
  prompt: "You are a Creative-Divergent test strategist. Think broadly about edge cases, unusual failure modes, and creative validation approaches.

Use the Read tool to read:
- {output_dir}/plan/research_plan.md
- All .py files in {output_dir}/src/

Suggest comprehensive test cases for this research implementation. Include unit tests, integration tests, and validation tests against known results. Consider edge cases and boundary conditions.

Return your suggestions as structured text (do not save to a file)."
)

Synthesize the test suggestions into a test plan:
- Unit tests: Individual function correctness
- Integration tests: Component interaction
- Validation tests: Results match known/expected values
- Edge cases: Boundary conditions, degenerate inputs
Present the test plan to the user for approval/modifications.

Step 2: Test Implementation

Create tests/ directory if it doesn't exist.
Write test code using pytest:
- tests/test_*.py files matching source modules
- Clear test names describing what's being tested
- Appropriate assertions with informative failure messages
Run tests with uv run pytest tests/ -v and report results.
Fix any failing tests (or flag them for user attention if the fix isn't clear).

Step 3: Visualization

Create plots/ directory if it doesn't exist.
Generate visualizations using matplotlib + scienceplots:

import matplotlib.pyplot as plt
import scienceplots

plt.style.use(['science', 'nature'])

For each visualization:
- Create the figure with appropriate size for the content
- Use proper labels, titles, and legends
- Apply domain-appropriate plot types (see domain template)
- Save in both formats:
```
fig.savefig('plots/{name}.png', dpi=300, bbox_inches='tight')
fig.savefig('plots/{name}.pdf', bbox_inches='tight')
```
Visualization checklist:
- Axes labeled with quantities and units
- Legend present if multiple series
- Appropriate scale (linear, log, etc.)
- Color-blind friendly palette (scienceplots default handles this)
- Saved as both PNG and PDF

Step 4: Plot Manifest Generation

After all plots are generated, create plots/plot_manifest.json — a structured registry of every plot for use by the report phase.

For each plot generated in Step 3, collect the following metadata:

{
  "plot_id": "descriptive_snake_case_name",
  "files": {
    "png": "plots/{name}.png",
    "pdf": "plots/{name}.pdf"
  },
  "description": "One-sentence description of what the plot shows",
  "section_hint": "results | methodology | validation | comparison | testing",
  "caption": "Publication-ready figure caption (2-3 sentences). Include key quantitative findings visible in the plot.",
  "markdown_snippet": "![Caption text](plots/{name}.png)",
  "source_context": "Brief note on what code/data generated this plot"
}

Write the complete manifest as a JSON array to plots/plot_manifest.json:

{
  "generated_at": "YYYY-MM-DD HH:MM",
  "total_plots": N,
  "plots": [ ...entries... ]
}

Section hint values (controlled vocabulary):
- results — Key findings, main experimental outcomes
- methodology — Algorithmic diagrams, data pipeline illustrations
- validation — Comparison with known/expected values, error analysis
- comparison — Baseline vs. proposed method, ablation studies
- testing — Test coverage, pass/fail distributions, edge case behavior
Caption guidelines:
- First sentence: what the plot shows (e.g., "Training loss over 100 epochs for three model variants.")
- Second sentence: key observation (e.g., "The proposed method converges 2.3x faster than the baseline.")
- Third sentence (optional): implication or context.

Step 5: Phase Gate

Before presenting to the user, execute a lightweight quality checkpoint:

Self-assessment: Evaluate the test and visualization outputs against the following checklist and assign a confidence level (High, Medium, or Low):

Checklist Item	Criteria
Coverage adequacy	Key functions and components have corresponding tests; no major gaps
Edge case handling	Boundary conditions, degenerate inputs, and error paths are tested
Visualization quality	Plots follow scienceplots style, axes labeled, legends present, saved as PNG+PDF
Result reproducibility	Tests are deterministic or use fixed seeds; results are consistent across runs

Conditional MAGI mini-review (if confidence is Medium or Low):

Send the test results + plot summaries to Codex for a focused review targeting the low-scoring checklist items:

mcp__codex-cli__ask-codex(
  prompt: "Review these research tests and visualizations for coverage, edge cases, visualization quality, and reproducibility. Focus on: {low_scoring_items}\n\n@{output_dir}/plots/plot_manifest.json\n@{output_dir}/tests/test_*.py"
)

If --claude-only: Replace the Codex call above with:

Agent(
  subagent_type: "general-purpose",
  prompt: "You are an Analytical-Convergent test reviewer. Focus on coverage gaps, reproducibility, and practical quality issues.

Use the Read tool to read:
- {output_dir}/plots/plot_manifest.json
- All test_*.py files in {output_dir}/tests/

Review these research tests and visualizations for coverage, edge cases, visualization quality, and reproducibility. Focus on: {low_scoring_items}

Return your review as structured text (do not save to a file)."
)

Go/No-Go synthesis: Write a brief gate report with:
- Confidence level and justification
- Checklist scores (pass/partial/fail for each item)
- Issues found (if any) and applied fixes
- Go/No-Go decision
Save to tests/phase_gate.md.

If the gate returns No-Go, fix the identified issues before presenting to the user. Maximum 1 fix iteration.

Step 6: Summary

Present to the user:

Test results summary (passed/failed/skipped)
List of generated plots with descriptions
Phase gate result summary
Any issues found during testing
Suggestions for additional tests or visualizations

Notes

If scienceplots is not installed, run uv add SciencePlots first.
For physics: include comparison with analytical/theoretical results in plots.
For AI/ML: include learning curves, comparison charts, and metric tables.
For statistics: include residual diagnostics, Q-Q plots, forest plots, and posterior densities.
For mathematics: include proof dependency DAGs, phase portraits, and parameter space plots.
For paper: include argument maps, figure placement plans, and citation network diagrams.

Source

git clone https://github.com/Axect/magi-researchers/blob/main/skills/research-test/SKILL.mdView on GitHub

Overview

Automates testing for research code in the active output directory and creates publication-quality visuals. It validates the src/ implementation against the research plan, and produces ready-to-publish figures using matplotlib with scienceplots. This helps ensure correctness, reproducibility, and compelling visuals for papers.

How This Skill Works

The tool locates the target directory from the optional argument or the most recent outputs/*/; it verifies that src/ exists and reads plan/research_plan.md for test guidance. It synthesizes a test plan covering unit, integration, and validation tests, then implements tests under tests/ with pytest. Visual outputs are generated using matplotlib with scienceplots style (nature/science) and saved as PNG (300 dpi) and PDF, with artifacts referenced via @filepath to avoid content truncation.

When to Use It

When you have an active research output with a src/ folder and plan/research_plan.md and need a formal test plan.
When you want unit tests for individual functions, integration tests for component interactions, and validation tests against known results.
When you need publication-ready visualizations generated from your results and sources.
When you must save visuals and test artifacts as high-quality PNG and PDF files for manuscripts or reports (PNG 300 dpi, PDF).
When you want to integrate with Claude/Gemini/CodeX pipelines and need guidance on tool usage and file references.

Quick Start

Step 1: Run /research-test [path/to/output/dir] to point to the active research output directory.
Step 2: Review plan/research_plan.md and the src/* files to understand the intended tests.
Step 3: Implement tests under tests/ with pytest and generate visuals; artifacts are saved as PNG (300 dpi) and PDF using matplotlib scienceplots.

Best Practices

Verify that the target directory exists and contains src/ and plan/research_plan.md before starting.
Sketch unit tests for individual functions, then add integration tests to cover module interactions.
Use pytest and organize tests under tests/ with clear naming like test_<module>.py.
Configure matplotlib with scienceplots style (nature/science) to match publication standards and save as PNG (300 dpi) and PDF.
Reference artifacts via @filepath to pass saved plan and source artifacts to prompts and avoid large inlined content.

Example Use Cases

Bioinformatics project: test a data parser in src/ and generate a figure of expression-level distributions.
Physics simulation: unit tests for a solver, integration tests for time-step interactions, and a phase-diagram visualization.
ML research pipeline: tests for data loader and model wrapper; visuals show training/validation curves.
Chemical kinetics: reproducibility suite with concentration vs time plots saved as PNG and PDF.
Ecology study: figure panel illustrating experimental results with publication-ready formatting.

Frequently Asked Questions

Add this skill to your agents