Research Test
npx machina-cli add skill Axect/magi-researchers/research-test --openclawResearch Test & Visualization Skill
Description
Creates tests for research code and generates publication-quality visualizations. Requires implemented code in src/ of an active research output directory.
Usage
/research-test [path/to/output/dir]
Arguments
$ARGUMENTS— Optional path to the research output directory. If not provided, uses the most recentoutputs/*/directory.
Instructions
Claude-Only Mode
When --claude-only is active (passed from the parent /research pipeline), all Gemini/Codex MCP calls in this skill are replaced with Claude Agent subagents (subagent_type: general-purpose). Subagents use the Read tool to access files instead of @filepath. Output filenames remain unchanged; each output starts with > Source: Claude Agent subagent (claude-only mode, {style}).
MCP Tool Rules
- Gemini: Use the following model fallback chain. Try each model in order; if a call fails (error, timeout, or model-not-found), retry with the next model:
model: "gemini-3.1-pro-preview"(preferred)model: "gemini-2.5-pro"(fallback)- Claude (last resort — skip Gemini MCP tool, use Claude directly)
- Visualization: Use
matplotlibwithscienceplots(['science', 'nature']style). Save plots as PNG (300 dpi) and PDF. - File References: Use
@filepathin the prompt parameter to pass saved artifacts (e.g.,@plan/research_plan.md) instead of pasting file content inline. The CLI tools read files directly, preventing context truncation. - Web Search: Use web search freely whenever testing requires checking best practices, benchmark references, or visualization techniques:
- Claude: Use the
WebSearchtool directly - Gemini: Add
search: truetomcp__gemini-cli__ask-geminicalls - Codex: Add
search: truetomcp__codex-cli__ask-codexcalls - When to search: testing methodologies, benchmark datasets, expected values for validation, visualization best practices
- Claude: Use the
Step 0: Locate Implementation
- Find the active research output directory (from
$ARGUMENTSor most recent). - Verify
src/exists and contains implementation code. - Read the research plan (
plan/research_plan.md) for test strategy guidance. - Read all source files to understand what needs testing.
Step 1: Test Strategy Discussion
-
Prepare a summary of the implemented code (key functions, expected behaviors, edge cases).
-
Consult Gemini for test suggestions:
mcp__gemini-cli__ask-gemini(
prompt: "Given the following research plan and implementation, suggest comprehensive test cases. Include unit tests, integration tests, and validation tests against known results.\n\nResearch plan:\n@{output_dir}/plan/research_plan.md\n\nSource files:\n@{output_dir}/src/*.py",
model: "gemini-3.1-pro-preview" // fallback: "gemini-2.5-pro" → Claude
)
If
--claude-only: Replace the Gemini call above with:Agent( subagent_type: "general-purpose", prompt: "You are a Creative-Divergent test strategist. Think broadly about edge cases, unusual failure modes, and creative validation approaches. Use the Read tool to read: - {output_dir}/plan/research_plan.md - All .py files in {output_dir}/src/ Suggest comprehensive test cases for this research implementation. Include unit tests, integration tests, and validation tests against known results. Consider edge cases and boundary conditions. Return your suggestions as structured text (do not save to a file)." )
-
Synthesize the test suggestions into a test plan:
- Unit tests: Individual function correctness
- Integration tests: Component interaction
- Validation tests: Results match known/expected values
- Edge cases: Boundary conditions, degenerate inputs
-
Present the test plan to the user for approval/modifications.
Step 2: Test Implementation
- Create
tests/directory if it doesn't exist. - Write test code using
pytest:tests/test_*.pyfiles matching source modules- Clear test names describing what's being tested
- Appropriate assertions with informative failure messages
- Run tests with
uv run pytest tests/ -vand report results. - Fix any failing tests (or flag them for user attention if the fix isn't clear).
Step 3: Visualization
- Create
plots/directory if it doesn't exist. - Generate visualizations using matplotlib + scienceplots:
import matplotlib.pyplot as plt
import scienceplots
plt.style.use(['science', 'nature'])
-
For each visualization:
- Create the figure with appropriate size for the content
- Use proper labels, titles, and legends
- Apply domain-appropriate plot types (see domain template)
- Save in both formats:
fig.savefig('plots/{name}.png', dpi=300, bbox_inches='tight') fig.savefig('plots/{name}.pdf', bbox_inches='tight')
-
Visualization checklist:
- Axes labeled with quantities and units
- Legend present if multiple series
- Appropriate scale (linear, log, etc.)
- Color-blind friendly palette (scienceplots default handles this)
- Saved as both PNG and PDF
Step 4: Plot Manifest Generation
After all plots are generated, create plots/plot_manifest.json — a structured registry of every plot for use by the report phase.
-
For each plot generated in Step 3, collect the following metadata:
{ "plot_id": "descriptive_snake_case_name", "files": { "png": "plots/{name}.png", "pdf": "plots/{name}.pdf" }, "description": "One-sentence description of what the plot shows", "section_hint": "results | methodology | validation | comparison | testing", "caption": "Publication-ready figure caption (2-3 sentences). Include key quantitative findings visible in the plot.", "markdown_snippet": "", "source_context": "Brief note on what code/data generated this plot" } -
Write the complete manifest as a JSON array to
plots/plot_manifest.json:{ "generated_at": "YYYY-MM-DD HH:MM", "total_plots": N, "plots": [ ...entries... ] } -
Section hint values (controlled vocabulary):
results— Key findings, main experimental outcomesmethodology— Algorithmic diagrams, data pipeline illustrationsvalidation— Comparison with known/expected values, error analysiscomparison— Baseline vs. proposed method, ablation studiestesting— Test coverage, pass/fail distributions, edge case behavior
-
Caption guidelines:
- First sentence: what the plot shows (e.g., "Training loss over 100 epochs for three model variants.")
- Second sentence: key observation (e.g., "The proposed method converges 2.3x faster than the baseline.")
- Third sentence (optional): implication or context.
Step 5: Phase Gate
Before presenting to the user, execute a lightweight quality checkpoint:
- Self-assessment: Evaluate the test and visualization outputs against the following checklist and assign a confidence level (
High,Medium, orLow):
| Checklist Item | Criteria |
|---|---|
| Coverage adequacy | Key functions and components have corresponding tests; no major gaps |
| Edge case handling | Boundary conditions, degenerate inputs, and error paths are tested |
| Visualization quality | Plots follow scienceplots style, axes labeled, legends present, saved as PNG+PDF |
| Result reproducibility | Tests are deterministic or use fixed seeds; results are consistent across runs |
-
Conditional MAGI mini-review (if confidence is
MediumorLow):- Send the test results + plot summaries to Codex for a focused review targeting the low-scoring checklist items:
mcp__codex-cli__ask-codex( prompt: "Review these research tests and visualizations for coverage, edge cases, visualization quality, and reproducibility. Focus on: {low_scoring_items}\n\n@{output_dir}/plots/plot_manifest.json\n@{output_dir}/tests/test_*.py" )If
--claude-only: Replace the Codex call above with:Agent( subagent_type: "general-purpose", prompt: "You are an Analytical-Convergent test reviewer. Focus on coverage gaps, reproducibility, and practical quality issues. Use the Read tool to read: - {output_dir}/plots/plot_manifest.json - All test_*.py files in {output_dir}/tests/ Review these research tests and visualizations for coverage, edge cases, visualization quality, and reproducibility. Focus on: {low_scoring_items} Return your review as structured text (do not save to a file)." ) -
Go/No-Go synthesis: Write a brief gate report with:
- Confidence level and justification
- Checklist scores (pass/partial/fail for each item)
- Issues found (if any) and applied fixes
- Go/No-Go decision
-
Save to
tests/phase_gate.md.
If the gate returns No-Go, fix the identified issues before presenting to the user. Maximum 1 fix iteration.
Step 6: Summary
Present to the user:
- Test results summary (passed/failed/skipped)
- List of generated plots with descriptions
- Phase gate result summary
- Any issues found during testing
- Suggestions for additional tests or visualizations
Notes
- If scienceplots is not installed, run
uv add SciencePlotsfirst. - For physics: include comparison with analytical/theoretical results in plots.
- For AI/ML: include learning curves, comparison charts, and metric tables.
- For statistics: include residual diagnostics, Q-Q plots, forest plots, and posterior densities.
- For mathematics: include proof dependency DAGs, phase portraits, and parameter space plots.
- For paper: include argument maps, figure placement plans, and citation network diagrams.
Source
git clone https://github.com/Axect/magi-researchers/blob/main/skills/research-test/SKILL.mdView on GitHub Overview
Automates testing for research code in the active output directory and creates publication-quality visuals. It validates the src/ implementation against the research plan, and produces ready-to-publish figures using matplotlib with scienceplots. This helps ensure correctness, reproducibility, and compelling visuals for papers.
How This Skill Works
The tool locates the target directory from the optional argument or the most recent outputs/*/; it verifies that src/ exists and reads plan/research_plan.md for test guidance. It synthesizes a test plan covering unit, integration, and validation tests, then implements tests under tests/ with pytest. Visual outputs are generated using matplotlib with scienceplots style (nature/science) and saved as PNG (300 dpi) and PDF, with artifacts referenced via @filepath to avoid content truncation.
When to Use It
- When you have an active research output with a src/ folder and plan/research_plan.md and need a formal test plan.
- When you want unit tests for individual functions, integration tests for component interactions, and validation tests against known results.
- When you need publication-ready visualizations generated from your results and sources.
- When you must save visuals and test artifacts as high-quality PNG and PDF files for manuscripts or reports (PNG 300 dpi, PDF).
- When you want to integrate with Claude/Gemini/CodeX pipelines and need guidance on tool usage and file references.
Quick Start
- Step 1: Run /research-test [path/to/output/dir] to point to the active research output directory.
- Step 2: Review plan/research_plan.md and the src/* files to understand the intended tests.
- Step 3: Implement tests under tests/ with pytest and generate visuals; artifacts are saved as PNG (300 dpi) and PDF using matplotlib scienceplots.
Best Practices
- Verify that the target directory exists and contains src/ and plan/research_plan.md before starting.
- Sketch unit tests for individual functions, then add integration tests to cover module interactions.
- Use pytest and organize tests under tests/ with clear naming like test_<module>.py.
- Configure matplotlib with scienceplots style (nature/science) to match publication standards and save as PNG (300 dpi) and PDF.
- Reference artifacts via @filepath to pass saved plan and source artifacts to prompts and avoid large inlined content.
Example Use Cases
- Bioinformatics project: test a data parser in src/ and generate a figure of expression-level distributions.
- Physics simulation: unit tests for a solver, integration tests for time-step interactions, and a phase-diagram visualization.
- ML research pipeline: tests for data loader and model wrapper; visuals show training/validation curves.
- Chemical kinetics: reproducibility suite with concentration vs time plots saved as PNG and PDF.
- Ecology study: figure panel illustrating experimental results with publication-ready formatting.