Research
npx machina-cli add skill Axect/magi-researchers/research --openclawResearch Workflow — Full Pipeline
Description
Runs the complete research pipeline: Brainstorming → Planning → Implementation → Testing & Visualization → Reporting. Orchestrates all phases with user checkpoints between each.
Usage
/research "research topic" [--domain physics|ai_ml|statistics|mathematics|paper] [--weights '{"novelty":0.4,"feasibility":0.3,"impact":0.3}'] [--depth low|medium|high|max] [--personas N] [--claude-only] [--resume <output_dir>]
Arguments
$ARGUMENTS— The research topic (required) and optional flags:--domain— Research domain (physics, ai_ml, statistics, mathematics, paper). Auto-inferred if omitted.--weights— JSON object of scoring weights for direction ranking. See/research-brainstormfor defaults per domain.--depth— Controls brainstorm review depth (default:medium):low— Skip cross-review, go directly to synthesis (fastest, lowest cost)medium— Standard one-shot cross-review (default)high— Cross-review + adversarial debate (most thorough, highest cost)max— Hierarchical MAGI-in-MAGI: N persona subagents run parallel mini-MAGI pipelines, then meta-review + adversarial debate across all perspectives (deepest, highest cost)
--personas N|auto— Number of domain-specialist subagents for--depth max(default:auto, range: 2-5). Whenauto, Claude analyzes the topic to determine the optimal persona count. Ignored for other depth levels.--claude-only— Replace all Gemini/Codex MCP calls with Claude Agent subagents across all phases. Use when external model endpoints are unavailable. Forwarded to all sub-skills automatically.--resume <output_dir>— Resume an interrupted pipeline from a previous output directory. See Resume Protocol below.
Instructions
MCP Tool Rules
- Gemini: Use the following model fallback chain. Try each model in order; if a call fails (error, timeout, or model-not-found), retry with the next model:
model: "gemini-3.1-pro-preview"(preferred)model: "gemini-2.5-pro"(fallback)- Claude (last resort — skip Gemini MCP tool, use Claude directly)
- Codex: Use
mcp__codex-cli__brainstormfor ideation,mcp__codex-cli__ask-codexfor analysis/review. If Codex fails 2+ times, fall back to Claude directly. - File References: Use
@filepathin the prompt parameter to pass saved artifacts (e.g.,@plan/research_plan.md) instead of pasting file content inline. The CLI tools read files directly, preventing context truncation. - Context7: Use
mcp__plugin_context7_context7__query-docsfor library documentation lookups during implementation. - Web Search: Use web search freely whenever factual verification, recent developments, or literature context would strengthen the analysis:
- Claude: Use the
WebSearchtool directly - Gemini: Add
search: truetomcp__gemini-cli__ask-geminiormcp__gemini-cli__brainstormcalls - Codex: Add
search: truetomcp__codex-cli__ask-codexormcp__codex-cli__brainstormcalls - When to search: prior work verification, methodological precedents, dataset/library availability, related approaches, fact-checking quantitative claims
- Claude-only mode: Claude Agent subagents cannot use WebSearch. The main Claude agent should search beforehand and include findings in the subagent prompt.
- Claude: Use the
- Visualization: Use
matplotlibwithscienceplots(['science', 'nature']style). Save plots as PNG (300 dpi) and PDF. - LaTeX: Use LaTeX for all mathematical expressions in output documents. Inline:
$...$. Display equations:$$on its own line with the equation on a separate line:
Never write display equations on a single line as$$ equation $$$$..equation..$$.
When this skill is invoked, execute the full research pipeline below. Always pause for user confirmation between phases.
Phase Gate Protocol
Phase gates are lightweight quality checkpoints inserted before each USER CHECKPOINT. Each gate follows the same structure but uses phase-specific criteria.
Gate procedure:
- Self-assessment: Claude evaluates the phase output against the checklist below and assigns a confidence level:
High,Medium, orLow. - Conditional MAGI mini-review (if confidence is
MediumorLow):- Send the phase output to one MAGI model for a focused review (Gemini for scientific/plan quality, Codex for implementation/test quality)
- If
--claude-only: Replace the MAGI model call with a Claude Agent subagent (subagent_type: general-purpose). Use the appropriate cognitive style: Creative-Divergent for scientific/plan review, Analytical-Convergent for implementation/test review. The subagent reads files via theReadtool instead of@filepath. - The review prompt should target the specific checklist items that scored low
- Go/No-Go synthesis: Claude writes a brief gate report with:
- Confidence level and justification
- Checklist scores (pass/partial/fail for each item)
- Issues found (if any) and applied fixes
- Go/No-Go decision
- Save to
{phase_dir}/phase_gate.md(e.g.,plan/phase_gate.md)
Phase-specific checklists:
| Phase | Checklist Items |
|---|---|
| Plan (Phase 2) | Completeness (all objectives addressed), methodology soundness, resource feasibility, risk identification |
| Implement (Phase 3) | Code correctness, alignment with plan, error handling, dependency management |
| Test (Phase 4) | Coverage adequacy, edge case handling, visualization quality, result reproducibility |
If a gate returns No-Go, Claude must fix the identified issues before presenting to the user. Maximum 1 fix iteration per gate.
Resume Protocol
When --resume <output_dir> is provided, the pipeline skips initialization and infers the current phase from the presence of key artifact files in the output directory. This avoids requiring the LLM to maintain a separate state file — the artifacts themselves serve as checkpoints.
Phase inference rules (evaluated top-down; first match wins):
| Condition | Inference | Action |
|---|---|---|
report.md exists | Pipeline complete | Inform user; offer to re-run specific phases |
plots/plot_manifest.json exists | Phase 4 complete | Resume from Phase 5 (Reporting) |
src/ contains at least one .py file | Phase 3 complete | Resume from Phase 4 (Testing) |
plan/research_plan.md exists | Phase 2 complete | Resume from Phase 3 (Implementation) |
brainstorm/synthesis.md exists | Phase 1 complete | Resume from Phase 2 (Planning) |
| None of the above | No phase complete | Start from Phase 1 (Brainstorming) |
Resume procedure:
- Use the
Globtool to check for each artifact in the order above. - Read the first few lines of the matched artifact to confirm it is non-empty.
- Announce to the user: detected phase, output directory, and which phase will be resumed.
- Read the domain template if
brainstorm/personas.mdorbrainstorm/weights.jsonexist (to restore context). - Continue the pipeline from the inferred phase, skipping all prior phases.
Important: On resume, do NOT re-create the output directory or overwrite existing artifacts. Append or create only the artifacts for the resumed phase and beyond.
Artifact Contract Protocol
Before starting each phase (2 through 5), verify that the required predecessor artifacts exist and are non-empty. Use the Glob and Read tools for deterministic, tool-based validation — do not rely on memory or assumptions.
Required artifacts per phase:
| Phase | Required Artifacts | Validation Method |
|---|---|---|
| Phase 2 (Plan) | brainstorm/synthesis.md | Glob + Read first 3 lines (non-empty) |
| Phase 3 (Implement) | plan/research_plan.md | Glob + Read first 3 lines (non-empty) |
| Phase 4 (Test) | At least one .py file in src/, plan/research_plan.md | Glob for src/**/*.py + Glob for plan |
| Phase 5 (Report) | brainstorm/synthesis.md, plan/research_plan.md, at least one .py in src/, plots/plot_manifest.json | Glob for each path |
On validation failure:
- List the missing or empty artifacts with specific file paths.
- Ask the user: "The following artifacts are missing: [list]. Would you like to (a) go back and generate them, or (b) proceed anyway?"
- If the user chooses to proceed, continue with a warning note in the phase gate.
Phase 0: Initialization
- Parse
$ARGUMENTS:- Extract the research topic (everything before flags or the entire string)
- Extract domain if
--domainis specified; otherwise infer from topic keywords - Parse
--resume <output_dir>: If provided, skip steps 2-6 and execute the Resume Protocol above. The pipeline will jump directly to the inferred phase.
- Create the output directory structure:
outputs/{sanitized_topic}_{YYYYMMDD}_v{N}/ ├── brainstorm/ ├── plan/ ├── src/ ├── tests/ └── plots/- Sanitize topic: lowercase, spaces→underscores, remove special chars, max 50 chars
- Date format: YYYYMMDD (today's date)
- Version: Glob for
outputs/{sanitized_topic}_{YYYYMMDD}_v*/and set N = max existing + 1 (start at v1)
- If the domain has a template in
${CLAUDE_PLUGIN_ROOT}/templates/domains/, read it as context. - Parse
--weights: If provided, validate and store. If omitted, domain defaults will be used by the brainstorm sub-skill. - Parse
--depth: Acceptlow,medium(default),high, ormax. - Parse
--personas N|auto: Accept integer 2-5 or the stringauto(default:auto). Only used when--depth max; ignored otherwise.- If
auto: Defer persona count determination to Phase 1 (brainstorm sub-skill Step 0b), where Claude analyzes the topic to select the optimal N. - If an explicit integer is given: Use that value directly.
- If
- Parse
--claude-only: Boolean flag (default:false). When present, all Gemini/Codex MCP calls across all phases are replaced with Claude Agent subagents. This flag is forwarded to every sub-skill invocation. - Announce to the user: topic, domain, output directory, active weights (user-provided or domain default), depth level, persona count (if
max; showautoif no explicit--personaswas given), and claude-only mode (if active).
Phase 1: Brainstorming
Execute the /magi-researchers:research-brainstorm workflow, forwarding all flags: --domain, --weights, --depth, --personas (only when --depth max), and --claude-only (if active).
Step 0 & 0b — Setup & Persona Casting:
- Brainstorm sub-skill parses weights and depth, assigns expert personas
- Outputs:
brainstorm/weights.json,brainstorm/personas.md - Personas are used in all subsequent phases where MAGI models are invoked
Step 1a — Parallel Brainstorming (with personas):
- Gemini and Codex brainstorm independently with assigned personas
- Save to
brainstorm/gemini_ideas.mdandbrainstorm/codex_ideas.md
Step 1b — Cross-Check (--depth medium|high):
- Gemini reviews Codex ideas →
brainstorm/gemini_review_of_codex.md - Codex reviews Gemini ideas →
brainstorm/codex_review_of_gemini.md - Skipped if
--depth low
Step 1b+ — Adversarial Debate (--depth high only):
- Top 3 disagreements identified → Round 2 defend/concede/revise
- Outputs:
brainstorm/debate_round2_gemini.md,brainstorm/debate_round2_codex.md
Steps 1-max-a~d — Hierarchical MAGI-in-MAGI (--depth max only):
- Layer 1: N persona subagents spawned in parallel, each running a mini-MAGI pipeline (Gemini brainstorm + Codex brainstorm + cross-review + conclusion) →
brainstorm/persona_{i}/ - Layer 2: Gemini and Codex meta-review all N conclusions; Claude extracts top 3 disagreements; adversarial debate (defend/concede/revise) →
brainstorm/meta_review_*.md,brainstorm/meta_debate_*.md - Layer 3: Claude produces enriched synthesis with cross-persona consensus, unique contributions, debate resolution, and emergent insights →
brainstorm/synthesis.md
Step 1c — Synthesis (with weighted scoring):
- Claude reads all documents, applies weights from
weights.jsonto rank directions - Creates
brainstorm/synthesis.mdwith weighted scores and debate resolution (if applicable) - Present top research directions to user
>>> USER CHECKPOINT: Confirm research direction <<<
Phase 2: Research Planning
Artifact Contract: Verify brainstorm/synthesis.md exists and is non-empty (Glob + Read first 3 lines). On failure, follow the Artifact Contract Protocol above.
Step 2a — Plan Drafting:
- Based on user-confirmed direction from Phase 1:
- Define specific research objectives
- Outline the technical approach (algorithms, models, data)
- Specify implementation requirements (language, libraries, compute)
- Design the test strategy
- Plan visualizations
- Save to
plan/research_plan.md
Step 2b — Murder Board:
Submit the research plan to Gemini as a hostile reviewer to stress-test for critical flaws:
mcp__gemini-cli__ask-gemini(
prompt: "You are a hostile but fair research reviewer. Your job is to find fatal flaws in this research plan — flaws that would cause the research to fail, produce invalid results, or waste significant effort.\n\nAttack the plan on these dimensions:\n1. **Methodological flaws**: Are there fundamental errors in the proposed approach?\n2. **Missing assumptions**: What unstated assumptions could invalidate results?\n3. **Scalability risks**: Will this approach break on realistic problem sizes?\n4. **Data/resource gaps**: Are required datasets, compute, or libraries actually available?\n5. **Novelty concerns**: Has this exact approach been tried and failed before?\n\nFor each flaw found, rate its severity (Critical/Major/Minor) and explain the likely failure mode.\n\nResearch Plan:\n@{output_dir}/plan/research_plan.md",
model: "gemini-3.1-pro-preview" // fallback chain applies
)
Save to plan/murder_board.md.
If
--claude-only: Replace the Gemini murder board call above with a Claude Agent subagent:Agent( subagent_type: "general-purpose", prompt: "You are an Adversarial-Critical reviewer. Your cognitive style is hostile but fair — you actively search for fatal flaws, unstated assumptions, and failure modes. You are NOT here to be helpful; you are here to break the plan. Use the Read tool to read: {output_dir}/plan/research_plan.md Attack the plan on these dimensions: 1. **Methodological flaws**: Are there fundamental errors in the proposed approach? 2. **Missing assumptions**: What unstated assumptions could invalidate results? 3. **Scalability risks**: Will this approach break on realistic problem sizes? 4. **Data/resource gaps**: Are required datasets, compute, or libraries actually available? 5. **Novelty concerns**: Has this exact approach been tried and failed before? For each flaw found, rate its severity (Critical/Major/Minor) and explain the likely failure mode. Save to {output_dir}/plan/murder_board.md. Start with: > Source: Claude Agent subagent (claude-only mode, Adversarial-Critical)" )
Step 2c — Mitigations:
Claude reviews each flaw from the murder board and documents a mitigation strategy:
- For each identified flaw:
- Acknowledge or dispute the flaw (with reasoning)
- If acknowledged: propose a concrete mitigation (plan modification, fallback strategy, or scoping change)
- Rate mitigation confidence:
High,Medium,Low
- If any mitigation has
Lowconfidence, perform one revision pass: update the relevant section ofresearch_plan.mdand re-assess. - Save to
plan/mitigations.md.
Phase Gate: Plan — Execute the Phase Gate Protocol with Plan checklist.
>>> USER CHECKPOINT: Approve research plan <<< Present to user: plan summary, murder board highlights, mitigations, and gate result.
Phase 3: Implementation
Artifact Contract: Verify plan/research_plan.md exists and is non-empty (Glob + Read first 3 lines). On failure, follow the Artifact Contract Protocol above.
Execute the /magi-researchers:research-implement workflow:
- Follow
research_plan.mdto implement code insrc/ - Use Context7 for library documentation as needed
- Validate basic functionality
Phase Gate: Implement — Execute the Phase Gate Protocol with Implement checklist.
- Present implementation summary to user, including gate result.
>>> USER CHECKPOINT: Review implementation <<<
Phase 4: Testing & Visualization
Artifact Contract: Verify at least one .py file exists in src/ (Glob src/**/*.py) and plan/research_plan.md exists. On failure, follow the Artifact Contract Protocol above.
Execute the /magi-researchers:research-test workflow:
Step 1 — Test Design:
- Consult Gemini for test case suggestions
- Claude synthesizes test strategy
- Present to user for approval
Step 2 — Test Execution:
- Write tests in
tests/ - Run with
uv run pytest tests/ -v - Report results
Step 3 — Visualization:
- Generate plots using matplotlib + scienceplots (
['science', 'nature']style) - Save as PNG (300 dpi) and PDF in
plots/
Step 4 — Plot Manifest:
- Generate
plots/plot_manifest.jsonwith metadata for every plot: plot_id, file paths, description, section_hint, publication-ready caption, and markdown snippet - This manifest is the primary input for Phase 5's plot integration
Phase Gate: Test — Execute the Phase Gate Protocol with Test checklist.
>>> USER CHECKPOINT: Review test results and visualizations <<<
Phase 5: Reporting
Artifact Contract: Verify all of the following exist (Glob for each): brainstorm/synthesis.md, plan/research_plan.md, at least one .py in src/, and plots/plot_manifest.json. On failure, follow the Artifact Contract Protocol above.
Execute the /magi-researchers:research-report workflow:
Step 0 — Gather & Health Check:
- Inventory all phase outputs
- Read
plots/plot_manifest.json(create if missing but plots exist) - Verify all plot files are present and valid
Step 1 — Content Assembly & Plot Mapping:
- Read all phase artifacts
- Map manifest plots to report sections using
section_hinttags
Step 2 — Report Draft with Integrated Plots:
- Generate
report.mdusing${CLAUDE_PLUGIN_ROOT}/templates/report_template.mdstructure - Actively embed plots from the manifest with contextualizing paragraphs and quantitative observations
- Include all sections: Background, Brainstorming, Methodology, Implementation, Results, Testing, Conclusion
Step 3 — Gap Detection & Plot Generation Loop (max 2 iterations):
- Identify claims without supporting figures or results needing visualization
- Generate new plots (write matplotlib code → execute → save to
plots/→ update manifest) - Re-draft affected sections with newly generated plots
Step 4 — MAGI Traceability Review (parallel cross-verification):
- Inject personas: If
brainstorm/personas.mdexists, prepend the assigned personas to Gemini and Codex review prompts for continuity - Gemini (BALTHASAR) reviews for scientific rigor: orphaned claims, orphaned plots, weak claim-evidence links, caption quality
- Codex (CASPER) reviews for visualization quality: missing visualizations, plot-narrative mismatch, encoding improvements, reproducibility gaps
- Claude (MELCHIOR) synthesizes both reviews — consensus issues are high-priority fixes, divergent suggestions evaluated on merit
Step 5 — Write Final Report:
- Save finalized
report.md - Present summary with plot integration statistics
>>> USER CHECKPOINT: Review and finalize report <<<
Completion
Announce completion with:
- Output directory location
- Summary of all generated artifacts
- Any follow-up suggestions (e.g., expand implementation, add more tests, explore alternative directions)
Notes
- If any phase fails, stop and inform the user with clear error context
- User can skip phases by saying "skip" at any checkpoint
- The workflow state is maintained through artifact files — use
--resume <output_dir>to resume an interrupted pipeline from the last completed phase - Each phase skill can also be run independently outside this pipeline
Source
git clone https://github.com/Axect/magi-researchers/blob/main/skills/research/SKILL.mdView on GitHub Overview
This skill runs the complete research pipeline from brainstorming to final reporting, orchestrating Brainstorming → Planning → Implementation → Testing & Visualization → Reporting. It includes user checkpoints between phases and supports domain options (physics, ai_ml, statistics, mathematics, or paper) with configurable depth and personas.
How This Skill Works
It orchestrates sequential research phases with phase gates and user confirmations between each stage. Depending on settings, it may route tasks through subagents (including Claude or Gemini) and follow MCP tool rules, while saving artifacts for resume via @filepath references. It is designed to pause between phases for alignment and quality control, and can resume from a saved output directory using --resume.
When to Use It
- Launch a new project from topic to final report with phase-based checkpoints.
- Tune research direction using domain-specific weights for novelty, feasibility, and impact.
- Apply thorough cross-review or adversarial debate at higher depths to increase rigor.
- Resume an interrupted pipeline quickly from a saved output directory.
- Operate with Claude-only or other model configurations when endpoints are unavailable.
Quick Start
- Step 1: Start a new project with /research topic and optional domain/flags (e.g., --domain physics).
- Step 2: Choose depth (--depth) and personas (--personas) if needed.
- Step 3: Review phase checkpoints and use --resume to continue if the run was interrupted.
Best Practices
- Define domain and weights at the start to guide direction ranking.
- Save artifacts to disk using @filepath references to prevent context loss.
- Choose an appropriate depth and number of personas for the task complexity.
- Review and document decisions at each phase checkpoint to maintain traceability.
- Verify methods and results with web searches and external sources when needed.
Example Use Cases
- A physics topic on quantum dot simulations from brainstorming to reporting.
- An AI/ML method comparison study with planning, experiments, and visualization.
- A statistics evaluation of experimental design and data analysis plan.
- A mathematics proof exploration with phased validation and visualization.
- A research paper writing workflow including literature review and results reporting.