What does this runbook test?

End-to-end OpenSkills validation across runtime tests and example agents to surface discoveries, activations, tool calls, and regressions.

How are failures documented?

Failures are reported with what was run, what passed, what failed, a repro command, and a suggested fix, per the Reporting Format.

Where should I run it?

Locally or in CI, using the baseline cargo test command and the example-agent npm checks described in the runbook.

openskills-e2e-test-runbook

npx machina-cli add skill Geeksfino/openskills/openskills-e2e-test-runbook --openclaw

Files (1)

SKILL.md

1.0 KB

OpenSkills E2E Test Runbook

Use this skill for confidence checks before merging runtime, tooling, or example-agent changes.

Test Layers

Runtime regression tests
Sandbox-focused tests
Example-agent behavior checks
Optional binding smoke tests

Baseline Commands

cargo test -p openskills-runtime

Example-agent checks (from example directories):

npm install
npm start "What skills are available?"
npm start "Create a new skill called 'note-taker'."

E2E Expectations

Skills discover successfully.
Skill activation occurs for matching prompts.
Tool calls align with intent (activation, file reads/writes, script runs).
No unexpected sandbox failures.

Reporting Format

What was run
What passed
What failed
Repro command for each failure
Suggested next fix

Source

git clone https://github.com/Geeksfino/openskills/blob/main/.cursor/skills/openskills-e2e-test-runbook/SKILL.mdView on GitHub

Overview

Performs a deterministic end-to-end OpenSkills validation across runtime tests and example agents to surface regressions before merges. It verifies discovery, activation behavior, and tool-call accuracy, then reports outcomes, repro steps, and suggested fixes.

How This Skill Works

The runbook executes four test layers: 1) Runtime regression tests, 2) Sandbox-focused tests, 3) Example-agent behavior checks, and 4) Optional binding smoke tests. Baseline commands include cargo test -p openskills-runtime and example-agent checks via npm install and npm start checks. Results are consolidated into a standard report detailing what was run, what passed, what failed, repro commands, and suggested next fixes.

When to Use It

Before merging runtime, tooling, or example-agent changes.
During CI to catch regressions early.
To validate example-agent prompts and corresponding activation behavior.
To verify tool calls align with intent (activation, file reads/writes, scripts).
To investigate sandbox or runtime failures and isolate regressions.

Quick Start

Step 1: Run baseline runtime tests: cargo test -p openskills-runtime.
Step 2: Run example-agent checks: npm install; npm start "What skills are available?"; npm start "Create a new skill named note-taker."
Step 3: Review the end-to-end report and reproduce any failures with the provided repro commands.

Best Practices

Run all test layers locally or in CI to ensure coverage.
Capture detailed repro commands for any failure.
Verify that skills are discovered and activate only for matching prompts.
Inspect tool call sequences to confirm intended actions (activation, file I/O, scripts).
Review sandbox stability and binding smoke tests for regressions.

Example Use Cases

Running cargo test -p openskills-runtime to validate runtime changes.
Using npm install and npm start with prompts like What skills are available?.
Checking activation behavior for a given prompt to ensure correct skill triggering.
Inspecting tool-call logs to confirm intended intents (activation, reads, writes, scripts).
Documenting failures with a repro command and suggested fix in the Reporting Format.

Frequently Asked Questions

Add this skill to your agents