eval-integration
Scannednpx machina-cli add skill Ibrahim-3d/conductor-orchestrator-superpowers/eval-integration --openclawIntegration Evaluator Agent
Specialized evaluator for tracks that integrate external services — Supabase, Stripe, Gemini, or any third-party API.
When This Evaluator Is Used
Dispatched by loop-execution-evaluator when the track involves:
- Authentication or database integration
- Payment processing integration
- AI/ML API integration
- Any external API connection
Inputs Required
- Track's
spec.mdandplan.md - Environment config (
.env.example, env variable documentation) - API client code (
src/lib/) - Database schema (if Supabase)
- Webhook handlers (if Stripe)
Evaluation Passes (6 checks)
Pass 1: API Contract Verification
| Check | What to Look For |
|---|---|
| Request shapes | API calls send correct payload structure |
| Response handling | Responses parsed with correct types |
| Error responses | 4xx/5xx errors handled with user-friendly messaging |
| Rate limits | Rate limit handling present (retry, backoff, queue) |
| Timeout | Reasonable timeout set on API calls |
| Auth headers | Bearer token / API key sent correctly |
### API Contracts: PASS ✅ / FAIL ❌
- Endpoints verified: [count]
- Missing error handling: [list]
- Type mismatches: [list]
Pass 2: Authentication Flow
| Check | What to Look For |
|---|---|
| Sign up | Creates user, stores token, redirects to dashboard |
| Sign in | Validates credentials, stores token, redirects |
| Sign out | Clears token, redirects to home |
| Token refresh | Handles expired tokens (refresh or re-auth) |
| Protected routes | Unauthenticated users redirected to login |
| OAuth | Third-party login flow (if applicable) |
### Auth Flow: PASS ✅ / FAIL ❌
- Flows tested: [sign up / sign in / sign out / token refresh]
- Broken flows: [list]
- Token handling: [correct / issues]
Pass 3: Data Persistence & Schema Hygiene
CRUD Operations:
| Check | What to Look For |
|---|---|
| Create | Data saved correctly to database/storage |
| Read | Data retrieved and rendered correctly |
| Update | Changes persisted on save |
| Delete | Records removed, UI reflects deletion |
| Relationships | Foreign keys / joins working correctly |
| Storage | File uploads stored and retrievable (if applicable) |
Database Schema Quality (MANDATORY for all new tables/migrations):
| Check | Requirement | Why |
|---|---|---|
| Timestamps | created_at, updated_at on ALL mutable tables | Debugging, audit trail, cache invalidation |
| Primary keys | UUID with default OR auto-increment | Data uniqueness |
| Foreign keys | Explicit cascade rules (on delete cascade) | Prevent orphaned data |
| Indexes | Index ALL foreign keys | Query performance |
| Null constraints | New columns nullable OR have defaults | Backward compatibility |
| Unique constraints | Composite uniques where needed | Data integrity |
| Version history | JSONB column for flexible history | Schema evolution |
Schema Anti-Patterns to Flag:
-- ❌ BAD: No timestamps
create table brands (
id uuid primary key,
name text
);
-- ✅ GOOD: Complete schema
create table brands (
id uuid primary key default gen_random_uuid(),
name text not null,
created_at timestamptz default now() not null,
updated_at timestamptz default now() not null
);
-- ❌ BAD: Foreign key without cascade
brand_id uuid references brands(id)
-- ✅ GOOD: Explicit cascade
brand_id uuid references brands(id) on delete cascade not null
-- ❌ BAD: New required column (breaks existing data)
alter table assets add column image_url text not null;
-- ✅ GOOD: Nullable or has default
alter table assets add column locked boolean default false;
### Data Persistence & Schema: PASS ✅ / FAIL ❌
- CRUD operations: [which work / which fail]
- Data integrity: [any corruption or loss]
- Storage: [files accessible / issues]
- **Tables missing timestamps: [count] — [list]**
- **Foreign keys without indexes: [count] — [list]**
- **Migrations without defaults: [count] — [list]**
- **Orphaned data risk: [YES/NO — describe]**
Pass 4: Error Recovery
| Check | What to Look For |
|---|---|
| Network failure | Offline/timeout → user sees error, can retry |
| Invalid data | Malformed responses → graceful fallback |
| Auth failure | Expired token → redirect to login, not crash |
| Payment failure | Declined card → clear message, can retry |
| API down | Service unavailable → error state, not blank screen |
| Partial failure | One API fails, others still work |
### Error Recovery: PASS ✅ / FAIL ❌
- Scenarios tested: [list]
- Unhandled failures: [list]
- User messaging: [clear / missing]
Pass 5: Environment Configuration
| Check | What to Look For |
|---|---|
.env.example | All required variables documented |
| No secrets in code | No API keys, tokens, or passwords in source files |
| Environment switching | Dev/staging/prod configs separate |
| Missing vars | App handles missing env vars gracefully (error, not crash) |
### Environment: PASS ✅ / FAIL ❌
- Variables documented: [YES/NO]
- Secrets in code: [NONE / list files with exposed secrets]
- Missing var handling: [graceful / crashes]
Pass 6: End-to-End Flow
Walk through the complete user journey that involves this integration:
| Flow | Steps to Verify |
|---|---|
| Auth flow | Landing → Sign Up → Verify → Dashboard |
| Payment flow | Select plan → Checkout → Payment → Confirmation |
| Generation flow | Form → Generate → View → Download |
### E2E Flow: PASS ✅ / FAIL ❌
- Flow tested: [describe]
- Steps completed: [X]/[Y]
- Broken at step: [which step, if any]
Verdict Template
## Integration Evaluation Report
**Track**: [track-id]
**Evaluator**: eval-integration
**Date**: [YYYY-MM-DD]
**Service**: [Supabase/Stripe/Gemini/etc.]
### Results
| Pass | Status | Issues |
|------|--------|--------|
| 1. API Contracts | PASS/FAIL | [details] |
| 2. Auth Flow | PASS/FAIL | [details] |
| 3. Data Persistence | PASS/FAIL | [details] |
| 4. Error Recovery | PASS/FAIL | [details] |
| 5. Environment | PASS/FAIL | [details] |
| 6. E2E Flow | PASS/FAIL | [details] |
### Verdict: PASS ✅ / FAIL ❌
[If FAIL, list specific fix actions for loop-fixer]
Handoff
- PASS → Return to
loop-execution-evaluator→ Conductor marks complete - FAIL → Return to
loop-execution-evaluator→ Conductor dispatchesloop-fixer
Source
git clone https://github.com/Ibrahim-3d/conductor-orchestrator-superpowers/blob/master/skills/eval-integration/SKILL.mdView on GitHub Overview
Evaluates tracks that connect to external services such as Supabase, Stripe, Gemini, or other APIs. It validates API contracts, authentication flows, data persistence, error handling, environment configuration, and end to end flow integrity. It is invoked by the loop-execution-evaluator for integration, auth, payments, or api track types.
How This Skill Works
The evaluator requires track spec and plan, environment config, API client code, and necessary database schema and webhook handlers. It runs through six evaluation passes including API contract verification, authentication flow checks, and data persistence schema hygiene, then returns clear pass or fail results with actionable guidance.
When to Use It
- Evaluating a track that integrates Supabase auth and database
- Verifying Stripe payment processing and webhook flows
- Assessing Gemini API or other third party API integrations
- Testing API client interactions with rate limits and retries
- Ensuring end-to-end flow integrity for external service tracks
Quick Start
- Step 1: Gather spec.md, plan.md, environment docs, API client code, and database schema
- Step 2: Run the eval-integration process for a track with type integration, auth, payments, or api
- Step 3: Review PASS/FAIL output, fix API contracts, auth flows, persistence, or config, and re-run
Best Practices
- Provide complete environment configs and docs (.env.example and docs)
- Include full API contract tests covering requests, responses, errors, and timeouts
- Thoroughly test authentication flows including sign up, sign in, sign out, and token refresh
- Enforce data persistence schema hygiene with timestamps, keys, cascades, and indexes
- Validate webhook and event flows with realistic external triggers
Example Use Cases
- Evaluating a Supabase backed user auth and profile store
- Verifying a Stripe checkout flow from initiation to webhook processing
- Testing a Gemini API integration for market data and trading signals
- Checking a multi API integration with retry and backoff logic
- End-to-end integration track testing with a third party CRM API