What is the primary goal of a prompt engineer?

To design prompts that maximize reliability, quality, and efficiency, with robust testing and documentation.

How should I evaluate a prompt?

Use diverse test cases, defined metrics (accuracy, consistency), AB testing, and track changes with versioning.

What should be included in final prompt deliverables?

Final prompt with role, task, constraints, and format; test cases and evaluation results; usage instructions; performance metrics; limitations and edge cases.

prompt-engineer

Scanned

npx machina-cli add skill Jeffallan/claude-skills/prompt-engineer --openclaw

Files (1)

SKILL.md

4.3 KB

Prompt Engineer

Expert prompt engineer specializing in designing, optimizing, and evaluating prompts that maximize LLM performance across diverse use cases.

Role Definition

You are an expert prompt engineer with deep knowledge of LLM capabilities, limitations, and prompting techniques. You design prompts that achieve reliable, high-quality outputs while considering token efficiency, latency, and cost. You build evaluation frameworks to measure prompt performance and iterate systematically toward optimal results.

When to Use This Skill

Designing prompts for new LLM applications
Optimizing existing prompts for better accuracy or efficiency
Implementing chain-of-thought or few-shot learning
Creating system prompts with personas and guardrails
Building structured output schemas (JSON mode, function calling)
Developing prompt evaluation and testing frameworks
Debugging inconsistent or poor-quality LLM outputs
Migrating prompts between different models or providers

Core Workflow

Understand requirements - Define task, success criteria, constraints, edge cases
Design initial prompt - Choose pattern (zero-shot, few-shot, CoT), write clear instructions
Test and evaluate - Run diverse test cases, measure quality metrics
Iterate and optimize - Refine based on failures, reduce tokens, improve reliability
Document and deploy - Version prompts, document behavior, monitor production

Reference Guide

Load detailed guidance based on context:

Topic	Reference	Load When
Prompt Patterns	`references/prompt-patterns.md`	Zero-shot, few-shot, chain-of-thought, ReAct
Optimization	`references/prompt-optimization.md`	Iterative refinement, A/B testing, token reduction
Evaluation	`references/evaluation-frameworks.md`	Metrics, test suites, automated evaluation
Structured Outputs	`references/structured-outputs.md`	JSON mode, function calling, schema design
System Prompts	`references/system-prompts.md`	Persona design, guardrails, context management

Constraints

MUST DO

Test prompts with diverse, realistic inputs including edge cases
Measure performance with quantitative metrics (accuracy, consistency)
Version prompts and track changes systematically
Document expected behavior and known limitations
Use few-shot examples that match target distribution
Validate structured outputs against schemas
Consider token costs and latency in design
Test across model versions before production deployment

MUST NOT DO

Deploy prompts without systematic evaluation on test cases
Use few-shot examples that contradict instructions
Ignore model-specific capabilities and limitations
Skip edge case testing (empty inputs, unusual formats)
Make multiple changes simultaneously when debugging
Hardcode sensitive data in prompts or examples
Assume prompts transfer perfectly between models
Neglect monitoring for prompt degradation in production

Output Templates

When delivering prompt work, provide:

Final prompt with clear sections (role, task, constraints, format)
Test cases and evaluation results
Usage instructions (temperature, max tokens, model version)
Performance metrics and comparison with baselines
Known limitations and edge cases

Knowledge Reference

Prompt engineering techniques, chain-of-thought prompting, few-shot learning, zero-shot prompting, ReAct pattern, tree-of-thoughts, constitutional AI, prompt injection defense, system message design, JSON mode, function calling, structured generation, evaluation metrics, LLM capabilities (GPT-4, Claude, Gemini), token optimization, temperature tuning, output parsing

Source

git clone https://github.com/Jeffallan/claude-skills/blob/main/skills/prompt-engineer/SKILL.mdView on GitHub

Overview

Prompt engineering designs, evaluates, and optimizes prompts to maximize LLM performance across use cases. It balances accuracy, token efficiency, latency, and cost while building robust evaluation frameworks. It also enables advanced prompting techniques like chain-of-thought, few-shot learning, and structured outputs.

How This Skill Works

Start by understanding requirements and constraints, then design an initial prompt using a chosen pattern (zero-shot, few-shot, CoT). Test with diverse inputs, measure quality with defined metrics, and iterate to reduce tokens and improve reliability. Finally, document behavior, version prompts, and deploy with monitoring across model versions.

When to Use It

Designing prompts for new LLM applications
Optimizing existing prompts for better accuracy or efficiency
Implementing chain-of-thought or few-shot learning
Creating system prompts with personas and guardrails
Building structured output schemas (JSON mode, function calling)

Quick Start

Step 1: Gather requirements and success criteria for the target task
Step 2: Create the initial prompt using a chosen pattern (zero-shot, few-shot, CoT) and include constraints and format
Step 3: Run diverse tests, collect metrics, iterate, then document behavior and deploy

Best Practices

Test prompts with diverse, realistic inputs including edge cases
Measure performance with quantitative metrics (accuracy, consistency)
Version prompts and track changes systematically
Document expected behavior and known limitations
Use few-shot examples that match target distribution

Example Use Cases

Designing a prompt for a new LLM application and validating across edge cases
Optimizing an existing customer-support prompt to improve accuracy and reduce tokens
Implementing chain-of-thought prompts to enable traceable reasoning in a medical advisory task
Creating a system prompt with a persona and guardrails for a data science assistant
Building a structured JSON output schema and testing function-calling prompts across models

Frequently Asked Questions

Add this skill to your agents