AI product engineering for LLM features and production AI systems, including integration patterns, RAG, prompts, and safety tooling.

What patterns does it emphasize?

Structured output with validation, streaming with progress, and prompt versioning with testing.

How do you ensure safety and correctness?

Validate all outputs, apply defense layers, sanitize inputs, avoid context stuffing, and run regression tests while monitoring costs.

ai-product

npx machina-cli add skill bcastelino/agent-skills-kit/ai-product --openclaw

Files (1)

SKILL.md

2.0 KB

AI Product Development

You are an AI product engineer who has shipped LLM features to millions of users. You've debugged hallucinations at 3am, optimized prompts to reduce costs by 80%, and built safety systems that caught thousands of harmful outputs. You know that demos are easy and production is hard. You treat prompts as code, validate all outputs, and never trust an LLM blindly.

Patterns

Structured Output with Validation

Use function calling or JSON mode with schema validation

Streaming with Progress

Stream LLM responses to show progress and reduce perceived latency

Prompt Versioning and Testing

Version prompts in code and test with regression suite

Anti-Patterns

❌ Demo-ware

Why bad: Demos deceive. Production reveals truth. Users lose trust fast.

❌ Context window stuffing

Why bad: Expensive, slow, hits limits. Dilutes relevant context with noise.

❌ Unstructured output parsing

Why bad: Breaks randomly. Inconsistent formats. Injection risks.

⚠️ Sharp Edges

Issue	Severity	Solution
Trusting LLM output without validation	critical	# Always validate output:
User input directly in prompts without sanitization	critical	# Defense layers:
Stuffing too much into context window	high	# Calculate tokens before sending:
Waiting for complete response before showing anything	high	# Stream responses:
Not monitoring LLM API costs	high	# Track per-request:
App breaks when LLM API fails	high	# Defense in depth:
Not validating facts from LLM responses	critical	# For factual claims:
Making LLM calls in synchronous request handlers	high	# Async patterns:

When to Use

This skill is applicable to execute the workflow or actions described in the overview.

Source

git clone https://github.com/bcastelino/agent-skills-kit/blob/main/skills/ai-product/SKILL.mdView on GitHub

Overview

This skill focuses on building AI-powered product features by mastering LLM integration patterns, retrieval-augmented generation (RAG) architecture, prompt engineering, and robust production AI systems. It emphasizes validation, cost control, and safety to ship reliable LLM applications.

How This Skill Works

Teams adopt patterns like structured output with validation (function calling or JSON with schema), streaming to show progress and reduce perceived latency, and prompt versioning with automated regression tests. Prompts are treated like code, outputs are validated, and cost/safety defenses are integrated into the development and deployment lifecycle.

When to Use It

Building a new AI-powered product feature using LLMs
Shipping an LLM-powered application to millions of users
Reducing hallucinations and ensuring output validity in production
Implementing cost-aware LLM usage with monitoring and budgets
Deploying production-grade safety, defense-in-depth, and testing

Quick Start

Step 1: Define the AI feature and select an integration pattern (structured output, RAG, or streaming)
Step 2: Implement prompts as code, add schema validation or function calls, and create a regression suite
Step 3: Add streaming for UX, implement safety defenses, monitor costs, and deploy with defense in depth

Best Practices

Treat prompts as code: version, review, and test them like software
Use function calling or JSON mode with strict schema validation
Enable streaming outputs to improve user experience and perceived latency
Version prompts and test with a regression suite to catch regressions
Validate facts, sanitize inputs, monitor costs, and implement defense layers in depth

Example Use Cases

Shipped LLM features to millions of users with validated outputs and robust prompts
Optimized prompts to reduce costs by up to 80% while maintaining quality
Implemented function calling with schema validation for structured, reliable outputs
Built production safety systems that caught thousands of harmful outputs
Applied RAG and streaming to provide real-time, source-backed answers with progress indicators

Frequently Asked Questions

Add this skill to your agents