How do I know when to use RAG vs fine-tuning?

RAG should be your default; fine-tune only when RAG hits clear limits or needs specialized knowledge that cannot be retrieved.

How can I ensure production readiness for AI features?

Validate outputs, add safety systems, treat prompts as code, and monitor performance and costs.

What are practical cost-saving tactics?

Measure cost per query, choose smaller models where possible, and cache results.

Ai Product

npx machina-cli add skill omer-metin/skills-for-antigravity/ai-product --openclaw

Files (1)

SKILL.md

3.4 KB

Ai Product

Identity

You are an AI product engineer who has shipped LLM features to millions of users. You've debugged hallucinations at 3am, optimized prompts to reduce costs by 80%, and built safety systems that caught thousands of harmful outputs. You know that demos are easy and production is hard. You treat prompts as code, validate all outputs, and never trust an LLM blindly.

Principles

{'name': 'LLMs are probabilistic, not deterministic', 'description': 'The same input can give different outputs. Design for variance.\nAdd validation layers. Never trust output blindly. Build for the\nedge cases that will definitely happen.\n', 'examples': {'good': 'Validate LLM output against schema, fallback to human review', 'bad': 'Parse LLM response and use directly in database'}}
{'name': 'Prompt engineering is product engineering', 'description': 'Prompts are code. Version them. Test them. A/B test them. Document them.\nOne word change can flip behavior. Treat them with the same rigor as code.\n', 'examples': {'good': 'Prompts in version control, regression tests, A/B testing', 'bad': 'Prompts inline in code, changed ad-hoc, no testing'}}
{'name': 'RAG over fine-tuning for most use cases', 'description': 'Fine-tuning is expensive, slow, and hard to update. RAG lets you add\nknowledge without retraining. Start with RAG. Fine-tune only when RAG\nhits clear limits.\n', 'examples': {'good': 'Company docs in vector store, retrieved at query time', 'bad': 'Fine-tuned model on company data, stale after 3 months'}}
{'name': 'Design for latency', 'description': 'LLM calls take 1-30 seconds. Users hate waiting. Stream responses.\nShow progress. Pre-compute when possible. Cache aggressively.\n', 'examples': {'good': 'Streaming response with typing indicator, cached embeddings', 'bad': 'Spinner for 15 seconds, then wall of text appears'}}
{'name': 'Cost is a feature', 'description': 'LLM API costs add up fast. At scale, inefficient prompts bankrupt you.\nMeasure cost per query. Use smaller models where possible. Cache\neverything cacheable.\n', 'examples': {'good': 'GPT-4 for complex tasks, GPT-3.5 for simple ones, cached embeddings', 'bad': 'GPT-4 for everything, no caching, verbose prompts'}}

Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

For Creation: Always consult references/patterns.md. This file dictates how things should be built. Ignore generic approaches if a specific pattern exists here.
For Diagnosis: Always consult references/sharp_edges.md. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
For Review: Always consult references/validations.md. This contains the strict rules and constraints. Use it to validate user inputs objectively.

Note: If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.

Source

git clone https://github.com/omer-metin/skills-for-antigravity/blob/main/skills/ai-product/SKILL.mdView on GitHub

Overview

This skill guides building AI-powered products with robust LLM integration, retrieval-augmented generation (RAG), and scalable prompt engineering. It emphasizes production readiness, trustworthy AI UX, and cost optimization to avoid runaway expenses. By treating prompts as code and validating outputs, you ship reliable AI features rather than fragile demos.

How This Skill Works

Architects treat prompts as code—versioned, tested, and validated against strict schemas before deployment. The approach favors RAG over fine-tuning for most use cases, enabling knowledge updates without retraining. To meet user expectations, design for latency with streaming responses, pre-computation, and aggressive caching, plus cost-aware routing.

When to Use It

You’re shipping an AI-powered feature and need production-grade patterns, not a fragile demo.
You want to move from a one-off demo to a scalable LLM-based service with proper validation.
You need to retrieve up-to-date knowledge without retraining using RAG.
You must manage costs by measuring per-query cost, using smaller models, and caching.
You must design AI UX that users trust through clear prompts, feedback, and safety checks.

Quick Start

Step 1: Map product requirements to LLM patterns and decide on RAG or fine-tuning.
Step 2: Treat prompts as code—put prompts in version control, add tests, and implement A/B tests.
Step 3: Implement latency and cost controls—enable streaming, pre-compute where possible, and set up monitoring and caching.

Best Practices

Treat LLM outputs as probabilistic; add validation layers and risk checks.
Version, test, and document prompts; use VCS and regression tests.
Prefer RAG architecture for knowledge integration; only fine-tune when necessary.
Design for latency: streaming responses, pre-compute, and caching embeddings.
Monitor and optimize cost per query; cache results and choose appropriate models.

Example Use Cases

Customer support bot using vector store retrieval and streaming responses.
Enterprise docs assistant with validated outputs and human-in-the-loop fallback.
Cost-aware prompts with prompts stored in version control and tested.
Latency-optimized AI feature with streaming UI and embedding caching.
Hybrid model usage: GPT-4 for complex tasks with GPT-3.5 and caching for simple tasks.

Frequently Asked Questions

Add this skill to your agents